Good afternoon friends, I am back with a new
subject this time, the name of the subject is business statistics, this course is very
important, it is being taught in several engineering institutions and in almost all management
institutions. Now, this course is specially meant for BBA
students, MBA students. There are several students who are pursuing
B.Com and they may also be requiring topics which I am going to teach in this particular
course. This course is specially important for part
time professionals who are working in different industries, who cannot come to educational
institutions and I will be covering lots of techniques which these professionals can use
in their daily lives. Now, many of you would be knowing about statistics,
this is; this course is not exactly on statistics but it is more of application of statistics
into different business scenario. So, statistics is what; at the end of the
day, we have got certain numbers and we transform those numbers into some useful information
and information should be actionable. Now, there are lots of information available
but many of them are not actionable, so we need to have some data, we need to process
that particular data and come up with certain information which are actionable. It is basically methods for processing and
analysing numbers. So, let us say if you have got sales data
of a company of last 1 year, so how you are let say, forecasting this sales in next particular
month or next particular year. The methods for helping reduce the uncertainty
inherent in decision making; now what makes a good manager and how he is different from
a bad manager, the most important point is how managers are making decisions. So, if a manager is taking right decision
at right time on the basis of data which he has got or she has got. And after analysing that particular data set
how he is coming up with information and how that particular manager is taking action. If you look at different types of statistics,
basically you have got 2 broad categories, you have got descriptive statistics and you
have got inferential statistics. Now, if you look at descriptive statistics,
you are describing something, let us say you are describing a group of students let us
say you are saying, I have got 30 students in my class. So, you are describing that particular class
or let us say this product has got height of 5 cm or 5 feet or whatsoever so you are
describing certain characteristics of a particular either product or individual or group or organisation. So, in descriptive statistics generally, we
collect data, we summarise them and we process data, so this is important part of descriptive
statistics but if you are coming up with certain inferences on the basis of descriptive statistics
then that is called inferential statistics. So, what you generally do in case of inferential
statistics, you infer some information from a sample of data and you try to know the characteristics
of the population on the basis of sample. So, basically there are 2 types of inferential
statistics and we have got estimation and hypothesis testing, so we will deal with these
2 topics in inferential statistics. Now, why to study this particular subject;
business statistics? As I said manager’s role is to make decisions,
it is not only managers but top bosses always make decisions be it director of any institution,
let us say CEO of any company or let us say any government officer or any army officer,
so these people keep making decisions and decisions should be based on certain facts,
certain data and once you processed those data, you come up with some actionable information,
so that is why decision makers use these data and after processing data, they make decisions. So, draw conclusions about large groups of
individuals or items using information collected from subsets of individuals or items. As I said in case of inferential statistics,
you collect data from sample and you try to know about population characteristics. So let us say, you want to know the height
of your students who are there in your class, so you are not going to measure height of
each and every student. So, what generally you to do; you take a sample,
you take height of that particular selected sample and then infer that the height of the
entire class would be this much, let us say 5 feet, 4 inches or whatever it is, so that
is inferential statistics. Make reliable forecast about business activities,
now this most important aspect as far as business statistics is concern; we want to forecast
so many things. For example, being a company what you would
like to forecast, certainly you would like to forecast what would be the demand of your
product in next year or next month or next quarter whatever it is, isn’t it? So, forecast sales you need to forecast how
many people you would be requiring next year in your organisation or in your company, so
you can forecast your man power, you can forecast how much raw material would be needed next
year. You may forecast how much money would be required
in next year, so forecasting is one of the areas where you can apply statistics on the
basis of pass data, you can process data and you can come up with some number for next
year's forecast and at the end of the day, it improves business processes. Now, there are different business processes,
let us say purchasing is one of the processes. So, purchasing, you got marketing, you have
got human resource management, let us say you have got after sales service, so once
you have got data you can make decision in all these areas. So statistics
can be used in several areas, can be used in business memos, business research as I
said you can do lots of research, if you have got data and data can be from several sources. You can prepare technical reports, technical
journals, newspaper articles, magazine articles, so all these are different applications of
business statistics. As far as this particular course is concerned,
I am going to talk about introduction, data collection and presentation of data, so most
important, the very first point in this particular you know surveying is the data collection. So, how you are collecting data, there are
different methods, how you are presenting data, there are again hundreds of methods
of data representation, what are different measures of location and dispersion, so I
will be teaching you in brief about different methods of central tendency, how to calculate
central tendency, what are different methods of dispersion measurement and so on. Probability and probability distributions,
numerical descriptive measures and basis probability, discrete and continuous probability distributions,
so I will be teaching you what are different probability distributions like binomial distribution,
Poisson distribution, normal distribution and so on. Sampling and sampling distribution, another
very important topic in this course, how you are choosing samples, what should be the sample
size. How to collect data from samples, which sampling
method is to be used whether it is; it would be probabilistic sampling or non- probabilistic
stick sampling, then I did discuss about inferential statistics, there are 2 types of statistics
as I have already mentioned; descriptive and inferential. So, if you look at confidence interval estimation
and one sample and hypothesis testing, so these are different inferential statistics,
this would be estimation and hypothesis testing and there are different types of hypothesis
testing, you can have 1 sample, 2 sample, 3 sample and so on, then chi-squared test,
simple linear regression, multiple regression, then we have got forecasting analysis. So, these are couple of points or let us say
topics which I would be covering in this particular course. Let us come to the first topic of this particular
course, its introduction and data collection, so before you are going for data collection
and analysis, you must know some basics about data, so what is data? Data is in fact there are several people have
defined data in their own ways and there are certain definitions I would be sharing with
you. So, data are collection of any number of related
observations, just look at this observations, so what could be the observations? I can say that there are 3 people absent today
in my organisation that is data, the number of telephone booths installed by the personnel
of telephone department that is data, number of buses going from let us say Bombay to Delhi
that is data, so all these observations are nothing but data. When you have got collection of data, it becomes
a data set, so let us say today’s temperature is let us say 41 degree centigrade, so this
is just one data point, if I write temperature of last one week, then it would become a data
set right, so it is a collection of data is data set. Data point as I said just single observation,
so temperature of Roorkee, today is let us say 41 degree centigrade, so that is just
single observation. Raw data; so information before it is arranged
and analysed that is raw data, so let us say if you have got today’s temperature and
yesterday’s temperatures, so 41 degree is temperature yesterday and 41 degree today,
so just a data set having 2 data points, right but that is just the raw data because we have
not analysed that particular data set having 2 data pints, right. So, let us say if I take mean, then it becomes
analysed data. One of the ways of defining data is information
plus noise, now every data set will have some information and some noise, so being a manager
or being a decision maker, your job is to extract information from available dataset,
so either extract information or you just remove noise is one and the same thing and
it is very difficult let me tell you, so data is information plus noise, so you should remove
noise from data. Because data generally speak and you should
hear whatever data say, a good manager always watch the data, listens to data and then take
appropriate decision. Let me give you an example of raw data, so
here is; you have got 5 data points, so high school and college CGPA of 5 high schools
and 5 colleges, so lets us say CGPA of first high school is 3.6 and college CGPA is 2.5. Similarly, here is CGPA is 4, here it is 3.8,
so can you come up with certain inference out of these 2; 2 sets of data; high school
data and college data? Because there is no co-relation, right. Because let us say when CGPA is 3.6, here
in high school it is 2.5, when it is 4, it is 3.8, right, so this is here it is decreasing
but here it is increased, isn’t it? It was 2.5, now it has become 2.7, so again
it has increased right, now here it is increased from 2.6 to 2.7 but it has decreased, so you
do not know how these two data sets are correlated, right. However, there would be certain correlation
definitely but this is just a raw data, okay. Now, there are different sources of data,
you can classify them as primary source and secondary source, so whenever you have got
data whether you have collected it from primary source or from secondary source, you should
evaluate data, you should test data whether that particular data set is appropriate or
not and there are different criteria on which you should evaluate data. First one is specification and methodology;
let us say if you have taken secondary data. Secondary data is data which someone else
has collected for some other purpose and you are just referring it for your use, it might
be useful for you, it might not be useful for you, so if you are looking at let us say
secondary data, then you should look at specification methodology of that particular secondary data. So, how what was the method of data collection,
when you are referring to secondary data, what was the response rate? Let us say somebody else collected data on
let us say 100 people and the response rate was let us say 10%, so is that data set useful
for your study, so you to check it, right, so what was the response rate, how the analysis
was done on secondary data. Sampling, what was the sampling methods used,
what was the sample size, how many questions were there, whether the questionnaire was
pilot tested or not? Who did field work, who controlled fieldworkers
and so on, so this is very important point, specification methodology as far as evolution
of data is concern, so at the end of the day what you want; you want reliable data, right,
if it is a secondary source, even if it is primary then also you need to look at reliability
of data, validity of data and generalizability, so there are 3 important points; reliability,
validity and generalizability, right. Errors and accuracy in secondary data, so
you should look at the errors which might be there in secondary data, what was the research
design used for collecting secondary data, whether it was exploratory research design
or conclusive research design, how data were collected, whether the data were collected
let us say in person or over phone or over internet or in mall intercept or you collected
date at home or at office, isn’t it? So, look at all those aspects as far as data
collection is concern, right and you are evaluating secondary sources of data right. So assess accuracy by comparing data from
different sources, let us say suppose if I ask you what would be India's GDP next year
in 1890, so I should get this particular information from several sources, is not it; rather than
just looking at one particular source. So, I can look sources like let us say centre
for monitoring Indian economy database, I can look at this information on RBI's website,
I can look at this information on the Ministry of Finance website, I can look at this data
on simple just by Googling, I can look at these data by talking to let us say Finance
Minister, isn’t it, so there are several sources and I should have data from different
sources and then try to come up with a particular solution or a particular number. Then the next one is currency, how updated
that data is; is the time lag between the data collected and data published, how frequently
the second sources are being updated and we should see if you look at the census data
in India, the census data are updated every 10 years, right, so how the forms which are
supplying secondary data, how that firm is updating secondary sources data. Then look at the objectives; what were the
objectives of collecting secondary data? Because you may or you may not use secondary
data but you may get some information from secondary data for your own research, so the
objective will tell you whether the data which you are taking for your research are relevant,
having any relevance or not with your research. Then look at the important key variables,
we will see what is the meaning of variable, unit of measures, measurement, categories,
relationship examined and so on, so reconfigured the data to increase their usefulness, so
this is the objective here. Look at dependability; when you are looking
at secondary source, you should look at the credibility of that particular source, the
reputation of the organisation from where you are taking secondary data, how trustworthiness
that particular sources. And it should be as far as possible data should
be obtained from an original source, right. So you may have different data and data again
as I can be let us say numeric data, nonnumeric data or let us say structured data or unstructured
data, you may have image data, you may have text data, you may have video data and so
on, so there are different types of data, so you should look at these issues in secondary
sources, okay. Now, there is something called double counting
example, whenever you are collecting data then while collecting data and while analysing
data, you should take care of something called double counting. Let us say if the truck association of a country
or of a state says that 75% of the items you receive come from trucks, just see this, so
this is the claim of the truck association that 75% of the items you receive at home
come from trucks. Does it mean the remaining 25% of the time,
the use of rail, aeroplane and ship has been there, it would be a wrong interpretation,
it is possible that the items which you are getting at your home might have come to your
city either by ship or by rail or by aeroplane but the distribution in the city was by trucks,
so this is a case of a double counting, right, so avoid such type of inference from data. Now, let us look at couple of other points
related to data, so I will be defining elements, variables and observations, so elements are
the entities on which data are collected, let us say I want to collect data on height
of students, so those students are nothing but elements, right, so elements are the entities,
so entities can be a group of people, a group of organisation and individual person, a product
or a group of product, so all these are elements. Variable is a characteristics of interest
for the elements, so let us come to the entity as a group of students, so let us say I am
interested in their height, so this height would be nothing but a variable okay, let
us say height of a student or weight of student or let us say marks of the student all these
are nothing but variable, so you are interested in certain characteristics of elements right. The set of measurements collected for a particular
element is called observation, so let us say if I measure height of a group of students,
let us say 5.5 inches, 6.1 inches and so on, so that would be called observations, right. The total number of data values in each in
a dataset is the number of elements multiplied by the number of variables is very simple. So, let us take an example of data, data set
elements, variables and observations, so here there are different companies let us say,
Dataram, LandCare and so, so all these are nothing but different companies and we will
call them elements, right as I said group of students, group of products all those are
elements, right and the variables are I am interested in let us say earning per share
of these particular company is 0.86, so this is a variable, annual sales is this much,
this is another variable. So, I have looked at earning per share from
stock exchange of this particular company, so AMEX stock exchange, so there you may have
different stock exchanges, right New York Stock Exchange, isn’t it, so these are different
observations and this is nothing but dataset, so and this is data point, every single observation
is data point. Let us look at once again what we have done
in today's session. We have talked about this; the course and
for whom this course is has been designed, what are the contents of this particular course,
we have seen different types of statistics, we have seen descriptive statistics and differential
statistics, we have also seen different definitions of data, data points, dataset, observations,
elements, entities and so on. So, with this let me complete today's session. In next session, we will carry forward this
particular topic which is introduction and data collection, thank you very much.