QT-Unit 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 57

Quantitative Methods

Statistics - An overview
• “Statistical thinking will one day be as necessary as the ability to read and write!” By H.G. Wells
• Business environment is competitive where organizations are data-rich but information-poor
• Hence it is significant for decision makers to develop the ability to extract meaningful information from raw
data to make better decisions
• Data is a collection of observations of one or more variables of interest
• Knowledge of statistics helps decision-makers to develop ability to:
- Present & describe information/data so as to improve decisions
- Draw conclusions about large population based upon information obtained samples
- Seek relationship between variables
- Obtain reliable forecasts of statistical variables of interest
Growth & development of Statistics
• The word Statistics has different meanings to different people depending on its use
• For a cricket fan , it refers to data relating to runs scored by a cricketer
• For an environmentalist, it refers to information on the quantity of pollutants released into atmosphere by all
types of vehicles in different cities
• For census department, it refers to information on birth rate & gender ratio in different states
• For a share broker, it refers to information on changes in share prices over a period of time
• Sources of such numerical data are both electronic and print media
• After statistical analysis information extracted is presented visually in form of graphs, charts, diagrams &
pictograms.
• Based on conclusions certain decisions are arrived at to problems pertaining to social, political, economic,
and cultural activities
Statistical thinking
• Statistical thinking can be defined as the thought process that focuses on ways to identify, control, and
reduce variations present in all phenomena
• Statistical thinking allows decision maker to recognise & make interpretations of the variations in a process
• Management philosophy acts as a guide for laying a solid foundation for total quality improvement in a given
process
• Also use of behavioural tools such as brainstorming, team-building & group decion-making as well as
statistical methods such as tables, charts & descriptive statistics are equally necessary for understanding &
improving the processes.
Steps in Process improvement
• Step 1: Specify the aim of the study
• Step 2: Understand how the process works
• Step 3: Assess the current process performance
• Step 4: Identify strategies for improvement
• Step 5: Test the effectiveness of proposed strategy
• Step 6: If successful , then implement the strategy
• Step 6: If not successful then go to step 4 & repeat
Defining Statistics
• Statistics is the art and science of collecting, analysing, presenting & interpreting data
• Quantitative data: Numerical data measured on an interval or ratio scales to describe how much or how
many

Characteristics of Statistics:

1. Statistical data are aggregates of facts:


Single & isolated figure is not statistics.
We cannot compare a single figure given alone.
• E.g. Raghav scored 45 marks out of 100 in Physics- is not statistical data. However, if we say that Ruchi, Keya &
Maria scored 60, 743 & 59 marks respectively , the group of figures become statistics.

2. Statistical data are numerically expressed:


Qualitative statements such as ‘per capita income of India is low’ or ‘the population of India is rapidly rising’ are
not statistics. Rather they are conclusions based on quantitative information. Statistics are in numbers and
quantities. E.g. Population of India has increased from 36.1 crores in 1951 to 84.6 crores in 1991 is an example of
statistics.
3. Data must be collected in a systematic manner:
Suitable planning of data collection is to be made in advance. Data collected in haphazard manner may lead to wrong
conclusions.

4. Figures must be accurate to a reasonable standard:


Data can be enumerated or counted with a reasonable level of accuracy. If the area of our study is small. E.g. data
regarding age and height of students can be enumerated more precisely in a small class. As the area of study becomes
wider, chances of mistakes in collection of data increase. e.g. when the study regarding age and height of students is
extended to all schools in state , chances of making mistakes in recording of data increase.

5. Statistics are collected for a predetermined purpose:


We must have well defined purpose, specific aims & objectives before we collect data. Suppose, we want to compare to
compare the performance of students of ,say, secondary level of National Open School in one subject or more, we must
specify the subjects and the year for which comparison is being carried out.
Types of Statistical methods
Statistical methods are classified into following categories:
1. Descriptive statistics:
Includes statistical methods used to summarize and describe the characteristics of a set of data. They are further divided into
two categories:
a) Graphic methods such as bar charts, line graphs, pie charts, etc
b) Numeric methods such as central tendency, dispersion, skewness etc.

2. Inferential statistics:
Consists of procedures used to make inferences about population characteristics on basis of sample results. They are further
divided into two categories:
a) Parametric: The use of parametric methods is based on the assumption that the population, from which the sample is
drawn, is normally distributed. Parametric methods can be used only when data are collected on an interval or ratio
scale.
b) Non-parametric: The basic idea behind the non-parametric method is no need to make any assumption of parameters
for the given population or the population we are studying. In fact, the methods don’t depend on the population. Here
there is no fixed set of parameters are available, and also there is no distribution (normal distribution, etc.) of any kind is
available for use.
Importance & scope of Statistics
‘A knowledge of statistics is like a knowledge of foreign language or of algebra, it may prove of use at any time
under any circumstances.’
Statistics and the Government:
-Government is required to collect huge amount of statistics for various purposes such as data relating to
prices, income & expenditure, investments, etc.
-Also data is collected on population dynamics in order to initiate and implement various welfare policies and
programmes

Statistics in Economics:
-Statistical methods are extensively useful in economic analysis.
-Studying pattern of prices, production, money in circulation, bank deposits, etc.
-Demand analysis to study relationship between price and supply of commodities
-Predicting inflation rate, unemployment rate, etc.
Statistics in Business management:
-Statistical reports provide a summary of business activities which enable decision-makers to take effective decisions
with respect to future activities.
Marketing:
-Before launch of a product, the market research team uses statistics to analyse data on purchasing power, habits of
consumers, competitors, pricing, etc.
-Purpose of such a study is to understand the possible market potential for the product.
Production:
-Statistical methods are used to conduct R&D activities to bring improvement in the quality of the existing products
and setting quality control for new ones.
-Statistical data analysis is helpful in making decisions about quantity and time of either to make or buy from outside.
Finance:
-Statistical methods helps to predict probable dividend in years to come.
-They are also useful in analysing data on income and expenditure, assets & liabilities, break-even analysis, etc. to
ascertain financial results of various operations.
Personnel:
-Statistical methods are used for manpower planning, wage & salary administration, incentive planning, attrition rate,
etc
-Study of employee-employer relationship requires analysis of various factors such as grievances handling, training &
development, etc.
Limitations of Statistics
1. Statistics does not study qualitative attributes:

2. Statistics does not study Individuals

3. Statistics can be misused


Types of Data
Data type Information type Measurement type

Categorical -Do you practice Yoga? Yes/No

Numerical Discrete How many books do you Number


have in your library?

Continuous What is your height? Cm/Inches


Organising data using data array
• For an easy & systematic research based on large quantity of numerical data, the data set must be first organised &
presented in an appropriate tabular and graphical format.
• The table presents the total number of overtime hours worked by workers for 30 consecutive weeks in a factory. The data
displayed in table are in raw form as numerical observations are not arranged in any particular order or sequence.

Table 1: Raw data of total overtime hours worked by machinists

94 89 88 89 90 94 92 88 87 85

88 93 94 93 94 93 92 88 94 90

93 84 93 84 91 93 85 91 89 95
This data in its present format does not highlight any trend such as the highest, lowest and average weekly
hours. Consequently no meaningful inference can be drawn unless this data is reorganised in a suitable
format.
If a raw data set is arranged in either ascending or descending order, then ordered sequence so obtained is
called an ordered array.

Table 2: Ordered array of total overtime hours worked by machinists

84 84 85 85 87 88 88 88
88 89 89 89 90 90 91 91
92 92 93 93 93 93 93 93
94 94 94 94 94 95
Constructing a frequency distribution
• A tabular summary of data showing the number(frequency) of observations in each of several non-overlapping class
intervals is known as Frequency distribution

• To condense the data into frequency distribution tables, the following steps should be taken:
1) Decide the number of class intervals:
- The decision on the number of class intervals depends largely on the judgement of the investigator and the range of
numerical values in the data set.
- As a general rule, a frequency distribution should have at least 5 class intervals but not more than 15.
- The following rule is used to decide approximate number of classes in a frequency distribution.
- If k represents number of classes and N the total number of observations, then value of k will be the smallest
exponent of the number 2, so that 2^k >= N
- In Table 2 we have N=30 observations. Hence we shall have
2^3 = 8 (<30)
2^4 = 16 (< 30)
2^5 = 32 (>30)
Hence we may choose k=5 as the number of classes
2) Determine the width of classes:

-It is desirable that the width of each class interval should be equal in size.
-The width of class interval can be determined by:

Width of class interval(h) = Largest numerical value – Smallest numerical value


__________________________________________
No. of classes desired (k)
-From the Table 2, the range is 95-84 = 11 hours
Hence width of class interval = 11/5 = 2.2 = 3 hours

3) Determine class limits:


-The limits of each class interval needs to be defined so that each numerical value of the data set belongs to one and only
one class.
Each class has two limits- a lower limit and an upper limit.

OT hours Tally Frequency


82 but less than 85 // 2
85 but less than 88 /// 3
88 but less than 91 //// //// 9
91 but less than 94 //// //// 10
94 but less than 97 //// / 6
Mid point of class intervals:

-The class mid-point is the point halfway between the boundaries of each class.
-It is obtained by dividing the sum of the upper and lower class limits by two.
Methods of Data classification
ii) Exclusive method:
-When data are classified in such a way that the upper limit of a class interval is the lower limit of the succeeding
class interval.
-This ensures continuity of the data

Class intervals Frequency


0 but less than 10 (0-10) 5
10 but less than 20 (10-20) 7
20 but less than 30 (20-30) 15
30 but less than 40 (30-40) 10
i) Inclusive method:
-When data are classified in such a way that both the upper and lower limits of a class interval are included in the
interval itself

Class intervals Frequency


0-4 5
5-9 7
10-14 15
15-19 10
Descriptive Statistics
Descriptive Statistics tell us about specific trends in our data and describe specific features of our sample.
e.g. a researcher will use descriptive statistics to tell readers about the proportion of men and women who
participated in a study.
The research may write something like:
“In this study 40% of the sample were men, whereas 60% were female.”
Or the researcher may inform readers about participants’ average scores on a particular variable in the study.
In this case the researcher may say: "The mean score on the communication competence measure was 14.55”.
The primary descriptive statistics fall into one of two “families”:
i) Measures of central tendency
ii) Measures of dispersion
Measures of central tendency
• Measures of central tendency tell us about a central characteristic of
the data
• Measures of central tendency include:
i) Mean
ii) Median
iii) Mode
• The mean − add up all the numbers and divide by how many numbers
there are.
• The median − is the middle number. It is found by putting the
numbers in order and taking the actual middle number if there is one,
or the average of the two middle numbers if not.
• The mode − is the most commonly occurring number.
Calculate Mean for Raw Data
• The mean is the sum of measurements / number of subjects

• Direct method Formula: (X-bar) = ΣXi / N

• Data:
66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66,
69
Example for Mean
• Formula: = ΣXi / N
= 1446 / 20
= 72.3

The mean for these test scores is 72.30


Mean for Grouped/Classified Data

• To calculate the mean for grouped data, you need a


frequency table that includes a column for the
midpoints, for the product of the frequencies times
the midpoints (fm).

Formula: = Σ (fm)
N
Frequency table:
Score f m* (fm)

41-50 1 45.5 45.5


51-60 3 55.5 166.5
61-70 8 65.5 524
71-80 3 75.5 226.5
81-90 2 85.5 171
91-100 3 95.5 286.5
N = 20 Σ (fm) = 1420
* Find midpoints first
Calculating Mean for Grouped Data:

Direct method
Formula: = Σ (fm)
N
= 1420 / 20
= 71

The mean for the grouped data is 71.


Properties of the Mean:
- only for numerical data at interval level

- "balance point“

- can be affected by outliers = skewed distribution

- tail becomes elongated and the mean is pulled in direction of


outlier.

Example…
no outlier:
$30000, 30000, 35000, 25000, 30000 then mean = $30000
but if outlier is present, then:
$130000, 30000, 35000, 25000, 30000 then mean = $50000
Median = exact centre or middle of ordered data.
The 50th percentile.
Formula:-
• Array data.
• When sample size is even, median falls halfway
between two middle numbers.
• To calculate: find (n/2) and (n/2)+1, and divide the
total by 2 to find the exact median.
• When sample size is odd, median is exact middle
(n+1) /2
Example for Raw Data:
• Suppose you have the following set of test scores:
• 66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66,
69

• 1. Array (put in order) your data:


• 98 98 95 89 89 77 76 74 69 69
68 67 66 66 66 66 60 60 52 41
N = 20 (N is even)
To calculate:
- find middle numbers(n/2), (n/2 )+1
- add together the two middle numbers
- divide the total by 2

• First middle number: (20/2) = the 10th number


• 2nd middle number: (20/2)+1 = the 11th number
• Look at data:
the middle numbers are 69 and 68
• The median would be (69+68)/2 = 68.5
Median in case of grouped data
Example:
Quartiles, Deciles & Percentiles
Formulas:
Formulas:
Properties of median:
• - for numerical data at interval or ordinal level

• -"balance point“

• -not affected by outliers

• -median is appropriate when distribution is highly skewed.


• Mode = can be used for any kind of data but only measure of central
tendency for nominal or qualitative data.

• Formula: value that occurs most often or the category or interval with
highest frequency.
Example for Nominal Variables:

• Religion frequency cf proportion % Cum%
• Catholic 17 17 .41 41 41
• Protestant 4 21 .10 10 51
• Jewish 2 23 .05 5 56
• Muslim 1 24 .02 2 58
• Other 9 33 .22 9 80
• None 8 41 .20 20 100

• Total 41 1.00 100%

• Central Tendency: MODE = largest category = Catholic


NOTE:
• When distribution is symmetric,
mean = median = mode
• For skewed, mean will lie in direction of skew.
• i.e. skewed to right (tail pulled to right)
mean > median (positive skew)
• skewed to left (tail pulled to left)
median > mean (negative skew)
Geometric mean
• In many business and economics problems, certain quantities change over a
period of time.
• In such cases, it’s important for decision-maker to know an average percentage
change to represent average growth/declining rate in the variable value over a
period of time.
• To arrive at right decisions, geometric mean needs to be calculated.
• Formula:
• GM= x1.x2.x3.x4.x5……….xn
GM = Antilog {1/n log xi}
Calculation of GM:
Year Growth rate Output at the end of year
2010 5.0 105
2011 7.5 112.87
2012 2.5 115.69
2013 5.0 121.47
2014 10.0 133.61
Measures of Dispersion

• Describe how variable the data are: i.e. how spread out around the
mean

• Also called measures of variation or variability


Range (for numerical data)

Range = difference between largest and smallest observations


i.e. if data are $130000, 35000, 30000, 30000, 30000, 30000, 25000,
25000
then range = 130000 - 25000 = $105000
Interquartile Range (Q):
- This is the difference between the 75th and the 25th percentiles (the middle
50%)
- Gives better idea than range of what the middle of the distribution looks like.

Formula: Q = Q3 - Q1
(where Q3 = N x .75, and Q1 = N x .25)
Using above data: Q = Q3 - Q1 = (6th – 2nd case)
= $30000-25000 =$5000
The interquartile range (Q) is $5000.
Variance and Standard Deviation:

• For raw data at the interval/ratio level.


• Most common measure of variation.
• The numerator in the formula is known as the sum of squares, and
the denominator is either the population size N or the sample size n-
1
• The variance is denoted by S2 and the standard deviation, which is the
square root of the variance, by S
Definitional Formula for Variance and Standard
Deviation:
• Variance: s2 = Σ (xi - )2 / N

• Standard Deviation:

s =

• (the standard deviation is the square root of the variance; the


variance is simply the standard deviation squared)
A working formula for the standard deviation:
Properties of S:
• always greater than or equal to 0
• the greater the variation about mean,
the greater S is
• n-1 corrects for bias when using sample data. S tends to
underestimate the real population standard deviation
when based on sample data so to correct for this, we use
n-1. The larger the sample size, the smaller difference this
correction makes. When calculating the standard
deviation for the whole population, use N in the
denominator.
NOTE:

• σ, N and Mu (µ) denote population parameters

• s, n, x-bar ( ) ) sample statistics


Remember the Rounding Rules!
• Always use as many decimal places as your calculator
can handle.

• Round your final answer to 2 decimal places, rounding


to nearest number.

• Engineers Rule: When last digit is exactly 5 (followed


by 0’s), round the digit before the last digit to nearest
EVEN number.
• THANKS!

You might also like