QT-Unit 1
QT-Unit 1
QT-Unit 1
Statistics - An overview
• “Statistical thinking will one day be as necessary as the ability to read and write!” By H.G. Wells
• Business environment is competitive where organizations are data-rich but information-poor
• Hence it is significant for decision makers to develop the ability to extract meaningful information from raw
data to make better decisions
• Data is a collection of observations of one or more variables of interest
• Knowledge of statistics helps decision-makers to develop ability to:
- Present & describe information/data so as to improve decisions
- Draw conclusions about large population based upon information obtained samples
- Seek relationship between variables
- Obtain reliable forecasts of statistical variables of interest
Growth & development of Statistics
• The word Statistics has different meanings to different people depending on its use
• For a cricket fan , it refers to data relating to runs scored by a cricketer
• For an environmentalist, it refers to information on the quantity of pollutants released into atmosphere by all
types of vehicles in different cities
• For census department, it refers to information on birth rate & gender ratio in different states
• For a share broker, it refers to information on changes in share prices over a period of time
• Sources of such numerical data are both electronic and print media
• After statistical analysis information extracted is presented visually in form of graphs, charts, diagrams &
pictograms.
• Based on conclusions certain decisions are arrived at to problems pertaining to social, political, economic,
and cultural activities
Statistical thinking
• Statistical thinking can be defined as the thought process that focuses on ways to identify, control, and
reduce variations present in all phenomena
• Statistical thinking allows decision maker to recognise & make interpretations of the variations in a process
• Management philosophy acts as a guide for laying a solid foundation for total quality improvement in a given
process
• Also use of behavioural tools such as brainstorming, team-building & group decion-making as well as
statistical methods such as tables, charts & descriptive statistics are equally necessary for understanding &
improving the processes.
Steps in Process improvement
• Step 1: Specify the aim of the study
• Step 2: Understand how the process works
• Step 3: Assess the current process performance
• Step 4: Identify strategies for improvement
• Step 5: Test the effectiveness of proposed strategy
• Step 6: If successful , then implement the strategy
• Step 6: If not successful then go to step 4 & repeat
Defining Statistics
• Statistics is the art and science of collecting, analysing, presenting & interpreting data
• Quantitative data: Numerical data measured on an interval or ratio scales to describe how much or how
many
Characteristics of Statistics:
2. Inferential statistics:
Consists of procedures used to make inferences about population characteristics on basis of sample results. They are further
divided into two categories:
a) Parametric: The use of parametric methods is based on the assumption that the population, from which the sample is
drawn, is normally distributed. Parametric methods can be used only when data are collected on an interval or ratio
scale.
b) Non-parametric: The basic idea behind the non-parametric method is no need to make any assumption of parameters
for the given population or the population we are studying. In fact, the methods don’t depend on the population. Here
there is no fixed set of parameters are available, and also there is no distribution (normal distribution, etc.) of any kind is
available for use.
Importance & scope of Statistics
‘A knowledge of statistics is like a knowledge of foreign language or of algebra, it may prove of use at any time
under any circumstances.’
Statistics and the Government:
-Government is required to collect huge amount of statistics for various purposes such as data relating to
prices, income & expenditure, investments, etc.
-Also data is collected on population dynamics in order to initiate and implement various welfare policies and
programmes
Statistics in Economics:
-Statistical methods are extensively useful in economic analysis.
-Studying pattern of prices, production, money in circulation, bank deposits, etc.
-Demand analysis to study relationship between price and supply of commodities
-Predicting inflation rate, unemployment rate, etc.
Statistics in Business management:
-Statistical reports provide a summary of business activities which enable decision-makers to take effective decisions
with respect to future activities.
Marketing:
-Before launch of a product, the market research team uses statistics to analyse data on purchasing power, habits of
consumers, competitors, pricing, etc.
-Purpose of such a study is to understand the possible market potential for the product.
Production:
-Statistical methods are used to conduct R&D activities to bring improvement in the quality of the existing products
and setting quality control for new ones.
-Statistical data analysis is helpful in making decisions about quantity and time of either to make or buy from outside.
Finance:
-Statistical methods helps to predict probable dividend in years to come.
-They are also useful in analysing data on income and expenditure, assets & liabilities, break-even analysis, etc. to
ascertain financial results of various operations.
Personnel:
-Statistical methods are used for manpower planning, wage & salary administration, incentive planning, attrition rate,
etc
-Study of employee-employer relationship requires analysis of various factors such as grievances handling, training &
development, etc.
Limitations of Statistics
1. Statistics does not study qualitative attributes:
94 89 88 89 90 94 92 88 87 85
88 93 94 93 94 93 92 88 94 90
93 84 93 84 91 93 85 91 89 95
This data in its present format does not highlight any trend such as the highest, lowest and average weekly
hours. Consequently no meaningful inference can be drawn unless this data is reorganised in a suitable
format.
If a raw data set is arranged in either ascending or descending order, then ordered sequence so obtained is
called an ordered array.
84 84 85 85 87 88 88 88
88 89 89 89 90 90 91 91
92 92 93 93 93 93 93 93
94 94 94 94 94 95
Constructing a frequency distribution
• A tabular summary of data showing the number(frequency) of observations in each of several non-overlapping class
intervals is known as Frequency distribution
• To condense the data into frequency distribution tables, the following steps should be taken:
1) Decide the number of class intervals:
- The decision on the number of class intervals depends largely on the judgement of the investigator and the range of
numerical values in the data set.
- As a general rule, a frequency distribution should have at least 5 class intervals but not more than 15.
- The following rule is used to decide approximate number of classes in a frequency distribution.
- If k represents number of classes and N the total number of observations, then value of k will be the smallest
exponent of the number 2, so that 2^k >= N
- In Table 2 we have N=30 observations. Hence we shall have
2^3 = 8 (<30)
2^4 = 16 (< 30)
2^5 = 32 (>30)
Hence we may choose k=5 as the number of classes
2) Determine the width of classes:
-It is desirable that the width of each class interval should be equal in size.
-The width of class interval can be determined by:
-The class mid-point is the point halfway between the boundaries of each class.
-It is obtained by dividing the sum of the upper and lower class limits by two.
Methods of Data classification
ii) Exclusive method:
-When data are classified in such a way that the upper limit of a class interval is the lower limit of the succeeding
class interval.
-This ensures continuity of the data
• Data:
66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66,
69
Example for Mean
• Formula: = ΣXi / N
= 1446 / 20
= 72.3
Formula: = Σ (fm)
N
Frequency table:
Score f m* (fm)
Direct method
Formula: = Σ (fm)
N
= 1420 / 20
= 71
- "balance point“
Example…
no outlier:
$30000, 30000, 35000, 25000, 30000 then mean = $30000
but if outlier is present, then:
$130000, 30000, 35000, 25000, 30000 then mean = $50000
Median = exact centre or middle of ordered data.
The 50th percentile.
Formula:-
• Array data.
• When sample size is even, median falls halfway
between two middle numbers.
• To calculate: find (n/2) and (n/2)+1, and divide the
total by 2 to find the exact median.
• When sample size is odd, median is exact middle
(n+1) /2
Example for Raw Data:
• Suppose you have the following set of test scores:
• 66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66,
69
• -"balance point“
• Formula: value that occurs most often or the category or interval with
highest frequency.
Example for Nominal Variables:
•
• Religion frequency cf proportion % Cum%
• Catholic 17 17 .41 41 41
• Protestant 4 21 .10 10 51
• Jewish 2 23 .05 5 56
• Muslim 1 24 .02 2 58
• Other 9 33 .22 9 80
• None 8 41 .20 20 100
• Describe how variable the data are: i.e. how spread out around the
mean
Formula: Q = Q3 - Q1
(where Q3 = N x .75, and Q1 = N x .25)
Using above data: Q = Q3 - Q1 = (6th – 2nd case)
= $30000-25000 =$5000
The interquartile range (Q) is $5000.
Variance and Standard Deviation:
• Standard Deviation:
s =