Stat Introduction Units 1& 2
Stat Introduction Units 1& 2
Stat Introduction Units 1& 2
Class Frequency
(Group of similar values) (No. of observations in each Class)
2.0 - 2.5 1
2.6 - 3.1 0
3.2 - 3.7 2
3.8 - 4.3 8
4.4 - 4.9 5
Introduction - Statistics & Data Analysis 2
5.0 - 5.5 4
Sample of daily production in yards of 30
carpet looms
Inferential Statistics
• It is the science of using a sample to make generalizations about
the important aspects of a population.
• A descriptive value for a population is called a parameter and a
descriptive value for a sample is called a statistic.
Statistical Data
• Statistical data are the basic raw material of
statistics.
• It refers to those aspects of a problem
situation that can be measured, quantified
or counted.
Data Sources
Data sources could be seen as of two types:
Secondary
Primary
Secondary data: They already exist in some form:
published or unpublished - in an identifiable
secondary source. They are, generally, available from
published source(s), though not necessarily in the
form actually required.
Primary data: The data which do not already exist in
any form, and thus have to be collected for the first
time from the primary source(s). By their very nature,
these data require fresh and first-time collection
covering the whole population or a sample drawn
from it.
Types of Data
• In statistics, data are classified into two broad
categories:
Quantitative Data: That can be quantified in
definite units of measurement.
Discrete data
e.g. The number of customers visiting a departmental
store everyday, the number of incoming flights at an
airport, number of defective items in a consignment
received for sale.
Continuous data:
e.g. All characteristics such as weight, length, height,
thickness, velocity, temperature etc.
Types of Data
SAMPLE
• Usually populations are so large that a researcher
cannot examine the entire group. Therefore, a
sample is selected to represent the population in a
research study. The goal is to use the results
obtained from the sample to help answer questions
about the population.
• A sample is a subset o the elements of a population.
Methods of Classification
Every item of the collected data has its own characteristics.
These characteristics can be of two types:
(i) Descriptive: (e.g. Honesty, beauty etc.)
These characteristics are those which cannot be measured
directly but they are counted on the basis of presence or
absence. (Non-measurable characteristics or attributes)
(ii) Numerical: (e.g. height, weight, profit etc.)
Numerical facts are those which can be measured.
types of classification
Male Females
Female Female
Male Male Employed Unemployed
Employed Unemployed
Quantitative Classification
154 8 1000-1500 15
155 10 1500-2000 33
156 6
2000-2500 22
157 2
2500-3000 18
158 12
3000-3500 12
159 12
5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9
9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9
Scatter Diagram
• Scatter Diagram
Number of Points Scored y
30
25
20
15
10
5
0 x
0 1 2 3
Number of Interceptions
Example: Panthers Football Team
x
Scatter Diagram
• A Negative Relationship
y
x
Scatter Diagram
• No Apparent Relationship
y
x
Tabular and Graphical Procedures
Data
Qualitative Data Quantitative Data
Bar Diagram
4th
Qtr
Sales in the Year 1990
Region Q1 Q2 Q3 Q4
3rd
Qtr East East 20.0 30.0 90.0 60.0
West
2nd North West 30.6 38.6 34.6 31.6
Qtr
North 45.9 46.9 45.0 43.9
1st
Qtr
0 100 200
Bar Diagram
E a st
W e st
N o r th Sales in the Year 1990
4 th Q tr Region Q1 Q2 Q3 Q4
3 r d Q tr
East 20.0 30.0 90.0 60.0
West 30.6 38.6 34.6 31.6
2 n d Q tr
North 45.9 46.9 45.0 43.9
1 s t Q tr
0 100 200
Pie Chart
1st Qtr 2nd Qtr Sales in the Year 1990
3rd Qtr 4th Qtr
Region Q1 Q2 Q3 Q4 Total
East 20 30 90 60 200
(In %) 10 15 45 30 100
Note:
Angle 3600 at centre is distributed
Introductionproportional
- Statistics & Data Analysis to % share. 48
4. Graphical Representation of Data
Pie Chart
Sales in the Year 1990
1st Qtr
2nd Qtr Region Q1 Q2 Q3 Q4 Total
3rd Qtr East 20 30 90 60 200
4th Qtr
(In %) 10 15 45 30 100
25-30 6
4 3
2 31-36 3
2 1 37-42 1
0 ---------------------------
Total 25
---------------------------
7-12
13-18
19-24
25-30
31-36
37-42
0 .2 19-24 8 0.32
0 .1 25-30 6 0.24
31-36 3 0.12
0
37-42 1 0.04
-------------------------------------
7-12
13-18
19-24
31-36
37-42
25-30
Total 25 1.00
-------------------------------------
C la s s - - >
Introduction - Statistics & Data Analysis 55
4. Graphical Representation of
Data
• Frequency Polygon
– A line graph that connects the midpoints of all the
bars in a histogram
– Graphical representation of a frequency
distribution but it is assumed that the distribution
has equal class width whereas histograms may have
unequal class-width as well.
– Two or more frequency polygons can be drawn on
the same graph whereas two histograms cannot be.
0 .2
25-30 6 0.24
0 .1
31-36 3 0.12
0 37-42 1 0.04
-------------------------------------
3.5
9.5
15.5
21.5
27.5
33.5
39.5
45.5
Total 25 1.00
-------------------------------------
C la s s M i d - P o in t - - >
Introduction - Statistics & Data Analysis 57
4. Graphical Representation of Data
0 .5
13-18 0.20 0.28
0 .4
0 .3 19-24 0.32 0.60
0 .2 25-30 0.24 0.84
0 .1 31-36 0.12 0.96
0 37-42 0.04 1.00
--------------------------------------
7
13
19
25
31
37
43
Total 1.00
C la ss --> --------------------------------------
Introduction - Statistics & Data Analysis 58
Ogives (cumulative frequency curves)
• A graph of a cumulative frequency distribution is
called Ogive
• A cumulative frequency distribution that enables
us to see how many observations lie above or
below certain values, rather than merely recording
the number of items within intervals
• A less-than or a greater-than ogive can be
constructed for a given frequency distribution
f
j 1
j xj f
j 1
j xj
( f1 x1 f 2 x2 ... f n xn )
n k
( f1 f 2 ... f n )
f j
1 observations; k=no. of classes
where n = no. jof
xj= mid-point of j-th class
fj = frequency of j-th class (Note: fj’s add to n)
Introduction - Statistics & Data Analysis 69
Exercise: weights in pounds of a sample of packages
is given, calculate the sample mean
Class Frequency
10.0-10.9 1
11.0-11.9 4
12.0-12.9 6
13.0-13.9 8
14.0-14.9 12
15.0-15.9 11
16.0-16.9 8
17.0-17.9 7
18.0-18.9 6
19.0-19.9Introduction - Statistics & Data Analysis 2 70
Exercise: weights in pounds of a sample of
packages is given, calculate the sample mean
Class Frequency x (midpoint) fx
10.0-10.9 1 10.5 10.5
11.0-11.9 4 11.5 46.0
12.0-12.9 6 12.5 75.0
13.0-13.9 8 13.5 108.0
14.0-14.9 12 14.5 174.0
15.0-15.9 11 15.5 170.5
16.0-16.9 8 16.5 132.0
17.0-17.9 7 17.5 122.5
18.0-18.9 6 18.5 111.0
19.0-19.9 2 19.5 39.0
65 988.5
wj 1
j xj
n
where n = no. of observations;
w
j 1
j
Unskilled $5.00 1 4
Semiskilled $7.00 2 3
Skilled $9.00 5 3
19,17,15,20,23,41,33,21,18,20,18,33,32,29,24,19,18,
20,17,22,55,19,22,25,28,30,44,19,20,39
2
spread of values
1 • We could quote a range
(population 1: 96-62=34
0
60 65 70 75 80 85 90 95
heart rate
beats; population 2: 88-
70=18 beats)
heart rate of population 2
• However, the problem is
this range depends just on
6
5
measure
3
2
1
0
76
84
92
60
64
68
72
80
88
96
heart rate
2
• Both populations have
1.5 the same range but
1
0.5
clearly population 2
0
1 3 5 7 9 11 13 15 17 19 21 23 25
has less spread across
6
most values.
5
4
• Better to measure
3 deviation from the
mean
2
0
1 3 5 7 9 11 13 15 17 19 21 23 25
Value
Value
Value
Introduction - Statistics & Data Analysis 93
Measures of Variability/Dispersion
• Range
• Interquartile range (IQR)
• Variance and standard deviation (average distance of
any of the observation in the data set from the mean)
• Coefficient of variation (CV)
Range Value of Highest
Value of Lowest
Observation Observation
Example: Let Given Observations are 1.5, 1.0, 4.5, 5, 0.5
Highest Value = 5.0; and Lowest Value = 0.5
So, Range = 5.0 – 0.5 = 4.5
Introduction - Statistics & Data Analysis 95
5.Descriptive Statistics
Measures of Dispersion/Variability
Interquartile Range
-To compute this we divide the data into 4 equal parts,
each of which contains 25% of the items in the
distribution
-The quartiles are then the highest values in each of
these four parts and the interquartile range is the
difference between the values of the first and third
quartiles (Q3-Q1)
N j1 N j1
where σ Population Variance
2
n j1 (n - 1) j1
where S Sample Variance
2
SD
CV (in %) x 100
Arithmetic Mean (AM)
- Note:
(1) CV is undefined when AM=0
Introduction - Statistics & Data Analysis 100
5.Descriptive Statistics
Units of Descriptive Statistics
---------------------------------------------------------------
Statistics Unit
---------------------------------------------------------------
Mean Same as the original data
Median --do--
Mode --do--
SD --do--
Variance Square of unit measuring original data
CV Per Cent
------------------------------------------------------------------------
Introduction - Statistics & Data Analysis 101
More words about the normal curve:
Chebyshev’s theorem
• According to a theorem devised by the Russian mathematician,
P.L.Chebyshev, no matter what the shape of the distribution, at
least 95% of the values will fall within +2 standard deviations
from the mean of the distribution and at least 99% of the values
will lie within +3 standard deviations from the mean.
• In the case of a symmetrical bell-shaped curve, we can say that
– About 68% of the values in the population will fall within +1 standard
deviation from the mean
– About 95% of the values will lie within +2 standard deviations from the
mean
34% 34%
47.7% 47.7% Value
Probability Theory 103
x
Determine the variance and standard deviation of
the following data set
Question1
0.04,0.06,0.12,0.14,0.14,0.15,0.17,0.17,0.18,
0.19,0.21,0.21,0.22,0.24,0.25
Class Frequency
700-799 4
800-899 7
900-999 8
1000-1099 10
1100-1199 12
1200-1299 17
1300-1399 13
1400-1499 10
1500-1599 9
1600-1699 7
1700-1799 2
1800-1899Introduction - Statistics & Data Analysis 1 105
Questions
Q1: Determine the sample variance and
sample standard deviation of annual charity
payments to a hospital.
Set of payments:
863,903,957,1041,1138,1204,1354,1624,
1698,1745,1802,1883
Frequency 2 14 23 7 4 2