Chapter 1 and 2
Chapter 1 and 2
Chapter 1 and 2
(Stat 2181)
Chapter 1
Introduction
1
Chapter Goals
After completing this chapter, you will be
able to:
• Explain the reasons for studying statistics
• Status(Latin)
“Political
• Statista (Italian) Statistics
state”
• Statistik (German)
• In ancient times, statistics was used for administrative purpose
only.
•People usually consider statistics as a numerical description.
However, statistics can also refer to a discipline which deals with
making sense of data.
•Thus, ‘statistics’ is defined in two senses: plural and singular .
3
Plural vs Singular
• Statistics in the Plural Sense: statistics means a
collection of numerical facts.
Example
• Data on human population of a region.
• Infants’ birth weight at a public hospital in three
consecutive months.
• Number of man-hours lost in industry in specific
years.
• Statistics in the Singular Sense: It refers to the study and
use of theory and methods for the analysis of data arising
from a random process or phenomena.
4
Definition and . . .
6
Statistical
Data Information
tools
7
Classification of Statistics
o Descriptive statistics comprises a set of methods to
describe the characteristics of a set of data.
oInferential statistics proceeds from data
characteristics to making generalizations, estimates,
forecasts, or other judgments based on the data.
o Example: On the last 3 Sundays, a car salesman sold 2,
1, and 0 new cars respectively.
o "The salesman averaged 1 new car sold for the last 3
Sundays."
o "The car salesman never sells more than 2 cars on any
Sunday."
8
Descriptive Statistics
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Summarize data
– e.g., Sample mean =
X i
9
Inferential Statistics
• Estimation
– e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g., Test the claim that the
population mean weight is 65 kg
10
o In generic terms, the field of statistics provides
some of the most fundamental tools and
techniques of the scientific method:
o forming hypotheses,
o designing experiments and observational
studies,
o gathering data,
o summarizing data,
o drawing inferences from data, i.e., estimation
and testing hypotheses.
11
1.2 Stages in Statistical Investigation
There are five basic stages for any statistical investigation.
1.Collection of Data refers to the process of collecting
observations (measurements, survey responses, etc.).
2.Organization of Data: The arrangement of data in a suitable
form. It constitutes editing, classifying and tabulation.
3.Presentation of Data is the process of displaying data in a
precise manner using tables, graphs & diagrams.
4. Analysis of Data is the process of systematically applying
statistical and/or logical techniques to describe, illustrate, and
evaluate data.
5. Interpretation of Data it is related with generalization of some
characteristics from sample to population.
12
1.3: Application, Uses and Limitations of Statistics
•Applications
•Statistics is applied in almost all areas of research such as in
• Industry – control charts and inspection plans.
• Commerce – demand and supply.
• Agriculture – mean comparison (ANOVA).
• Economics – index number, time series and estimation.
• Education – formulation of policies to start new course.
• Planning – data related to production and consumption.
• Medicine – testing efficacy of a new drug.
• Modern Applications, for example, software engineering.
13
1.3: Application, Uses …
•Uses of Statistics
• Statistics can be used to express facts related to different
situations in number.
• Statistics presents messy data in a simple and easily
understandable manner.
• We can make comparisons of facts from data.
• Statistics helps to show existing trends and make future
predictions.
• It is also used to run successful business: a businessman
should estimate demand and supply of a commodity based
on relevant data.
14
Limitations
15
1.4 Types of Variables and Measurement Scales
• Stage of disease
• Ordinal: • Severity of pain
• Level of satisfaction
• Temperature
• Interval
• Exam scores
• Ratio: • Distance
• Length
• Time until death
• Weight 18
Chapter Two
Presentation
19
Chapter Goals
After completing this chapter, you are expected to:
• Explain why we collect data
• Identify sources of data
• Describe the various methods of data collection
• Create and interpret diagrams to describe categorical
variables:
– frequency distribution, bar chart, pie chart, Pictograms
• Create and interpret graphs to describe numerical
variables:
– frequency distribution, histogram, ogive, stem-and-leaf
plot
20
2.1 Methods of Data Collection
• Why we collect data?
– To answer questions,
– To make decisions, and
– To gain a deeper understanding of some
phenomena.
• Example
– Does lowering speed limit reduce the number of
fatal traffic accidents?
– What fractions of students in a college belong to
blood group O?
• Data: A plural noun (the singular form is datum) means a
set of known or given facts.
• Data can be collected using survey or experiment.
21
2.1.1 Sources of Data
• Primary
– Data generated by the immediate user(s) of the data.
– Survey, experimental and observational research are
most popular.
– Tend to require more time and expense than secondary
data.
• Secondary
– Data gathered from another source for a similar or
different purpose.
• Internal sources within the researcher’s organization
• External sources, including governmental, trade,
commercial and internet sources.
22
Sources of Data . . .
Data
Categorical Numerical
(Qualitative) (Quantitative)
Examples: Data on
Marital Status
Cause of death Discrete Continuous
Eye Color
(Defined categories or
groups) Examples: Data on Examples: Data on
25
Types of Data . . .
•
Year Food Education Others Total
2005 5400 1500 5500 12400
2006 5700 2000 6000 13700
2007 5900 1800 6200 13900
2008 6000 2100 5800 13900
26
Methods of Data Collection
Various methods based on the nature of the
investigation and limitations in the availability of
resources.
1. Direct Observation: The investigator observes the
behavior of subjects/individuals in the set of
observations.
Though costly, it is arguably a good method, as it
reduces the chance of incorrectness.
2. Enumeration: selected group of respondents will
be asked a set of questions available in the
schedule by well-trained enumerators.
Could be time consuming if the coverage area is wide
27
Data Collection …
28
Data Collection …
5. Indirect Oral Interview: The researcher contacts third
parties called witnesses capable of supplying the
necessary information.
– Recommended if the information is of complex
nature or the informants are not inclined to respond.
6. Mailed Enquiry Method: Letters with a set of
questions are sent to the respondents and responses are
collected afterwards.
Recommended if the survey covers large area and the
respondents are scattered around.
7. Old Records: A researcher uses data collected by
others & stored in some forms such as in books,
newspapers, almanacs or even unpublished sources.
29
2.2 Methods of Data Presentation
• Data in raw form are usually not easy to use for
decision making.
Class Width = 10
33
Frequency Distribution …
Frequency Distribution
Qualitative Quantitative
Ungrouped Grouped
• Frequency Distribution: A table useful to present data in
classes and shows the number of observations in each class.
• Qualitative FD: a frequency distribution where the data to be
presented are only nominal or ordinal.
• Ungrouped FD: a frequency distribution where each number
in a dataset represents a single class.
• Grouped FD: several values are grouped into one class.
34
Steps in the Construction of Grouped FD
1. Find the difference between the smallest and largest
values in the raw data and denote as R.
2. Set the number of classes (K); usually in between 5 &
20 or use Struges’ rule K=1+3.322(log10 n)
3. Estimate the class width W= R/K; round the estimate to
a convenient value.
4. Determine the LCL for the first class by selecting a
convenient number that is <= the lowest data value.
Then add to it the class width to get the lower class
limit of the second class. Keep adding until the
desired number of classes is reached.
5.1. If the observations are whole numbers (e.g., 12, 23, 78,
etc.), subtract ONE from the lower class limit of the second
class to get the upper class limit of the first class. 35
Steps in the Construction of Grouped FD
5.2. If the observations are fractions (e.g., 1.2,
2.3, 7.8, etc.), subtract 0.1 from the lower class
limit of the second class to get the upper class
limit of the first class.
5.3. If the observations are fractions (e.g., 1.32,
2.35, 7.84, etc.), subtract 0.01 from the lower
class limit of the second class to get the upper
class limit of the first class.
6. Count number of frequencies in each class and put
them with the corresponding classes.
36
Relative and Cumulative FD
• Relative frequency table: a table showing relative
frequencies in each class.
– Relative frequency can be expressed in terms of a a
percentage.
• Cumulative frequency (cf): the sum of the frequencies
succeeding or preceding a class k including the frequency
of the class k.
– The cumulative relative frequency expresses the same
information as a percent by multiplying by 100%/n.
• Less than cf counts the number of observations less than
or equal to upper class boundary of a class.
• More than cf is obtained by adding frequencies of
observations greater than lower class boundary of a class.
37
Example
• Consider the following data
30 40 41 33 70 51 37 10 31 21 60 44 63 72 23 37 65 14
25 28 64 39 17 74 53 34 51 27 43 45 33 16 23 68 47 32
36 19 48 49 67 60 45 54 44 30 15 38 22 46 61 25 29 55
48 49 35 13 37 36
• Prepare i) absolute frequency distribution;
ii) relative frequency distribution;
iii) less than and more than cumulative
frequency distributions.
38
Example …
R= 74 – 10 = 64 , n = 60
Using Sturges’ Rule:
K=1+3.322(log10 60) = K=1+3.322( 1.778151 ) = 6.9070 7
W = 64/ 7 = 9.14 10
39
Example …
Bar Chart
41
Simple Bar Chart
42
Simple Bar Chart . . .
Year No. of students
2000 3005
2001 3567
2002 3800
2003 4300
2004 3650
2005 5000
43
Two Way Bar Chart
• To represent data having both negative and positive
values.
• Example
Year 1990 1991 1992 1993
Net Migration 50,000 -5,000 20,000 40,000
44
Multiple Bar Chart
• To make comparison between two or more variables.
• Example: A number of accounting firms were audited, and
classified according to size status (I [large], II [medium] and
III [small]) and the degree to which income-changing
accounting practices were used in preparing clients' tax
returns.
Degree of Change
Size No changes Some changes Total
Large 23 36 59
Medium 52 61 113
Small 22 21 43
Total 97 118 215
45
Multiple Bar Chart
46
Subdivided Bar Chart
• To show and compare the breakup of one variable into
several components.
Year 2000 2001 2002 2003 2004
No. of females 800 824 856 768 900
No. of males 1389 2450 1245 1655 1445
Total 2189 3274 2101 2423 2345
47
Broken Chart
• To represent data having broad variations in value.
• One observation may be extremely larger as compared to the
others.
• If we use a scale proportional to the value (frequency), then it
will be almost impossible to see the bars of small values.
• Example
• Represent the data given below using a suitable chart.
Year 1990 1991 1992 1993 1994 1995
Value 899 543 787 35323 121 234
48
Broken Bars . . .
• Simple bar: • Broken bar:
49
Pie Diagram
• A circular diagram where a circle is divided into sectors with
areas proportional to the corresponding components.
• Pie diagrams are useful for displaying the relative frequency
distribution of a categorical variable.
University Addis Ababa Gondar Jimma Total
No. of students 8000 6000 6000 20000 Addis Ababa =
[8000/20000]* 360
= 144
L Gondar =
e [6000/20000]* 360
g = 108
e Jimma =
n [6000/20000]* 360
= 108
d 50
Pictograms
• Pictograms are useful to present data using pictures.
• Example: Represent the following data using a
pictogram.
Department Accounting Statistics Computer Science Chemistry
Number of 150 200 250 200
Students
• Accounting:
• Computer Science:
• Chemistry:
• Statistics: Key:
= 50 students
51
Steam and Leaf Plot
• A stem and leaf plot is a special table where each
data value is split into a stem (the first digit or
digits) and a leaf (usually the last digit).
Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Stem Leaf
21 is shown as 2 1
38 is shown as 3 8
52
Steam and . . .
• Give a stem-and-leaf plot for the following data.
• 3.584, 3.615, 3.586, 3.712, 3.823, 3.616, 3.580, 3.888,
3.617, 3.584, 3.882, 3.912, 3.91, 3.712, 3.580, 3.917
• Stem Leaf
• 3.58 0 0 4 4 6
• 3.61 5 6 7
• 3.71 2 2
• 3.82 3
• 3.88 2 8
• 3.91 0 2 7
• 3.58|4 represents 3.584
53
2.2.4 Graphical Presentation of Data
54
Histogram Example
Daily High
Temperature Frequency
Histogram : Daily High Tem perature
10 but less than 20 3
20 but less than 30 6 7 6
30 but less than 40 5
40 but less than 50 4
6 5
50 but less than 60 2 5 4
Frequency
4 3
3 2
2
1 0 0
(No gaps 0
between 0 0 1010 2020 30 30 40 40 50 50 60 60 70
bars) Temperature in Degrees 55
Frequency Polygon
• This is a line graph of class frequencies plotted against
class marks.
• End points must be joined to the x-axis (y = 0) at mid
points of empty classes: one before the first class and the
other after the last class.
• They serve the same purpose as histograms, but are
especially helpful for comparing sets of data.
• Example
• 1. Represent the following data using a frequency polygon.
Class 14.5-24.5 24.5-34.5 34.5-44.5 44.5-54.5 54.5-64.5
Frequency 3 4 8 6 7
56
Frequency Polygon . . .
57
Frequency Polygon . . .
• 2. The following frequency distribution refer to test scores
for 28 students in an examination. Plot frequency polygons
for the two datasets.
Score 0-5 5-10 10- 15 15-20 20-25
Test1 3 4 8 6 7
Test2 1 2 5 12 8
58
Ogive
o The ogive is a frequency polygon (line plot) of
cumulative frequency or the relative cumulative frequency.
oExample
Price in Birr Frequency Less than More than
Frequency Frequency
10-20 2 2 26
20-30 3 5 24
30-40 6 11 21
40-50 8 19 15
50-60 5 24 7
60-70 2 26 2
59
Ogive …
60