Introduction To Statistics - Doc1
Introduction To Statistics - Doc1
Introduction To Statistics - Doc1
The module has ten chapters: the first two chapters have been designed to deal with general Introductions,
basically to define some basic terms, and Methods of Data Representations.
The next two chapters are about Descriptive Statistics, dealing with Measures of Central Tendency
(collectively known as averages), and Measures of Variation or Dispersion.
Chapter 5 and 6, Probability and Probability Distributions, shall deal with Elementary Probability
Theory and two common Discrete Probability Distributions: Binomial and Poisson; and some Continuous
Probability Densities: Normal, Chi-Square t and F distributions, which play indispensable roles in statistical
theory and inferences.
Chapter 7, 8 and 9 are meant to discuss Sampling Distributions, Estimation and Hypothesis Testing and
Two Sample Inferences on one mean, two means and one proportion and two proportions. The last
chapter deals with Simple Linear Regression and Correlation.
Objectives
CHAPTER 1
INTRODUCTION
CONTENTS
INTRODUCTION
What is Statistics? What is the need to study statistics? How is it employed? These are only some of the
basic questions one has to raise with the field of statistics. This chapter will provide only partial answers to
these questions.
The chapter has six sub-sections that define some important terms starting with the word “Statistics” it self,
treated as singular and plural, along with its classifications, applications, uses and limitations; stages in any
statistical study; scales of measurement and a highlight to methods of data collection.
Objectives:
Statistics has been defined in to two ways, some writers define it as ‘statistical data’ .i.e numerical
statement of facts, while others define it as ‘statistical methods”; that is, complete body of the
principles and techniques used in collecting and analyzing such data.
Classifications of Statistics
Descriptive statistics:-refers to the procedures used to collect, organize and summarized
masses of data. The frequency distribution, measurement of central tendency such as mean
and median, measures of dispersion such as range and standard deviation, belong to this
category of statistics.
Inferential statistics: - includes the methods used to find out some thing about a
population based on a sample. In this form of statistical analysis, descriptive statistics is
linked with probability theory so that an investigator can generalize the results of a study.
2. Organization of data
This is summarization of the data in some meaningful way, like in the form of a table.
4. Analysis of data
This is the process of extracting relevant information from the summarized data, mainly through
use of elementary mathematical operation.
5. Interpretation of data
The final step is drawing conclusion from the data collected. A valid conclusion must be
drawn on the basis of analysis. A high degree of skill and experience is necessary for the
interpretation.
Uses of statistics
The main function of statistics is to enlarge our knowledge of complex phenomena. The following
are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison of different sets of data.
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variables.
8. Forecasting future events.
Limitations of statistics
Statistics with all its wide application in every sphere of human activity has its own limitations.
Some of them are given below.
Since statistics is basically a science and deals with a set of numerical data, it is applicable to the
study of only these subjects of enquiry, which can be expressed in terms of quantitative
measurements. As a matter of fact, qualitative phenomenon like honesty, poverty, beauty,
intelligence, etc, cannot be expressed numerically and any statistical analysis cannot be directly
applied on these qualitative phenomena. Nevertheless, statistical techniques may be applied
indirectly by first reducing the qualitative expressions to accurate quantitative terms. For
example, the intelligence of a group of students can be studied on the basis of their marks in a
particular examination.
Statistics does not give any specific importance to individual items; in fact, it deals with an
aggregate of objects. Individual items, when they are taken individually, do not constitute
statistical data and do not serve any purpose for any statistical enquiry.
It is well known that mathematical and physical sciences are exact. But statistical laws are not
exact but they are only approximations. Statistical conclusions are not universally true, they are
true only on the average.
Statistics must be used only by experts; otherwise, statistical methods are the most dangerous
tools on the hands of the inexpert. The use of statistical tools by the inexperienced and untraced
persons might lead to wrong conclusions. Statistics can be easily misused by quoting wrong
figures of data.
Statistical methods do not provide complete solution to the problems because problems are to be
studied taking the background of the countries culture, philosophy or religion into consideration.
Thus the statistical study should be supplemented by other evidences.
Data classification can be defined as a method of grouping data according to their similarities
and uses to study the characteristics of the entire population on the basis of their classes.
The classification of data is generally done on geographical, chronological, qualitative or
qualitative basis.
i) In geographical classification, data are arranged according to places, areas or regions.
ii) In chronological classification, data are arranged according to their time references.
iii) In qualitative classification, data are arranged according to attributes like sex, marital
status, educational standard, etc.
iv) In quantitative classification, data are arranged according to certain characteristics
that has been measured or counted.
Example 1.1
Data collected based on sex, marital status, educational standard, and so on give rise to qualitative
data.
b) Quantitative data
In quantitative classification, data are arranged according to certain characteristic that has been
counted or measured.
Quantitative variables are again divided in two groups: - discrete and continuous.
Discrete data:-are described by integers only and their values are obtained by counting, the
possible values for such variables are 0, 1, 2… that means they assume only counting numbers.
Example 1.2
Number of students in Dire Dawa University, number of private cars in Dire Dawa,
number of books are some of the examples that produces discrete data.
Continuous data:-are those quantitative figures which can take any numbers, including fractions.
Their values are obtained by measurement.
Example 1.3
Weight of a person in kg, height, temperature and so on give rise to continuous data.
Definition: A characteristic which shows variability or takes on different values is called a variable.
Quantitative variable – is the one which leads to quantitative data. Hence we can talk about a discrete
variable (yielding discrete data) and a continuous variable (yielding continuous data).
Qualitative variable- similarly, leads to qualitative data.
Proper knowledge about the nature and type of data to be dealt with is essential in order to
specify and apply the proper statistical method for their analysis and inferences. Measurement
scale refers to the property of value assigned to the data based on the properties of order,
distance and fixed zero.
The scales of measurement also show what mathematical operations and what statistical
analyses are permissible to be done on the values of the variable.
Accordingly, there are four scales of measurement: nominal, ordinal, interval and ratio scales.
Example 1.4
b) Ordinal scale
This refers to the variables whose values can be ordered or ranked but the difference between data
values either can’t be determined or is meaningless. Comparison is restricted. Ranking and
counting are the only mathematical operations to be done on the values given to these variables.
Example 1.5
ii) Beauty classified as beautiful, more beautiful and most beautiful is ordinal.
c) Interval scale
These variables have the properties of the ordinal scale plus the difference between two values
is constant. There is no true zero origin; that is, zero doesn’t show absence in this case.
Example 1.6
Temperature of a given area may be 0 oc. But this doesn’t mean that there is no heat at all; It
simply indicates that it is too cold.
d) Ratio scale
Ratio scale variables have the properties of the interval scale but in this case there is true zero
origin. That is, zero shows absence of something in this case.
All mathematical operations like division, multiplication, logarithms, powers and others are
allowed to be operated on the values of such variables.
Example1.7
Income of a person, amount of yield from a plot of land, expenditure and consumption amount.
In all of these cases, if the variables assume zero values, it is the indication of absence of the
values. That means, for example, if yield is zero, it shows no yield at all.
Depending on the source of data, there are two methods of data collection:
a) Primary method
Data measured or collected by the investigator or the user directly from the source.
• Two activities are involved: planning and measuring.
a) Planning:
Identify source and elements of the data.
Decide whether to consider sample or census.
10
b) Secondary data
These are data gathered or compiled from published and unpublished sources or files.
When our source is secondary data, we need to check:
The type and objectives of the situations.
The purpose for which the data are collected and compatibility with the present
problem.
The appropriateness of the nature and classification of the data to our problem.
There are no biases and misreporting in the published data.
Note that data which are primary for one may be secondary for the other.
11
a) It saves money
It is cheaper to assess a sample of size n than a population of size N (n<N).
b) It saves labor
Small number of staffs (enumerators, supervisor, data editors) are required in sample
survey than in census.
c) It saves time
Since the size is small, it reduces data collection and processing time.
d) It minimizes disturbance
12
If the process of data collection affects the society, sampling is the only alternative
for data collection.
CHAPTER SUMMARY
Statistics is the science that deals with the method of data collection, organization,
presenting, analysis and interpretation of the results of the analysis.
There are two classifications of statistics: descriptive and inferential.
Descriptive statistics includes those procedures used to summarize complex data. These
include graphical methods, measure of central tendency and measures of dispersion
Inferential statistics deals with taking samples and reaching conclusions about a
population, which include estimation and test of hypothesis.
Variables are classified in to quantitative and qualitative. Quantitative variables are those
variables whose values can be expressed numerically. The values of the qualitative
variables, how ever, can not be expressed numerically.
Planning and measurement are the two activities involved while working with primary
data.
The two main kinds of data collection are Census survey and Sample survey. Census
represents complete enumeration, where as sample survey means taking part of the
population so as to infer about the general population from the results of the sample.
13
Exercises on Chapter 1
14
15
CHAPTER 2
CONTENTS
2.1 FREQUENCY DISTRIBUTIONS 16
INTRODUCTION
In this chapter we will deal with the classification and presentation of data by using
frequency distribution and different types of graphs. Having collected and edited the
data, the next important step is to organize it. That is, to present it in a readily
comprehensible condensed form that aids to draw inferences from it. It is also necessary
that the like be separated from the unlike ones.
OBJECTIVES
16
2.1 FREQUENCY DISTRIBUTIONS
The presentation of data is broadly classified in to the following two categories:
• Tabular presentation
• Diagrammatic and Graphic presentation.
The process of arranging data in to classes or categories according to similarities technically
is called classification.
Classification is a preliminary and it prepares the ground for proper presentation of data.
Raw data: recorded information in its originally collected form, whether it is count or
measurement, is referred to as raw data.
Frequency: is the number of times a value is repeated for the variable in the corresponding
data operations.
Frequency array:- is an array where the individual items or values of a variable are given
along with the corresponding frequencies.
Example 2.1
A social worker collected the following data on marital status for 25 persons. (M=married,
S=single, W=widowed, D=divorced). Prepare a frequency distribution.
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital
status (M, S, D, and W). These types will be used as class for the distribution. We follow the
following procedures to construct such a frequency distribution.
Step 1: Prepare a table as shown below.
Percentages are not necessarily part of frequency distribution but they can be added since
they are used in certain types of diagrammatic representations such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps, one can construct the following frequency distribution.
M //// 5 20
S //// // 7 28
D //// // 7 28
W //// / 6 24
In such frequency distributions, the data are classified according to numerical size. This is used to
summarize interval and ratio data. Numerical frequency distributions may be discrete (ungrouped ) or
continuous (grouped), depending on whether the variable is discrete or continuous.
19
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown below.
Step 3: Tally the data.
Step 4: Count the frequency and record in the last column.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
20
21
3. The classes must be mutually exclusive. This means that no data value can fall into
two different classes.
4. The classes must be all inclusive or exhaustive. This means that all data values must
be included.
6. The classes must be equal in width. The exception here is the first or last class. It is
possible to have a "below ..." or "... and above" class. This is often used with ages.
4. Find the class width dividing the range by the number of classes and rounding up
. There are two things to watch out here. You must round up, not off.
Normally 3.2 would be rounded to 3, but in rounding up, it becomes 4. If the range
divided by the number of classes gives an integer value (no remainder), then you can
either add one to the number of classes or add one to the class width. Sometimes
you're locked into a certain number of classes because of the instructions.
5. Pick a suitable starting point less than or equal to the minimum value. The starting
point is called the lower limit of the first class. Continue to add the class width to this
lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the
upper limits.
7. Find the boundaries by subtracting 0.5U units from the lower limits and adding 0.5U
units on the upper limits. The boundaries are also half-way between the upper limit
of one class and the lower limit of the next class.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it
may not be necessary to find out the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example 2.3
23
Step 5: Select the starting point, let it be the minimum observation. Then,
11, 17, 23, 29, 35, 41 are the upper class limits.
So, combining steps 5 and 6, one can construct the following classes:
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Then, continue adding W on both boundaries to obtain the rest boundaries. By doing, so one
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 9: Write the numeric values for the tallies in the frequency column.
Class Class Class Tally Freq. Cf (less Cf (more rf. rcf (less
limit boundary Mark than type) than type) than type
25
* 100.
Example 2.4
Draw a suitable diagram to represent the following population in a town.
15%
25%
Men
Women
Girls
Boys
40% 20%
Bar Charts
- A set of bars (thick lines or narrow rectangles) representing some magnitude over
time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being:
Simple bar chart
Deviation or two way bar chart
Broken bar chart
Component or sub divided bar chart.
28
-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
Example 2.5
The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.
30
24 24
25
20
15 12
10
0
A B C
29
100
80
sales in $
product C
60
product B
40
product A
20
0
1957 1958 1959
Years of production
Example 2.7
Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
Solution:
30
60
50
Sales in $
40 product A
30 product B
20 product C
10
0
1957 1958 1959
Years of production
Activity 2.1
Draw a diagram presenting by product in 1958, assuming that there was a product D whose
sales in 1958 was $ 100000.
Example 2.8
The following table summarizes the Biostatistics mid exam score of 38 students out of 35
marks.
If we want to draw Histogram for this data it would look like the following:
32
Frequency Polygon
Frequency Polygon depicts a frequency distribution for discrete or continuous numeric data.
Frequency polygons are graphical device for understanding the shapes of distributions.
A Histogram can easily be changed to Frequency Polygon by joining the mid points of the
top of the adjacent rectangles of the Histogram with a line. It is also possible to draw
Frequency Polygon without drawing Histogram.
Example 2.9
The following frequency distribution represents the ages of 60 patients at Gambella hospital.
Represent the data by a frequency polygon.
33
Finally we have to plot the midpoints (on the X axis) with respective to frequency of each
class (on the Y axis) and connect adjacent plots with a straight line.
Note that two artificial class marks at both ends with frequencies of zero have been
added to “tie down” the graph on the X-Axis.
This is a graph showing the cumulative frequency (the less than or more than type) plotted
against upper or lower class boundaries, respectively. That is, class boundaries are plotted
along the horizontal axis and the corresponding cumulative frequencies are plotted along the
vertical axis. The points are then joined by a free hand curve.
34
1. Less than ogive :- is a line graph obtained from less than cumulative frequency
plotted against upper boundaries of their respective class intervals
2. More than Ogive :- is a line graph obtained from more than cumulative frequency
plotted against the lower boundaries of their respective class intervals
35
Class Class
Limit F boundary LCB UCB
3-7 3 2.5-7.5 2.5 7.5
8-12 4 7.5-12.5 7.5 12.5
13-17 6 12.5-17.5 12.5 17.5
18-22 13 17.5-22.5 17.5 22.5
23-27 17 22.5-27.5 22.5 27.5
28-32 6 27.5-32.5 27.5 32.5
33-37 1 32.5-37.5 32.5 37.5
36
0
7.5 12.5 17.5 27.5 32.5 37.5
22.
5
Upper class boundary
50
40
30
frequency 20
cumulative
More than 10
0
32.5 37.5 27.5 22.5 17.5 12.5 7.5
Lower class boundary
37
Class boundaries
38
There are two types of frequency distribution: grouped and ungrouped frequency
distribution.
Class mark, class boundary, cumulative frequency and relative frequency are some
Histogram, frequency polygon, and ogive are usually drawn for quantitative data
Pie chart is a circular chart that is used to display the percentage of the total number
Bar chars are usually used for count data. The different types of bar charts include
simple bar chart ,deviation bar chart, component bar chart and multiple bar chart.
We have to know the types of graphs and apply them in their appropriate places.
39
32 38 30 24 24 37 39 34 35 31
23 35 29 34 21 35 35 24 23 26
30 38 25 37 25 39 25 30 27 32
33 30 29 32 33 35 29 33 19 39
22 33 31 20 29 27 31 22 23 36
40
5. A company has 25 vehicles. The table below shows the summary of yearly fuel
consumption of the vehicles.
Fuel consumption
1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6 and above
in 000’s of liters
Number of vehicles 2 5 6 7 4 1
i) Give a) The lower class limit of the 3rd class. b) Class boundaries of the 2nd class.
c) Class midpoint of the 4th class. d) Width of the 1st class.
e) How many of the vehicles consumed: i) at least 1950 liters but not more than or equal
to 2950 liters? ii) Less than 3950 liters? iii) At most 4900 liters?
g) What percent of the vehicles consumed: i) At least 2950 liters?
ii) Less than 5950 liters? iii) More than 1950 liters?
6. The table below shows the weight distribution of 25 students in basket ball team.
Below 50.5 3
Below 55.5 10
Below 60.5 16
Below 65.5 20
Below 70.5 22
Below 75.5 25
41
iii) How many of the students weigh more than 65.5 kgs? Between 55.5 - 70.5 kgs?
7) The following table shows the type of cars manufactured by a certain company during
1972-1975.
Years
Cars 1972 1973 1974 1975
Toyota 400 300 380 450
Nissan 260 340 350 390
Isuzu 330 310 445 470
Construct
8) A recent study showed that a typical Ethiopian car owner incurs the following expenses,
on the average, when he leases a car for 3 years. Draw a pie chart to portray this data.
Expenditure item Amount ($)
Lease amount 4,500
Gasoline 1,350
Insurance 1,800
Maintenance 1,350
42
TENDENCY 39
INTRODUCTION
In the previous chapter, you have been introduced to the classification and presentation of
data using graphical methods. Graphical methods are important for data analysis, how ever,
they are inappropriate for statistical inference, since it is difficult to derive the similarity of a
sample frequency and the corresponding population histogram. The two most common
numerical descriptive measures are measure of central tendency and measures of variability.
That is, we seek to describe the center of the distribution and also how the measurements
vary about the center of the distribution. So, this chapter introduces you to the methods used
to find the average or representative values in a given data set
Objectives:
Compute and interpret the arithmetic mean, harmonic mean, geometric mean,
median, mode, Quartiles, Deciles, Percentiles and soon
43
Measures of central tendency are measures of the location of the middle or the center of a
distribution. The definition of "middle" or "center" is purposely left somewhat vague so that
the term "central tendency" can refer to a wide variety of measures.
-The tendency statistical data to get concentrated at certain value is called central tendency.
And various methods that determine the actual value at which the data tend to concentrate
are called measure of central tendency. One of the most important objectives of statistical
analysis is to get one single value that describes the characteristics of the entire data. Such a
value is called the central value or average.
-When we want to make comparison between groups of numbers it is good to have a single
value that is considered to be a good representative of each group. This single value is called
the average of the group.
-Averages are also called measures of central tendency.
-An average which is representative is called typical average and an average which is not
representative and has only a theoretical value is called a descriptive average.
Objectives:
To comprehend the data easily i.e. to condensed the mass of data in to one single
value.
To facilitate comparison.
To make further statistical analysis.
44
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the
numbers."
Example 3.1
Suppose that the following were scores made on the first homework assignment for five
students in the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where N=5, the
summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the sequence of
summation. If the expression were written with "i=2", the summation would start with the
second number in the set.
The "N" in the upper part of the summation notation tells where to end the sequence of
summation. If there were only three scores then the summation and example would be:
45
PROPERTIES OF SUMMATION
4.
5.
Example 3.2
46
a) b) c) d) e)
f) g) h) g)
Note: There is no measure satisfied all the above condition, we choose the one that satisfies
most of the properties!
47
There are several different measures of central tendency; each having its advantage and
disadvantage, including:
• The Mean
• The Median
• The Mode
The choice of these averages depends up on which one best fits the property under
discussion.
Mean: There are three types of mean which are suitable for a particular type of data. They
are:
When the data are arranged or given in the form of frequency distribution i.e. there
are k variate values such that a value has a frequency ( i=1,2,---,k) ,then the
Arithmetic mean will be
49
=the class mark of the ith class and fi = the frequency of the ith class
Example 3.3
50
2) The distribution of age at first marriage of 130 males was as given below
Age in years(X):18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29.
2) Indirect Method
1) If we subtract an arbitrary constant from each observation, the mean is also reduced
by the constant value
51
The origin data are transformed using some assuming mean (working mean) denoted
by A and let xi denotes the original value, then
Show!
Show!
Activity 3.3
1) Suppose that the deviation of the observation from the assumed mean of 7 are
52
2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.
4. If a wrong figure has been used when calculating the mean the correct mean can be
obtained with out repeating the whole process using:
53
Example 3.4
In 2002/03, the average salaries of elementary school teachers in three cities were Birr 24,
000, 20,000, and 30,000. If there were 600,400 & 800 elementary school teachers, find the
weighted average salary of all the elementary school teachers in the three cities.
Solution.
Activity 3.4:
a) A student obtained the following percentage in an examination: English 60, Biology 75,
Mathematics 63, Physics 59, and chemistry 55. Find the student’s weighted arithmetic
mean if weights 1, 2, 1, 3, 3, respectively, are allotted to the subjects.
54
Merits:
• It is rigidly defined.
• It is based on all observations.
• It is suitable for further mathematical treatment.
• It is a stable average, i.e. it is not affected by fluctuations of sampling to some extent.
• It is easy to calculate and simple to understand.
Demerits:
• It is affected by extreme observations.
• It can not be used in the case of open end classes.
• It can not be determined by the method of inspection.
• It can not be used when dealing with qualitative characteristics, such as intelligence,
honesty, beauty.
• It can be a number which does not exist in a series of data.
• Some times it leads to wrong conclusion if the details of the data from which it is
obtained are not available.
• It gives high weight to high extreme values and less weight to low extreme values.
55
Where N=
Example 3.5
Calculate the geometric mean for the following.
2, 3, 4, 6
In case of grouped data, mid-values of the class intervals are considered as Xi.
For logarithmic values of X’s, it becomes the average of logX i values and the formula for the
Geometric mean is:
In case of frequency distribution where each of Xi occurs fi times (i=1,2,. . .,k), we have:
56
The population of a country in 1980 was 2 million and in 1990 it was 22 million. What was
the average annual increase during this period?
Note: The geometric mean is less affected by extreme values than the arithmetic mean and is
useful as a measure of central tendency for some positively skewed distributions.
The H.M is the inverse of the arithmetic mean of the reciprocals of the observations of a
set. It is a suitable measure of central tendency when the data pertains to speed, rates, and
time.
Let X1, X2,. . ., XN be N variate values in a set; then the harmonic mean is given by:
, for i=1, 2, …, k.
Example 3.7
Example 3.8
57
Average Speed
If the data are arranged in the for of a frequency distribution in which an observation X i has
frequency fi (i=1, 2, . . .,k), the harmonic mean is given by,
It fulfils almost all properties of a good measure of central tendency, except when any
observation is zero, it can not be calculated. Its main advantage is that it gives more
weightage to small values and less weightage to large values.
Given two values x and x , there is a relation ship that exist between HM,GM and AM.
58
59
Example 3.9
a) Find the mode of 5, 3, 5, 8, 9
Mode =5
b) Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal Data: 8 and 9
c) Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
The mode of a set of numbers X1, X2, …, Xn is usually denoted by .
If data are given in the shape of continuous frequency distribution, the mode is defined as:
60
Example 3.10
Activity 3.5
The following is the distribution of the size of certain farms selected at random from a
district. Calculate the mode of the distribution.
61
62
In a distribution, median is the value of the variable which divides the data in to two equal
halves.
In an ordered series of data, the median is an observation lying exactly in the middle of the
series. It is the middle most value in the sense that the number of values less than the median
is equal to the number of values greater than it.
Let X1, X2, …, Xn be the observations, then the numbers arranged in ascending order will be
X[1], X[2], …X[n], where X[i] is ith smallest value.
Here, we find that X[1]< X[2]< …<X[n]
Median is denoted by .
63
Example 3.11
Find the median of the following data.
a) 3,8,4,7,7,5,6,8,7,4,6,8,9,7,6
3,4,4,5,6,6,7,7,7,7,8,8,8,9
Median = 7
b) 3,4,4,5,6,6,6,7,7,7,7,8,8,8
Median=
Activity 3.6
a) Actual waiting time for the first job on the selected sample of nine people having different
field of specializations was given below.
Waiting time (in month):11.6, 11.3, 10.7, 18.0, 3.3, 9.2, 8.3, 3.8, 6.8
Calculate the median of the waiting time?
b) The export of agricultural products in million dollars from a country during eight quarters
in 1974 and 1975 was, 29.7, 16.6, 2.3, 14.1, 36.6, 18.7, 3.5, 21.3.
Find the median of the given set of values?
64
Example 3.12
Solution:
65
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Quartiles
66
The first quartile Q1 is the value that is the first quarter of the given ordered data.
The second quartile Q2 is the value that divides the given ordered data in to two
equal parts
The third quartile Q3 is the value that is the third quarter of the given ordered data
Quartiles are the measurements that divide the series in to 4 equal parts. The median is
the 2nd quartile. The first quartile (Q1) is the value of the item, which divides the lower
half of the distribution into two equal parts. The third quartile (Q3) is the value or the
item that divided the upper half of the distribution in to two equal parts. That is it is the
For raw (ungrouped) data, first arrange the n observations in increasing order of
magnitude. Then the ith quartile is given by
In dividing i(n+1) by 4, there may be a remainder r ,let q be the quotient and r be the
remainder of the division then
Example 3.13
Find the first, the second and third quartile for the following data. (exam result 10%) of
15 students 4,8,9,7,6,6,6,7,7,8,8,8,9,9,
67
Lqi = The lower class boundary of the class in which the ith quartile is located
68
N = Sample size
Example 3.15
69
Example 3.16
Find the 1st and 3rd quartiles for the following data
DECILES
The values that divide the data set in to ten equal parts are called deciles. They are denoted
by D1, D2,…, D9 respectively
70
In dividing i(n+1) by 10, there may be a remainder r ,let q be the quotient and r be the
remainder of the division then
PERCENTILES
The values that divide our data set in to hundred equal parts are called percentiles. They are
denoted by p1,p2,…, p99
For raw (ungrouped) data, first arrange then observation in increasing order of magnitude
71
The lower class boundary of the class in which the ith percentile is located
Is the cumulative frequency of the class immediately preceding the class containing pi
Example 3.17
Calculate i) 7th decile, and ii) 90 th percentile from the following table.
The number 345.8 is contained in the minimum cumulative frequency 351, hence the class
190-200 is the 7th decile class
Then 199.5.
ii) .
The number 444.60 is contained in the minimum cu.fr.465 hence, the 90 th percentile class is
220-230 . So that, we have:
73
Measures of central tendency are those statistical methods used to find the
values used to represent sets of values in a data set
Arithmetic mean is the sum of all the values in data set divided by the total number of
observations
Median is the middle value after the observation are arranged in the order of their
magnitude
Mode is the value that occurs with the highest frequency in a dataset
Harmonic mean is the reciprocal of the numbers
Geometric mean is the nth root of the product of n numbers
Quartiles are the values that divide a given data set in to four equal parts
Deciles are the values that divide a given data set in to ten equal parts
Percentiles are the values that divide a given data set in to hundred equal parts
Different measures of control Tendency have different properties and applications we
are there fore, required to apply them in their appropriate places
74
1. The arithmetic mean of two numbers is 13 and their geometric mean is 12.
2. The following table shows the distribution of marks of 100 students in a certain exam
out of 50. The median and mode are given to be 25 and 24 respectively. Calculate the
missing frequencies and then arithmetic mean of the data..
Number of students 14 ? 27 ? 15
3. The mean weight of 150 students in a certain class is 60 kg. The mean weight of boys
is 70 kg and that of girls is 55 kg. Find the number of boys and girls in the class.
4. The ratios of teachers to students in four colleges are 1:8, 2: 15, 1:10 and 2:21. Find
the average ratio of teachers to students.
5. An entrance exam for a job consists of three subjects, English, Mathematics and Office
management having 20%, 30% and 50% respectively. Find the average score of a
candidate who got 70%, 60% and 50%, respectively in the three exams. Find the
average ratio of teachers to students.
75
11. In a surveying class there are 10 freshman, 6 second year and 12 third year students.
If the freshman averaged 70 in an examination, the second years averaged 75 and the
third years averaged 85. Find the mean grade for the entire class.
12. The profit of a company increased by 25% during the year 1992, increased by 40%
during the year 1993, decreased by 20% in the year 1994 and increased by 10%
during the year 1995. Find the average growth in the profit level over the four year
periods.
13. In a 400- meter athletic competition a participant covers the distance as given below.
Find the average speed.
First 80 meters 10
Last 80 meters 10
76
16. The marks secured out of 100 by a group of students in a school are
given below.
77
CONTENTS
4.1 INTRODUCTION AND OBJECTIVES OF MEASURING
VARIATION 73
4.2 ABSOLUTE AND RELATIVE MEASURES 74
4.3 TYPES OF MEASURES OF VARIATION 74
4.4 MOMENTS SKEWNESS AND KURTOSIS 86
INTRODUCTION
In our society, people usually elect their representative that conveys the interest of most of
them. But sometimes the representative may convey the interests that deviate from the
interests of some of the members. That is, the question is “how well their representative
represents them?” Similarly, in statistics, we may seek to know how well an average
represents the whole set of data.
Objectives
78
Measure of central tendency alone does not adequately describe a set of observation unless
all observations are the same. So we need some additional information like
1) The extent to which the items in a particular distribution are scatters around the central
tendency i.e. measure of dispersion.
2) The direction of scatteredness whether more items are attached towards higher or lower
values i.e. measure of skewness.
3) The extent to which the distribution is more peaked or more flat toped than the normal
distribution i.e. measure of kurtosis.
Measure of dispersion
The scatter or spread of items of a distribution is known as dispersion or variation. In other
words the degree to which numerical data tend to spread about an average value is called
dispersion or variation of the data.
Measures of dispersions are statistical measures which provide ways of measuring the extent
in which data are dispersed or spread out.
79
80
The measures of dispersion which are expressed in terms of the original unit of a series are
termed as absolute measures. Such measures are not suitable for comparing the variability of
two distributions which are expressed in different units of measurement and different
average size.
Relative measures of dispersions are a ratio or percentage of a measure of absolute
dispersion to an appropriate measure of central tendency and are thus pure numbers
independent of the units of measurement. For comparing the variability of two distributions
(even if they are measured in the same unit), we compute the relative measure of dispersion
instead of absolute measures of dispersion.
Various measures of dispersions are in use. The most commonly used measures of
dispersions are:
The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the
range of scores. Because the range is greatly affected by extreme scores, it may give a distorted
81
For grouped data: where is the last upper class limit and
is the first lower class limit.
Merits:
• It is rigidly defined.
• It is easy to calculate and simple to understand.
Demerits:
• It is not based on all observation.
• It is highly affected by extreme observations.
• It is affected by fluctuation in sampling.
• It is not liable to further algebraic treatment.
• It can not be computed in the case of open end distribution.
• It is very sensitive to the size of the sample.
82
Activity 4.1
1) Find the R and RR and then identify which data is more dispersed?
a) For the month income of 10 workers X i: 347, 420, 500,600,696,710, 835, 850, and
900.
b) For the following age distribution.
Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
2. If the range and relative range of a series are 4 and 0.25 respectively. Then what is the value
of:
a) Smallest observation
b) Largest observation
IQR is the difference between the upper quartile (Q 3) and lower quartile (Q1) of a given
group. It is a measure of dispersion when the data contains extreme values. It is also a good
measure of dispersion for the distribution having open ended class
Example 4.1
83
If xi/fi, i=1, 2, …, n is the frequency distribution then mean deviation from the
mean is given by
Where represents modulus or the absolute value of the deviation , where the
negative sign is ignored.
Mean deviation from median
Since mean deviation is based on all the observations it is a better measure of dispersion than
range or quartile deviation
Example 4.2
Calculate i) Quartile deviation (Q.D), and ii) mean deviation (M.D) from mean and from
median, for the following data:
84
Freq. 6 5 8 15 7 6 3
Solution:
652 -8 659.2
i) Here N=50 ,
Mean, marks
Median =
85
Population Variance
Population variance=
Population variance=
Sample Variance
One would expect the sample variance to simply be the population variance with the
population mean replaced by the sample mean. However, one of the major uses of statistics
is to estimate the corresponding parameter. This formula has the problem that the estimated
86
Sample variance=
Sample variance=
1) The variance has mostly removed the lacunae which are present in measures of
dispersion given before it.
2) The main demerit of variance is that its unit is square of the unit of measurement of
variate values. Generally this value is large and makes it difficult to decide about the
magnitude variation.
87
are known we can correct this. We can use the following formula:
Let
then the combined variance (is called pooled variance) is given by:
b) is also .
c) is
Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that
the units were also squared. To get the units back the same as the original data values, the
square root must be taken.
88
b) is also .
89
90
Xi 5 10 12 17 Total
(Xi- )2 36 1 1 36 74
Activity 4.2
i) The data is given in the form of frequency distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
ii) The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a) If 10 is added to each of the numbers in the set, then what will be the variance and
standard deviation of the new set?
b) If each of the numbers in the set are multiplied by -5, then what will be the
variance and standard deviation of the new set?
Hence in situations where either the two series have different units of measurements, or their
means differ sufficiently in size, the coefficient of variation should be used as a measure of
dispersion.
91
Properties of C.V
1) It is one of the most widely used measures of dispersion because of its virtues.
2) Smaller the value of C.V, more consistent is the data and vice versa.
3) For fixed experiments, C.V is generally reported. If C.V is low it indicates more
reliability of experimental findings.
Example 4.4
Consider the distribution of yields (per plot) of two paddy varieties. For the first variety, the
mean and S.d are 60 kg and 10 kg respectively. For the second variety the mean and S.d are
50kg and 9kg respectively
This shows that the variability in first variety is less as compared to that in the second variety
Activity 4.3
Two distribution A& B have mean 80 inch and 20 kg and s. deviation is 10 inch and 1.5 kg
respectively. Which distribution has greater variability?
Chebyshev's Theorem
Is, developed by Russian Mathematician Chebyshev, Specifies the proportions of the
spread in terms of the standard deviation.
For any set of data (population or sample) and any constant k(greater than one) the
proportion of the data that must lie with k standard deviations on either sides of the mean
92
• Z gives the deviations from the mean in units of standard deviation and it tell us
how many S.D a given value lie above or below the mean.
• It also helps in hypothesis testing
• It is used to compare two observations coming from different groups.
93
Two groups of children were trained to perform a certain task for a month and then tested to
find out which group is faster to learn the task. The average time taken to perform the task
was 10-4 minutes with s.d of 1.2 min &11.9 min with a s.d. of 1.3 min for the 2 nd
group .Child A form group 1 took 9.2 min. while child B from group 2 took 9.3 min, who
was faster in performing the task relative to the other
Group I Group II
S. d = 1.2 S. d = 1.3
XA = 9.2 XB = 9.3
These values indicate that the time taken, by child A is one S.d below the average time taken
by the group. The time taken by child B is two S.d below the mean time taken by his/her
group, child B is therefore, faster in performing the task relative to the other.
r=1, 2.3,…
- If r=1, it is the simple arithmetic mean, this is called the first moment.
94
For r = 1, 2, …
If r=2, it is population variance, this is called the second central moment. If we assume n-
1≈n ,it is also the sample variance.
3. The rth moment about any number A
- denoted by and defined as:
r=1, 2, …
Remarks: 1) 2) 3)
Activity 4.4
1. Find the first two moments for the following set of numbers 2, 3, 7
2. Find the first three central moments of the numbers in problem 1
3. Find the third moment about the number 3 of the numbers in problem 1.
Skew ness
- Skewness is the degree of asymmetry or departure from symmetry of a distribution.
- A skewed frequency distribution is one that is not symmetrical.
- Skewness is concerned with the shape of the curve not size.
95
Measures of Skewness
It is the measure of the direction and degree of asymmetry.
- Denoted by
- There are various measures of skewness.
1. The Pearsonian coefficient of skewness
96
In a negatively skewed distribution, smaller observations are less frequent than larger
observations i.e. the majority of the observations have a value above an average.
Activity 4.5
1. Suppose the mean, the mode, and the standard deviation of a certain distribution are
32, 30.5 and 10 respectively. What is the shape of the curve representing the distribution?
2. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5.
If the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and
the probable mode of the distribution.
Kurtosis
Kurtosis is the degree of peakd-ness of a distribution, usually taken relative to a normal
distribution. The peakd-ness of a distribution be classified in to three:
a) Leptokurtic: -A distribution having relatively high peak.
- A large number of observations have same values
b) Mesokurtic: - Normal peak
- The curve is properly peak.
c) Platykurtic: - Flat toped
- A large number of observations have low frequency are spread in the
middle interval.
97
Activity 4.6
1. If the first four central moments of a distribution are:
98
Variability or dispersion concerns with the extent to which values in a data set
vary from the mean or from one another
There are different measures of dispersion these include range, variance,
standard deviation, mean deviation and coefficient of variation
Range is the difference between the largest and the smallest value in a data set
Variance is the sum of the squares of the difference between the mean and the
individual observations divided by the total number of observations for the case
of population and by n-1 for the case of sample
Standard deviation is the positive square root of the variance
Coefficient of variation is the ratio of the standard deviation to the arithmetic
mean and expressed as percentage
Different measures of dispersion have different properties and different uses, we
have to apply them in their appropriate places
Exercises on Chapter 4
99
3. The standard deviation calculated from a set of 32 observations is 5. If the sum of the
observations is 80, what is the sum of square of the observations?
4. The mean of 5 observations is 3 and variance is 2. If three of them are 1, 3 and 5. Find the
remaining two.
5. The distribution of marks of 50 students in statistics out of 50 is given in the table below.
Marks 0-10 10-20 20-30 30-40 40-50
Number of 5 8 15 16 6
students
Calculate
a) The range b) The quartile deviation
c) The standard deviation and interpret the result.
6. Two models of radio were subjected to a durability test, and the results were as
follows.
Number of sets
examined
Life(in years)
Model A Model B
100
State which model has a longer average life and which model has more uniformity
City 1: 25 24 23 26 17
City 2: 22 21 24 22 20
City 3: 32 27 35 24 28
Which city have the most consistent temperature, based on these data?
8. Suppose Bekele got 90 on a test in which the mean and S.D for the class were 70 and
10 respectively. In other test Almaz score 60 in which the mean and S.D for her class
were 56 and 40 respectively.
a) Who was better of relative to his/her class?
b) Which class has students of less similar result?
9. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5.
If the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and
the probable mode of the distribution.
10. If the standard deviation of a symmetric distribution is 10, what should be the
value of the fourth moment so that the distribution is mesokurtic?
101
ELEMENTARY PROBABILITY
CONTENTS
5.1. Introduction 95
In this chapter, there are two main points to be discussed: Possibilities and Probabilities. After
presenting some basic concepts of probability, the next part is about techniques of counting or the
methods used to determine the number of possibilities, which are indispensable to compute
probabilities, then followed by different definitions of probability; and finally, some general rules
and derived theorems of probability will be presented.
Objectives:
Define basic terms in probability such as: sample space, outcome, event, and so on.
5.1 INTRODUCTION
In our daily life, it is not uncommon to hear words which express our doubts or being
uncertain about the happenings of certain events. To mention some instances,
102
These statements show uncertainty about the happening of the event under question. In
Statistics, however, sensible numerical statements can be made about uncertainty and apply
different approaches to calculate probabilities.
In general, there are three states of expectations: certainty, impossibility, and uncertainty.
Probability Theory is concerned about the study of a random (chance) phenomena; it is a numerical
measure of the chance of occurrence of something (called an event), which shows the degree of
uncertainty. Thus, we say that the probability of the above three expectations is, respectively, one,
zero, and between zero and one. Probability Theory is thr basis for all statistical applications in any
field of study.
Since probability theory is closely related with set theory, one need to revise this section from
mathematics. Probability is also defined in terms of relative frequency, presented in chapter two of
this module. Thus, the following is a review of your knowledge on these topics, and of course from
your knowledge of elementary probability is High School.
• Probability theory is the foundation upon which the logic of inference is built.
• In general, probability is the chance of an outcome of an experiment. It is the measure of how likely
an outcome is to occur.
103
Example 5.1
If a fair coin is tossed three times, it is possible to enumerate all possible eight sequences of
head (H) and tail (T). But it is not possible to predict which sequence will occur at any
occasion.
3. Outcome: The result of a single trial of a random experiment
4. Sample Space(S): Set of all possible outcomes of a probability experiment.
Example: Sample space of a trial conducted by three tossing of a coin is S=
{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Sample space can be
Countable (finite or infinite)
Uncountable
5. Event (Sample Point): It is a subset of sample space. It is a statement about one or more
outcomes of a random experiment. It is denoted by capital letter A.
For example, in the event, that there are exactly two heads in three tossing of a coin, it
would consist of three points HTH, HHT and THH.
Remark: If S (sample space) has n members then there are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an event: The complement of event A (denoted by ), consists of
all the sample points in the sample space that are not in A.
8. Elementary (simple) Event: an event having only a single element or sample point.
9. Mutually Exclusive (Disjoint) Events: Two events which cannot happen at the same
time.
10. Independent Events: Two events are said to be independent if the occurrence of one
does not affect the probability of the other occurring.
104
Addition Rule
If event A can occur in m possible ways and event B can occur in n possible ways,
there are m+n possible ways for either event A or event B to occur,
but only if there are no events in common between them.
i.e. n (A or B) =n (A)+n(B)-n(A B)
Only A = A-B
105
Both A & B=
Notes: 1) An alternative expression is: n =n(only A)+n(only B)+n(both A & B).
To list the outcomes of the sequence of events, a useful device called tree diagram is used.
Example 5.2:
A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or milk with
bread, cake and sandwich. How many possibilities does he have?
106
If a man has 3 pairs of trousers, 5 shirts, 2 jackets and 3 pairs of shoes, in how many different ways
can he wear his clothes and shoes?
Solution: Using n1=3, n2=5, n3=2 and n4=3, the total number of ways of wearing is:
Activity 5.1
The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many different
cards are possible if
a) Repetitions are permitted.
b) Repetitions are not permitted.
Factorial notation
The symbol "n!", read as " n factorial", denotes the product of all positive integers less than or equal
to n.
Permutation
An arrangement of n objects in a specified order is called permutation of the objects.
107
3. The number of permutations of n objects in which k1 are alike, k2 are alike ---- etc is
Example 5.4
Find the permutations of two of the five vowels a, e, i, o, u; and list them.
108
Combination
There are many problems in which we are interested in finding the number of ways in which r objects
can be selected from n distinct objects without regard to the order of selection. Such selections are
called combinations.
Definition: The number of ways of selecting r objects from a set of n objects with out regard
to the order of selection is called combination.
Example 5.5
Given the letters A, B, C, and D list the permutation and combination for selecting two
letters.
Solutions:
Permutation: Combination:
AB BC
AC BD
AD DC
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
109
Example 5.6
In how many ways can 3 letters be selected form the four letters a, b, c & d?
Solution: Since we do not care about their order of selection, we have only the following four cases:
abc, abd, acd, & bcd.
But recall that, the number of permutations of 3 letters out of the 4 is 4P3 =24, and we know that each
of the three letters can be arranged in 3! = 6 ways.
Actually, "combination" means the same as "subset"; in the above case, the number of subsets of 3
elements that can be selected from a set of 4 distinct elements is = 4.
This is called the total number of combinations of 3 objects selected from n distinct objects.
The number of combinations of n distinct objects taking r of them at a time is given by:
, for .
Example 5.7
Example 5.8
If a committee of 5 candidates is to be formed out of 10, of which 4 are girls and 6 are boys, how
many committees can be formed if 2 girls are to be included?
110
There are four different conceptual approaches to study probability theory. These are: • The
classical approach.
• The frequencies approach.
• The axiomatic approach.
• The subjective approach.
Definition
If a random experiment with N equally likely outcomes is conducted and out of these N A
outcomes are favorable to the event A, then the probability that event A occur denoted P (A)
is defined as:
Limitation:
If it is not possible to enumerate all the possible outcomes for an experiment.
If the sample points (outcomes) are not mutually independent.
If the total number of outcomes is infinite.
If each and every outcomes is not equally likely.
Example 5.9
Solution: S ={1, 2, 3, 4, 5, 6}; let E ={3, 5}. For a fair die, P(1)=P(2) = =P(6)=1/6; then,
P(E)=m/n=2/6=1/3.
111
If one sits for a quiz, the two options (pass/fail) are not equally likely.
Definition
The probability of an event A is the proportion of outcomes favorable to A in the long run
when the experiment is repeated under same condition.
Example 5.10
112
Solution: If E =The event that the plane will arrive on time, then:
That is, the plane didn't arrive on time for 600 – 468 =132 flights; thus, .
In general, or
Activity 5.4
If records show that 60 out of 100,000 bulbs produced are defective. What is the probability
of a newly produced bulb to be defective?
Axiomatic Approach:
This approach does not give precise definition of probability but gives certain axioms or postulates
or rules on which probability calculations are based. Then, anyone of the preceding concepts can be
used in applications as long as it is consistent with these rules.
Let E be a random experiment and S be a sample space associated with E. With each event A a real
number called the probability of A satisfies the following properties called axioms of probability or
postulates of probability.
1.
2. P(s) =1
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P (AuB) =P (A) +P (B)
Subjective Approach
It is always based on some prior body of knowledge. Hence subjective measures of
uncertainty are always conditional on this prior knowledge. The subjective approach accepts
113
Example 5.11
Abebe’s belief about the chances of Ethiopia Buna club winning the FA Cup this year may
be very different from Daniel's. Abebe, using only his knowledge of the current team and
past achievements may rate the chances at 30%. Daniel, on the other hand, may rate the
chances as 10% based on some inside knowledge he has about key players having to be sold
in the next two months.
There are also other other rules, but all are derived from the above three postulates.
Some of them are:
a) P(A1 A2 … An) =P(A1)+ P(A2) + … + P(An) , if A1, A2, …,An are pairwise mutually
exclusive.
b) , probability never exceeds unity.
c) .
d) , where is the complement of event A.
e) For any two events A and B, P(A B)=P(A)+P(B)-P(A B); this is the general addition
rule.
114
Remark: 1)
2)
3) For three events A, B, and C
.
4) If an event A must result in of the mutually exclusive events A1.A2,…, An.
Then P (A) =P(A1).P(A/A1) + P(A2).P(A/A2) + ….+ P(An).P(A/An).
5) Suppose that A1, A2, …, An are mutually exclusive events whose union is the sample
space.
Activity 5.5
1. For a student enrolling at freshman in a certain university, the probability is 0.25 that
he/she will get scholarship and 0.75 that he/she will graduate. If the probability is 0.2 that
he/she will get scholarship and will also graduate. What is the probability that a student who
get a scholarship graduate?
2) A lot consists of 20 defective and 80 non-defective items from which two items are
chosen without replacement. Find the probability that:
a) that both items are defective, b) the second item is defective.
Probability of Independent Events
115
Remarks: If A1, A2, A3 are to be independent then they must be pair wise independent,
Where j,k=1,2,3 and we must also have
Example 5.12
Solution: a) P (A) P (B) = (0.4) (0.2) = 0.08 = P (A B).Hence, A and B are independent.
b) P(C) P (D) = (0.5) (0.3) = 0.15 P(C D) = 0.10. Hence, C and D are dependent.
Example 5.13
A problem in Statistics is given to three students X, Y, and Z, whose probabilities of solving it are
a) All of them will solve it; b) Any one of them will solve it, if they try independently?
b)
Activity 5.6
116
2. A ball is drawn at random from a box containing 6 red balls, 4 white balls and 5 blue balls.
Find the probability that they are drawn in the order red, white and blue if each ball is
117
Classical probability concept: The probability of an event is m/n if it can occur in m ways out of a
total of n equally likely ways.
The relative frequency concept of probability: The probability of the occurrence of an event equals
its relative frequency.
118
2. a) b) ; c)
a) all heads; b) two tails and a head in this order; c) two tails & a head in any order?
3. Among 15 clocks there are two defectives. In how many ways can an inspector choose
3 of clocks for inspection, so that
a) Non of the defective is included
119
120
PROBABILITY DISTRBUTIONS
CONTENTS
6.1. DEFINITION OF RANDOM VARIABLES AND PROBABILITY
DISTRIBUTIONS 114
6.2. INTRODUCTION TO EXPECTATION – MEAN AND VARIANCE OF
A RANDOM VARIABLE 118
6.3. COMMON DISCRETE PROBABILITY DISTRIBUTIONS –
BINOMIAL AND POISSON 121
6.4. COMMON CONTINUOUS PROBABILITY DISTRIBUTIONS -
NORMAL, CHI-SQUARE, T AND F 125
INTRODUCTION
In chapter 5, the techniques of computing the probability of an event have been introduced.
In this chapter, we shall study the most commonly used discrete probability distributions;
namely, the Binomial and Poisson distributions; and three continuous probability densities:
normal, chi-square and t distributions, which are very important in statistical inference. We
will also mention some of their properties, because we need the results. But before
presenting the probability distributions specifically, we need to define a random variable, a
probability distribution, and the mean and variance, in general, of a continuous as well as
discrete random variables.
Objectives:
121
Random variable: - is numerical valued function defined on the sample space. It assigns a
real number for each element of the sample space. Generally a random variables are denoted
by capital letters and the value of the random variables are denoted by small letters
Discrete random variable: are variables which can assume only a specific number of
values. They have values that can be counted
Examples
• Toss a coin n time and count the number of heads.
• Number of children in a family.
• Number of car accidents per week.
• Number of defective items in a given company.
• Number of bacteria per two cubic centimeter of water.
Continuous random variable: are variables that can assume all values between any two
give values.
Examples
• Height of students at certain college.
122
Example 6.1
In an experiment of "flipping a fair coin 3 times", list the elements of the sample space that
are assumed to be equally likely (as this is what is meant by a fair or balanced coin) and the
corresponding values x of the r-v X, the number of heads observed.
Solution: If H stands for heads and T for tails, then the sample space corresponding to this
experiments is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Since X= the number of heads observed, the results are shown in the following table:
Element of S Probability x
HHH 1/8 3
HHT 1/8 2
HTH 1/8 2
HTT 1/8 1
THH 1/8 2
THT 1/8 1
TTH 1/8 1
123
Activity 6.1
1) Consider an experiment of tossing a coin three times. Let X be the number of heads.
Construct the probability distribution of X.
2) A balanced die is tossed two twice, construct a probability distribution if:
A) X is the sum the number of spots in the two trials.
B) X is the absolute difference of the number of spots in the trials.
1)
2)
3) If X is discrete random variable then
Example 6.2
124
Since these values are all non-negative and the sum is , the given
function can serve as a pmf of some random variable whose domain is .
Definition: a non negative function f(x) is called probability distribution of continuous R.V
X if the total area bounded by the curve and the X-axis is 1 and if the sub area under the
curve bounded by the curve & X-axis and perpendicularly erected at any points a and b give
the probability that X is between a and b.
Example 6.3
125
b)
Activity 6.2
Definition:
1. Let a discrete random variable X assume the values X1, X2, ….,Xn with the probabilities
P(X1), P(X2), ….,P(Xn) respectively. Then the expected value of X, denoted as E(X) is
defined as:
126
=
2. Let X be a continuous random variable assuming the values in the interval (a, b) such that
=1, then
Where
Rules of Expectation
a) E (kX) =kE(X)
b) E(X+k) =E(X) + k
127
b)
a) E (XY) =E(X).E(Y)
b)
c) Cov (X, Y) =0
Example 6.4
Let a fair die be rolled once. Find the mean number rolled, say X.
Solution: Since S = {1, 2, 3, 4, 5, 6} and all are equally likely with prob. of 1/6, we have
Example 6.5
Find the expected value and the variance of the r-v given in
Solution:
= 1.
128
Activity 6.3
1. What is the expected value and Variance of a random variable X obtained by tossing a
coin three times where X is the number of heads?
2. Let X be a continuous R.V with distribution
In this section, we shall study two common discrete probability distributions, namely, the
Binomial and Poisson distributions.
1. Binomial Distribution
Definition: The outcomes of the binomial experiment and the corresponding probabilities of
these outcomes are called Binomial Distribution.
Let p=probability of success q= 1-p=probability of failure on any given trials
Then the probability getting x success in n trials becomes
When using the binomial formula to solve problems, we have to identify three things:
• The number of trials (n)
• The probability of a success on any one trial (P) and
• The number of successes desired (X).
Example 6.6
Find the probability of getting 5 heads and 7 tails in 12 flips of a fair coin.
Then, p = Prob. of getting a head =1/2, and q = prob. of not getting a head=1/2.
130
Example 6.7
If the probability is 0.20 that a person traveling on an EAL flight will a vegetarian, find the
probability that 3 of 10 people on such flight will be a vegetarian?
Activity 6.4
What is the probability of getting three heads by tossing a fair coin four times?
2. Poisson Distribution
Where is the average number occurrence of an event in the unit length of interval or
distance and x is the number of occurrence in a Poisson process.
The Poisson distribution depends only on the average number of occurrences per unit time of
space. It is used as a distribution of rare events, such as:
• Number of misprints.
• Natural disasters like earth quake.
• Accidents.
• Hereditary.
131
Note: The Poisson probability distribution provides a close approximation to the binomial
probability distribution when n is large and p is quite small or quite large with λ=np.
Usually we use this approximation if 5≤np. In other words, if n>20 and np<5 or n(1-p) ≤5
then we may use Poisson distribution as an approximation to binomial distribution.
Example 6.8
Suppose that customers enter a waiting line at random at a rate of 4 per minute. Assuming
that the number entering the line during a given time interval has a Poisson distribution, find
the probability that:
, but .
132
3. A sale firm receives, on the average, 3 calls per hour on its toll-free number. For any
given hour, find the probability that it will receive the following.
a) At most 3 calls
b) At least 3 calls
4. If approximately 2% of the people are left-handed, find the probability that in a room 200
people, there are exactly 5 people who are left-handed?
In this section, we will study three important continuous probability distributions that play the
leading role in statistical inference; viz., the normal, t & Chi-Square distributions.
A random variable X is said to have a normal distribution if its probability density function
is given by
133
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution i.e. there is no gaps or holes.
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the
mean is 0.5
That is, .
134
Example 6.9
Find the probabilities that a r-v having the standard N.D will take on a value
a) Less than 1.72; b)less than -0.88; c) between 1.30 & 1.75; d)between -0.25 & 0.45.
a) P(Z<1.72)=P(Z<))+P()<Z<1.72)=0.5+0.4573=0.9573.
b) P(Z < -0.88) = P(Z > 0.88) =0.5 - P(0 < Z < 0.88) =0.5- 0.3106 = 0.1894.
c) P(1.30 < Z <1.75)= P(0 < Z < 1.75) – P(0 < Z < 1.30) = 0.4599 – 0.4032)=0.0567.
d) P(-0.25 < Z < 0.45)= P(-0.25 < Z < 0) + P( 0 < Z < 0.45) = 0.0987 + 0.1736=0.2723.
Activity 6.6
Of a large group of men, 5% are less than 60 inches in height and 40% are between 60 & 65
inches. Assuming a normal distribution, find the mean and standard deviation of heights.
135
The distribution of the t statistic is called the t distribution or the Student t distribution. The
particular form of the t distribution is determined by its Degrees of Freedom (df). The
degrees of freedom refers to the number of independent observations in a set of data. When
estimating a mean score or a proportion from a single sample, the number of independent
observations is equal to the sample size minus one.. The t distribution can be used with any
statistic having a bell-shaped distribution (i.e., approximately normal).
136
The chi-square variable is similar to t variable in that its distribution is a family of curves
based on the number of degree of freedom. The symbol for chi-square is (Greek letter chi,
pronounced “ki”). The chi-square distribution is obtained from the values of when
random samples are selected from a normally distributed population whose variance is .A
chi-square variable can not be negative, and the distributions are positively skewed. At about
100 degree of freedom, the chi-square distribution becomes some what symmetrical. The
area under each chi-square distribution is equal to 1.00 or 100%.
In order to find the area under the chi-square distribution, there are three cases to consider:
1) Find the chi-square critical value for a specific when the hypothesis test is one tailed
right. In this case, find the value at the top of table and the corresponding degree of
freedom in the left column. Then, the critical value is located when the two columns meet.
Example 6.10
a) The critical chi-square value for 15 degrees of freedom when and the test is one-
tailed right is 24.996.
b). Find the chi-square critical value for a specific when the hypothesis test is one tailed
left. In this case, the value must be subtracted from one. Then, the left side of the table
used, because the table gives the area to the right of the critical value, the statistics can
not be negative.
Example 6.11
137
Note that after the degrees of freedom reach 30, chi-square table only gives values for
multiples of 10(40, 50,60,etc.). When the exact degrees of freedom one is seeking are not
specified in the table, the closer smaller value should be used.
When the degrees of freedom are greater than or equal to 2, the maximum value for
Y occurs when .
We use the t distribution and tests to examine the probability of a single estimator taking a
particular value we use the F distribution and F tests to carry out joint hypothesis testing on
more than one estimator
The motivation behind the F distribution is where we have independent samples of two
variables each drawn from normal distributions
138
If we want to find out if the variances are the same, so X2=Y2, but we can’t observe the
sample variances; however, we have sample estimators, SX2 and SY2 :
and
If we take
If the two variances are the same, F=1. If they are different, F 1 and the greater the
difference, the greater the value of F will be
Statistical theory shows us that if the two population variances are equal (X2=Y2), the F
ratio will follow the F distribution with (m-1)/(n-1) df (with the larger of the two variances
on the top)
The F ratio is often designated Fk1,k2, where the subscript denotes the parameters of the
distribution, so, here k1=(m-1) and k2=(n-1)
CHAPTER SUMMARY
, where
For a Normal Distribution, the three averages coincide: mean = median = mode.
If X is continuous, then .
The Chi-square distribution is used to test an association of attributes.
140
141
1. Suppose that an examination consists of six true and false questions, and assume that a student
has no knowledge of the subject matter. The probability that the student will guess the
correct answer to the first question is 30%. Likewise, the probability of guessing each of the
remaining questions correctly is also 30%.
a) What is the probability of getting more than three correct answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
2) The probability that a patient contracting IB will recover from the distance under medical
treatment is 0.6 out of 15 patients contracting the diseases
a) What is the probability that exactly 10 is record?
Assume that the patients are subjected under the same medical treatment.
3. Find the area under the standard normal distribution which lies
a) Between Z=0 and z=96.0
142
143
CONTENTS
INTRODUCTION
Before giving the notion of sampling we will first define population. In a statistical
investigation the interest usually lies in the assessment of the general magnitude and the
study of variation with respect to one or more characteristics relating to individuals
belonging to a group. This group of individuals under study is to a group This group of
individuals under study is called population or universe thus in statistics, population is an
aggregate of objects animate and inanimate under study the population may be finite of
infinite
Objectives
At the end of this chapter students will be able to
Explain the meaning of sampling theory sampling unit and sampling frame
144
Sampling is that part of statistical practice concerned with the selection of individual
observations intended to yield some knowledge about a population of concern, especially for
the purposes of statistical inference. Before having further discussion on the specific type of
sampling methods, it is valuable to be acquainted to the following terms:
1. Sampling
Sampling is the process or method of sample selection from the population.
Sampling can be done either with replacement or with out replacement.
In this case, a unit is selected from a population with a known probability and a unit is
returned to the population before the next selection is made (after records its
characteristic(s).Thus, in this method at each selection, the population size remains constant
and the probability at each selection or draw remains the same and a unit has chances of
being selected more than once. There are possible samples of size n from a
population of N units.
In this selection procedure, if a unit from a population size N is selected, it is not returned to
the population. Thus, for any subsequent selection, the population size reduced by one. There
145
3. Sample size
The number of sampling units which are selected from a population. The sample size
depends on a number of considerations which are as follows.
e) Precision required.
4. Study Unit
The ratio between the numbers of units in the sample to the number of units in the source
population.
6. Sampling frame
146
Examples
List of house holds.
List of students in the registrar office.
147
A probability sampling scheme is one in which every unit in the population has a known
nonzero probability of being sampled and the process involves random selection.
Probability sampling includes: Simple Random Sampling, Systematic Sampling, Stratified
Sampling, Cluster Sampling or Multistage Sampling.
3. Cluster sampling
149
Cluster sampling tends to provide best results when the elements within the clusters
are heterogeneous.
It is used in large geographic samples where no list is available of all the units in the
population but the population boundaries can be well-defined.
Cluster sampling must use a random sampling method at each stage. This may result
in a somewhat larger sample than using a simple random sampling method, but it
saves time and money.
For example, to obtain information about the drug habits of all high school students in a
state, you could obtain a list of all the school districts in the state and select a simple
random sample of school districts. Then, within in each selected school district, list all the
high schools and select a simple random sample of high schools. Within each selected high
school, list all high school classes, and select a simple random sample of classes. Then use
the high school students in those classes as your sample.
150
For example, to select a sample of 25 dorm rooms in your college dorm, make a list of all
the room numbers in the dorm. Say there are 100 rooms. Divide the total number of rooms
(100) by the number of rooms you want in the sample (25). The answer is 4. This means
that you are going to select every fourth dorm room from the list. But you must first
consult a table of random numbers. Pick any point on the table, and read across or down
until you come to a number between 1 and 4. This is your random starting point. Say your
random starting point is "3". This means you select dorm room 3 as your first room, and
then every fourth room down the list (3, 7, 11, 15, 19, etc.) until you have 25 rooms
selected.
151
The researcher chooses the sample based on who he/she think would be appropriate for the
study.Samples are taken based on previous knowledge of the population (from which the
samples are taken), and the specific purpose of the study or investigation. Researchers use
their personal judgment in selecting the sample(s)
B) Convenience Sampling
The selection of units from the population is based on easy availability and/or accessibility.
C) Quota Sampling
D) Snowball Sampling
152
Sampling Distribution
Because statistic such as x varies from sample to sample, they are random variables. As
such, Statistic has probability distributions associated with them. In order to make
probability statements regarding a sample statistic, we need to know the probability
distribution of the sample statistic. That is to say, we need to know the shape, center and
spread of the sample statistic’s distribution.
The sampling distribution of a statistic is a probability distribution for all possible values of
the statistic computed from a sample of size n.
Sampling distribution of the sample mean is a theoretical probability distribution that shows
the functional relation ship between the possible values of a given sample mean based on
samples of size and the probability associated with each value, for all possible samples of
size drawn from that particular population.
153
Example 7.1
Let as take single random of sample from this population: - so that size 2 with replacement
is Nn = 42 = 16.
(2,2) 2 (2,6) 4
(4,2) 3 (4,6) 5
(6,2) 4 (6,6) 6
(8,2) 5 (8,6) 7
(2,4) 3 (2,8) 5
(4,4) 4 (4,8) 6
(6,4) 5 (6,8) 7
(8,4) 6 (8,8) 8
2 1 1/16
3 2 2/16
154
5 4 4/16
6 3 3/16
7 2 2/16
8 1 1/16
, the mean of the sample mean is the same as the population mean.
Activity 7.1
Suppose we have a population of size N=5, consisting of the age of five children: 6, 8, 10,
12, and 14.
155
Remark:
1. In general, if sampling is with replacement or while sampling from an infinite
population.
factor (fpc)
3. In any case the sample mean is unbiased estimator of the population mean.
That is, (show this)
Sampling may be from a normally distributed population or from a non- normally
distributed population.
When sampling is from a normally distributed population, the distribution of will
posses the following property.
1. The distribution of will be normal
Activity 7.2
156
157
where .
Note: Since increasing the sample size will decrease the standard error!!
Thus, the larger the sample size is, the larger is (since the interval
Example 7.2
What is the probability of the difference between the sample proportion and the population
proportion will be less or equal to 0.05 as the sample size What is the probability as
we increase the sample size to 100?
Solution
. Thus,
158
Thus,
There is 69.22% chance that the difference between the sample proportion and the
population proportion is not more than 0.05 . That is, the larger sample size will
provide a higher probability that the value of the sample proportion will be within a specific
distance of the population proportion.
Example 7.3
A new soft drink is being market tested. It is estimated that 60% of consumers will like the
new drink. A sample of 96 taste-tested the new drink.
(a) Determine the standard error of the proportion
(b)What is the probability that equal to or more than 70.4% of consumers will indicate they
like the drink?
(c) What is the probability that equal to or more than 30% of consumers will indicate they do
not like the drink?
Solution:
159
(b)
(c) We need to compute the probability that less than 70% of consumers will indicate they
like the drink?
Example 7.4
What is the most important factor for business travelers when they are staying in a hotel?
According to USA Today, 74% of business travelers state that having a smoke-free room is
the most important factor. Assume that the population proportion is and that a
sample of 200 business travelers will be selected.
(a) What is the probability that the sample proportion will be within of the population
proportion?
(b) Suppose the probability that a sample proportion will be within of the population
mean is 0.9. What is the sample size n?
Solution:
(a)
160
161
Rarely would one construct a sampling distribution of means and derive the standard error of
this distribution in order to Determine the error in generalizing to the population Instead, the
standard error of a sampling distribution of means ( ) can be estimated from the standard
error of the mean of a single sample:
Example 7.5
Consider the following summarized data for case processing time.
The Central Limit Theorem applies to the sampling distribution of a proportion. The
standard error of the sampling distribution of a proportion can be estimated from a single
sample, in a manner similar to that used with the mean
Example 7.6
Survey of attitudes towards the death penalty (N=800)
162
Suppose a random variable X has population mean μ and standard deviation σ and that a
random sample of size n is taken from this population. Then the sampling distribution of
Simply stated: For any population, regardless of its shape, as the sample size increases, the
shape of the sampling distribution of the sample mean, , becomes more normal.
Example 7.7
For a population of 2,000 students living in hostels of the monthly mean expenditure on
three meals is 500 birr with a variance of 144, if sampling is with replacement find the
probability that a random sample of size 36 student from this population yields a mean
expenditure of less than birr 495 per month
163
1) Suppose that all students who are at examination in a particular year the mean score was
450 with s.d of 120.If 400 of the students who took the test during that particular year were
selected at random.
(b) What is the probability that a randomly selected student has an ACT Math score less than
18?
(c) What is the probability that a random sample of 10 ACT test takers had a mean math
score of 18 or less?
164
Another method of getting information about the population is by taking a small proportion
of a population which can be technically called samples. Sampling is used extensively in all
facets of business and government.
Most non-probability types of sampling (Judgment, quota, Convenience and Referral) have
common weakness. The choice of items selected in the sample is left to the discretion of the
researcher. Some users of non-probability sampling recognize the disadvantages of this type
sampling but consider that the cost saving and convenience outweighs the disadvantages.
The main disadvantage is that the reliability or accuracy of the sample results cannot be
accurately measured. There fore, the subsequent discussion involving the reliability of
sample results concerns only probability sampling techniques.
Exercises on Chapter 7
165
consists of four strata of size 500, 1200, 200, and 100. How large a sample must be taken
3. A population consists of the four numbers, 3,7,11, and 15. Consider all possible samples
of size 2 drawn from this population without replacement. Find
a) b) ;
Verity (c) and (d ) from (a) and (b) using suitable formulae.
a) n = 10 and N = 200;
b) n= 20 and N = 200;
c) n = 40 and N = 400;
166
CONTENTS
INTRODUCTION
The concept of estimation and hypothesis testing is used indifferent aspects of human life
and different fields of study.
167
Compute and interpret the confidence interval for population mean and proportion
testing a hypothesis
Inference is the process of making interpretations or conclusions from sample data for the
totality of the population.
Inferential statistics uses the sample results to make decisions and draw conclusions about
the population from which the sample is drawn.
In statistics there are two ways though which inference can be made.
Statistical estimation
Statistical hypothesis testing
Both involve using sample statistics to make inferences about the
population parameter.
Both involve using sample statistics to make inferences about the population parameter.
168
Sample
Numerica
l data
Statistical Estimation
This is one way of making inference about the population parameter where the investigator
does not have any prior notion about values or characteristics of the population parameter.
There are two ways estimation:
i. Point Estimation: It is a single value or number of sample information that is used
to estimate a parameter. The best point estimate of the population mean is the
sample mean
ii. Interval estimation: It is the procedure that results in the interval of values as an
estimate for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a parameter.
169
sample size increases. i.e. gets closer to θ as the sample size increases.
3. Relatively Efficient Estimator: The estimator for a parameter with the smallest variance.
This actually compares two or more estimators for one parameter.
Condition-1: If the population variance is known; what ever the value of sample size but
the population is normal
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean
of a sample. Consider samples of size n drawn from a population, whose mean is μ and
standard deviation is with replacement and order important. The population can have any
171
Where: = is the probability that the parameter lies outside the interval
Note: When (as is often the case) we don't know the population standard deviation and n is
large ( ), we can approximate it by the sample standard deviation , and obtain the
172
In most practical research, the standard deviation for the population of interest is not known.
In this case, the standard deviation is replaced by the estimated standard deviation S, also
known as the standard error. Since the standard error is an estimate for the true value of the
standard deviation, the distribution of the sample mean is no longer normal with mean
and standard deviation . Instead, the sample mean follows the -distribution with mean
and standard deviation . The -distribution is also described by its degrees of
freedom. For a sample of size n, the -distribution will have n-1 degrees of freedom. The
notation for a -distribution with n-1 degrees of freedom is . As the sample size n
increases, the -distribution becomes closer to the normal distribution, since the standard
error approaches the true standard deviation for large n.
-The value of can be obtained from a table with an area of to the right with
degrees of freedom.
Example 8.1:
A random sample of 900 workers showed an average height of 67 inches with a standard
deviation of 5 inches.
173
a) , S=5, n=900
b)
Example 8.2
A Drug Company is testing a new drug which is supposed to reduce blood pressure. From
the six people who are used as subjects, it is found that the average drop in blood pressure is
2.28 points, with a standard deviation of 0.95 points. What is the 95% confidence interval for
the mean change in pressure?
Solution:
174
Example 8.3
Suppose we want to estimate a 95% confidence interval for the average quarterly returns of
all fixed-income funds in the Ethiopia. We draw a sample of 100 observations and calculate
the sample mean to be 0.05 and the standard deviation 0.03. We assume that those returns
are normally distributed with known variance.
Solution:
n=100
175
If P represents for the population proportion then the sample proportion provides a
good estimate of P. Therefore, the sample proportion is the point estimation of the
population proportion. To construct the confidence interval for the proportion we follow the
following conditions:
Conditions: If the population proportion is not too close to zero or one, and
that the sample size is large (at least 30):
176
Example 8.4
In a sample of 400 people who were questioned regarding their participation in sports, 160
said that they did participate. Construct a 98 % confidence interval for P, the proportion of P
in the population who participate in sports.
Solution:
Hence, we can conclude that about 98% confident that the true proportion of people in the
population who participate in sports between 34.5% and 45.7%.
177
In the above equation, z0 is the critical value of z used in conjunction with the specified level
of significance (α level), while Z 1 is the value of z with respect to the designated probability
of Type II error (β level). In determining sample size for testing the mean, z 0 and Z1 always
have opposite algebraic signs.
The result is that the two products in the numerator will always be accumulated. Also, the
above equation can be used in conjunction with either one-tail or two-tail tests and any
fractional sample size is rounded up. Finally, the sample size should be large enough to
warrant use of the normal ability probability distribution in conjunction with P0 and P1.
Hypothesis Testing
Definitions
Statistical hypothesis
178
Null hypothesis
This is a claim or statement about a population parameter that is usually assumed to be true
from the very beginning until it is declared false. It is a statistical hypothesis that states a
hypothesis of equality or the hypothesis of no difference between a parameter and a specific
value. It is usually denoted by H0.
Alternative hypothesis: Is a claim or statement about a population parameter that will be true
if the null hypothesis is false. It is a statistical hypothesis that states a hypothesis of
difference between a parameter and a specific value. It is usually denoted by H1 or HA.
Types and size of errors:
Testing hypothesis is based on sample data which may involve sampling and non
sampling errors.
Type I error: Rejecting the null hypothesis when it is actually true. The significance
level ( ) can be interpreted as the probability of rejecting the null hypothesis when it
is actually true. The distribution of the test statistic under the null hypothesis
determines the probability of a type I error.
=P (type I error) = level of significance
Type II error: Occurs when a false null hypothesis is not rejected. The null
hypothesis is actually false but we wrongfully conclude do not reject it.
represents the probability that H0 is not rejected when actually H0 is false. The
179
Note: The two types of errors that occur in tests of hypothesis depend on each other. We
can not lower the values of and simultaneously for a test of hypothesis for a fixed
sample size. Lowering the value of will raise the value of , and lowering the value of
will raise the value of . However, we can decrease both and simultaneously by
increasing the sample size.
The following table gives a summary of possible results of any hypothesis test:
180
1. VS
2. VS
3. VS
Condition-1
If the population standard deviation, is known what ever the value of sample size is and
when sampling is from a normal distribution:
After specifying α we have the following test criteria corresponding to the above three
hypothesis.
Note: When we don't know the population standard deviation and n is large ( ), we
can approximate it by the sample standard deviation , and obtain the following test
statistics:
181
Condition-2
After specifying α we have the following test criteria corresponding to the above three
hypothesis.
Example 8.5
182
Solution:
critical region is
is the acceptance region
5. Compute the test value
, , n=150
6. Decision:
183
Example 8.6
Ten individuals are chosen at random from a population and their height is found to be in
inches 63, 63, 66, 67, 68, 69, 70, 71 and 71. In the height of the data the average height of
the population is 66 inches. Can we conclude that the height of an individual is decreasing?
(Use and assume the normality of the population)
Solution:
VS
, , n=10
6. Decision:
VS
6. Decision:
Example 8.8
185
VS
5. Decision:
The procedure to make tests of hypothesis about the population proportion for large
samples is similar in many aspects to the population mean. The procedure includes the same
seven steps. Similarly, the test can be two-tailed or one tailed. When the sample size is large,
the sample proportion is approximately normally distributed with its mean equal to and
186
denoted by then one can formulate two sided (1) and one sided (2 and 3) hypothesis as
follows:
1. VS
2. VS
3. VS
Example 8.9
A manufacturing company has submitted a claim that 100% of items produced by a certain
process are non defective. An improvement in the process is being considered that the feel
187
1. (actually ) VS
2.
3. Critical Region: Z>1.645
4. Computation
5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has reduced
the proportion of defective.
Example 8.10
The unemployment rate in a given country at a given period is believed to be 10%. The
government embarked on a series of projects to reduce unemployment. It was of interest to
determine whether unemployment decreases as a result of the projects. A random sample of
500 people was chosen, and 48 of them were found to be unemployed. Test at 1% level of
significance if the government projects reduced the unemployment rate
188
4. Critical Region:
5. Computation
Activity 8.1
A large sample of 200 students from the students of a certain high school is interviewed and
85 of them are found to use city bus. Can you conclude that at least 40% of the students
use city bus? Use a 0.05 level of significance.
In the previous section we tried to see how we can test hypothesis for numeric data give in
the form of mean or proportion. It is also possible to apply hypothesis testing on categorical
data.
Suppose that we have a population consisting of observations having two attributes or
qualitative characteristics say A and B.
If the attributes are independent then the probability of possessing both A and B is PA*PB
189
A B1 B2 . . Bj . Bc Total
The chi-square procedure test is used to test the hypothesis of independency of two attributes
190
Remarks:
Example 8.12
Non
Moderate smoker Heavy smokers Total
smoker
Hypertension 21(33.5) 36 (29.47) 30(23.68) 87
No Hypertension 48(35.365) 26(32.03) 19(25.32) 93
Total 69 62 49 180
At .Test weather presence or absence of hypertension depends on smoking habit?
Solution
191
Activity 8.2
A researcher is interested to assess the effect of litracy on family planning use. Accordingly
he collected data and tabulated the findings in the following manner. Can we say there is
association between educational status and family planning use?
No c 15 d 33 48
Total 78 82 160
192
There are two types of inferences. These are estimation and tests of
hypothesis
There are two types of estimations. These are point estimation and Interval
estimation
The degrees of confidence, the maximum allowable errors are the three
important factors needed in the determination of the sample size for a particular problem
Test of hypothesis is the procedure we follow either to accept or reject the hypothesis.
193
2. The mean life time of a sample 16 light bulbs is 1570 hrs with standard deviation of 110
hours test the hypothesis that there is some improvement in the mean life of time o f light
bulbs at =0.05
3. A sociologist claims that the average age of murderer victims in small city is less than or
equal 23.2 yrs. A sample of 18 recent victims had a mean age of 22.6 at =0.05 test the
sociologists claim the population s.d is 2 years
4. A sample of 50 days showed that a fast food restaurant served 182 customers during lunch
time. The standard deviation of a sample was 8. Find the 95% CI for the mean N.
5. The president of a large university wants to estimate the average age of the students
presently enrolled. From past studies the standard deviation is known to be 2 year. A
sample of 50 students is selected and the mean is found to be 23.2 years. Contract 95% CI
for the population mean
7. A theory predicts that the population of beans in the 4 groups A, B,C,D should be in the
ratio 9:3:3:1. In an experiment among 1600 bean, the number in the four groups are
882,313,287 & 118. Does observed mean that support the theory
194
Son
Father Bold Not
Bold 85 59
Not 65 91
Using α=5% test whether there is association between father and son regarding boldness.
9. Random samples of 200 men, all retired were classified according to education and
number of children is as shown below
Number of children
195
CONTENTS
PROPORTIONS 194
INTRODUCTION
Dear learner, in the previous chapter, you have been introduced to the two problems of
statistical inference; namely, statistical estimation and tests of hypothesis, though restricted
to one mean and one proportion. This chapter is a natural continuation of the previous.
The general focus of this chapter is on testing hypotheses and constructing confidence
intervals about parameters (means and proportions) from two populations, thereby enabling
you to meet the following objectives:
Test hypotheses and construct confidence intervals about the difference between two
population means and proportions using data from large samples.
Test hypotheses and establish confidence intervals about the difference between two
population means and proportions using data from small samples when the
population variances are unknown and the populations are normally distributed.
197
Example 9.1
Suppose you want to compare two different methods of production, A and B, to see which,
on average, requires less time. You could decide to use either of the two following
sampling plans:
the time each takes to complete (2) Have each person use the
(2) Do the same thing for a different (3) Have each person use
198
Although two-sample inference is the simplest kind of Experimental Design, most of the
important concepts of Experimental Design are illustrated in the two-sample case:
Goal: Compare two population means A and B by comparing the sample means and
from two random samples, one taken from population A and the other from population
B.
Data layout:
Sample A Sample B
x1 y1
x2 y2
x3 y3
: :
xnA ynB
199
(3) The analysis then proceeds slightly differently depending on whether the
populations standard deviations are known/given or not:
Statistic: - Statistic: -
Distribution: z Distribution: t
Degrees of Freedom, :
=
Where and
Furthermore, the formula will usually not give an integer value, and it is recommended that
you round your result down to the next nearest integer.
200
z/2 t/2
Note: In the vast majority of applications, D 0 is usually 0 because we are usually interested
in simply testing whether the two means are equal or not (i.e., whether or not A-B =
0 or A-B < 0, or >0,or 0)
z= t=
Note: In the case where A and B are unknown, the text gives an additional method for
comparing the population means. This method “pools” the values of s A and sB
together.
The really good news is that you can ignore the method based on pooling because it has
recently been shown in the statistics literature that this method is unnecessary and doesn’t
lead to any better results than the method the text describes above (for the A, B unknown
case).
201
Example 9.2
The problem explicitly states that independent samples are used, but you could have seen
that by just noticing that the sample sizes n A = 17 and nB = 12 are different (i.e., the samples
couldn’t possibly have been paired)
Since the population standard deviations are not given/known, we must use the t distribution
for conducting hypothesis tests and constructing confidence intervals:
Goal: Compare two population means A and B by taking one random sample of items and
measuring them under two different conditions, A and B. The basic idea behind this is that
many extraneous sources of variation in the population can be filtered out by pairing, which
then leaves a clearer picture of the true difference between the means.
For example, think of testing a new drug by measuring peoples’ responses before (A) and
after (B) they take the drug. By comparing the i th person’s individual responses, xi versus yi
(before & after), all of the extraneous factors related to this individual’s life style are
automatically “filtered out” and the difference x i-yi only measures the actual response of that
person to the drug.
Data layout:
1 x1 y1 d1
202
3 x3 y3 d3
: : : :
n xn yn dn
Note: The two sample sizes must be equal since the same n items in the random sample are
being measured twice.
Statistics (sample means & standard deviations) calculated from the differences, di:
Mean of the differences:
The analysis then proceeds exactly as if you were doing single-sample inference for a mean
using t distribution:
Statistic:
203
(a) These samples are definitely “paired” because the each car is measured twice, once for
shock absorber A and once for B.
(manufacturer) (competitor)
1 8.8 8.4 .4
2 10.5 10.1 .4
3 12.5 12.0 .5
4 9.7 9.3 .4
5 9.6 9.0 .6
6 13.2 13.0 .2
= .416666, sd = .132916
We can conclude that this data does show that there is a difference between the mean
strengths of the two brands of shock absorbers.
What if you make a mistake in the beginning and think that these samples are independent?
t= = = 0.4043.
Next,
204
As you can quickly see, t = 0.4043 doesn’t fall in wither tail of the rejection region, so the
(false) conclusion would be that there is no difference between the two population means.
The moral of this story: Mistakenly using the independent samples test (in those cases
when the paired samples test should be used) can lead to incorrect conclusions (so be careful
to correctly identify when to use the independent versus paired samples approach).
Data layout:
Sample A Sample B
XA = # of ‘successes” YB = # of ‘successes”
205
Sample A Sample B
= =
The analysis proceeds a little differently depending on whether you are doing a confidence
interval or a hypothesis test:
Statistic: - Statistic: -
where and = 1-
Note: The text limits its hypothesis tests for proportions to the most common case, where D 0
is 0. The standard error above is based on the assumption that D0 = 0.
( - ) z/2
206
The method for finding the minimum necessary sample sizes, nA and nB for estimating either pA-pB or
A-B is the same: set the desired margin of error, ME, that you are willing to accept equal to the
half-width of the confidence intervals and solve for the sample sizes.
Since this will result is one equation with two unknowns (n A and nB), we usually have to
impose some other condition on the two sample sizes. One of the most frequently used
conditions is that one sample be a fixed (specified) multiple of the other. So, let us assume
that:
nA = r-nB
where r is a constant that you specify in advance. For example, samples from population A
might be cheaper to obtain than samples from population B, so you might want to specify
that twice as many sampled items are taken from A as from B. In that case, you would use a
value of r = 2.
samples of size:
nB = and nA = rnB
207
Set ME = z/2 .
Solve to find nB = .
nB = and nA = rnB
Notes: (a) The text only discusses the case where r=1 (i.e., equal sample sizes)
(b) Also, to use this formula you have to first come up with reasonable guesses
(estimates or bounds) for pAand pB.
(c)The most conservative (i.e., largest sample size) thing to do is use pA = pB = .5.
Otherwise, use upper (or lower) bounds on pA and pB if you know of some.
208
The F distribution can be shown to be the appropriate probability model for the ratio of the
variances of two samples taken independently from the same normally distributed
population, with there being a different F distribution for every combination of the degrees
of freedom (df) associated with each sample. For each sample, df =n - 1. The statistic that is
used to test the null hypothesis that two population variances are equal is
Since each sample variance is an unbiased estimator of the same population variance, the
long-run expected value of the above ratio is about 1.0. (Note: The expected value is not
exactly 1.0, but rather is df2= (df2 - 2), for mathematical reasons that are outside of the scope
of this outline.) However, for any given pair of samples the sample variances are not likely
to be identical in value, even though the null hypothesis is true. Since this ratio is known to
follow an F distribution, this probability distribution can be used in conjunction with testing
the difference between two variances. Although a necessary mathematical assumption is that
the two populations are normally distributed, the F test has been demonstrated to be
relatively robust, and insensitive to departures from normality when each population is
unimodal and the sample sizes are about equal.
Example 9.4
For a random sample of n1=10 life bulbs the mean bulb light x1-bar =400hrs, with S1=200.
For another brand of bulb whose useful life is assumed to be normally distributed, a random
sample of n2=8 as a ample mean of x2-bar=4300 hour and a sample standard deviation of
S2=250. Test the null hypothesis that the samples were obtained from populations with equal
variances, using the 10 percent level of significance for the test, by use of the F distribution:
209
Since the computed F ratio is neither smaller than 0.304 nor larger than 3.68, it is in the
region of acceptance of the null hypothesis. Thus, the assumption that the variances of the
two populations are equal cannot be rejected at the 10 percent level of significance.
210
z/2 t/2
( - ) z/2
The method for finding the minimum necessary sample sizes, nA and nB for estimating either pA-pB or
A-B is the same: set the desired margin of error, ME, that you are willing to accept equal to the
half-width of the confidence intervals and solve for the sample sizes.
The statistic that is used to test the null hypothesis that two population variances are
equal is
Exercises on Chapter 9
211
2. The average length of twenty trout caught in a lake was 10.8 inches with standard
deviation of 2.3 inches, and the average length of fifteen trout caught in another lake was
9.7 inches with standard deviation of 1.5 inches Construct a 90 percent confidence
interval for the difference in the true mean lengths of trout in the two lakes.
3. A farmer tried Feed A on 256 cattle and Feed B on 144 cattle. The mean weight of cattle
given Feed A was found to be 1350 pounds with a standard deviation of 180 pounds. On
the other hand, the mean weight of the cattle given Feed B was found to be 1430 pounds
with a standard deviation of 210 pounds. At the 5 percent level of significance, is Feed B
significantly better than Feed A? Find the p-value
4. At a certain university twelve voters were picked at random from those who are in favor
of impeachment of the president, and ten were selected at random from those who are
against. The following table give their ages.
In favor 27 34 28 30 29 50 30 44 29 32 41 35
Against 31 36 43 40 32 48 30 29 42 49
5. Paired t-test.
Dr. Williams claims that the special diet that he recommends significantly reduces weight. A
sample of eight persons was selected and they were put on the diet for a period of 6 weeks.
The table below shows the weights (in pounds) of those eight person before and after dieting.
a) Construct a 99% confidence interval for the mean difference d in weight before
and after using the dieting recommended by Dr. Williams. Use a paired difference
sd
d t
n .
2
b) Using a 1% level of significance, can you conclude that the mean weight loss for
all persons due to this special diet is greater than zero?
6. In a study to estimate the proportion of residences in a certain city and its suburbs that
subscribe to a certain magazine, it is found that 63 of 120 urban residences subscribe
while only 34 of 125 suburban residences subscribe. Find a 90% confidence interval for
the difference in the proportion of urban and suburban residences that subscribe to writer's
digest.
7. A jar containing 130 mosquitoes was sprayed with an insecticide of Brand A and it was
found that 98 of them were killed. When another jar containing 150 mosquitoes of the same
type was sprayed with Brand B. 120 of them were killed. At the 2 percent level of
significance, do the two brands differ in their effectiveness?
213
CONTENTS
INTRODUCTION
Most of the analysis discussed in the previous chapters deal with one variable case. Some
times, how ever, we are interested in determining the degrees of relation ship between two or
more variables and even we try to estimate by how much one variable related to it changes
by one. Regression and correlation analysis are used to study relationships among variables.
This chapter introduces you to such and related issues
Objectives
Draw scatter diagram to identify the type of relation ship that exists between
variables
214
Linear regression and correlation is studying and measuring the linear relation ship among
two or more variables. When only two variables are involved, the analysis is referred to as
simple correlation and simple linear regression analysis, and when there are more than two
variables the term multiple regression and partial correlation is used.
10.1.1 DEFINITION
Correlation Analysis: deals with the measurement of the closeness of the relation ship
which are described in the regression equation.
We say there is correlation when the two series of items vary together directly or inversely.
In simple linear regression analysis, two variables are under study/one independent and one
dependent.
A variable whose value is used to estimate the value of the dependent variable. It is
denoted by Y
215
INTRODUCTION TO STATISTICS: Stat 281
Where B0 & B1 represent the intercept and the slope (they are called parameters,
regression coefficient)
The random error term, i, is included in the model to represent the following two
phenomena.
2. Random variation
Assumptions
6. The distribution of population errors for X has the same (constant) deviation
which is denoted by
Note:
One of the methods that help us to find the estimates of B0 & B1 is the least squares method
or ordinary least squares method (OLS))
The resulting estimates of B0 & B1 denoted by & , respectively are called the Least
squares Estimates
Note:- This method gives the values and such that the sum of squares errors is
minimum. i.e. We minimize
216
(SS Residual)
& derivate it
i.e.
When we solve the two equations, we get the least squares values B0 & B1
Then the estimated regression lone (the least squares regression lines, will be given by
217
Interpretation of &
The value of gives the predicted or the mean value of Y for X = 0. The value of , gives
the average change in Y (dependent variable) due to a change of one unit in X (independent
variable).
Example 10.1
Find the least squares regression line for the data on the final marks & number of
hours spent on studying
Xi 8 5 13 10 6 18 15 2 9 11
Yi 65 94 72 70 54 90 85 33 56 29
Solution
The Least square regression line is
218
29.88 indicated the expected mark of a student who spent zero hour
on studying
CORRELATION
SCATTER DIAGRAM
219
COVARIANCE
If (X1, Y1), (X2, Y2)…., (Xn, Yn) are n pairs of observations of the variables X and Y in a bi-
variety distribution, then
And
COEFFICIENT OF CORRELATION
220
where
Solution:
Since r is positive & close to 1, is indicates there is a strong positive linear relation ship
between the number of hours spent on studying and the final marks.
(i.e r is between – 1< r < 1, if r = -1, there is strong negative linear relation ship & if r=1,
there is strong positive relationship & if it is 0, no linear relation ship between the Y-
dependent and X independent variable)
221
i) bXY = i)
ii) ii)
iii) iii)
Example 10.3
3X + 2Y = 26 and 6X + Y = 31
Solution:
3X + 2Y = 26 ….( *)
222
i) r= -0.5
(Since r2<1, our assumption that (*) & (**) are the liner regression of Y on X & X on Y
respectively is true)
223
n X iYi X i Yi
Y on X : bYX ..........1
n X i 2 X i
2
224
225
Where
Example 10.4
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is correlation
between the tests of the ladies
Lipsticks A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7
Solution:
RX 2 1 4 3 5 7 6 Total
RY 1 3 2 4 5 6 7
D=RX-RY 1 -2 2 -1 0 1 -1
D2 1 4 4 1 0 1 1 12
Yes, there is positive correlation. (i.e r is between – 1< r < 1, if r = -1, there is strong
negative correlation & if r=1, there is strong positive correlation & if it is 0, no correlation ).
CHAPTER SUMMARY
226
Many relation ships among variables exist in real world. One way to determine whether a
relation ship exists is to use the statistical techniques known as correlation and regression.
The strength and direction of the relation ship is measured by the value of the correlation
coefficient. It can assume values between and including -1 and +1.
The closer the coefficient to +1 or -1, the stronger the relation ship is between the variables.
A value of +1 or -1 indicates a perfect relation ship. A positive relation ship between two
variables means that for small values of the independent variables, the values of the
dependent variable will be small, and for large values of the independent variables, the
values of the dependent variable will be large.
A negative relation ship between two variables means that for small values of the
independent variable the values of the dependent variable will be large and for that large
values of the independent variable, the values of the dependent variable will be small
Relation ship can be linear or curvilinear, to determine the shape, one draws a scatter plot of
the variables. If the relation ship is linear, the data can be approximated by a straight line
called regression line or the line of best fit.
227
Exercises on Chapter 10
1. A study was reported in a medical journal suggesting that the peak heart rate on
individual can reach during intensive exercise decreases with age. A cardiologist
wanted to do his own study treadmill at 6 miles per hour and their age their heart rates
were recorded as follows.
Age(X) 30 30 40 20 20 45 30 45 50
Heart rate(Y) 190 180 180 200 195 170 180 175 165
a) Find the least square regression of Y on X.
b) For an 80 years old man, what peak heart rate do you predict?
, , , , 2 =775, n=100.
3. The equations of two regression lines between two variables are expressed as:
6x + y = 31 and 3x + 2y = 26.
c) If , find and .
228
APPENDICES
APPENDIX A
CUMULATIVE AREA OF THE STANDARD NORMAL CURVE from 0 to z
229
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998
3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
3.6 .4998 .4998 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.8 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.9 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000
230
APPENDIX B
CUMULATIVE AREA OF THE Student- t CURVE WITH DEGREES OF FREDOM n-1
The t- Distribution
232
APPENDIX C
CUMULATIVE AREA OF RIGHT TAIL AREAS FOR THE CHI-SQUARE
DISTRIBUTION WITH N-1 DEGREES OF FREEDOM
233
234
235
References:
236