Introduction To Statistics - Doc1

INTRODUCTION TO STATISTICS STAT 281
INTRODUCTION TO THE COURSE
The module has ten chapters: the first two chapters have been designed to deal with general Introductions,
basically to define some basic terms, and Methods of Data Representations.
The next two chapters are about Descriptive Statistics, dealing with Measures of Central Tendency
(collectively known as averages), and Measures of Variation or Dispersion.
Chapter 5 and 6, Probability and Probability Distributions, shall deal with Elementary Probability
Theory and two common Discrete Probability Distributions: Binomial and Poisson; and some Continuous
Probability Densities: Normal, Chi-Square t and F distributions, which play indispensable roles in statistical
theory and inferences.
Chapter 7, 8 and 9 are meant to discuss Sampling Distributions, Estimation and Hypothesis Testing and
Two Sample Inferences on one mean, two means and one proportion and two proportions. The last
chapter deals with Simple Linear Regression and Correlation.
Sufficient examples as well as activities are provided whenever necessary.
Objectives
By the end of the course, the student should be able to:
 Explain the basic concepts of statistics.

 Identify the different types of probability distributions.
 Know and apply the collection and organization of data.
 Identify the different types of sampling techniques.
 Analyze and conclude based on the data collected from a sample.
CHAPTER 1
INTRODUCTION
CONTENTS
1.1. DEFINITIONS AND CLASSIFICATION OF STATISTICS 3

1.2 STAGES IN STATISTICAL INVESTIGATION 3
1
Introduction to Statistics STAT 281
1.3. DEFINITION OF SOME TERMS 4

1.4. APPLICATIONS, USES AND LIMITATIONS OF STATISTICS 5
1.5. SCALES OF MEASUREMENT 7
1.6. INTRODUCTION TO METHODS OF DATA COLLECTION 10
INTRODUCTION
What is Statistics? What is the need to study statistics? How is it employed? These are only some of the
basic questions one has to raise with the field of statistics. This chapter will provide only partial answers to
these questions.
The chapter has six sub-sections that define some important terms starting with the word “Statistics” it self,
treated as singular and plural, along with its classifications, applications, uses and limitations; stages in any
statistical study; scales of measurement and a highlight to methods of data collection.
Objectives:
After completing this chapter, students are expected to be able to:
 Explain the meaning and uses of statistics

 Differentiate between descriptive and inferential statistics.
 Differentiate between types of variables.
 Describe the four levels of measurement.
1.1 DEFINITION AND CLASSIFICATION OF STATISTICS

Definitions of Statistics
Statistics has been defined in to two ways, some writers define it as ‘statistical data’ .i.e numerical
statement of facts, while others define it as ‘statistical methods”; that is, complete body of the
principles and techniques used in collecting and analyzing such data.
 Statistics as a numerical data (plural meaning)
Prepared by Big Bang, August, 2017 GC

Statistics are measurements, enumerations or estimates analyzed and presented as to

exhibit important inter relationships among them. ‘A.M. turtle’.
 Statistics as a statistical method (singular meaning)
Statistics may be defined as ‘the methods and techniques of collecting, organizing,
presenting, analyzing and interpreting numerical data”.
Classifications of Statistics
 Descriptive statistics:-refers to the procedures used to collect, organize and summarized
masses of data. The frequency distribution, measurement of central tendency such as mean
and median, measures of dispersion such as range and standard deviation, belong to this
category of statistics.
 Inferential statistics: - includes the methods used to find out some thing about a
population based on a sample. In this form of statistical analysis, descriptive statistics is
linked with probability theory so that an investigator can generalize the results of a study.
1.2 Stages in statistical investigation

There are five stages or steps in any statistical investigation:
1. Collection of data
This is the first step and it is the foundation of the entire data set. It is the process of
measuring, gathering, assembling the raw data up on which the statistical investigation is to be
based. Careful planning is essential before collecting the data. There are different methods of
collection of data such as census survey, sample survey, etc and the investigator should make
use of the correct method.
2. Organization of data
This is summarization of the data in some meaningful way, like in the form of a table.
3. Presentation of the data

This is the process of re-organization, classification, compilation, and summarization of data to

present it in a meaningful form. The collected data may be presented in the form of tabular or
diagrammatic or graphic form.
4. Analysis of data
This is the process of extracting relevant information from the summarized data, mainly through
use of elementary mathematical operation.
5. Interpretation of data
The final step is drawing conclusion from the data collected. A valid conclusion must be
drawn on the basis of analysis. A high degree of skill and experience is necessary for the
interpretation.
1.3 Definition of some terms in Statistics

Data are the raw materials of statistical investigations, they arise whenever measurements are
made or observations are recorded.
There are two groups of data:

i) Primary data-data which are collected from the units or individual respondents for the
purpose of certain study or information.
ii) Secondary data- data which had been collected by certain agency and statistically
treated and the information contained in it is used again for some another purpose.
Some more basic terms include:
 Population:- is total collection of elements to be studied and they have one or more
specific characteristics.
Example:- population consisting of DDU summer students for some study.
 Sample :- is any subset of population selected to draw some valuable conclusions about
the entire population on its basis
4

 Parameter:- is numerical measurement, which describes some characteristics of a

population
 Sample statistics:- is a numerical measurement , which describes some characteristics of
a sample
1.4 Applications, Uses and Limitations of statistics
Some of the applications of statistics:

• In almost all fields of human endeavor.
• Almost all human beings in their daily life are subjected to obtaining numerical facts.
• It is applicable in some process like invention of certain drugs, extent of environmental
pollution, etc.
• In industries, especially in quality control area.
Uses of statistics
The main function of statistics is to enlarge our knowledge of complex phenomena. The following
are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison of different sets of data.
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variables.
8. Forecasting future events.
Limitations of statistics
Statistics with all its wide application in every sphere of human activity has its own limitations.
Some of them are given below.

1. Statistics is not suitable to the study of qualitative phenomenon
Since statistics is basically a science and deals with a set of numerical data, it is applicable to the
study of only these subjects of enquiry, which can be expressed in terms of quantitative
measurements. As a matter of fact, qualitative phenomenon like honesty, poverty, beauty,
intelligence, etc, cannot be expressed numerically and any statistical analysis cannot be directly
applied on these qualitative phenomena. Nevertheless, statistical techniques may be applied
indirectly by first reducing the qualitative expressions to accurate quantitative terms. For
example, the intelligence of a group of students can be studied on the basis of their marks in a
particular examination.
2. Statistics does not study individuals
Statistics does not give any specific importance to individual items; in fact, it deals with an
aggregate of objects. Individual items, when they are taken individually, do not constitute
statistical data and do not serve any purpose for any statistical enquiry.
3. Statistical laws are not exact
It is well known that mathematical and physical sciences are exact. But statistical laws are not
exact but they are only approximations. Statistical conclusions are not universally true, they are
true only on the average.
4. Statistics can be easily misused
Statistics must be used only by experts; otherwise, statistical methods are the most dangerous
tools on the hands of the inexpert. The use of statistical tools by the inexperienced and untraced
persons might lead to wrong conclusions. Statistics can be easily misused by quoting wrong
figures of data.
5. Statistics is only one of the methods of studying a problem

Statistical methods do not provide complete solution to the problems because problems are to be
studied taking the background of the countries culture, philosophy or religion into consideration.
Thus the statistical study should be supplemented by other evidences.
1.5 SCALES OF MEASUREMENT

CLASSIFICATION OF DATA
Data classification can be defined as a method of grouping data according to their similarities
and uses to study the characteristics of the entire population on the basis of their classes.
The classification of data is generally done on geographical, chronological, qualitative or
qualitative basis.
i) In geographical classification, data are arranged according to places, areas or regions.
ii) In chronological classification, data are arranged according to their time references.
iii) In qualitative classification, data are arranged according to attributes like sex, marital
status, educational standard, etc.
iv) In quantitative classification, data are arranged according to certain characteristics
that has been measured or counted.
Data can also be classified according to different aspects such as:

I. Depending on the type of variable
a) Qualitative data (categorical data)
In qualitative classification, data are arranged according to attributes.
Example 1.1
Data collected based on sex, marital status, educational standard, and so on give rise to qualitative
data.
Sex: male or female
Marital status: married, single, divorce, widowed.

Educational standard: Literate or Illiterate.
Rank of instructors: Graduate assistant, assistant lecturer, lecturer, and so on.
b) Quantitative data
In quantitative classification, data are arranged according to certain characteristic that has been
counted or measured.
Quantitative variables are again divided in two groups: - discrete and continuous.
Discrete data:-are described by integers only and their values are obtained by counting, the
possible values for such variables are 0, 1, 2… that means they assume only counting numbers.
Example 1.2
Number of students in Dire Dawa University, number of private cars in Dire Dawa,
number of books are some of the examples that produces discrete data.
Continuous data:-are those quantitative figures which can take any numbers, including fractions.
Their values are obtained by measurement.
Example 1.3
Weight of a person in kg, height, temperature and so on give rise to continuous data.
II) Depending on time reference

a) Time series data:- are data collected over along period of time.
b) Cross sectional data:- are data collected over a particular period of time on a range of
spaces.
Definition: A characteristic which shows variability or takes on different values is called a variable.

 Quantitative variable – is the one which leads to quantitative data. Hence we can talk about a discrete
variable (yielding discrete data) and a continuous variable (yielding continuous data).
 Qualitative variable- similarly, leads to qualitative data.
III) Depending on scales/Level of measurement
Proper knowledge about the nature and type of data to be dealt with is essential in order to
specify and apply the proper statistical method for their analysis and inferences. Measurement
scale refers to the property of value assigned to the data based on the properties of order,
distance and fixed zero.
The scales of measurement also show what mathematical operations and what statistical
analyses are permissible to be done on the values of the variable.
Accordingly, there are four scales of measurement: nominal, ordinal, interval and ratio scales.
a) Nominal scale variables

These are those qualitative variables that consist of name label or categories of individuals. In
nominal scales numbers are assigned to the variables simply for coding purposes. It is not possible
to compare two individuals based on the numbers assigned to them. They don’t share any of the
properties of the number we deal with an ordinary arithmetic.
Example 1.4
Sex, Religion, Nationality, color, are nominal variables.
b) Ordinal scale
This refers to the variables whose values can be ordered or ranked but the difference between data
values either can’t be determined or is meaningless. Comparison is restricted. Ranking and
counting are the only mathematical operations to be done on the values given to these variables.
Example 1.5
i) Rank of instructors in a university as graduate assistant, lecturer, and professor is ordinal.

ii) Beauty classified as beautiful, more beautiful and most beautiful is ordinal.
c) Interval scale
These variables have the properties of the ordinal scale plus the difference between two values
is constant. There is no true zero origin; that is, zero doesn’t show absence in this case.
Example 1.6
Temperature of a given area may be 0 oc. But this doesn’t mean that there is no heat at all; It
simply indicates that it is too cold.
d) Ratio scale
Ratio scale variables have the properties of the interval scale but in this case there is true zero
origin. That is, zero shows absence of something in this case.
All mathematical operations like division, multiplication, logarithms, powers and others are
allowed to be operated on the values of such variables.
Example1.7
Income of a person, amount of yield from a plot of land, expenditure and consumption amount.
In all of these cases, if the variables assume zero values, it is the indication of absence of the
values. That means, for example, if yield is zero, it shows no yield at all.
1.6 INTRODUCTION TO METHODS OF DATA COLLECTION
Depending on the source of data, there are two methods of data collection:
a) Primary method
Data measured or collected by the investigator or the user directly from the source.
• Two activities are involved: planning and measuring.
a) Planning:
 Identify source and elements of the data.
 Decide whether to consider sample or census.
10

 If sampling is preferred, decide on sample size, selection method,… etc.

 Decide measurement procedure.
 Set up the necessary organizational structure.
b) Measuring: there are different options.
 Focus Group
 Telephone Interview
 Mailed Questionnaires
 Door-to-Door Survey
 Mall Intercept
 New Product Registration
 Personal Interview and
 Experiments are some of the sources for collecting the primary data.
b) Secondary data
These are data gathered or compiled from published and unpublished sources or files.
When our source is secondary data, we need to check:
 The type and objectives of the situations.
 The purpose for which the data are collected and compatibility with the present
problem.
 The appropriateness of the nature and classification of the data to our problem.
 There are no biases and misreporting in the published data.
Note that data which are primary for one may be secondary for the other.
11

SCOPE OR COVERAGE OF DATA COLLECTION

In general, there are two methods of data collection, census and sample survey
i) Census survey or complete enumeration
It is a process of investigating the characteristics of each and every member of the
population. It is a survey in which observations are made through the entire population.
Advantages of census survey
 It is more representative than sample survey

 It is more accurate
 Complete and exact when the domain is small
Disadvantages of census survey
 Completeness is impossible when the population is large

 It consumes time, money and human power
ii) Sample survey

 It is an investigation where some part of a population are taken to infer about the
whole population.
 It is appropriate when there is insufficient cost and time.
Advantage of sample survey over census survey
a) It saves money
It is cheaper to assess a sample of size n than a population of size N (n<N).
b) It saves labor
Small number of staffs (enumerators, supervisor, data editors) are required in sample
survey than in census.
c) It saves time
Since the size is small, it reduces data collection and processing time.
d) It minimizes disturbance
12

If the process of data collection affects the society, sampling is the only alternative
for data collection.
CHAPTER SUMMARY
 Statistics is the science that deals with the method of data collection, organization,
presenting, analysis and interpretation of the results of the analysis.
 There are two classifications of statistics: descriptive and inferential.
 Descriptive statistics includes those procedures used to summarize complex data. These
include graphical methods, measure of central tendency and measures of dispersion
 Inferential statistics deals with taking samples and reaching conclusions about a
population, which include estimation and test of hypothesis.
 Variables are classified in to quantitative and qualitative. Quantitative variables are those
variables whose values can be expressed numerically. The values of the qualitative
variables, how ever, can not be expressed numerically.
 Planning and measurement are the two activities involved while working with primary
data.
 The two main kinds of data collection are Census survey and Sample survey. Census
represents complete enumeration, where as sample survey means taking part of the
population so as to infer about the general population from the results of the sample.
13

Exercises on Chapter 1
1. Broadly, define the term ‘Statistics’.

2. Mention some of the uses and limitations of Statistics.
3. Classify the following variables as qualitative, quantitative, discrete or continuous.
a) Number of courses that students take at DDU during this summer.
b) The amount of rainfall at Dire Dawa over the last ten years.
c) The attitudes of the parents towards bringing up their children.
d) Type of automobiles people drive.
e) Length of steel bars produced in a given production run.
f) Weight of a bar of soap.
4. An insurance company has insured 250,000 cars over the last five years. The company
would like to know the number of cars involved in one or more accidents over this time
period. It selects 500 clients at random from the files and makes a record of clients who were
involved in one or more accidents. Based on this information, Identify:
a) The population. b) The sample. c) The variable of interest to the insurance company.
d) The type of statistics used. e) The scope of data collection.
5. Classify the following statements in to descriptive or inferential statistics:
a) The average age of students at our school is 22 years.
b) 20% of students in my summer Biology class are married.
c) It is expected that there will be 1200 car accidents in the next three months in Ethiopia.
d) Two thirds of all doctors interviewed smoke cigarettes.
e) A firm reported that the average life span of a product is estimated to be 8 months.
6. Are the following data nominal, ordinal, interval or ratio? Explain your answers.
a) Type of automobiles people drive. b) Students I.D card numbers.
c) Number of errors made on a production line. d) Time it took to run 42 kms.
e) Phone numbers. f) Temperature readings in Fahrenheit.
14

g) Military rank. h) Birth place of students. i) Number of customers of a bank.
15

CHAPTER 2
METHODS OF DATA PRESENTATION
CONTENTS
2.1 FREQUENCY DISTRIBUTIONS 16
2.2. DIAGRAMMATIC AND GRAPHICAL PRESENTATION OF DATA 25
INTRODUCTION
In this chapter we will deal with the classification and presentation of data by using
frequency distribution and different types of graphs. Having collected and edited the
data, the next important step is to organize it. That is, to present it in a readily
comprehensible condensed form that aids to draw inferences from it. It is also necessary
that the like be separated from the unlike ones.
OBJECTIVES
At the end of this chapter, the student is expected to be able to:
 Explain the meaning of frequency and frequency distribution

 Construct both Grouped and Ungrouped frequency distributions
 Put raw data in to frequency distribution
 Compute class mark, class boundary, relative frequency and cumulative
frequencies
 Know the types of graphs and apply them in their appropriate places
 Draw histogram, frequency polygon, ogive, pie chart, bar chart and apply them
for appropriate data set.
16
2.1 FREQUENCY DISTRIBUTIONS
The presentation of data is broadly classified in to the following two categories:
• Tabular presentation
• Diagrammatic and Graphic presentation.
The process of arranging data in to classes or categories according to similarities technically
is called classification.
Classification is a preliminary and it prepares the ground for proper presentation of data.
Before seeing frequency distribution, we have to see some basic terms.
Raw data: recorded information in its originally collected form, whether it is count or
measurement, is referred to as raw data.
Array: is an arrangement of row data in to ascending or descending order of magnitude.
Frequency: is the number of times a value is repeated for the variable in the corresponding
data operations.
Frequency array:- is an array where the individual items or values of a variable are given
along with the corresponding frequencies.
Frequency distributions:- is a tabular summary of a set of data showing frequency of items

in each of the several non-overlapping classes or categories.
Types of frequency distributions
There are three basic types of frequency distributions:

 Categorical frequency distribution
 Ungrouped frequency distribution
 Grouped frequency distribution
17

There are two groups of frequency distributions: categorical or numerical.
1) Categorical frequency Distribution

Used for data that can be placed in specific categories such as nominal, or ordinal.
Example 2.1
A social worker collected the following data on marital status for 25 persons. (M=married,
S=single, W=widowed, D=divorced). Prepare a frequency distribution.
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital
status (M, S, D, and W). These types will be used as class for the distribution. We follow the
following procedures to construct such a frequency distribution.
Step 1: Prepare a table as shown below.
Class Tally Frequency Percent

(1) (2) (3) (4)
M
S
D
W
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
18

Step 4: Find the percentages of values in each class by using:
%= , where f= frequency of the class, n=total number of values.
Percentages are not necessarily part of frequency distribution but they can be added since
they are used in certain types of diagrammatic representations such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps, one can construct the following frequency distribution.
Class Tally Frequency Percent

(1) (2) (3) (4)
M //// 5 20
S //// // 7 28
D //// // 7 28
W //// / 6 24
2) Numerical Frequency Distribution
In such frequency distributions, the data are classified according to numerical size. This is used to
summarize interval and ratio data. Numerical frequency distributions may be discrete (ungrouped ) or
continuous (grouped), depending on whether the variable is discrete or continuous.
Discrete (Ungrouped) frequency Distribution

 Is a table of all the potential raw score values that could possibly occur in the data
along with the number of times each value actually occurred.
 Such distribution is often constructed for small set or data on discrete variable.
To construct ungrouped frequency distribution, we need the following steps:

 First find the smallest and largest raw scores in the collected data.
 Arrange the data in order of magnitude and count the frequency.
19

 To facilitate counting, one may include a column of tallies as shown above.
Example 2.2
The following data represent the mark of 20 students. Construct ungrouped frequency
distribution.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown below.
Step 3: Tally the data.
Step 4: Count the frequency and record in the last column.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
20

Grouped (Continuous) frequency Distribution
This is a frequency distribution when several numbers are grouped in one class.
When the range of the data is large, the data must be grouped in to classes that are more than
one unit in width.
Definition of some common terms

 Class limits: Separates one class in a grouped frequency distribution from another.
The limits could actually appear in the data and have gaps between the upper limits
of one class and lower limit of the next.
 Units of measurement (U): the distance between two possible consecutive
measures. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
 Class boundaries: Separates one class in a grouped frequency distribution from
another. The boundaries have one more decimal places than the row data and
therefore do not appear in the data. There is no gap between the upper boundary of
one class and lower boundary of the next class.
The lower class boundary is obtained by subtracting 0.5U from the corresponding
lower class limit and the upper class boundary is obtained by adding 0.5U to the
corresponding upper class limit.
 Class width: the difference between the upper and lower class boundaries of any
class. It is also the difference between the lower limits of any two consecutive classes
or the difference between any two consecutive class marks.
 Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
 Cumulative frequency: is the number of observations less than/more than or equal
to a specific value.
 Cumulative frequency above: it is the total frequency of all values greater than or
equal to the lower class boundary of a given class.
 Cumulative frequency below: it is the total frequency of all values less than or
equal to the upper class boundary of a given class.
21

 Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class
interval together with their corresponding cumulative frequencies. It can be more
than or less than type, depending on the type of cumulative frequency used.
 Relative frequency (rf): it is the frequency divided by the total frequency. This
gives the percent of values falling in that class.
 Relative cumulative frequency (rcf): it is the cumulative frequency divided by the
total frequency. Gives the percent of the values which are less than or more than the
upper class boundary.
Guidelines for classes
1. There should be between 5 and 20 classes.

2. The class width had better be an odd number. This will guarantee that the class
midpoints are integers instead of decimals.
3. The classes must be mutually exclusive. This means that no data value can fall into
two different classes.
4. The classes must be all inclusive or exhaustive. This means that all data values must
be included.
5. The classes must be continuous. There are no gaps in a frequency distribution.

Classes that have no values in them must be included (unless it's the first or last
classes which are dropped).
6. The classes must be equal in width. The exception here is the first or last class. It is
possible to have a "below ..." or "... and above" class. This is often used with ages.
Constructing a Grouped Frequency Distribution
1. Find the largest and smallest values.

2. Compute the Range (R) = Maximum - Minimum
22

3. Select the number of classes desired. This is usually between 5 and 20 or use
Sturges’ rule of thumb:
, where k is number of classes desired and n is total number of

observations.
4. Find the class width dividing the range by the number of classes and rounding up
. There are two things to watch out here. You must round up, not off.
Normally 3.2 would be rounded to 3, but in rounding up, it becomes 4. If the range
divided by the number of classes gives an integer value (no remainder), then you can
either add one to the number of classes or add one to the class width. Sometimes
you're locked into a certain number of classes because of the instructions.
5. Pick a suitable starting point less than or equal to the minimum value. The starting
point is called the lower limit of the first class. Continue to add the class width to this
lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the
upper limits.
7. Find the boundaries by subtracting 0.5U units from the lower limits and adding 0.5U
units on the upper limits. The boundaries are also half-way between the upper limit
of one class and the lower limit of the next class.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it
may not be necessary to find out the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example 2.3
23

Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solution:
Step 1: Find the highest and the lowest value H=39, L=6.
Step 2: Find the range; R=H-L=39-6=33.
Step 3: Select the number of classes desired using Sturges’ formula:
k=1+3.32log (20) =5.32=6(rounding up).
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation. Then,
6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit.
E.g. the first upper class=12-U=12-1=11. Then,
11, 17, 23, 29, 35, 41 are the upper class limits.
So, combining steps 5 and 6, one can construct the following classes:
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Step 7: Find the class boundaries.
E.g. for the first class, lower class boundary=6-U/2=5.5,

24

Upper class boundary =11+U/2=11.5.
Then, continue adding W on both boundaries to obtain the rest boundaries. By doing, so one
can obtain the following class boundaries:
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: Tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows:
Class Class Class Tally Freq. Cf (less Cf (more rf. rcf (less
limit boundary Mark than type) than type) than type
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10

12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00
25

26

2.2 DIAGRAMMATIC AND GRAPHIC PRESENTATION OF DATA.
These are techniques for presenting data in visual displays using diagrams and pictures.
Importance: -
• They have greater attraction.
• They facilitate comparison.
• They are easily understandable.
Diagrammatic presentation of data
-Diagrams are appropriate for presenting discrete as well as qualitative data.
-The three most commonly used diagrammatic presentations for discrete as well as
qualitative data are:
• Pie charts
• Pictogram
• Bar chart
Pie chart
A Pie Chart is a circular chart divided into sectors, illustrating relative magnitudes or
frequencies of classes of a given variable. Pie chart usually represents categorical data but it
is also possible to use it for discrete quantitative data. The angle of each sector has to be
proportional to the relative frequency of a given class. Angle of Sector=
* 100.
Example 2.4
Draw a suitable diagram to represent the following population in a town.
Men Women Girls Boys

2500 2000 4000 1500
Solution: Draw a pie-chart.

Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
27

Step 3: Using a protractor and compass, graph each section and write its name
corresponding percentage.
Class Frequency Percent Degree

Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54
15%
25%
Men
Women
Girls
Boys
40% 20%
Bar Charts
- A set of bars (thick lines or narrow rectangles) representing some magnitude over
time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being:
 Simple bar chart
 Deviation or two way bar chart
 Broken bar chart
 Component or sub divided bar chart.
28

 Multiple bar charts.
Simple Bar Chart
-Are used to display data on one variable.
-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
Example 2.5
The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.
Product Sales($) Sales($) Sales($)

In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
A Simple Bar chart for sale by product in year 1997 is:

Sales($) In 1957
30
24 24
25
20
15 12
10
0
A B C
Component Bar chart

-When there is a desire to show how a total (or aggregate) is divided in to its component
parts, we use component bar chart.
29

-The bars represent total value of a variable with each total broken in to its component parts
and different paints or designs are used for identifications.
Example 2.6
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solution:
Sales By product in 1957-1959
100
80
sales in $
product C
60
product B
40
product A
20
0
1957 1958 1959
Years of production
Multiple Bar charts
- These are used to display data on more than one variable.

- They are used for comparing different variables at the same time.
Example 2.7
Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
Solution:
30

Sales by Product in 1957-1959
60
50
Sales in $
40 product A
30 product B
20 product C
10
0
1957 1958 1959
Years of production
Broken Bar diagram

This chart is used to present data involving few extreme figures where it will be difficult to
accommodate the bars corresponding to those figures with in graph paper. In this case, we
use piece of bars with each piece starting a jump on the numerical data.
Activity 2.1
Draw a diagram presenting by product in 1958, assuming that there was a product D whose
sales in 1958 was $ 100000.
Graphical Presentation of data

The histogram, frequency polygon and cumulative frequency curve or Ogive are most
commonly applied graphical representations for continuous data.
Procedures for constructing statistical graphs

• Draw and label the X and Y axes.
• Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y
axes.
• Represent the class boundaries for the histogram or Ogive or the mid points for the
frequency polygon on the X axes.
• Plot the points.
• Draw the bars or lines to connect the points.
31

Histogram
This is a graph which displays the data by using vertical bars of various heights to represent
frequencies. Class boundaries are placed along the horizontal axes. Class marks and class
limits are some times used as quantity on the X axes. Unlike Bar graph, in the case of
Histogram, the categories (bars) must be adjacent.
Example 2.8
The following table summarizes the Biostatistics mid exam score of 38 students out of 35
marks.
If we want to draw Histogram for this data it would look like the following:
32

Histogram of Biostatistics marks in mid exam
Frequency Polygon
Frequency Polygon depicts a frequency distribution for discrete or continuous numeric data.
Frequency polygons are graphical device for understanding the shapes of distributions.
A Histogram can easily be changed to Frequency Polygon by joining the mid points of the
top of the adjacent rectangles of the Histogram with a line. It is also possible to draw
Frequency Polygon without drawing Histogram.
Example 2.9
The following frequency distribution represents the ages of 60 patients at Gambella hospital.
Represent the data by a frequency polygon.
33

Then we have to identify the mid points of each interval.
Finally we have to plot the midpoints (on the X axis) with respective to frequency of each
class (on the Y axis) and connect adjacent plots with a straight line.
Note that two artificial class marks at both ends with frequencies of zero have been
added to “tie down” the graph on the X-Axis.
Ogive (cumulative frequency polygon)
This is a graph showing the cumulative frequency (the less than or more than type) plotted
against upper or lower class boundaries, respectively. That is, class boundaries are plotted
along the horizontal axis and the corresponding cumulative frequencies are plotted along the
vertical axis. The points are then joined by a free hand curve.
34

There are two types of ogive
1. Less than ogive :- is a line graph obtained from less than cumulative frequency
plotted against upper boundaries of their respective class intervals
2. More than Ogive :- is a line graph obtained from more than cumulative frequency
plotted against the lower boundaries of their respective class intervals
35

Example 2.10
Draw both cumulative frequency curves for the following data.
Class Class
Limit F boundary LCB UCB
3-7 3 2.5-7.5 2.5 7.5
8-12 4 7.5-12.5 7.5 12.5
13-17 6 12.5-17.5 12.5 17.5
18-22 13 17.5-22.5 17.5 22.5
23-27 17 22.5-27.5 22.5 27.5
28-32 6 27.5-32.5 27.5 32.5
33-37 1 32.5-37.5 32.5 37.5
The less than Ogive curve:
36

50
ncy
40
freque
tive 30
cumula
20
than
Less 10
0
7.5 12.5 17.5 27.5 32.5 37.5
22.
5
Upper class boundary
The More than Ogive curve:
50
40
30
frequency 20
cumulative
More than 10
0
32.5 37.5 27.5 22.5 17.5 12.5 7.5
Lower class boundary
The less than and More than Ogive curves together:
37

50
40
frequency
cumulative 30
more than 20
Less than and
10
0
7.5 12.5 17.5 22.5 27.5 32.5 37.5
Class boundaries
38

CHAPTER SUMMARY
 Frequency is the number of times a value appears in a data set
 There are two types of frequency distribution: grouped and ungrouped frequency
distribution.
 Class mark, class boundary, cumulative frequency and relative frequency are some
of the most important quantities we compute for a given frequency distribution
 Histogram, frequency polygon, and ogive are usually drawn for quantitative data
 Pie chart is a circular chart that is used to display the percentage of the total number
of measurements falling in to different categories.
 Bar chars are usually used for count data. The different types of bar charts include
simple bar chart ,deviation bar chart, component bar chart and multiple bar chart.
 We have to know the types of graphs and apply them in their appropriate places.
39

1. Identify the type/s of classification to summarize results from:
a) Type of cars sold by a company.

b) Amount of deposit made by customers of a bank in a given day.
c) Amount of coffee exported from 5 regions in a country.
d) Number of cars sold to a certain company for three consecutive years.
2. The following data shows the high temperatures in 0C for 50 randomly selected days:
32 38 30 24 24 37 39 34 35 31
23 35 29 34 21 35 35 24 23 26
30 38 25 37 25 39 25 30 27 32
33 30 29 32 33 35 29 33 19 39
22 33 31 20 29 27 31 22 23 36
a) Construct a grouped frequency distribution with suitable number of classes.

b) Convert the distribution obtained in (a) in to a cumulative
i) less than and ii) More than distribution.
c) Construct a histogram, frequency polygon, and both ogives.
3. The following data shows the average yearly consumption of meat in kilograms
for 40 families.
12.6 17.8 19.9 19.0 10.4 20.6 13.2 22.5
14.0 15.6 19.1 20.4 20.6 18.6 18.0 15.9
13.7 14.9 18.7 18.4 20.1 24.2 19.3 13.9
11.7 16.7 15.3 18.3 17.4 23.4 22.0 17.9
21.7 18.9 14.4 9.9 16.0 16.8 10.8 16.2
40

a) Construct a continuous frequency distribution with suitable number of classes.
b) Construct the less than ogive.
c) Construct the relative frequency distribution.
4. A frequency distribution with 6 classes of equal size is constructed to present data
which has been recorded in integers. If the class midpoint of the 3rd class interval is 20
and the class width is 5, write down all the classes.
5. A company has 25 vehicles. The table below shows the summary of yearly fuel
consumption of the vehicles.
Fuel consumption
1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6 and above
in 000’s of liters
Number of vehicles 2 5 6 7 4 1
i) Give a) The lower class limit of the 3rd class. b) Class boundaries of the 2nd class.
c) Class midpoint of the 4th class. d) Width of the 1st class.
e) How many of the vehicles consumed: i) at least 1950 liters but not more than or equal
to 2950 liters? ii) Less than 3950 liters? iii) At most 4900 liters?
g) What percent of the vehicles consumed: i) At least 2950 liters?
ii) Less than 5950 liters? iii) More than 1950 liters?
6. The table below shows the weight distribution of 25 students in basket ball team.
Weight in kgs Number of students
Below 50.5 3
Below 55.5 10
Below 60.5 16
Below 65.5 20
Below 70.5 22
Below 75.5 25
41

a) Form the continuous frequency distribution type where the unit of measurement is 1.
ii) Determine the class limits and the class marks.
iii) How many of the students weigh more than 65.5 kgs? Between 55.5 - 70.5 kgs?
7) The following table shows the type of cars manufactured by a certain company during
1972-1975.
Years
Cars 1972 1973 1974 1975
Toyota 400 300 380 450
Nissan 260 340 350 390
Isuzu 330 310 445 470
Construct
a) A simple bar chart for the total number of cars manufactured.

b) Multiple bar charts.
c) Component bar chart.
d) Percentage component bar chart.
8) A recent study showed that a typical Ethiopian car owner incurs the following expenses,
on the average, when he leases a car for 3 years. Draw a pie chart to portray this data.
Expenditure item Amount ($)
Lease amount 4,500
Gasoline 1,350
Insurance 1,800
Maintenance 1,350
42

CHAPTER 3
MEASURES OF CENTERAL TENDENCY

CONTENTS
3.1. INTRODUCTION AND OBJECTIVES OF MEASURING CENTRAL
TENDENCY 39
3.2. THE SUMMATION NOTATION 41
3.3. PROPERTIES MEASURES OF CENTRAL TENDENCY 43
3.4. TYPES OF MEASURES OF CENTRAL TENDENCY 44
INTRODUCTION
In the previous chapter, you have been introduced to the classification and presentation of
data using graphical methods. Graphical methods are important for data analysis, how ever,
they are inappropriate for statistical inference, since it is difficult to derive the similarity of a
sample frequency and the corresponding population histogram. The two most common
numerical descriptive measures are measure of central tendency and measures of variability.
That is, we seek to describe the center of the distribution and also how the measurements
vary about the center of the distribution. So, this chapter introduces you to the methods used
to find the average or representative values in a given data set
Objectives:
At the end of this chapter the student is expected to be able to:
 Discuss the meanings and uses of the measure of central tendency
 Decide the appropriate measures of central tendency
 Compute and interpret the arithmetic mean, harmonic mean, geometric mean,
median, mode, Quartiles, Deciles, Percentiles and soon
43

3.1 INTRODUCTION AND OBJECTIVES OF MEASURING CENTRAL
TENDENCY
Measures of central tendency are measures of the location of the middle or the center of a
distribution. The definition of "middle" or "center" is purposely left somewhat vague so that
the term "central tendency" can refer to a wide variety of measures.
-The tendency statistical data to get concentrated at certain value is called central tendency.
And various methods that determine the actual value at which the data tend to concentrate
are called measure of central tendency. One of the most important objectives of statistical
analysis is to get one single value that describes the characteristics of the entire data. Such a
value is called the central value or average.
-When we want to make comparison between groups of numbers it is good to have a single
value that is considered to be a good representative of each group. This single value is called
the average of the group.
-Averages are also called measures of central tendency.
-An average which is representative is called typical average and an average which is not
representative and has only a theoretical value is called a descriptive average.
Objectives:
 To comprehend the data easily i.e. to condensed the mass of data in to one single
value.
 To facilitate comparison.
 To make further statistical analysis.
44

3.2 THE SUMMATION NOTATION
Let X1,X2 X3,…,XN be a number of measurements where N is the total number of observation
and Xi is ith observation.
Very often in statistics an algebraic expression of the form X 1+X2+X3+...+XN is used in a
formula to compute a statistic. It is tedious to write an expression like this very often, so
mathematicians have developed a shorthand notation to represent a sum of scores, called the
summation notation.
The symbol is mathematical shorthand for X1+X2+X3+...+XN
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the
numbers."
Example 3.1
Suppose that the following were scores made on the first homework assignment for five
students in the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where N=5, the
summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the sequence of
summation. If the expression were written with "i=2", the summation would start with the
second number in the set.
The "N" in the upper part of the summation notation tells where to end the sequence of
summation. If there were only three scores then the summation and example would be:
45

Sometimes if the summation notation is used in an expression and the expression must be
written a number of times, as in a proof, then a shorthand notation for the shorthand notation
is employed. When the summation sign " " is used without additional notation, then "i=1"
and "N" are assumed
PROPERTIES OF SUMMATION
1. , Where k is any constant
2. , Where k is any constant
3. , where a and b are any constant
4.
5.
Example 3.2
46

Activity 3.1
Considering the following data determine
X Y
5 6
7 7
7 8
6 7
8 8
a) b) c) d) e)
f) g) h) g)
3.3 PROPERTIES OF MEASURES OF CENTRAL TENDENCY

The characteristics of a good measure of central tendency (Or a typical average) should
have the following properties:
 It should be defined rigidly which means that it should have a definite value.
 It should be based on all observation under investigation.
 It should be not be affected by extreme observations.
 It should be capable of further algebraic treatment.
 It should be as little as affected by fluctuations of sampling or it should be
stable with sampling.
 It should be ease to calculate and simple to understand.
 It should be unique and always exist.
Note: There is no measure satisfied all the above condition, we choose the one that satisfies
most of the properties!
47

48

3.4 TYPES OF MEASURES OF CENTRAL TENDENCY
There are several different measures of central tendency; each having its advantage and
disadvantage, including:
• The Mean
• The Median
• The Mode
The choice of these averages depends up on which one best fits the property under
discussion.
Mean: There are three types of mean which are suitable for a particular type of data. They
are:
a) Arithmetic mean or Average

b) Geometric mean
c) Harmonic mean
3.4.1 The Arithmetic Mean
It is divided in to two that is simple arithmetic mean and the weighted arithmetic mean
1) Simple Arithmetic Mean:
Different methods exist for grouped and ungrouped data. These are direct method and
indirect method.
1) Direct method
- The mean is defined as the sum of the magnitude of the items divided by the number of
items
The mean of X1, X2 ,X3 …Xn is denoted by A.M, and is given by:
When the data are arranged or given in the form of frequency distribution i.e. there
are k variate values such that a value has a frequency ( i=1,2,---,k) ,then the
Arithmetic mean will be
49

Where k is the number of classes and .
Arithmetic Mean for Grouped Data

If data are given in the shape of a continuous frequency distribution, then the arithmetic
mean is obtained as follows:
=the class mark of the ith class and fi = the frequency of the ith class
Example 3.3
Find the arithmetic mean for the following frequency distribution
Class interval Frequency Class mark

2-5 3 3.5
6-8 2 7
9-12 5 10.5
13-15 4 14
50

Activity 3.2
1) Daily cash earnings of 15 workers working in different industries are as follows:

11.63,8.22,12.56,12.14,29.23,18.23,11.49,11.30,17.00,9.16,8.64,27.56,8.23,19.77,12.81
Find the average daily earning of a worker?
2) The distribution of age at first marriage of 130 males was as given below
Age in years(X):18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29.
No. of males (f): 2, 1, 4, 8, 10, 12, 17, 19, 18,1 4, 13,12.
Compute the average age of males at first marriage?
3) Calculate the mean for the following age distribution.

Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
2) Indirect Method
Coding of data: - a linear transformation of data may be regarded as coding. In

coding we shift the origin and change the scale.
The effect of coding on mean is given below.
1) If we subtract an arbitrary constant from each observation, the mean is also reduced
by the constant value
51

2) If we divided each observation of a set by arbitrary constant, the mean reduced as
many times the constant divisor.
Note: In case of addition or multiplication, the word ‘reduced’ should be replaced by
increased in the above statement.
The origin data are transformed using some assuming mean (working mean) denoted
by A and let xi denotes the original value, then
Show!
When the data are arranged or given in
the form of frequency distribution
For grouped data
Show!
Activity 3.3
1) Suppose that the deviation of the observation from the assumed mean of 7 are
1, -1, -2, -2, 0, -3, -2, 2, 0,-3
a) Find the true mean.

b) Find the original observation
52

2) Find the mean of the marks obtained by 51 students with A=48.5 and w=10 of
xi 28.5, 38.5, 48.5, 58.5, 68.5
fi 4, 12, 15, 13, 7
Special properties of Arithmetic mean

1. The sum of the deviations of a set of items from their mean is always zero. i.e.
2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.
, for any constant A.
3. If is the mean of observations , and is the mean of observations , etc, and

is the mean of observations , then the mean of all the observation in all groups,
often called the combined mean, is given by:
4. If a wrong figure has been used when calculating the mean the correct mean can be
obtained with out repeating the whole process using:
Where n is total number of observations.
5. The effect of transforming original series on the mean.

a) If a constant k is added/ subtracted to/from every observation then the new mean
will be the old mean± k respectively i.e. .
b) If every observations are multiplied by a constant k then the new mean will be
k*old mean i.e. .
53

2) Weighted Mean
When a proper importance is desired to be given to different data, a weighted mean is
appropriate.
Weights are assigned to each item in proportion to its relative importance.
Let X1, X2, …, Xn be the value of items of a series and W1, W2, …, Wn their corresponding
weights , then the weighted mean, denoted , is defined as:
Example 3.4
In 2002/03, the average salaries of elementary school teachers in three cities were Birr 24,
000, 20,000, and 30,000. If there were 600,400 & 800 elementary school teachers, find the
weighted average salary of all the elementary school teachers in the three cities.
Solution.
Activity 3.4:
a) A student obtained the following percentage in an examination: English 60, Biology 75,
Mathematics 63, Physics 59, and chemistry 55. Find the student’s weighted arithmetic
mean if weights 1, 2, 1, 3, 3, respectively, are allotted to the subjects.
54

b) A teacher allots weights 2 to homework, 3 to mid exam and 5 to final exam. If a student
scores 90, 50, and 60 for HW, ME and FE, respectively, what is his/her average
academic performance?
Merits and Demerits of Arithmetic Mean
Merits:
• It is rigidly defined.
• It is based on all observations.
• It is suitable for further mathematical treatment.
• It is a stable average, i.e. it is not affected by fluctuations of sampling to some extent.
• It is easy to calculate and simple to understand.
Demerits:
• It is affected by extreme observations.
• It can not be used in the case of open end classes.
• It can not be determined by the method of inspection.
• It can not be used when dealing with qualitative characteristics, such as intelligence,
honesty, beauty.
• It can be a number which does not exist in a series of data.
• Some times it leads to wrong conclusion if the details of the data from which it is
obtained are not available.
• It gives high weight to high extreme values and less weight to low extreme values.
3.4.2 Geometric Mean (G.M)

Here it is the particular type of data for which the Geometric mean is of importance because
it gives a good mean value. If the vitiate values are measured as ratios, proportions or
percentages, geometric mean gives a better measure of central tendency than other means.
The G.M of N variate values is the Nth root of their product.
55

Like arithmetic mean, it also depends on all observations. It is affected by the extreme values
but not to the extent of the mean. However, there is one great drawback with it, that it can
not be calculated if any one or more values are zero or negative.
Suppose that X1, X2, ---, XN are N variate values, then their G.M is given as,
In case X1,X2 . . . , XK have the corresponding frequencies f1,f2, . . ., fk, then
Where N=
Example 3.5
Calculate the geometric mean for the following.
2, 3, 4, 6
In case of grouped data, mid-values of the class intervals are considered as Xi.
For logarithmic values of X’s, it becomes the average of logX i values and the formula for the
Geometric mean is:
for i=1,2,. . . ,N.
In case of frequency distribution where each of Xi occurs fi times (i=1,2,. . .,k), we have:
Where for i=1, 2, . . ., k, Then taking antilog of both
sides, we obtain G.M.
Geometric mean for the second purpose is given
Where n is the length of the period
56

Example 3.6
The population of a country in 1980 was 2 million and in 1990 it was 22 million. What was
the average annual increase during this period?
Here n = 11 years, and
Note: The geometric mean is less affected by extreme values than the arithmetic mean and is
useful as a measure of central tendency for some positively skewed distributions.
3.4.3 Harmonic Mean (H.M)
The H.M is the inverse of the arithmetic mean of the reciprocals of the observations of a
set. It is a suitable measure of central tendency when the data pertains to speed, rates, and
time.
Let X1, X2,. . ., XN be N variate values in a set; then the harmonic mean is given by:
, for i=1, 2, …, k.
Example 3.7
Find the harmonic mean of the following data: 2, 1, 4, 3.
Example 3.8
57

If a car driver covered the first 10 km at a speed of 40km/h and the next 10km at a speed of 60km/h.
What is the average speed of the car driver to cover the 20km
Average Speed
If the data are arranged in the for of a frequency distribution in which an observation X i has
frequency fi (i=1, 2, . . .,k), the harmonic mean is given by,
Where for i=1, 2, …, k.
It fulfils almost all properties of a good measure of central tendency, except when any
observation is zero, it can not be calculated. Its main advantage is that it gives more
weightage to small values and less weightage to large values.
Relationship between AM, GM and HM

1 2
Given two values x and x , there is a relation ship that exist between HM,GM and AM.
This relation ship exists in two cases
So that G.M = HM=AM.
58

Combining the two relation ship, we find out that:
Another relation ship:
59

3.4.4 The Mode
The mode is a value which occurs most frequently in a set of values, and which occurs more
than once.
- The mode may not exist and even if it does exist, it may not be unique.
- In case of discrete distribution, the value having the maximum frequency is the modal
value.
- If in a set of observed values, all values occur once or equal number of times, then, there is
no mode.
Example 3.9
a) Find the mode of 5, 3, 5, 8, 9
Mode =5
b) Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal Data: 8 and 9
c) Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
The mode of a set of numbers X1, X2, …, Xn is usually denoted by .
Mode for Grouped data
If data are given in the shape of continuous frequency distribution, the mode is defined as:
Where: = the mode of the distribution

Lmod= the lower class boundary of the modal class
fmo= frequency of the modal class
60

f1= frequency of the class preceding the modal class
f2= frequency of the class succeeding the modal class
W=the size of the modal class
Note: The modal class is a class with the highest frequency
Example 3.10
Find the mode for the frequency distribution given by below.
Class interval Frequency

3-6 4
6-9 8
9-12 10
12-15 3
Activity 3.5
The following is the distribution of the size of certain farms selected at random from a
district. Calculate the mode of the distribution.
61

Size of farms No. of farms
5- 15 8
15- 25 12
25- 35 17
35- 45 29
45- 55 31
55- 65 5
65- 75 3
62

Merits and Demerits of Mode
Merits:
• It is not affected by extreme observations.
• Easy to calculate and simple to understand.
• It can be calculated for distribution with open end class.
 Can be used for qualitative data as well.
Demerits:
• It is not rigidly defined.
• It is not based on all observations
• It is not suitable for further mathematical treatment.
• It is not stable average, i.e. it is affected by fluctuations of sampling to some extent.
• Often its value is not unique.
3.4.5 The Median and other quantiles (quartiles, deciles, percentiles)
In a distribution, median is the value of the variable which divides the data in to two equal
halves.
In an ordered series of data, the median is an observation lying exactly in the middle of the
series. It is the middle most value in the sense that the number of values less than the median
is equal to the number of values greater than it.
Let X1, X2, …, Xn be the observations, then the numbers arranged in ascending order will be
X[1], X[2], …X[n], where X[i] is ith smallest value.
Here, we find that X[1]< X[2]< …<X[n]
Median is denoted by .
63

Median for ungrouped data
Example 3.11
Find the median of the following data.
a) 3,8,4,7,7,5,6,8,7,4,6,8,9,7,6
Arrange the given data in either increasing or decreasing order:
3,4,4,5,6,6,7,7,7,7,8,8,8,9
Median = 7
b) 3,4,4,5,6,6,6,7,7,7,7,8,8,8
Median=
Activity 3.6
a) Actual waiting time for the first job on the selected sample of nine people having different
field of specializations was given below.
Waiting time (in month):11.6, 11.3, 10.7, 18.0, 3.3, 9.2, 8.3, 3.8, 6.8
Calculate the median of the waiting time?
b) The export of agricultural products in million dollars from a country during eight quarters
in 1974 and 1975 was, 29.7, 16.6, 2.3, 14.1, 36.6, 18.7, 3.5, 21.3.
Find the median of the given set of values?
Median for grouped data.

If data are given in the shape of continuous frequency distribution, the median is defined as:
Where: =lower class boundary of the median class.
64

W=the size of the median class.
n=total number of observation.
Note: The median class is the class with the smallest cumulative frequency (less than type)
greater than or equal to n/2.
Example 3.12
Find the median wage of the following distribution
Wages(in Rs) 2000-3000 3000n-4000 4000-5000 5000-6000 6000-7000

No.of workers 3 5 20 10 5
Solution:
Wages(in Rs) No.of workers Cf

2000-3000 3 3
3000-4000 5 8
4000-5000 20 28
5000-6000 10 38
6000-3000 5 43
Here N/2 =43/2=21.5.

So, the first cf ≥ 21.5 is 28 and the corresponding class is 4,000-5,000, so the median class is
4,000-5,000, and
65

Activity 3.7
Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Merits and Demerits of Median

Merits:
• Median is a positional average and hence not influenced by extreme observations.
• Can be calculated in the case of open end intervals.
• Median can be located even if the data are incomplete.
Demerits:
• It is not a good representative of data if the number of items is small.
• It is not amenable to further algebraic treatment.
• It is susceptible to sampling fluctuation (likely to be affected by sampling fluctuation).
Empirical relationship between
, for symmetrical distribution
, for unimodal skewed or asymmetrical frequency distribution
QUARTILES, DECILES AND PERCENTILES
Quartiles
66

Quartiles are values, which divide the ordered data in to 4 equal parts. Hence there are three
quartiles. ,
 The first quartile Q1 is the value that is the first quarter of the given ordered data.
 The second quartile Q2 is the value that divides the given ordered data in to two
equal parts
 The third quartile Q3 is the value that is the third quarter of the given ordered data
Quartiles are the measurements that divide the series in to 4 equal parts. The median is
the 2nd quartile. The first quartile (Q1) is the value of the item, which divides the lower
half of the distribution into two equal parts. The third quartile (Q3) is the value or the
item that divided the upper half of the distribution in to two equal parts. That is it is the
value of the the item in the series
For raw (ungrouped) data, first arrange the n observations in increasing order of
magnitude. Then the ith quartile is given by
Value of the ordered data
In dividing i(n+1) by 4, there may be a remainder r ,let q be the quotient and r be the
remainder of the division then
Example 3.13
Find the first, the second and third quartile for the following data. (exam result 10%) of
15 students 4,8,9,7,6,6,6,7,7,8,8,8,9,9,
67

Example 3.14
The following are yields of barley from 14 plots
30,32,35,38,40,42,48,49,52,55,58,60,62, and 65 . Find the 1st and 3rd quartiles.
The ith quartile for grouped frequency distribution is given by
Where Qi- is quartile
Lqi = The lower class boundary of the class in which the ith quartile is located
68

Fpqi- is the cumulative frequency of the class immediately preceding the class containing Qi
fqi- the frequency of the class containing Qi
W- width of the class containing Qi
N = Sample size
Example 3.15
Calculate three quartiles for the following data
Marks No. of Students(f) Less than cf

0-10 6 6
10-20 5 11
20-30 5 19
30-40 15 34
40-50 7 41
50-60 6 47
60-70 3 50
Total 50
c.f > 12.75 is 19, so the corresponding class containing Q1 is 20-30
C.F >25 is 34, so the Corresponding class containing
69

c.f> 37.25 is 41 ., the Corresponding class containing
Example 3.16
Find the 1st and 3rd quartiles for the following data
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70

No. of students 6 5 8 15 7 6 3
< c.f 6 11 19 34 41 47 50
So the corresponding class containing Q1 is 20-30
is 41, so the corresponding class containing Q3 is 40-50
DECILES
The values that divide the data set in to ten equal parts are called deciles. They are denoted
by D1, D2,…, D9 respectively
70

For row (ungrouped) data, first arrange the n observations in increasing order of magnitude.
Then the ith deciles is given by
In dividing i(n+1) by 10, there may be a remainder r ,let q be the quotient and r be the
remainder of the division then
The ith deciles for grouped frequency distribution is given by
Where Di = the ith decile

th
- Ldithe lower class boundary of the class in which the i decile is located
- Fpdi is the cumulative frequency of the class immediately preceding the class containing Di
- fdi is he frequency of the class containing Di

- w- width of the class containing Di
Sample size
PERCENTILES
The values that divide our data set in to hundred equal parts are called percentiles. They are
denoted by p1,p2,…, p99
For raw (ungrouped) data, first arrange then observation in increasing order of magnitude
71

In dividing by 100, there may be remainder, let q be the quotient and be the
remainder of the division
The ith percentile for grouped frequency distribution given by
where pi- the ith percentile
The lower class boundary of the class in which the ith percentile is located
Is the cumulative frequency of the class immediately preceding the class containing pi
The frequency of the class containing,

Width of the class containing pi
Sample size
Example 3.17
Calculate i) 7th decile, and ii) 90 th percentile from the following table.
Monthly per No of families C.f

capital exp. Classes
140-150 17 17
150-160 29 46
160-170 42 88
170-180 72 160
180-190 84 244
190-200 107 351
200-210 49 400
210-220 34 434
220-230 31 465
230-240 16 481
240-250 12 493
72

Solution:
The number 345.8 is contained in the minimum cumulative frequency 351, hence the class
190-200 is the 7th decile class
Then 199.5.
ii) .
The number 444.60 is contained in the minimum cu.fr.465 hence, the 90 th percentile class is
220-230 . So that, we have:
73

CHAPTER SUMMARY
 Measures of central tendency are those statistical methods used to find the
values used to represent sets of values in a data set
 Arithmetic mean is the sum of all the values in data set divided by the total number of
observations
 Median is the middle value after the observation are arranged in the order of their
magnitude
 Mode is the value that occurs with the highest frequency in a dataset
 Harmonic mean is the reciprocal of the numbers
 Geometric mean is the nth root of the product of n numbers
 Quartiles are the values that divide a given data set in to four equal parts
Deciles are the values that divide a given data set in to ten equal parts
 Percentiles are the values that divide a given data set in to hundred equal parts
 Different measures of control Tendency have different properties and applications we
are there fore, required to apply them in their appropriate places
74

1. The arithmetic mean of two numbers is 13 and their geometric mean is 12.
Find a) the numbers b) the HM.
2. The following table shows the distribution of marks of 100 students in a certain exam
out of 50. The median and mode are given to be 25 and 24 respectively. Calculate the
missing frequencies and then arithmetic mean of the data..
Marks 0-10 10-20 20-30 30-40 40-50
Number of students 14 ? 27 ? 15
3. The mean weight of 150 students in a certain class is 60 kg. The mean weight of boys
is 70 kg and that of girls is 55 kg. Find the number of boys and girls in the class.
4. The ratios of teachers to students in four colleges are 1:8, 2: 15, 1:10 and 2:21. Find
the average ratio of teachers to students.
5. An entrance exam for a job consists of three subjects, English, Mathematics and Office
management having 20%, 30% and 50% respectively. Find the average score of a
candidate who got 70%, 60% and 50%, respectively in the three exams. Find the
average ratio of teachers to students.
6. In a class there are 30 females and 70 males. If females averaged 60 in an examination

and boys averaged 72, find the mean for the entire class?
7. An average weight of 10 students was calculated to be 65.Latter it was discovered that

one weight was misread as 40 instead of 80 k.g. calculate the correct average weight?
75

8. The mean of n Tetracycline Capsules X 1, X2, …,Xn are known to be 12 gm. New set
of capsules of another drug are obtained by the linear transformation Y i = 2Xi – 0.5
( i = 1, 2, …, n ) then what will be the mean of the new set of capsules?
9. The mean of a set of numbers is 500.
a. If 10 is added to each of the numbers in the set, then what will be
the mean of the new set?
b. If each of the numbers in the set are multiplied by -5, then what will
be the mean of the new set?
10. Ato Ayele spent Birr 100 on each of the following two items: A and B. If the prices
of the items are Birr 30 and Birr 10 per kg respectively, find the average price of the
items per kg.
11. In a surveying class there are 10 freshman, 6 second year and 12 third year students.
If the freshman averaged 70 in an examination, the second years averaged 75 and the
third years averaged 85. Find the mean grade for the entire class.
12. The profit of a company increased by 25% during the year 1992, increased by 40%
during the year 1993, decreased by 20% in the year 1994 and increased by 10%
during the year 1995. Find the average growth in the profit level over the four year
periods.
13. In a 400- meter athletic competition a participant covers the distance as given below.
Find the average speed.
Speed (Meters per second)
First 80 meters 10
Next 240 meters 7.5
Last 80 meters 10
76

14. Following is the distribution of marks obtained by 500 students in statistics.
Marks more than 0 10 20 30 40 50
Number of students 500 460 400 200 100 30
a) Calculate the most suitable average

b) Obtain Q 1, Q3, D2, D7, P28, P40 and P80 and interpret the results.
15. Suppose the price of an item in a certain shop is presented as shown below:
Price Number of items

10-19 27
21-29 A
31-39 28
41-49 B
51-59 19
Total N
If 75% of the items is less than or equal to 45 and most of the items have a price of 34,
and then find the missing frequencies.
16. The marks secured out of 100 by a group of students in a school are
given below.
Marks Number of students

Below10 15
Below 20 35
Below 30 60
Below 40 84
Below 50 106
Below 60 120
Below 70 125
Determine the median and modal marks and interpret the results.
77

CHAPTER 4
Measures of Dispersion (Variation)
CONTENTS
4.1 INTRODUCTION AND OBJECTIVES OF MEASURING
VARIATION 73
4.2 ABSOLUTE AND RELATIVE MEASURES 74
4.3 TYPES OF MEASURES OF VARIATION 74
4.4 MOMENTS SKEWNESS AND KURTOSIS 86
INTRODUCTION
In our society, people usually elect their representative that conveys the interest of most of
them. But sometimes the representative may convey the interests that deviate from the
interests of some of the members. That is, the question is “how well their representative
represents them?” Similarly, in statistics, we may seek to know how well an average
represents the whole set of data.
Objectives
At the end of this chapter the student is expected to be able
 Explain the meaning and uses of the measures of dispersion

 Decide the appropriate measures of dispersion for a purpose and
 Compute range, quartile deviation, Inter quartile range, mean deviation and
decide which one is best measure of dispersion
 Compute and interpret the variance and standard deviation
78

 Compute coefficient of variation, standard sore and interpret the results of the
above and other measures of dispersion
4.1 INTRODUCTION AND OBJECTIVES OF MEASURING

VARIATION
Measure of central tendency alone does not adequately describe a set of observation unless
all observations are the same. So we need some additional information like
1) The extent to which the items in a particular distribution are scatters around the central
tendency i.e. measure of dispersion.
2) The direction of scatteredness whether more items are attached towards higher or lower
values i.e. measure of skewness.
3) The extent to which the distribution is more peaked or more flat toped than the normal
distribution i.e. measure of kurtosis.
Measure of dispersion
The scatter or spread of items of a distribution is known as dispersion or variation. In other
words the degree to which numerical data tend to spread about an average value is called
dispersion or variation of the data.
Measures of dispersions are statistical measures which provide ways of measuring the extent
in which data are dispersed or spread out.
Objectives of measuring Variation:

• To judge the reliability of measures of central tendency
• To control variability itself.
• To compare two or more groups of numbers in terms of their variability.
• To make further statistical analysis.
Desirable properties of measure of dispersion
79

The desirable properties for statistical average also apply to a good measure of
dispersion.
80

4.2 ABSOLUTE AND RELATIVE MEASURES
The measures of dispersion which are expressed in terms of the original unit of a series are
termed as absolute measures. Such measures are not suitable for comparing the variability of
two distributions which are expressed in different units of measurement and different
average size.
Relative measures of dispersions are a ratio or percentage of a measure of absolute
dispersion to an appropriate measure of central tendency and are thus pure numbers
independent of the units of measurement. For comparing the variability of two distributions
(even if they are measured in the same unit), we compute the relative measure of dispersion
instead of absolute measures of dispersion.
4.3 TYPES OF MEASURES OF VARIATION
Various measures of dispersions are in use. The most commonly used measures of
dispersions are:
Absolute measure Relative measures
Range Relative range

Quartile deviation Coefficient of quartile deviation
Mean deviation Coefficient of mean deviation
Variance Coefficient of variation
Standard deviation Standard scores
4.3.1 The Range (R) and Relative Range (RR)
The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the
range of scores. Because the range is greatly affected by extreme scores, it may give a distorted
81

picture of the scores. The following two distributions have the same range, 13, yet appear to
differ greatly in the amount of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.
For ungrouped data: R= where is the maximum observation and is the

minimum observation.
For grouped data: where is the last upper class limit and
is the first lower class limit.
Merits and Demerits of range
Merits:
• It is rigidly defined.
• It is easy to calculate and simple to understand.
Demerits:
• It is not based on all observation.
• It is highly affected by extreme observations.
• It is affected by fluctuation in sampling.
• It is not liable to further algebraic treatment.
• It can not be computed in the case of open end distribution.
• It is very sensitive to the size of the sample.
Relative Range (RR)

It is also some times called coefficient of range and given by:
For ungrouped data:
82

For grouped data:
Activity 4.1
1) Find the R and RR and then identify which data is more dispersed?
a) For the month income of 10 workers X i: 347, 420, 500,600,696,710, 835, 850, and
900.
b) For the following age distribution.
Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
2. If the range and relative range of a series are 4 and 0.25 respectively. Then what is the value
of:
a) Smallest observation
b) Largest observation
4.3.2 The Quartile Deviation (QD)& Coefficient of Quartile Deviation (COQD)
IQR is the difference between the upper quartile (Q 3) and lower quartile (Q1) of a given
group. It is a measure of dispersion when the data contains extreme values. It is also a good
measure of dispersion for the distribution having open ended class
Example 4.1
If Q1=12 and Q2=45, then IQR= 45-12=33.
83

Quartile deviation and coefficient of Q.D.
Quartile Deviation is half of the IQR, i.e.

Quartile Deviation= .
It is an absolute measure of dispersion.
To compare the variability of two series, a relative measure known as Coefficient of quartile
deviations is given which is symbolically expressed as:
4.3.3 The Mean (Average) Deviation and Coefficient of Mean Deviation
If xi/fi, i=1, 2, …, n is the frequency distribution then mean deviation from the
mean is given by
Mean deviation from mean
Where represents modulus or the absolute value of the deviation , where the
negative sign is ignored.
Mean deviation from median
Since mean deviation is based on all the observations it is a better measure of dispersion than
range or quartile deviation
Example 4.2
Calculate i) Quartile deviation (Q.D), and ii) mean deviation (M.D) from mean and from
median, for the following data:
84

Marks: 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Freq. 6 5 8 15 7 6 3
Solution:
Marks MV(x) f (x-md) d= x-35 f(x-md) fd
0-10 5 6 29 -3 174 -18 28.4 170-4
10-20 15 5 19 -2 95 -10 18.4 92
20-30 25 8 9 -1 72 -8 8.4 67.2
30-40 35 15 1 0 15 0 1.6 24.0
40-50 45 7 11 1 77 7 11.6 81.2
50-60 55 6 21 2 126 12 21.6 129-6
60-70 65 3 31 3 93 9 31.6 94.6
652 -8 659.2
i) Here N=50 ,
ii) M.D (from mean) =
Mean, marks
M.D (from mean) =
Median =
85

4.3.4 The Variance, the Standard deviation and the coefficient of Variation
Population Variance
If we divide the variation by the number of values in the population, we get

something called the population variance. This variance is the "average squared
deviation from the mean".
For ungrouped data:
Population variance=
For grouped data:
Population variance=
Sample Variance
One would expect the sample variance to simply be the population variance with the
population mean replaced by the sample mean. However, one of the major uses of statistics
is to estimate the corresponding parameter. This formula has the problem that the estimated
86

value isn't the same as the parameter. To counteract this, the sum of the squares of the
deviations is divided by one less than the sample size.
For ungrouped data:
Sample variance=
For grouped data:
Sample variance=
We usually use the following computational formula.
Properties of the Variance
1) The variance has mostly removed the lacunae which are present in measures of
dispersion given before it.
2) The main demerit of variance is that its unit is square of the unit of measurement of
variate values. Generally this value is large and makes it difficult to decide about the
magnitude variation.
87

3) The variances give more weight to the extreme values as compared to those which are
the mean value, because the difference is squared in variance.
4) If the wrong number has been used in calculating the variance and if n,
are known we can correct this. We can use the following formula:
Let
5) If a sample of elements has a variance and a sample of elements have a variance of ,
then the combined variance (is called pooled variance) is given by:
6) If the variance of observation is , then variance of
a) is also . Where k is a constant number.
b) is also .
c) is
Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that
the units were also squared. To get the units back the same as the original data values, the
square root must be taken.
with class frequency
88

for grouped frequency distribution.
Population standard deviation=

Sample standard deviation =
The following steps are used to calculate the sample variance:
1. Find the arithmetic mean.
2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the number of
observations minus one, i.e., n-1 (where n is equal to the number of observations in the data
set).
6. Take the squired root to get the standard deviation.
Remark: If the standard deviation of a set of data is small the values are more concentrated
around the mean and if the standard deviation is large, the value is more scattered widely
around the mean.
Properties of standard deviation
1) It is considered to be the best measure of dispersion and is used widely.

2) There is however one difficulty with it. If the unit of measure of variables of two
series is not same, and then the variability can not be compared by comparing the
values of standard deviation.
3) If the standard deviation of observation is , then the standard
deviation of
a) is also . Where k is a constant number.
b) is also .
89

c) is
90

Example 4.3
Find the variance and standard deviation of the following sample data 5, 17, 12, 10.
Solution:
=11
Xi 5 10 12 17 Total
(Xi- )2 36 1 1 36 74
S2=74/3=24.67 => S= √S2=√24.67 = 4.97
Activity 4.2
i) The data is given in the form of frequency distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
ii) The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a) If 10 is added to each of the numbers in the set, then what will be the variance and
standard deviation of the new set?
b) If each of the numbers in the set are multiplied by -5, then what will be the
variance and standard deviation of the new set?
The Coefficient of Variation (C.V)
Hence in situations where either the two series have different units of measurements, or their
means differ sufficiently in size, the coefficient of variation should be used as a measure of
dispersion.
91

• Is defined as the ratio of standard deviation to the mean usually expressed as percents.
Properties of C.V
1) It is one of the most widely used measures of dispersion because of its virtues.
2) Smaller the value of C.V, more consistent is the data and vice versa.
3) For fixed experiments, C.V is generally reported. If C.V is low it indicates more
reliability of experimental findings.
Example 4.4
Consider the distribution of yields (per plot) of two paddy varieties. For the first variety, the
mean and S.d are 60 kg and 10 kg respectively. For the second variety the mean and S.d are
50kg and 9kg respectively
This shows that the variability in first variety is less as compared to that in the second variety
Activity 4.3
Two distribution A& B have mean 80 inch and 20 kg and s. deviation is 10 inch and 1.5 kg
respectively. Which distribution has greater variability?
Chebyshev's Theorem
 Is, developed by Russian Mathematician Chebyshev, Specifies the proportions of the
spread in terms of the standard deviation.
 For any set of data (population or sample) and any constant k(greater than one) the
proportion of the data that must lie with k standard deviations on either sides of the mean
92

( is at least that is the proportion of items falling beyond k standard
deviations from the mean is at most
The Empirical or Normal Rule:

Chebyshev’s theorem applies to any distribution regardless of its shape. However, when a
distribution is bell-shaped (or what is called normal), the following statements, which make up
the empirical rule, are true.
• Approximately 68% of the data values fall within one standard deviation of the mean i.e.
with in ( .
• Approximately 95% of the data values fall within two standard deviations of the mean i.e.
with in (
• Approximately 99.7% of the data values fall within three standard deviations of the mean
i.e. with in .
4.3.5 The Standard Scores (Z-scores)
• If X is a measurement from a distribution with mean and standard deviation S,

then its value in standard units is
• Z gives the deviations from the mean in units of standard deviation and it tell us
how many S.D a given value lie above or below the mean.
• It also helps in hypothesis testing
• It is used to compare two observations coming from different groups.
93

Example 4.5
Two groups of children were trained to perform a certain task for a month and then tested to
find out which group is faster to learn the task. The average time taken to perform the task
was 10-4 minutes with s.d of 1.2 min &11.9 min with a s.d. of 1.3 min for the 2 nd
group .Child A form group 1 took 9.2 min. while child B from group 2 took 9.3 min, who
was faster in performing the task relative to the other
Group I Group II
Mean = 10.4 Mean = 11.9
S. d = 1.2 S. d = 1.3
XA = 9.2 XB = 9.3
These values indicate that the time taken, by child A is one S.d below the average time taken
by the group. The time taken by child B is two S.d below the mean time taken by his/her
group, child B is therefore, faster in performing the task relative to the other.
4.4 MOMENTS, SKEWNESS AND KURTOSIS
Moments are used to measure skew ness and kurtosis.

If X is a variable that assume the values X1, X2,…..,Xn then
1. The rth moment is defined as:
r=1, 2.3,…
- If r=1, it is the simple arithmetic mean, this is called the first moment.
94

2. The rth moment about the mean (the rth central moment)
- Denoted by Mr and defined as:
For r = 1, 2, …
If r=2, it is population variance, this is called the second central moment. If we assume n-
1≈n ,it is also the sample variance.
3. The rth moment about any number A
- denoted by and defined as:
r=1, 2, …
Remarks: 1) 2) 3)
Activity 4.4
1. Find the first two moments for the following set of numbers 2, 3, 7
2. Find the first three central moments of the numbers in problem 1
3. Find the third moment about the number 3 of the numbers in problem 1.
Skew ness
- Skewness is the degree of asymmetry or departure from symmetry of a distribution.
- A skewed frequency distribution is one that is not symmetrical.
- Skewness is concerned with the shape of the curve not size.
95

- If the frequency curve (smoothed frequency polygon) of a distribution has a longer tail to
the right of the central maximum than to the left, the distribution is said to be skewed to
the right or said to have positive skewness. If it has a longer tail to the left of the central
maximum than to the right, it is said to be skewed to the left or said to have negative
skewness.
- For moderately skewed distribution, the following relation holds among the three
commonly used measures of central tendency.
Mean-Mode=3*(Mean-Median)
Measures of Skewness
It is the measure of the direction and degree of asymmetry.
- Denoted by
- There are various measures of skewness.
1. The Pearsonian coefficient of skewness
2. The moment coefficient of skewness
Note: The shape of the curve is determined by the value of

• If =0 then the distribution is symmetric
• If >0 then the distribution is positively skewed
• If <0 then the distribution is negatively skewed.
Remarks:
96

In a positively skewed distribution, smaller observations are more frequent than
larger observations i.e. the majority of the observations have a value below an
average and it has a long tail in the positive direction.
In a negatively skewed distribution, smaller observations are less frequent than larger
observations i.e. the majority of the observations have a value above an average.
Activity 4.5
1. Suppose the mean, the mode, and the standard deviation of a certain distribution are
32, 30.5 and 10 respectively. What is the shape of the curve representing the distribution?
2. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5.
If the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and
the probable mode of the distribution.
Kurtosis
Kurtosis is the degree of peakd-ness of a distribution, usually taken relative to a normal
distribution. The peakd-ness of a distribution be classified in to three:
a) Leptokurtic: -A distribution having relatively high peak.
- A large number of observations have same values
b) Mesokurtic: - Normal peak
- The curve is properly peak.
c) Platykurtic: - Flat toped
- A large number of observations have low frequency are spread in the
middle interval.
97

Measures of kurtosis:
- It is a measure of peakdness.
- Denoted by and given by
Note: The peakdness depends on the value of

 If >3 then the curve is Leptokurtic.
 If =3 then the curve is Mesokurtic.
 If < then the curve is Platykurtic.
Activity 4.6
1. If the first four central moments of a distribution are:
a) Compute a measure of skewness

b) Compute a measure of kurtosis and give your interpretation.
2. If the standard deviation of a symmetric distribution is 10, what should be the value of
the fourth moment so that the distribution is mesokurtic?
98

CHAPTER SUMMARY
 Variability or dispersion concerns with the extent to which values in a data set
vary from the mean or from one another
 There are different measures of dispersion these include range, variance,
standard deviation, mean deviation and coefficient of variation
 Range is the difference between the largest and the smallest value in a data set
 Variance is the sum of the squares of the difference between the mean and the
individual observations divided by the total number of observations for the case
of population and by n-1 for the case of sample
 Standard deviation is the positive square root of the variance
 Coefficient of variation is the ratio of the standard deviation to the arithmetic
mean and expressed as percentage
 Different measures of dispersion have different properties and different uses, we
have to apply them in their appropriate places
99

1. In a moderately asymmetrical distribution the mode minus the mean is 2.4 and median is
24.8. The coefficient of variation is 20%. Find the mode, the mean and the standard
deviation.
2. The following data are given for 20 observations.
=306, =5490, =16, =10.
Find arithmetic mean and standard deviation for the 20 observations.
3. The standard deviation calculated from a set of 32 observations is 5. If the sum of the
observations is 80, what is the sum of square of the observations?
4. The mean of 5 observations is 3 and variance is 2. If three of them are 1, 3 and 5. Find the
remaining two.
5. The distribution of marks of 50 students in statistics out of 50 is given in the table below.
Marks 0-10 10-20 20-30 30-40 40-50
Number of 5 8 15 16 6
students
Calculate
a) The range b) The quartile deviation
c) The standard deviation and interpret the result.
6. Two models of radio were subjected to a durability test, and the results were as
follows.
Number of sets
examined
Life(in years)
Model A Model B
100

0-2 5 2
2-4 16 7
4-6 13 12
6-8 7 19
8-10 5 9
10-12 4 1
State which model has a longer average life and which model has more uniformity
7. Meteorologist is interested in the consistency of temperatures in three cities. During a

given week collected the following data. The temperatures for the five days of the week in
the three cities were
City 1: 25 24 23 26 17
City 2: 22 21 24 22 20
City 3: 32 27 35 24 28
Which city have the most consistent temperature, based on these data?
8. Suppose Bekele got 90 on a test in which the mean and S.D for the class were 70 and
10 respectively. In other test Almaz score 60 in which the mean and S.D for her class
were 56 and 40 respectively.
a) Who was better of relative to his/her class?
b) Which class has students of less similar result?
9. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5.
If the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and
the probable mode of the distribution.
10. If the standard deviation of a symmetric distribution is 10, what should be the
value of the fourth moment so that the distribution is mesokurtic?
101

CHAPTER 5
ELEMENTARY PROBABILITY
CONTENTS
5.1. Introduction 95
5.2. Definition of some probability terms 96
5.3. Counting rules: addition and multiplication rules,
permutation and combination 97
5.4. Probability of an event 103
5.5. Some probability rules 107
5.6. Conditional probability and independence 107
In this chapter, there are two main points to be discussed: Possibilities and Probabilities. After
presenting some basic concepts of probability, the next part is about techniques of counting or the
methods used to determine the number of possibilities, which are indispensable to compute
probabilities, then followed by different definitions of probability; and finally, some general rules
and derived theorems of probability will be presented.
Objectives:
After studying this chapter, you should be able to:
Define basic terms in probability such as: sample space, outcome, event, and so on.
Describe the addition and multiplication rules of counting.
Differentiate between permutations and combinations.
Evaluate the probability of an event through the various methods.
Describe the different rules of probability.
5.1 INTRODUCTION
In our daily life, it is not uncommon to hear words which express our doubts or being
uncertain about the happenings of certain events. To mention some instances,
102

“If by chance you meet her, please convey my heart-felt greeting";
“Probably, he might not take the class today";
“There is a 50-50 chance of survival of a cancer”, his physician said;
For a Mathematician, there is a fat chance of passing this course; etc
These statements show uncertainty about the happening of the event under question. In
Statistics, however, sensible numerical statements can be made about uncertainty and apply
different approaches to calculate probabilities.
A numerical measure of uncertainty is provided by a very important branch of Mathematics

known as Theory of Probability.
In general, there are three states of expectations: certainty, impossibility, and uncertainty.
Probability Theory is concerned about the study of a random (chance) phenomena; it is a numerical
measure of the chance of occurrence of something (called an event), which shows the degree of
uncertainty. Thus, we say that the probability of the above three expectations is, respectively, one,
zero, and between zero and one. Probability Theory is thr basis for all statistical applications in any
field of study.
Since probability theory is closely related with set theory, one need to revise this section from
mathematics. Probability is also defined in terms of relative frequency, presented in chapter two of
this module. Thus, the following is a review of your knowledge on these topics, and of course from
your knowledge of elementary probability is High School.
• Probability theory is the foundation upon which the logic of inference is built.
• It helps us to cope up with uncertainty.
• In general, probability is the chance of an outcome of an experiment. It is the measure of how likely
an outcome is to occur.
5.2 DEFINITIONS OF SOME PROBABILITY TERMS
1. Experiment: Any process of observation or measurement or any process which generates

well defined outcome.
103

2. Probability Experiment (Random Experiment): It is an experiment that can be repeated
any number of times under similar conditions and it is possible to enumerate the total
number of outcomes with out predicting an individual out come.
Example 5.1
If a fair coin is tossed three times, it is possible to enumerate all possible eight sequences of
head (H) and tail (T). But it is not possible to predict which sequence will occur at any
occasion.
3. Outcome: The result of a single trial of a random experiment
4. Sample Space(S): Set of all possible outcomes of a probability experiment.
Example: Sample space of a trial conducted by three tossing of a coin is S=
{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Sample space can be
 Countable (finite or infinite)
 Uncountable
5. Event (Sample Point): It is a subset of sample space. It is a statement about one or more
outcomes of a random experiment. It is denoted by capital letter A.
For example, in the event, that there are exactly two heads in three tossing of a coin, it
would consist of three points HTH, HHT and THH.
Remark: If S (sample space) has n members then there are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an event: The complement of event A (denoted by ), consists of
all the sample points in the sample space that are not in A.
8. Elementary (simple) Event: an event having only a single element or sample point.
9. Mutually Exclusive (Disjoint) Events: Two events which cannot happen at the same
time.
10. Independent Events: Two events are said to be independent if the occurrence of one
does not affect the probability of the other occurring.
104

11. Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.
5.3 COUNTING RULES: ADDITION, MULTIPLICATION RULES,

PERMUTATION AND COMBINATION
In order to calculate probabilities, we have to know

• The number of elements of an event
• The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
• In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule
- The multiplication rule
- Permutation rule
- Combination rule
Addition Rule
If event A can occur in m possible ways and event B can occur in n possible ways,
there are m+n possible ways for either event A or event B to occur,
but only if there are no events in common between them.
i.e. n (A or B) =n (A)+n(B)-n(A B)
Observe the following Venn-diagram
Only A = A-B
105

Only B=B-A
Both A & B=
Notes: 1) An alternative expression is: n =n(only A)+n(only B)+n(both A & B).
If , then n = n(A) + n(B).
n(A)= n(only A)+ n ; and n(B)= n(only B)+ n .
Alternative expressions: A-B= , and B-A = .
To list the outcomes of the sequence of events, a useful device called tree diagram is used.
Example 5.2:
A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or milk with
bread, cake and sandwich. How many possibilities does he have?
Solutions: See the following 9 possibilities:

Tea
Bread Coffee Milk
Cake Bread Bread
Sandwich cake cake
Sandwich Sandwich
The Multiplication Rule
106

If a choice consists of k steps of which the first can be made in n 1 ways, the second can be
made in n2 ways…, the kth can be made in nk ways, then the whole choice can be made in (
ways.
Example 5.3
If a man has 3 pairs of trousers, 5 shirts, 2 jackets and 3 pairs of shoes, in how many different ways
can he wear his clothes and shoes?
Solution: Using n1=3, n2=5, n3=2 and n4=3, the total number of ways of wearing is:
Activity 5.1
The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many different
cards are possible if
a) Repetitions are permitted.
b) Repetitions are not permitted.
Factorial notation
The symbol "n!", read as " n factorial", denotes the product of all positive integers less than or equal
to n.
Let n be a non-negative integer. Then, n factorial, denoted by n!, is defined as
n!= n*(n-1)*(n-2)*…*2*1, where 0!=1.
Note the following relation ships:
n!=n(n-1)!=n(n-1)(n-2)! =n(n-1)(n-2)(n-3)!, etc.
, and so on. In general, we can have
Permutation
An arrangement of n objects in a specified order is called permutation of the objects.
107

Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! =n*(n-1)*(n-2)*,…,*2*1.
2. The arrangement of n objects in a specified order using r objects at a time is called the
permutation of n objects taken r objects at a time. It is written as and the formula is
3. The number of permutations of n objects in which k1 are alike, k2 are alike ---- etc is
4. The arrangement of n objects around a line is (n-1)! ways.

5. The number of ways of partitioning a set of n things in to k cells where there are n 1
elements in the first cell, n2 elements in the second cell,…,nk elements in the kth cell is
Example 5.4
Find the permutations of two of the five vowels a, e, i, o, u; and list them.
Solution: There are n1*n2=5*4=20 different permutations, listed as follows:
ae,ai,ao,au, ea,ei,eo,eu, ia,ie,io,iu, oa,oe,oi,ou, ua,ue,ui,uo.
108

Activity 5.2
1. Suppose we have a letters A, B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word
“MISSISSIPPI”?
3. In how many ways can people assigned 1 triple and 2 double room?
4. In how many ways can a party of 7 people arrange themselves?
a) In a row of 7 chairs?
b) Around a circular table?
Combination
There are many problems in which we are interested in finding the number of ways in which r objects
can be selected from n distinct objects without regard to the order of selection. Such selections are
called combinations.
Definition: The number of ways of selecting r objects from a set of n objects with out regard
to the order of selection is called combination.
Example 5.5
Given the letters A, B, C, and D list the permutation and combination for selecting two
letters.
Solutions:
Permutation: Combination:
AB BC
AC BD
AD DC
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
109

The number of combinations of r objects selected from n objects is denoted by or
and is given by the formula:
Example 5.6
In how many ways can 3 letters be selected form the four letters a, b, c & d?
Solution: Since we do not care about their order of selection, we have only the following four cases:
abc, abd, acd, & bcd.
But recall that, the number of permutations of 3 letters out of the 4 is 4P3 =24, and we know that each
of the three letters can be arranged in 3! = 6 ways.
Thus, we have (3!)4 = 4P3, from which we get 4=4P3/3! = , say.
Actually, "combination" means the same as "subset"; in the above case, the number of subsets of 3
elements that can be selected from a set of 4 distinct elements is = 4.
This is called the total number of combinations of 3 objects selected from n distinct objects.
Notation: is used to denote the combination of n objects taking r of them at a time.
The number of combinations of n distinct objects taking r of them at a time is given by:
, for .
Example 5.7
In how many ways can a committee of 2 students be formed out of 6?
Solution: We substitute n = 6 and r =2, to get 6C2=15.
Example 5.8
If a committee of 5 candidates is to be formed out of 10, of which 4 are girls and 6 are boys, how
many committees can be formed if 2 girls are to be included?
110

Solution: It can be seen as a two-stage selection. Since 2 of the 4 girls can be selected in n 1=4C2=6
ways, and 3 of the 6 boys in n2= 6C3 =20 ways, then, the total number of committees is
5.4 PROBABILITY OF AN EVENT
There are four different conceptual approaches to study probability theory. These are: • The
classical approach.
• The frequencies approach.
• The axiomatic approach.
• The subjective approach.
The classical approach

This approach is used when:
- All outcomes are equally likely and mutually exclusive.
- Total number of outcome is finite, say N.
Definition
If a random experiment with N equally likely outcomes is conducted and out of these N A
outcomes are favorable to the event A, then the probability that event A occur denoted P (A)
is defined as:
Limitation:
 If it is not possible to enumerate all the possible outcomes for an experiment.
 If the sample points (outcomes) are not mutually independent.
 If the total number of outcomes is infinite.
 If each and every outcomes is not equally likely.
Example 5.9
What is the probability that a 3 or 5 will turn up in rolling a fair die ?
Solution: S ={1, 2, 3, 4, 5, 6}; let E ={3, 5}. For a fair die, P(1)=P(2) =  =P(6)=1/6; then,
P(E)=m/n=2/6=1/3.
111

Activity 5.3
1.A fair die is tossed once. What is the probability of getting
a) Number 4?
b) An odd number?
c) Number greater than 4?
d) Either 1 or 2 or …. Or 6
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of
these candles are selected at random, what is the probability?
a) All will be defective.
b) 6 will be non defective
c) All will be non defective
Limitations of the classical definition
At times, it is impossible to apply this concept. Two instances can be mentioned:
i) If the outcomes are not equally likely to occur.
If one sits for a quiz, the two options (pass/fail) are not equally likely.
If one jumps from a running train, is his survival/death equally likely?
ii) If the total number of outcomes is infinite.
The Frequencies or a posteriori Approach

This is based on the relative frequencies of outcomes belonging to an event.
Definition
The probability of an event A is the proportion of outcomes favorable to A in the long run
when the experiment is repeated under same condition.
Example 5.10
112

If the records of Ethiopian Air Lines show that 468 of 600 of its flights from B/Dar to Addis arrived
on time, what is the probability that any one of similar flights will arrive on time?
Solution: If E =The event that the plane will arrive on time, then:
Note: The probability of not arriving on time is: P( EC )
That is, the plane didn't arrive on time for 600 – 468 =132 flights; thus, .
In general, or
Activity 5.4
If records show that 60 out of 100,000 bulbs produced are defective. What is the probability
of a newly produced bulb to be defective?
Axiomatic Approach:
This approach does not give precise definition of probability but gives certain axioms or postulates
or rules on which probability calculations are based. Then, anyone of the preceding concepts can be
used in applications as long as it is consistent with these rules.
Let E be a random experiment and S be a sample space associated with E. With each event A a real
number called the probability of A satisfies the following properties called axioms of probability or
postulates of probability.
1.
2. P(s) =1
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P (AuB) =P (A) +P (B)
Subjective Approach
It is always based on some prior body of knowledge. Hence subjective measures of
uncertainty are always conditional on this prior knowledge. The subjective approach accepts
113

unreservedly that different people (even experts) may have vastly different beliefs about the
uncertainty of the same event.
Example 5.11
Abebe’s belief about the chances of Ethiopia Buna club winning the FA Cup this year may
be very different from Daniel's. Abebe, using only his knowledge of the current team and
past achievements may rate the chances at 30%. Daniel, on the other hand, may rate the
chances as 10% based on some inside knowledge he has about key players having to be sold
in the next two months.
5.5 Some probability rules
There are also other other rules, but all are derived from the above three postulates.
Some of them are:
a) P(A1 A2 … An) =P(A1)+ P(A2) + … + P(An) , if A1, A2, …,An are pairwise mutually
exclusive.
b) , probability never exceeds unity.
c) .
d) , where is the complement of event A.
e) For any two events A and B, P(A B)=P(A)+P(B)-P(A B); this is the general addition
rule.
5.6 CONDITIONAL PROBABILITY AND INDEPENDENCE

Conditional Events: If the occurrence of one event has an effect on the next occurrence of
the other event then the two events are conditional or dependant events.
Conditional probability of an event
114

The conditional probability of an event A given that B has already occurred, denoted by
P(A/B).Since A is known to have occurred, it becomes the new sample pace replacing the
original sample space.
From this we are led to the definition
, P(B) or P(A B)=P(A/B).P(B)
Remark: 1)
2)
3) For three events A, B, and C
.
4) If an event A must result in of the mutually exclusive events A1.A2,…, An.
Then P (A) =P(A1).P(A/A1) + P(A2).P(A/A2) + ….+ P(An).P(A/An).
5) Suppose that A1, A2, …, An are mutually exclusive events whose union is the sample
space.
Then is called Bayes’ rule
Activity 5.5
1. For a student enrolling at freshman in a certain university, the probability is 0.25 that
he/she will get scholarship and 0.75 that he/she will graduate. If the probability is 0.2 that
he/she will get scholarship and will also graduate. What is the probability that a student who
get a scholarship graduate?
2) A lot consists of 20 defective and 80 non-defective items from which two items are
chosen without replacement. Find the probability that:
a) that both items are defective, b) the second item is defective.
Probability of Independent Events
115

The probability of B occurring is not affected by the occurrence or nonoccurrence of A, then
we say that A and B are independent events i.e. P (B/A)=P(B). This is equivalent to
Remarks: If A1, A2, A3 are to be independent then they must be pair wise independent,
Where j,k=1,2,3 and we must also have
Example 5.12
Given that P (A) = 0.4, P (B) = 0.2, P (A B) = 0.08,
P(C) = 0.5, P (D) = 0.3, P(C D) = 0.10.
a) Are A and B independent? b) Are C and D independent?
Solution: a) P (A) P (B) = (0.4) (0.2) = 0.08 = P (A B).Hence, A and B are independent.
b) P(C) P (D) = (0.5) (0.3) = 0.15  P(C D) = 0.10. Hence, C and D are dependent.
The notion of independence can be extended to more than two events:
Example 5.13
A problem in Statistics is given to three students X, Y, and Z, whose probabilities of solving it are
, respectively. What is the probability that
a) All of them will solve it; b) Any one of them will solve it, if they try independently?
Solution: WE are given that P(X) = ½, P(Y) = ¾, and P (Z) = ¼.
b)
Activity 5.6
116

1. A fair die is tossed twice. Find the probability of getting a 4, 5,or 6 on the first toss and a
1,2,3 or 5 on the second toss?
2. A ball is drawn at random from a box containing 6 red balls, 4 white balls and 5 blue balls.
Find the probability that they are drawn in the order red, white and blue if each ball is
a) replaced b) not replaced
117

CHAPTER SUMMARY
Mutually exclusive events are those which do not occur together.
Independent events are those which do not affect each other.
A & B are independent if and only if:
The number of permutations of r objects selected out of n different objects is given by
All of n different objects can be arranged in n! different ways.
The number combinations of n distinct objects selecting r of them at a time is:
Classical probability concept: The probability of an event is m/n if it can occur in m ways out of a
total of n equally likely ways.
The relative frequency concept of probability: The probability of the occurrence of an event equals
its relative frequency.
The three axioms of probability are:
Probability is non - negative.
Probability of a sample space is unity.
if A & B are mutually excursive.
 , number of elements in a union of sets.
 , for any two events.
 For any three events A, B, and C,
118

1. If find
2. a) b) ; c)
3. A coin is loaded so that , and . If the coin is tossed three times,

what is the probability of getting?
a) all heads; b) two tails and a head in this order; c) two tails & a head in any order?
3. Among 15 clocks there are two defectives. In how many ways can an inspector choose
3 of clocks for inspection, so that
a) Non of the defective is included
b) Only one defective is included
c) Two defectives are included
4. In how many ways can a committee of 5 people be chosen out of 9 people?

5. Out of 5 Mathematician and 7 Statistician a committee consisting of 2 Mathematician
and 3 Statistician is to be formed. In how many ways this can be done if
a) There is no restriction
b) One particular Statistician should be included
c) Two particular Mathematicians can not be included on the committee.
6. A committee of 5 people must be selected out 5men and 8 women. In how many ways
can be selection made if there are three women on the committee?
7. A recent survey asked 100 people if they thought women in armed forces should
permitted to participate in combat. The result of the survey are shown in the table
Gender Yes No Total

Male 32 18 50
Female 8 42 50
Total 40 60 100
Find the probabilities
119

a) The respondent answer “yes” given that the respondent was female
b) The respondent was a male given that respondent answered “no”
8. Consider the following 2 × 2 table that shows incidence of myocardial infarction

(denoted MI) for women who had used oral contraceptives and women who had
never used oral contraceptives.
MI Yes MI No Totals
Used oral contraceptives 55 65 120
Never used oral contraceptives 25 125 150
Totals 80 190 270
Assume that the proportions in the table represent the “infinite population” of adult
women. Let A = {woman used oral contraceptives} and let B = {woman had an MI
episode}
Find a) P(A), P(B), P(Ac), and P(Bc).
b) What is P(A n B)?
c) What is P(A u B)?
d) Are A and B mutually exclusive?
e) What are P(A|B) and P(B|A)?
f) Are A and B independent?
120

CHAPTER 6
PROBABILITY DISTRBUTIONS
CONTENTS
6.1. DEFINITION OF RANDOM VARIABLES AND PROBABILITY
DISTRIBUTIONS 114
6.2. INTRODUCTION TO EXPECTATION – MEAN AND VARIANCE OF
A RANDOM VARIABLE 118
6.3. COMMON DISCRETE PROBABILITY DISTRIBUTIONS –
BINOMIAL AND POISSON 121
6.4. COMMON CONTINUOUS PROBABILITY DISTRIBUTIONS -
NORMAL, CHI-SQUARE, T AND F 125
INTRODUCTION
In chapter 5, the techniques of computing the probability of an event have been introduced.
In this chapter, we shall study the most commonly used discrete probability distributions;
namely, the Binomial and Poisson distributions; and three continuous probability densities:
normal, chi-square and t distributions, which are very important in statistical inference. We
will also mention some of their properties, because we need the results. But before
presenting the probability distributions specifically, we need to define a random variable, a
probability distribution, and the mean and variance, in general, of a continuous as well as
discrete random variables.
Objectives:
121

After studying this chapter, you should be able to:
 Describe a random variable and its probability distribution.
 Evaluate the mathematical expectation of a random variable.
 Evaluate probabilities of a discrete and continuous random variables.
 Identify the appropriate distribution under a given situation.
6.1. DEFINITION OF RANDOM VARIABLES AND PROBABILITY

DISTRIBUTION
Random variable: - is numerical valued function defined on the sample space. It assigns a
real number for each element of the sample space. Generally a random variables are denoted
by capital letters and the value of the random variables are denoted by small letters
Random variables are of two types: Discrete and Continuous.
Discrete random variable: are variables which can assume only a specific number of
values. They have values that can be counted
Examples
• Toss a coin n time and count the number of heads.
• Number of children in a family.
• Number of car accidents per week.
• Number of defective items in a given company.
• Number of bacteria per two cubic centimeter of water.
Continuous random variable: are variables that can assume all values between any two
give values.
Examples
• Height of students at certain college.
122

• Mark of a student.
• Life time of light bulbs.
• Length of time required to complete a given training.
Probability distribution: - consists of a value a random variable can assume and the
corresponding probabilities of the values or it is a function that assigns probability for each
element of random variable.
Probability distribution can be discrete or continuous.
Discrete probability distribution: - is a formula, a table, a graph or other devices used to

specify all possible values of the discrete random variable (R.V) X along with their
respective probabilities.
Example 6.1
In an experiment of "flipping a fair coin 3 times", list the elements of the sample space that
are assumed to be equally likely (as this is what is meant by a fair or balanced coin) and the
corresponding values x of the r-v X, the number of heads observed.
Solution: If H stands for heads and T for tails, then the sample space corresponding to this
experiments is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Since X= the number of heads observed, the results are shown in the following table:
Element of S Probability x
HHH 1/8 3
HHT 1/8 2
HTH 1/8 2
HTT 1/8 1
THH 1/8 2
THT 1/8 1
TTH 1/8 1
123

TTT 1/8 0
Thus, we can write X(HHH) = 3, X(HHT) = 2, , X(TTT) = 0,

and P(X = 3) = 1/8 = the probability that the r-v X is 3, P(X= 2) = 3/8, and P(X=0)=1/8.
Note that the possible values of X are: .
Activity 6.1
1) Consider an experiment of tossing a coin three times. Let X be the number of heads.
Construct the probability distribution of X.
2) A balanced die is tossed two twice, construct a probability distribution if:
A) X is the sum the number of spots in the two trials.
B) X is the absolute difference of the number of spots in the trials.
Properties of discrete probability distribution
1)
2)
3) If X is discrete random variable then
Example 6.2
124

Check whether the following function can serve as a pmf of a discrete r-v X:
Solution: Substituting the different values of x, we get
Since these values are all non-negative and the sum is , the given
function can serve as a pmf of some random variable whose domain is .
B) Continuous probability distribution
Definition: a non negative function f(x) is called probability distribution of continuous R.V
X if the total area bounded by the curve and the X-axis is 1 and if the sub area under the
curve bounded by the curve & X-axis and perpendicularly erected at any points a and b give
the probability that X is between a and b.
Example 6.3
Suppose that the r-v X is continuous with the following pdf:
a) Check that satisfies the two conditions of being a p.d.f.;

b) Evaluate P(X<0.5).
Solution: a) Obviously, for o < X< 1, f(x) >0, and
Hence, is the pdf of some random variable X.
125

Note: since f(x) is zero in the other two intervals:
b)
Activity 6.2
Let X be a continuous r-v with the following pdf:
a) Check that satisfies the conditions of being a p.d.f.

b) Find .
Properties of continuous probability distribution
a) The total area under the curve is one i.e.
b) the area under the curve between the point a and b.

c)
d)
e)
6.2 INTRODUCTION TO EXPECTATION
Definition:
1. Let a discrete random variable X assume the values X1, X2, ….,Xn with the probabilities
P(X1), P(X2), ….,P(Xn) respectively. Then the expected value of X, denoted as E(X) is
defined as:
126

E(X) =X1.P (X1) +X2.P(X2) +…. +Xn.P (Xn)
=
2. Let X be a continuous random variable assuming the values in the interval (a, b) such that
=1, then
Mean and Variance of a random variable

Let X is given random variable.
1. The expected value of X is its mean
Mean of X=E(X)
2. The variance of X is given by:
Variance of X=Var(x) =
Where
Rules of Expectation
1) Let X be a R.V and k be a real number, then
a) E (kX) =kE(X)
b) E(X+k) =E(X) + k
2) Let X and Y be R.V on the sample space, then
127

a)
b)
Where Cov(X, Y) =the covariance between X and Y=E (XY)-E(X).E(Y)
3) Let X and Y be independent R.V, then
a) E (XY) =E(X).E(Y)
b)
c) Cov (X, Y) =0
Example 6.4
Let a fair die be rolled once. Find the mean number rolled, say X.
Solution: Since S = {1, 2, 3, 4, 5, 6} and all are equally likely with prob. of 1/6, we have
Example 6.5
Find the expected value and the variance of the r-v given in
Solution:
= 1.
128

=
Activity 6.3
1. What is the expected value and Variance of a random variable X obtained by tossing a
coin three times where X is the number of heads?
2. Let X be a continuous R.V with distribution
Then find a) P (1<x<1.5; b) E(x); c) Var(x); and d) .
6.3 COMMON DISCRETE PROBABILITY DISTRIBUTIONS
In this section, we shall study two common discrete probability distributions, namely, the
Binomial and Poisson distributions.
1. Binomial Distribution
A binomial experiment is a probability experiment that satisfies the following four

requirements called assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a
failure.
3. The probability of each outcome does not change from trial to trial, and
4. The trials are independent, thus we must sample with replacement.
129

Examples of binomial experiments
• Tossing a coin 20 times to see how many tails occur.
• Asking 200 people if they watch BBC news.
• Registering a newly produced product as defective or non defective.
• Asking 100 people if they favor the ruling party.
• Rolling a die to see if a 5 appears.
Definition: The outcomes of the binomial experiment and the corresponding probabilities of
these outcomes are called Binomial Distribution.
Let p=probability of success q= 1-p=probability of failure on any given trials
Then the probability getting x success in n trials becomes
And this sometimes written as
When using the binomial formula to solve problems, we have to identify three things:
• The number of trials (n)
• The probability of a success on any one trial (P) and
• The number of successes desired (X).
Example 6.6
Find the probability of getting 5 heads and 7 tails in 12 flips of a fair coin.
Solution: Given n = 12 trials. Let X be the number of heads.
Then, p = Prob. of getting a head =1/2, and q = prob. of not getting a head=1/2.
 The probability of getting k heads in a random trial of a coin 12 times is
130

And for k =5, .
Example 6.7
If the probability is 0.20 that a person traveling on an EAL flight will a vegetarian, find the
probability that 3 of 10 people on such flight will be a vegetarian?
Solution: Let X be the number of vegetarians. Given n = 10, p = 0.20, k = 3; then,
Remark: If X is a binomial random variable with parameters n and p then

E(X)=np and var(X)=npq
Activity 6.4
What is the probability of getting three heads by tossing a fair coin four times?
2. Poisson Distribution
A random variable X is said to have a Poisson distribution if its probability distribution is

given by:
Where is the average number occurrence of an event in the unit length of interval or
distance and x is the number of occurrence in a Poisson process.
The Poisson distribution depends only on the average number of occurrences per unit time of
space. It is used as a distribution of rare events, such as:
• Number of misprints.
• Natural disasters like earth quake.
• Accidents.
• Hereditary.
131

• Arrivals
. Number of misprints per page
The process that gives rise to such events is called Poisson process.
If X is a Poisson random variable with parameters λ then
E(x) = λ, var(x) = λ.
Note: The Poisson probability distribution provides a close approximation to the binomial
probability distribution when n is large and p is quite small or quite large with λ=np.
Usually we use this approximation if 5≤np. In other words, if n>20 and np<5 or n(1-p) ≤5
then we may use Poisson distribution as an approximation to binomial distribution.
Example 6.8
Suppose that customers enter a waiting line at random at a rate of 4 per minute. Assuming
that the number entering the line during a given time interval has a Poisson distribution, find
the probability that:
a) One customer enters during a given one-minute interval of time;
b) At least one customer enters during a given half-minute time interval.
Solution: a) Given per min, .
b) Per half-minute, the expected number of customers is 2, which is a new parameter.
, but .
132

Activity 6.5
1. If 1.6 accidents can be expected an intersection on any given day, what is the probability
that there will be 3 accidents on any given day?
2. If there are 200 typographical errors randomly distributed in a 500-page manuscript, find
the probability that a given page contains exactly 3 errors.
3. A sale firm receives, on the average, 3 calls per hour on its toll-free number. For any
given hour, find the probability that it will receive the following.
a) At most 3 calls
b) At least 3 calls
c) Five or more calls
4. If approximately 2% of the people are left-handed, find the probability that in a room 200
people, there are exactly 5 people who are left-handed?
6.4 COMMON CONTINUOUS PROBABILITY DISTRIBUTIONS
In this section, we will study three important continuous probability distributions that play the
leading role in statistical inference; viz., the normal, t & Chi-Square distributions.
6.4.1 Normal Distribution
A random variable X is said to have a normal distribution if its probability density function
is given by
- are parameters of the normal distribution.
133

Properties of Normal Distribution:
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum
ordinate is at μ=x and is given by
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution i.e. there is no gaps or holes.
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the
mean is 0.5
6. It is unimodal, i.e., values mound up only in the center of the curve.

7. Median=Mean=mod =μ and located at the center of the distribution.
8. The probability that a random variable will have a value between any two points is equal to
the area under the curve between those points.
Note: To facilitate the use of normal distribution, the following distribution known as the
standard normal distribution was derived by using the transformation
That is, .
Properties of the Standard Normal Distribution:
Same as a normal distribution, but

• Mean is zero
• Variance is one
• Standard Deviation is one
134

- Areas under the standard normal distribution curve have been tabulated in various ways. The
most common ones are the areas between Z=0 and a positive value of Z.
- Given a normal distributed random variable X with Mean μ and standard deviation σ
Example 6.9
Find the probabilities that a r-v having the standard N.D will take on a value
a) Less than 1.72; b)less than -0.88; c) between 1.30 & 1.75; d)between -0.25 & 0.45.
Solution: Making use of the Z table, we find that
a) P(Z<1.72)=P(Z<))+P()<Z<1.72)=0.5+0.4573=0.9573.
b) P(Z < -0.88) = P(Z > 0.88) =0.5 - P(0 < Z < 0.88) =0.5- 0.3106 = 0.1894.
c) P(1.30 < Z <1.75)= P(0 < Z < 1.75) – P(0 < Z < 1.30) = 0.4599 – 0.4032)=0.0567.
d) P(-0.25 < Z < 0.45)= P(-0.25 < Z < 0) + P( 0 < Z < 0.45) = 0.0987 + 0.1736=0.2723.
Activity: Find the following: a) P(-0.45 < Z < -0.25); b)P(Z>1.75).
Activity 6.6
Of a large group of men, 5% are less than 60 inches in height and 40% are between 60 & 65
inches. Assuming a normal distribution, find the mean and standard deviation of heights.
6.4. 2 Student’s t Distribution

In statistics as long as sample size is large enough, most datasets can be explained by
Standard Normal Distribution. But when the sample size is small, statisticians rely on the
distribution of the t statistic (also known as the t score), whose value is given by:
135

Where the sample mean, μ is the population mean, s is the standard deviation of the
sample, and n is the sample size.
The distribution of the t statistic is called the t distribution or the Student t distribution. The
particular form of the t distribution is determined by its Degrees of Freedom (df). The
degrees of freedom refers to the number of independent observations in a set of data. When
estimating a mean score or a proportion from a single sample, the number of independent
observations is equal to the sample size minus one.. The t distribution can be used with any
statistic having a bell-shaped distribution (i.e., approximately normal).
The t distribution has the following properties:
 The mean of the distribution is equal to 0.

 The variance is equal to v / (v - 2), where v is the degrees of freedom.
 With infinite degrees of freedom, the t distribution is the same as the standard normal
distribution.
 The t distribution is similar to standard normal distribution in the following ways
 It is bell-shaped.
 It is symmetric about the mean.
 The mean, median, and mode are equal to zero and located at the center of the
distribution.
 The curve never touches the x axis.
 The t distribution differs from standard normal distribution in the following ways.
 The variance is greater than one
 The t distribution is actually a family of curves based on the concept of degrees of
freedom, which is related to sample size.
136

 As the sample size increases, the t distribution approaches the standard normal
distribution.
6.4.3 Chi-Square Distribution
The chi-square variable is similar to t variable in that its distribution is a family of curves
based on the number of degree of freedom. The symbol for chi-square is (Greek letter chi,
pronounced “ki”). The chi-square distribution is obtained from the values of when
random samples are selected from a normally distributed population whose variance is .A
chi-square variable can not be negative, and the distributions are positively skewed. At about
100 degree of freedom, the chi-square distribution becomes some what symmetrical. The
area under each chi-square distribution is equal to 1.00 or 100%.
In order to find the area under the chi-square distribution, there are three cases to consider:
1) Find the chi-square critical value for a specific when the hypothesis test is one tailed
right. In this case, find the value at the top of table and the corresponding degree of
freedom in the left column. Then, the critical value is located when the two columns meet.
Example 6.10
a) The critical chi-square value for 15 degrees of freedom when and the test is one-
tailed right is 24.996.
b). Find the chi-square critical value for a specific when the hypothesis test is one tailed
left. In this case, the value must be subtracted from one. Then, the left side of the table
used, because the table gives the area to the right of the critical value, the statistics can
not be negative.
Example 6.11
137

The critical value for 10 df when =0.05 and the test is one-tailed left is 3.940.
3) Find the chi-square critical value for a specific when the hypothesis test is two-tailed.
When a two-tailed test is conducted, the area must be split. For example, to find the critical
chi-square values for 22 degrees of freedom when =0.05, we use the area to the right of
the larger value 0.025 (0.05/2), and the area to right of the smaller value 0.975(1-0.05/2).
Hence, one must use values in the table of 0.025 and 0.975, with 22 degrees of freedom
the critical values are 36.781 and 10.982 respectively.
Note that after the degrees of freedom reach 30, chi-square table only gives values for
multiples of 10(40, 50,60,etc.). When the exact degrees of freedom one is seeking are not
specified in the table, the closer smaller value should be used.
The chi-square distribution has the following properties:

 The mean of the distribution is equal to the number of degrees of freedom (v): μ = v.
 The variance is equal to two times the number of degrees of freedom: σ2 = 2 * v
 When the degrees of freedom are greater than or equal to 2, the maximum value for
Y occurs when .
 As the degrees of freedom increase, the chi-square curve approaches a normal

distribution.
6.4.5 The F distribution
We use the t distribution and tests to examine the probability of a single estimator taking a
particular value we use the F distribution and F tests to carry out joint hypothesis testing on
more than one estimator
The motivation behind the F distribution is where we have independent samples of two
variables each drawn from normal distributions
138

Example 6.12
 X1, X2…., Xm: random sample of size m from a normal distribution
 Y1, Y2…., Yn: random sample of size n from a normal distribution
If we want to find out if the variances are the same, so X2=Y2, but we can’t observe the
sample variances; however, we have sample estimators, SX2 and SY2 :
and
If we take
If the two variances are the same, F=1. If they are different, F 1 and the greater the
difference, the greater the value of F will be
Statistical theory shows us that if the two population variances are equal (X2=Y2), the F
ratio will follow the F distribution with (m-1)/(n-1) df (with the larger of the two variances
on the top)
The F ratio is often designated Fk1,k2, where the subscript denotes the parameters of the
distribution, so, here k1=(m-1) and k2=(n-1)
Properties of the F distribution

1. The F distribution is skewed to the right and ranges between zero and infinity (i.e.
it only takes positive values)
2. As the df , the F distribution approaches the normal distribution
139

3. The square of a t-distributed r.v. has an F distribution with 1 and k df in the
numerator and denominator respectively
i.e. tk2=F1,k
CHAPTER SUMMARY
 Any function can be a pmf if: , and .
 For a random variable X,
, where
 , and , where a and b are constants.
 The binomial pmf is given by: .
 For a binomial random variable, , and .
 For the Poisson distribution: .; E(X)= =Var(X).
 For f(x) to be a pdf, we need and ; then P(a < X < b) = .
 The pdf of a Normal Distribution is: ; .
 For a Normal Distribution, the three averages coincide: mean = median = mode.
 has mean 0 and variance 1.
 If X is continuous, then .
 The Chi-square distribution is used to test an association of attributes.
140

If and are the mean and the variance of a random sample of size n from a normal population
X 
with mean and variance , then t= has t distribution with (n-1) degrees of freedom.
S/ n
141

1. Suppose that an examination consists of six true and false questions, and assume that a student
has no knowledge of the subject matter. The probability that the student will guess the
correct answer to the first question is 30%. Likewise, the probability of guessing each of the
remaining questions correctly is also 30%.
a) What is the probability of getting more than three correct answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
2) The probability that a patient contracting IB will recover from the distance under medical
treatment is 0.6 out of 15 patients contracting the diseases
a) What is the probability that exactly 10 is record?
b) What is the expected number of patient who will recover?
c) What is the variance of the number of patient who will recover?
Assume that the patients are subjected under the same medical treatment.
3. Find the area under the standard normal distribution which lies
a) Between Z=0 and z=96.0
b) Z=-1.45 and Z=0
c) The right of Z=-0.35
d) To the left of Z=0.35
e) Between Z=-0.67 and Z=0.75
f) Between Z=0.25 and Z=1.25
4. Find the value of Z if
142

a) The normal curve area between 0 and z(positive) is 0.4726
b) The area to the left of z is 0.9868
5. A random variable X has a normal distribution with mean 80 and standard deviation 4.8.
What is the probability that it will take a value?
a) Less than 87.2
b) Greater than 76.4
c) Between 81.2 and 86.0
6. A normal distribution has mean 62.4.Find its standard deviation if 20.05% of the area
under the normal curve lies to the right of 72.9
7. A random variable has a normal distribution with σ =5.Find its
mean if the probability that the random variable will assume a value less than 52.5 is 0.6915.
143

CHAPTER 7
SAMPLING AND SAMPLING DISTRBUTIONS
CONTENTS
7.1. METHODS OF SAMPLING (SIMPLE RANDOM SAMPLING) 136
7.2. SAMPLING DISTRIBUTION OF THE SAMPLE MEAN 144
7.3. SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION 148
7.4. STANDARD ERRORS 152
7.5. THE CENTRAL LIMIT THEOREM 153
INTRODUCTION
Before giving the notion of sampling we will first define population. In a statistical
investigation the interest usually lies in the assessment of the general magnitude and the
study of variation with respect to one or more characteristics relating to individuals
belonging to a group. This group of individuals under study is to a group This group of
individuals under study is called population or universe thus in statistics, population is an
aggregate of objects animate and inanimate under study the population may be finite of
infinite
Objectives
At the end of this chapter students will be able to
 Explain the meaning of sampling theory sampling unit and sampling frame
144

 Differentiate different sampling methods techniques
 Determine the sampling distribution of mean and of the
sampling distribution of the proportion
7.1. METHODS OF SAMPLING
7.1.1. The concept of sampling
Sampling is that part of statistical practice concerned with the selection of individual
observations intended to yield some knowledge about a population of concern, especially for
the purposes of statistical inference. Before having further discussion on the specific type of
sampling methods, it is valuable to be acquainted to the following terms:
1. Sampling
Sampling is the process or method of sample selection from the population.
Sampling can be done either with replacement or with out replacement.
1.1 Sampling with replacement (swr):
In this case, a unit is selected from a population with a known probability and a unit is
returned to the population before the next selection is made (after records its
characteristic(s).Thus, in this method at each selection, the population size remains constant
and the probability at each selection or draw remains the same and a unit has chances of
being selected more than once. There are possible samples of size n from a
population of N units.
1.2 Sampling with out replacement (swor):
In this selection procedure, if a unit from a population size N is selected, it is not returned to
the population. Thus, for any subsequent selection, the population size reduced by one. There
are possible samples of size n from a population of N units.
145

2. Sampling unit
The ultimate unit to be sampled or elements of the population to be sampled is called
sampling unit.
Examples
 If some body studies Scio-economic status of the house holds, house holds is the
sampling unit.
 If one studies performance of freshman students in some college, the student is the
sampling unit.
3. Sample size
The number of sampling units which are selected from a population. The sample size
depends on a number of considerations which are as follows.
a) The purpose for which the sample is drawn.
b) The type of population from which the sample is to be drawn.
c) Availability of technical people or equipment needed.
d) Resources allotted for the study in terms of time and money.
e) Precision required.
4. Study Unit
The unit on which information is collected is called study unit.
5. Sampling Fraction (Sampling Interval)
The ratio between the numbers of units in the sample to the number of units in the source
population.
6. Sampling frame
146

Sampling frame is the list of all the units in the source population from which a sample is to
be taken.
Examples
 List of house holds.
 List of students in the registrar office.
7. Errors in sample survey

There are two types of errors
a) Sampling error:
- It is the discrepancy between the population value and sample value due to the
fact that the sample is not a perfect representation of the population.
- May arise due to inappropriate sampling techniques applied
b) Non sampling errors: are errors due to procedure bias such as:
 Due to incorrect responses ( is called response or observational error)
 Measurement or lack of preciseness of definition.
 Errors at different stages in processing the data such as editing and
tabulating of data..
Reasons for Sampling

 Reduced cost: Finances required to cover the whole population can hardly be made
available
 Greater speed: Too much time required studying the whole population and often the
study becomes outdated by the time it is complete.
 Greater accuracy: Complete enumeration (census study) adds many errors which are
reduced or eliminating by sampling.
 The only option when the population is infinite: Incase, the population is infinite or
consists uncountable number of units, its study is impossible.
147

Because of the above consideration, in practice we take sample and make conclusion about
the population values such as population mean and population variance, known as
parameters of the population.
Sometimes taking a census makes more sense than using a sample. Some of the reasons
include:
 Universality
 Qualitativeness
 Detailedness
 Non-representative ness
7.1.2. Sampling Techniques
There are two types of sampling techniques, broadly speaking.
A) Random Sampling or Probability Sampling
A probability sampling scheme is one in which every unit in the population has a known
nonzero probability of being sampled and the process involves random selection.
Probability sampling includes: Simple Random Sampling, Systematic Sampling, Stratified
Sampling, Cluster Sampling or Multistage Sampling.
1. Simple Random Sampling:

 It is a method of selecting items from a population such that every possible sample of
specific size has an equal chance of being selected. In this case, sampling may be
with or without replacement. Or all elements in the population have the same pre-
assigned non zero probability to be included in to the sample.
 This could be accomplished by writing each study units name on a slip of paper and
selecting adequate number of them using Lottery Method. It can also be done by
assigning a number to each sampling unit then samples are selected using Table of
Random Numbers or Computer application.
148

Table of Random Numbers
Table of random numbers are tables of the digits 0, 1, 2,…,, 9, each digit having an equal
chance of selection at any draw. For convenience, the numbers are put in blocks of five. In
using these tables to select a simple random sample, the steps are:
i) Number the units in the population from 1 to N (prepare frame of the population).
ii) Then proceed in the following way
If the first digit of N is a number between 5 and 9 inclusively, the following method of
selection is adequate. Suppose N=528 and we want n=10.
Select three columns from the table of random numbers, say columns 25 to 27. Go down the
three columns selecting the first 10 distinct numbers between 001 & 528. These are 36, 509,
364, 417, 348, 127, 149, 186, 439, and 329. Then the units with these roll numbers are our
samples.
Note: If sampling is with out replacement, reject all the numbers that comes more than once.
2. Stratified Random Sampling

 The population will be divided in to non-overlapping and exhaustive groups called
strata.
 A separate sample is taken from each stratum using Simple or Systematic Random
Sampling techniques.
 Elements in the same strata should be more or less homogeneous while different in
different strata.
 It is applied if the population is heterogeneous.
 The main advantage is it improves representativeness of the sample and it creates
reasonable comparison among strata. The major limitation is it requires separate
sampling frame for each stratum.
 Some of the criteria basis for stratification is: Characteristics of the population (Sex,
Age, ethnic origin and Occupation, etc.) and Geographical
3. Cluster sampling
149

 Dividing the population into separate groups of elements called clusters. Each
element of the population belongs to one and only one clusters.
 A simple random sample of the clusters is then taken. All elements within each
sampled cluster form the sample.
 Cluster sampling tends to provide best results when the elements within the clusters
are heterogeneous.
 It is used in large geographic samples where no list is available of all the units in the
population but the population boundaries can be well-defined.
 Cluster sampling must use a random sampling method at each stage. This may result
in a somewhat larger sample than using a simple random sampling method, but it
saves time and money.
 Cluster sampling is useful when it is difficult or costly to generate a simple random

sample.
For example, to obtain information about the drug habits of all high school students in a
state, you could obtain a list of all the school districts in the state and select a simple
random sample of school districts. Then, within in each selected school district, list all the
high schools and select a simple random sample of high schools. Within each selected high
school, list all high school classes, and select a simple random sample of classes. Then use
the high school students in those classes as your sample.
4. Systematic Random Sampling

This method selects units at a fixed interval throughout the sampling frame after a random
start.
150

 Is obtained by numbering each subject of the population and then selecting every
number.
 Here are the steps you need to follow in order to achieve a systematic random
sample:
 Number the units in the population from 1 to N,

 Decide on the n (sample size) that you need,
 Calculate the Sampling Fraction k (K = N/n),
 Randomly select an integer between 1 to k, suppose it is j
 The unit is selected at first and then until the

required sample size is reached
The general advantage of Systematic Random Sampling is the fact that it is easier and less
time consuming to perform. In some situation it can also be conducted without sampling
frame. However, this method can be biased when there is cyclic patter in the order of the
subjects.
 For example, to select a sample of 25 dorm rooms in your college dorm, make a list of all
the room numbers in the dorm. Say there are 100 rooms. Divide the total number of rooms
(100) by the number of rooms you want in the sample (25). The answer is 4. This means
that you are going to select every fourth dorm room from the list. But you must first
consult a table of random numbers. Pick any point on the table, and read across or down
until you come to a number between 1 and 4. This is your random starting point. Say your
random starting point is "3". This means you select dorm room 3 as your first room, and
then every fourth room down the list (3, 7, 11, 15, 19, etc.) until you have 25 rooms
selected.
B) Non Random Sampling or non probability sampling.

 It is a sampling technique in which the choice of individuals for a sample depends on
the basis of convenience, personal choice or interest.
151

 It is any sampling method where some elements of the population have no chance of
selection or where the probability of selection can't be accurately determined.
 In No probability sampling, the sample is less likely to be representative of the
population, thus information about the relationship between sample and population is
limited, making it difficult to extrapolate from the sample to the population.
 Non probability sampling is used when there is no sampling frame to conduct
probability sampling, or when it is impossible to conduct probability sampling due to
economical and feasibility factors
 Non probability sampling is divided into Purposive, Convenience, Quota and
Snowball Sampling.
A) Judgmental or Purposive Sampling
The researcher chooses the sample based on who he/she think would be appropriate for the
study.Samples are taken based on previous knowledge of the population (from which the
samples are taken), and the specific purpose of the study or investigation. Researchers use
their personal judgment in selecting the sample(s)
B) Convenience Sampling
The selection of units from the population is based on easy availability and/or accessibility.
C) Quota Sampling
It starts with systematically setting “Quota” to represent subgroups of a population. Then

data is collected to meet the predefined Quota.
D) Snowball Sampling
152

The researcher begins by identifying someone who meets the inclusion criteria of the study.
Then the study subject would be asked to recommend others who s/he may know who also
meet the criteria.
Sampling Distribution
Because statistic such as x varies from sample to sample, they are random variables. As
such, Statistic has probability distributions associated with them. In order to make
probability statements regarding a sample statistic, we need to know the probability
distribution of the sample statistic. That is to say, we need to know the shape, center and
spread of the sample statistic’s distribution.
The sampling distribution of a statistic is a probability distribution for all possible values of
the statistic computed from a sample of size n.
 There are commonly three properties of interest of a given sampling distribution.

 Its Mean
 Its Variance
 Its Functional form.
7.2 SAMPLING DISTRIBUTION OF THE SAMPLE MEAN
Sampling distribution of the sample mean is a theoretical probability distribution that shows
the functional relation ship between the possible values of a given sample mean based on
samples of size and the probability associated with each value, for all possible samples of
size drawn from that particular population.
Steps for the construction of Sampling Distribution of the mean

1. From a finite population of size N, randomly draw all possible samples of size n
2. Calculate the mean for each sample.
153

3. Summarize the mean obtained in step 2 in terms of frequency distribution or relative
frequency distribution.
Example 7.1
Consider a population consisting of values 2, 4, 6, 8
Let as take single random of sample from this population: - so that size 2 with replacement
is Nn = 42 = 16.
Samples Mean Sample Mean
(2,2) 2 (2,6) 4
(4,2) 3 (4,6) 5
(6,2) 4 (6,6) 6
(8,2) 5 (8,6) 7
(2,4) 3 (2,8) 5
(4,4) 4 (4,8) 6
(6,4) 5 (6,8) 7
(8,4) 6 (8,8) 8
Mean frequencies probability.
2 1 1/16
3 2 2/16
154

4 3 3/16 sampling distribution of the sample mean.
5 4 4/16
6 3 3/16
7 2 2/16
8 1 1/16
where =the mean of sample mean.
, the mean of the sample mean is the same as the population mean.
Activity 7.1
Suppose we have a population of size N=5, consisting of the age of five children: 6, 8, 10,
12, and 14.
155

Take samples of size 2 with replacement and construct sampling distribution of the sample
mean.
Remark:
1. In general, if sampling is with replacement or while sampling from an infinite
population.
2. If sampling is with out replacement
, where the term is called the finite population correction
factor (fpc)
3. In any case the sample mean is unbiased estimator of the population mean.
That is, (show this)
 Sampling may be from a normally distributed population or from a non- normally
distributed population.
 When sampling is from a normally distributed population, the distribution of will
posses the following property.
1. The distribution of will be normal
2. The mean of is equal to the population mean, i.e.

3. The variance of is equal to the population variance divided by the sample size i.e.
Activity 7.2
156

If the uric acid values in normal adult males are approximately normally distributed with
mean 5.7 mgs and standard deviation 1mg find the probability that a sample of size 9 will
yield a mean.
i. greater that 6
ii. between 5 and 6
iii. Less that 5.2
157

7.3 SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION
As
where .
Thus, for some constants
where Z is the standard normal random variable.
Note: Since increasing the sample size will decrease the standard error!!
Thus, the larger the sample size is, the larger is (since the interval
is larger than the one with smaller sample size)!!
Example 7.2
What is the probability of the difference between the sample proportion and the population
proportion will be less or equal to 0.05 as the sample size What is the probability as
we increase the sample size to 100?
Solution
. Thus,
158

There is 42.46% chance that the difference between the sample proportion and the
population proportion is not more than 0.05 as . .
As sample size is increased to 100, then
Thus,
There is 69.22% chance that the difference between the sample proportion and the
population proportion is not more than 0.05 . That is, the larger sample size will
provide a higher probability that the value of the sample proportion will be within a specific
distance of the population proportion.
Example 7.3
A new soft drink is being market tested. It is estimated that 60% of consumers will like the
new drink. A sample of 96 taste-tested the new drink.
(a) Determine the standard error of the proportion
(b)What is the probability that equal to or more than 70.4% of consumers will indicate they
like the drink?
(c) What is the probability that equal to or more than 30% of consumers will indicate they do
not like the drink?
Solution:
159

(a)
(b)
(c) We need to compute the probability that less than 70% of consumers will indicate they
like the drink?
Example 7.4
What is the most important factor for business travelers when they are staying in a hotel?
According to USA Today, 74% of business travelers state that having a smoke-free room is
the most important factor. Assume that the population proportion is and that a
sample of 200 business travelers will be selected.
(a) What is the probability that the sample proportion will be within of the population
proportion?
(b) Suppose the probability that a sample proportion will be within of the population
mean is 0.9. What is the sample size n?
Solution:
(a)
160

(b)
161

7.4 STANDARD ERRORS
7.4.1 Standard Error of a Sample Mean
Rarely would one construct a sampling distribution of means and derive the standard error of
this distribution in order to Determine the error in generalizing to the population Instead, the
standard error of a sampling distribution of means ( ) can be estimated from the standard
error of the mean of a single sample:
Example 7.5
Consider the following summarized data for case processing time.
X = 72 days, S = 3 days, N = 80.
7.4.2 Standard Error of a Proportion
The Central Limit Theorem applies to the sampling distribution of a proportion. The
standard error of the sampling distribution of a proportion can be estimated from a single
sample, in a manner similar to that used with the mean
Example 7.6
Survey of attitudes towards the death penalty (N=800)
162

P = proportion favorable = 0.60
Q = proportion unfavorable = 0.40
Standard error of P
95% confidence interval
P 1.96 (SP) = 0.60 1.96 (0.017)
95% interval: (0.5667 to 0.6333)
7.5. THE CENTRAL LIMIT THEOREM
Suppose a random variable X has population mean μ and standard deviation σ and that a
random sample of size n is taken from this population. Then the sampling distribution of
becomes approximately normal with and variance as the sample size n

increases (n ).
Simply stated: For any population, regardless of its shape, as the sample size increases, the
shape of the sampling distribution of the sample mean, , becomes more normal.
Example 7.7
For a population of 2,000 students living in hostels of the monthly mean expenditure on
three meals is 500 birr with a variance of 144, if sampling is with replacement find the
probability that a random sample of size 36 student from this population yields a mean
expenditure of less than birr 495 per month
Solution: Given, =500 , δ= 12, n= 36
163

Activity 7.3
1) Suppose that all students who are at examination in a particular year the mean score was
450 with s.d of 120.If 400 of the students who took the test during that particular year were
selected at random.
a) Determine the standard error of the mean

b) What is the probability that their scores have a mean
i) greater than 456
ii) Between 440 and 460
2) In 2000, as reported by ACT Research Service, the mean ACT Math score was =20.7 If
ACT Math scores are normally distributed with =5, answer the following questions.
(a) Describe the sampling distribution of the sample mean,
(b) What is the probability that a randomly selected student has an ACT Math score less than
18?
(c) What is the probability that a random sample of 10 ACT test takers had a mean math
score of 18 or less?
164

CHAPTER SUMMARY
We emphasized in this chapter that data can be collected by taking in to consideration of

each and every member of the population or by taking some members of the population as a
representative. A complete enumeration of all elements of a universe is known as a census
survey. Census survey is important because no element of chance will be entered in the data
that will be collected using this method and hence the highest accuracy will be obtained.
Census survey is more essential when the population size is small and the elements of the
population are heterogeneous. But the problem with this method is that it requires more
money, time and energy. Thus, this method is beyond the reach of individual researchers.
Another method of getting information about the population is by taking a small proportion
of a population which can be technically called samples. Sampling is used extensively in all
facets of business and government.
The purpose of selecting a sample from a population is to generalize about a certain

phenomenon of the population. Most business and government decisions are based on
incomplete data because a study of the total population would either be too costly or too time
consuming. If every item in the population has a definable chance of being selected, the
sample is called probability sample. Simple random, systematic, stratified and cluster
sampling techniques are methods of assuring random selection of items from the population.
If the selection of every item in a sample is not governed by the laws of chance, the sampling
is considered non-probability sampling.
Most non-probability types of sampling (Judgment, quota, Convenience and Referral) have
common weakness. The choice of items selected in the sample is left to the discretion of the
researcher. Some users of non-probability sampling recognize the disadvantages of this type
sampling but consider that the cost saving and convenience outweighs the disadvantages.
The main disadvantage is that the reliability or accuracy of the sample results cannot be
accurately measured. There fore, the subsequent discussion involving the reliability of
sample results concerns only probability sampling techniques.
165

1. a) Consider a population consisting of 12 items, numbered 1 through 12. List all
possible systematic samples of size 3.
b) List all possible systematic samples of size 4 from a population consisting
of 14 items, identified as the numbers 01 through 14 .
2. A stratified sample of size 80 is to be taken from a population of size 2000, which
consists of four strata of size 500, 1200, 200, and 100. How large a sample must be taken
from each stratum if the allocation is to be proportional?
3. A population consists of the four numbers, 3,7,11, and 15. Consider all possible samples
of size 2 drawn from this population without replacement. Find
a) b) ;
c) the mean of the sampling distribution of means;
d) the standard deviation of the sampling distribution of means.
Verity (c) and (d ) from (a) and (b) using suitable formulae.
4. What is the value of the fpc when
a) n = 10 and N = 200;
b) n= 20 and N = 200;
c) n = 40 and N = 400;
d) n = 400 and N = 4,000?
166

CHAPTER 8
ESTIMATION AND HYPOTHESIS TESTING
CONTENTS
8.1. POINT AND INTERVAL ESTIMATION OF THE MEAN 160
8.2. POINT AND INTERVAL ESTIMATION OF THE PROPORTION 166
8.3. SAMPLE SIZE DETERMINATION 168
8.4. HYPOTHESIS TESTING ABOUT THE MEAN 171
8.5. HYPOTHESIS TESTING ABOUT THE PROPORTION 176
8.6. TESTS OF ASSOCIATION 179
INTRODUCTION
Statistical inference involves the procedures of reaching conclusions about a population or

populations based on the sample results. Our inference may be estimation of a population
parameter or testing an idea (hypothesis) about a population or populations. Thus we
generally divide statistical inference into two as estimation and test of hypothesis. Both
estimation and hypothesis are an idea about something around as. The procedure we follow
to accept or reject a hypothesis is called test of the hypothesis. To accept or reject a
hypothesis, we base ourselves on a sample evidence. If the sample evidence does not agree
with what is hypothesized about a population, we reject the hypothesis
The concept of estimation and hypothesis testing is used indifferent aspects of human life
and different fields of study.
167

Objectives
At the end of this chapter students are expected to be able to:
 Explain the meaning of estimation & hypothesis
 State the desirable properties of a point estimation.
 Discriminate between point estimation & interval estimation
 Compute and interpret the confidence interval for population mean and proportion
 Discriminate between one-tailed and two-tailed tests
 Discriminate between type I and Type II errors
 State the steps of testing a hypothesis
 Explain on how to apply normal distribution and when to use z – distribution in
testing a hypothesis
 Concepts Statistical Estimation and Hypothesis testing
Inference is the process of making interpretations or conclusions from sample data for the
totality of the population.
Inferential statistics uses the sample results to make decisions and draw conclusions about
the population from which the sample is drawn.
In statistics there are two ways though which inference can be made.
 Statistical estimation
 Statistical hypothesis testing
Both involve using sample statistics to make inferences about the
population parameter.
Both involve using sample statistics to make inferences about the population parameter.
168

Populatio Inference Analyzed
n data
Sample
Numerica
l data
Statistical Estimation
This is one way of making inference about the population parameter where the investigator
does not have any prior notion about values or characteristics of the population parameter.
There are two ways estimation:
i. Point Estimation: It is a single value or number of sample information that is used
to estimate a parameter. The best point estimate of the population mean is the
sample mean
ii. Interval estimation: It is the procedure that results in the interval of values as an
estimate for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a parameter.
Estimator and Estimate

Estimator is the rule or random variable that helps us to approximate a population parameter.
But estimate is the different possible values which an estimator can assume. For example:
The sample mean
169

is an estimator for the population mean and is an estimate, which is one of the
possible values of
Properties of best estimator

The following are some qualities of an estimator
o It should be unbiased.
o It should be consistent.
o It should be relatively efficient.
To explain these properties let be an estimator of θ

1. Unbiased Estimator: An estimator whose expected value is the value of the parameter
being estimated. i.e.

2. Consistent Estimator: An estimator which gets closer to the value of the parameter as the
sample size increases. i.e. gets closer to θ as the sample size increases.
3. Relatively Efficient Estimator: The estimator for a parameter with the smallest variance.
This actually compares two or more estimators for one parameter.
8.1. POINT AND INTERVAL ESTIMATION OF POPULATION MEAN
8.1.1. Point estimation of the population mean: μ

Another term for statistic is point estimate, since we are estimating the parameter value. A
point estimator is the mathematical way we compute the point estimate. For instance, sum of
Xi over n is the point estimator used to compute the estimate of the population means, That
is, is a point estimator of the population mean.
8.1.2. Confidence interval estimation of the population mean

Although possesses nearly all the qualities of a good estimator, because of sampling error,
we know that it's not likely that our sample statistic will be equal to the population
170

parameter, but instead will fall into an interval of values. We will have to be satisfied
knowing that the statistic is "close to" the parameter. That leads to the obvious question,
what is "close"?
We can phrase the latter question differently: How confident can we be that the value of the
statistic falls within a certain "distance" of the parameter? Or, what is the probability that the
parameter's value is within a certain range of the statistic's value? This range is the
confidence interval. A confidence interval is a specific interval estimate of a parameter
determined by using data obtained from a sample and the specific confidence level of the
estimate.
The confidence level is the probability that the value of the parameter falls within the range
specified by the confidence interval surrounding the statistic. There are different conditions
to be considered to construct confidence intervals of the population mean,
Condition-1: If the population variance is known; what ever the value of sample size but
the population is normal
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean
of a sample. Consider samples of size n drawn from a population, whose mean is μ and
standard deviation is with replacement and order important. The population can have any
frequency distribution. The sampling distribution of will have a mean and a
standard deviation , and approaches a normal distribution as n gets large.

This allows us to use the normal distribution curve for computing confidence intervals.
, where is a measure of error.
171

- For the interval estimator to be good the error should be small. How it is small?
• By making n large
• Small variability
• Taking Z small
-To obtain the value of Z, we have to attach this to a theory of chance. That is, there is an
area of size 1- Such that:
Where: = is the probability that the parameter lies outside the interval
is the value of the standard normal variable corresponding to the
right of which probability lie , i.e.
If the population has a normal distribution and is known, then a

confidence interval for is given by:
Note: When (as is often the case) we don't know the population standard deviation and n is
large ( ), we can approximate it by the sample standard deviation , and obtain the
following (good) approximation of the confidence interval for :
Z-value with an area of /2 to its right (obtained from a table).
172

Condition-2: If the population variance is not known and n is Small (n<30 the population
is normal:
In most practical research, the standard deviation for the population of interest is not known.
In this case, the standard deviation is replaced by the estimated standard deviation S, also
known as the standard error. Since the standard error is an estimate for the true value of the
standard deviation, the distribution of the sample mean is no longer normal with mean
and standard deviation . Instead, the sample mean follows the -distribution with mean
and standard deviation . The -distribution is also described by its degrees of
freedom. For a sample of size n, the -distribution will have n-1 degrees of freedom. The
notation for a -distribution with n-1 degrees of freedom is . As the sample size n
increases, the -distribution becomes closer to the normal distribution, since the standard
error approaches the true standard deviation for large n.
has distribution with n-1 degree of freedom.
-The value of can be obtained from a table with an area of to the right with
degrees of freedom.
Therefore, the confidence interval for when the population is normally

distributed and is not known is given by:
Example 8.1:
A random sample of 900 workers showed an average height of 67 inches with a standard
deviation of 5 inches.
173

a) Find a 95% confidence interval of the mean height of all workers
b) Find a 99% confidence interval of the mean height of all workers
Solution:
a) , S=5, n=900
from the table.
The required interval will be:
b)
from the table.
Example 8.2
A Drug Company is testing a new drug which is supposed to reduce blood pressure. From
the six people who are used as subjects, it is found that the average drop in blood pressure is
2.28 points, with a standard deviation of 0.95 points. What is the 95% confidence interval for
the mean change in pressure?
Solution:
174

, ,
from the table, with
Example 8.3
Suppose we want to estimate a 95% confidence interval for the average quarterly returns of
all fixed-income funds in the Ethiopia. We draw a sample of 100 observations and calculate
the sample mean to be 0.05 and the standard deviation 0.03. We assume that those returns
are normally distributed with known variance.
Solution:
n=100
from the table
The confidence interval is:
175

8.2. POINT AND INTERVAL ESTIMATION OF THE POPULATION
PROPORTION
If P represents for the population proportion then the sample proportion provides a
good estimate of P. Therefore, the sample proportion is the point estimation of the
population proportion. To construct the confidence interval for the proportion we follow the
following conditions:
Conditions: If the population proportion is not too close to zero or one, and
that the sample size is large (at least 30):
 Under these conditions, the sampling distribution can be approximated by a
normal distribution that has mean P and standard deviation

To construct a confidence interval for P, we can now adopt the same argument that was used
in finding a confidence interval for and write:
Hence a ( ) 100% confidence interval is population proportion P is given by:
An Approximate ( ) 100% confidence interval for the population proportion P is

given by:
176

If the sample size is large (usually n>30)
Example 8.4
In a sample of 400 people who were questioned regarding their participation in sports, 160
said that they did participate. Construct a 98 % confidence interval for P, the proportion of P
in the population who participate in sports.
Solution:
Let X= be the number of people who are interested to participate in sports.
X=160, n=400, =0.02, Hence
As a result, an approximate 98% confidence interval for P is given by:
Hence, we can conclude that about 98% confident that the true proportion of people in the
population who participate in sports between 34.5% and 45.7%.
8.3 SAMPLE SIZE DETERMINATION
177

Before a sample is actually collected, the required sample size for testing a hypothesis
concerning the population proportion can be determined by specifying (1) the hypothesized
value of the proportion, (2) a specific alternative value of the proportion such that the
difference from the null-hypothesized value is considered important, (3) the level of
significance to be used in the test, and (4) the probability of Type II error that is to be
permitted. The formula for determining the minimum sample size required for testing a
hypothesized value of the proportion is
In the above equation, z0 is the critical value of z used in conjunction with the specified level
of significance (α level), while Z 1 is the value of z with respect to the designated probability
of Type II error (β level). In determining sample size for testing the mean, z 0 and Z1 always
have opposite algebraic signs.
The result is that the two products in the numerator will always be accumulated. Also, the
above equation can be used in conjunction with either one-tail or two-tail tests and any
fractional sample size is rounded up. Finally, the sample size should be large enough to
warrant use of the normal ability probability distribution in conjunction with P0 and P1.
Hypothesis Testing
A statistical hypothesis test is a method of making statistical decisions using experimental

data.
Hypothesis Testing: Is a common method of drawing inferences about a population based on

statistical evidence from a sample.
Definitions
Statistical hypothesis
178

This is an assertion, statement, or claim about the population whose plausibility is to be
evaluated on the basis of the sample data.
Test statistic: Is a statistics whose value serves to determine whether to reject or accept the
hypothesis to be tested. It is a random variable.
Statistic test: Is a test or procedure used to evaluate a statistical hypothesis and its value
depends on sample data.
There are two types of hypothesis:
Null hypothesis
This is a claim or statement about a population parameter that is usually assumed to be true
from the very beginning until it is declared false. It is a statistical hypothesis that states a
hypothesis of equality or the hypothesis of no difference between a parameter and a specific
value. It is usually denoted by H0.
Alternative hypothesis: Is a claim or statement about a population parameter that will be true
if the null hypothesis is false. It is a statistical hypothesis that states a hypothesis of
difference between a parameter and a specific value. It is usually denoted by H1 or HA.
Types and size of errors:
 Testing hypothesis is based on sample data which may involve sampling and non
sampling errors.
 Type I error: Rejecting the null hypothesis when it is actually true. The significance
level ( ) can be interpreted as the probability of rejecting the null hypothesis when it
is actually true. The distribution of the test statistic under the null hypothesis
determines the probability of a type I error.
=P (type I error) = level of significance
 Type II error: Occurs when a false null hypothesis is not rejected. The null
hypothesis is actually false but we wrongfully conclude do not reject it.
represents the probability that H0 is not rejected when actually H0 is false. The
179

distribution of the test statistic under the alternative hypothesis determines the
probability of a type II error.
=P (type II error)
 The power of a test ( ) is the probability of correctly rejecting a false null
hypothesis. The value of ( ) is called the power of a test.
=Power of test
Note: The two types of errors that occur in tests of hypothesis depend on each other. We
can not lower the values of and simultaneously for a test of hypothesis for a fixed
sample size. Lowering the value of will raise the value of , and lowering the value of
will raise the value of . However, we can decrease both and simultaneously by
increasing the sample size.
The following table gives a summary of possible results of any hypothesis test:
Actual situation (condition)

H0 is true H0 is false
(H1 is false) (H1 is true)
Decision Do not Reject H0 Correct Decision Type II error
Reject H0 Type I error Correct Decision
General steps in hypothesis testing:

1. State the appropriate hypothesis
2. Select the level significance,
3. Select an appropriate test statistics
4. Identify the critical region.
5. Compute the test value
6. Making the decision.
180

7. Summarize the results.
8.4. HYPOTHESIS TESTS ABOUT THE MEAN:
1. VS
2. VS
3. VS
Condition-1
If the population standard deviation, is known what ever the value of sample size is and
when sampling is from a normal distribution:
The formula for the test statistic is:
After specifying α we have the following test criteria corresponding to the above three
hypothesis.
Hypothesis Decision rule is to

reject H0 if:
Null Alternative
VS
Note: When we don't know the population standard deviation and n is large ( ), we
can approximate it by the sample standard deviation , and obtain the following test
statistics:
181

-The decision rule is the same as condition-1.
Condition-2
When the population standard deviation, , is unknown, the population is normally or

approximately normally distributed, and sample size is small (n<30):
The formula for the test statistic is:
After specifying α we have the following test criteria corresponding to the above three
hypothesis.

reject H0 if:
Null Alternative
VS
Example 8.5
182

Ethio Telecom provides telephone service in an area. According to the company’s records,
the average length of all calls placed was 12.5 minutes. A sample of 150 such calls placed
through this Co. produced a mean length of 13 minutes with a standard deviation of 2.6
minutes. Can you conclude that the mean length of all current calls is different from 12.5
minutes? Use the 0.05 level of significance and assume that the distribution of all call is
normal.
Solution:
Let population mean
1. State the null and alternative hypothesis:
(The mean length of all current calls is 12.5 minutes)
(The mean length of all current calls is different from12.5

minutes).
2. Select the level significance, = 0.05 (given)

3. Select an appropriate test statistics:
Z-statistic is appropriate because the sample size is large
4. Identify the critical region:
Here we have two critical regions since we have two tailed hypothesis. The
critical region is
is the acceptance region
, , n=150
6. Decision:
Reject H0, since is not in the acceptance region
183

7 Conclusion: At 5% level of significance, we have evidence to say that the average length
of all such calls is not equal to 12.50 minutes.
Example 8.6
Ten individuals are chosen at random from a population and their height is found to be in
inches 63, 63, 66, 67, 68, 69, 70, 71 and 71. In the height of the data the average height of
the population is 66 inches. Can we conclude that the height of an individual is decreasing?
(Use and assume the normality of the population)
Solution:
Let population mean
VS

-statistic is appropriate because the population standard deviation is unknown
and the sample size is small.
4. Critical region:
is the acceptance region.

, , n=10
6. Decision:

184

7. Conclusion: At 5% level of significance, we have evidence to say that the average
height of an individual is less than 66 inches.
Example 8.7
A national magnitude claims that the average college student watches less television. The
average national of all college students is 29.4 hours per week with a standard deviation of 2
hours. A sample of 25 college students has a mean of 27 hours. Test the claim at
and assume normality of the population.
Solution:
VS

Z-statistic is appropriate because the population standard deviation is known.
4. Critical region:
is the acceptance region for the null hypothesis

, n=25
6. Decision:
Do not reject H0, since is not in the acceptance region

7. Conclusion: The average college students watches less television at 1% level of
significance
Example 8.8
185

An authority from a district power station of the town told reporters recently that the average
monthly electric Bill of households in AA is not more than Birr 100. A random sample of
400 households from the city produces a mean of Birr 105 Bill with standard deviation of
Birr 40. Test the claim of the authority at 5% level of significance.
Solution:
VS
Select the level significance, = 0.05 (given)

Z-statistic is appropriate because the sample size is large and the population is non-
normal
3. Critical region:
is the acceptance region for the null hypothesis

5. Decision:

6. Conclusion: At 5% level of significance the claim of the authority is not correct.
8.5. TESTS ABOUT A POPULATION PROPORTION: P
The procedure to make tests of hypothesis about the population proportion for large
samples is similar in many aspects to the population mean. The procedure includes the same
seven steps. Similarly, the test can be two-tailed or one tailed. When the sample size is large,
the sample proportion is approximately normally distributed with its mean equal to and
186

standard deviation equal to Hence; we use the normal distribution to perform a
test of hypothesis about the population proportion for a large Sample. The sample size
considered to be large when and are both greater than 5.
Suppose the assumed or hypothesized value of (parameter of the binomial distribution) is
denoted by then one can formulate two sided (1) and one sided (2 and 3) hypothesis as
follows:
1. VS
2. VS
3. VS
The choice of depends on the prior information we have on the values of .

Decision Rule:

reject H0 if:
Null Alternative
VS
Example 8.9
A manufacturing company has submitted a claim that 100% of items produced by a certain
process are non defective. An improvement in the process is being considered that the feel
187

will lower the proportion of defectives below the current 10%. In an experiment 100 items
are produced with the new process and 5 are defective: Is this evidence sufficient to conclude
that the method has been improved? Use a 0.05 level of significance.
Solution: As usual, we follow the steps:
1. (actually ) VS
2.
3. Critical Region: Z>1.645
4. Computation
5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has reduced
the proportion of defective.
Example 8.10
The unemployment rate in a given country at a given period is believed to be 10%. The
government embarked on a series of projects to reduce unemployment. It was of interest to
determine whether unemployment decreases as a result of the projects. A random sample of
500 people was chosen, and 48 of them were found to be unemployed. Test at 1% level of
significance if the government projects reduced the unemployment rate
Solution: As usual, we follow the steps:
188

1. VS
2.
3. Critical Region: Z<-Z1.645
4. Critical Region:
5. Computation
6. Decision: Do not reject H0 since Zcal > Ztab

7. Conclusion: the government projects didn’t reduce unemployment.
Activity 8.1
A large sample of 200 students from the students of a certain high school is interviewed and
85 of them are found to use city bus. Can you conclude that at least 40% of the students
use city bus? Use a 0.05 level of significance.
8.6. Test of Association
In the previous section we tried to see how we can test hypothesis for numeric data give in
the form of mean or proportion. It is also possible to apply hypothesis testing on categorical
data.
Suppose that we have a population consisting of observations having two attributes or
qualitative characteristics say A and B.
If the attributes are independent then the probability of possessing both A and B is PA*PB
189

Where PA is the probability that a number has attribute A.
PB is the probability that a number has attribute B.
Suppose A has r mutually exclusive and exhaustive classes.
B has c mutually exclusive and exhaustive classes
The entire set of data can be represented using c*r contingency table.
A B1 B2 . . Bj . Bc Total
A1 O11 O12 O1j O1c R1

A2 O21 O22 O2j O2c R2
.
.
.
Ai Oi1 Oi2 Oij Oic Ri
.
.
.
Ar Or1 Or2 Orj Orc
Total C1 C2 Cj n
The chi-square procedure test is used to test the hypothesis of independency of two attributes
The statistic is given by:
..Where =The number of units that belong to category i of A and j of B.
= Expected frequency that belong to category i of A and j of B and is given by
Where Ri=the raw total
190

Cj= the column total.
n=total number of observation.
Remarks:
- The null and alternative hypothesis may be stated as:

H0: There is no association between A and B.
H1: not H0 (There is association between A and B).
Decision Rule:
- Reject H0 for independency at α level of significance if the calculated value of exceeds

the tabulated value with degree of freedom equal to (c-1) (r-1).
Example 8.12
In an experiment to study the dependence of hypertension on smoking habits, the following

data are taken on 180 individuals
Non
Moderate smoker Heavy smokers Total
smoker
Hypertension 21(33.5) 36 (29.47) 30(23.68) 87
No Hypertension 48(35.365) 26(32.03) 19(25.32) 93
Total 69 62 49 180
At .Test weather presence or absence of hypertension depends on smoking habit?
Solution
: Presence or absence of hypertension is independent of smoking habit
H1: Ho is not true.
191

Decision: Since 14.46>5.99 we reject the null hypothesis
Conclusion: Smoking and presence and absence of hypertension is related
Activity 8.2
A researcher is interested to assess the effect of litracy on family planning use. Accordingly
he collected data and tabulated the findings in the following manner. Can we say there is
association between educational status and family planning use?
FP Use Educational Status Total

Ilitrate Litrate
Yes a 63 b 49 112
No c 15 d 33 48
Total 78 82 160
192

CHAPTER SUMMARY
In this chapter we have seen some important points such as:
 Statistical inference involves the procedures of reaching conclusions about a

population 1.5 on sample variance
 There are two types of inferences. These are estimation and tests of
hypothesis
 There are two types of estimations. These are point estimation and Interval
estimation
 In point estimation a single sample result is used to approximate the

population parameter value, while in the interval estimation range of values is used to
estimate the population parametric value.
 The formula that we use for a particular confidence interval estimation

depends on the availability of the population variance and the size of the sample under
consideration
 The degrees of confidence, the maximum allowable errors are the three
important factors needed in the determination of the sample size for a particular problem
 Hypothesis is an idea about a given population parameter
 Test of hypothesis is the procedure we follow either to accept or reject the hypothesis.
 The type of distribution we use for a particular problem

depends on the sampling distribution of the sample statistic under consideration. For
testing about the mean, sample size and the availability of the population variance are the
two most important factors to determine the distribution to be used for a test.
Exercises for Chapter 8
193

1. A travel agent estimates that the average cost of three day trip to a park is 915.60 . People
who schedule the trip paid an average cost of 927 of the fee. The population S.d is 35.
At =0.05 test whether .
2. The mean life time of a sample 16 light bulbs is 1570 hrs with standard deviation of 110
hours test the hypothesis that there is some improvement in the mean life of time o f light
bulbs at =0.05
3. A sociologist claims that the average age of murderer victims in small city is less than or
equal 23.2 yrs. A sample of 18 recent victims had a mean age of 22.6 at =0.05 test the
sociologists claim the population s.d is 2 years
4. A sample of 50 days showed that a fast food restaurant served 182 customers during lunch
time. The standard deviation of a sample was 8. Find the 95% CI for the mean N.
5. The president of a large university wants to estimate the average age of the students
presently enrolled. From past studies the standard deviation is known to be 2 year. A
sample of 50 students is selected and the mean is found to be 23.2 years. Contract 95% CI
for the population mean
6. A samples of 16 private-duty nurses showed an average salary of 480 birr. A standard

deviation of the sample was 64. Contract the 95% CI for all nurses in private- duty.
7. A theory predicts that the population of beans in the 4 groups A, B,C,D should be in the
ratio 9:3:3:1. In an experiment among 1600 bean, the number in the four groups are
882,313,287 & 118. Does observed mean that support the theory
194

8. A geneticist took a random sample of 300 men to study whether there is association
between father and son regarding boldness. He obtained the following results.
Son
Father Bold Not
Bold 85 59
Not 65 91
Using α=5% test whether there is association between father and son regarding boldness.
9. Random samples of 200 men, all retired were classified according to education and
number of children is as shown below
Number of children
0-1 2-3 Over 3

Education level
Elementary 14 37 32
Secondary and above 31 59 27
195

196

CHAPTER 9
TWO SAMPLE INFERENCES
CONTENTS
9.1. INFERENCES ABOUT DIFFERENCES BETWEEN MEANS 187
9.2. INFERENCES ABOUT DIFFERENCES BETWEEN
PROPORTIONS 194
9.3. INFERENCES CONCERNING VARIANCES 198
INTRODUCTION
Dear learner, in the previous chapter, you have been introduced to the two problems of
statistical inference; namely, statistical estimation and tests of hypothesis, though restricted
to one mean and one proportion. This chapter is a natural continuation of the previous.
The general focus of this chapter is on testing hypotheses and constructing confidence
intervals about parameters (means and proportions) from two populations, thereby enabling
you to meet the following objectives:
 Test hypotheses and construct confidence intervals about the difference between two
population means and proportions using data from large samples.
 Test hypotheses and establish confidence intervals about the difference between two
population means and proportions using data from small samples when the
population variances are unknown and the populations are normally distributed.
197

 Test hypotheses and construct confidence intervals about two population variances
when the two populations are normally distributed.
9.1. INFERENCES ABOUT DIFFERENCES BETWEEN MEANS
In single-sample inference (Chapters 7 & 8) the process is always the same:

(1) Obtain a random sample and
(2) Conduct the appropriate analysis (hypothesis test or interval estimate)
In two-sample inference, you now get to be involved more directly in deciding how to obtain
the sample data:
Example 9.1
Suppose you want to compare two different methods of production, A and B, to see which,
on average, requires less time. You could decide to use either of the two following
sampling plans:
“independent-samples approach” “paired-samples approach”
(1) Have a random sample of 20 (1) Obtain one sample of 25
people use method A and measure people and
the time each takes to complete (2) Have each person use the
production task. method A, then
(2) Do the same thing for a different (3) Have each person use
random sample of, say, 30 people method B, then
(3) Compare the average completion (4) Compare the method A
times for the two groups results to the method B
results for each person
198

Deciding on how to obtain the data for comparing two (or more) averages is called
Experimental Design (also called Design of Experiments and abbreviated DOE)
Although two-sample inference is the simplest kind of Experimental Design, most of the
important concepts of Experimental Design are illustrated in the two-sample case:
(1) The independent-samples method (for comparing 2 averages) described above

generalizes to what is called the completely randomized design
(2) Similarly, the paired-samples method generalizes to something called the
randomized block design
9.1.1. Comparing two means using independent samples
Goal: Compare two population means A and B by comparing the sample means and
from two random samples, one taken from population A and the other from population
B.
Data layout:
Sample A Sample B
x1 y1
x2 y2
x3 y3
: :
xnA ynB
Note: The two sample sizes nA and nB don’t have to be equal.
Statistics (sample means & standard deviations) calculated from data:

Sample A Sample B
199

and sA and sB
(3) The analysis then proceeds slightly differently depending on whether the
populations standard deviations are known/given or not:
If both A and B are known If both A and B are unknown
Statistic: - Statistic: -
Standard Error: Standard Error:
Distribution: z Distribution: t
Degrees of Freedom, :
=
Where and
Note:  must always lie between min(nA-1,nB-1) and nA+nB-2.
Furthermore, the formula will usually not give an integer value, and it is recommended that
you round your result down to the next nearest integer.
200

(4) Confidence Interval estimate of A-B
 z/2  t/2
d.f. =  (from formula in step 3)
(5) Hypothesis Test of H0: A-B = D0
Note: In the vast majority of applications, D 0 is usually 0 because we are usually interested
in simply testing whether the two means are equal or not (i.e., whether or not A-B =
0 or A-B < 0, or >0,or  0)
z= t=
d.f. =  (from formula in step 4)
Note: In the case where A and B are unknown, the text gives an additional method for
comparing the population means. This method “pools” the values of s A and sB
together.
The really good news is that you can ignore the method based on pooling because it has
recently been shown in the statistics literature that this method is unnecessary and doesn’t
lead to any better results than the method the text describes above (for the A, B unknown
case).
201

So, simply use one of the two methods (A,B known or A, B unknown) described in these
notes when using independent samples.
Example 9.2
The problem explicitly states that independent samples are used, but you could have seen
that by just noticing that the sample sizes n A = 17 and nB = 12 are different (i.e., the samples
couldn’t possibly have been paired)
Since the population standard deviations are not given/known, we must use the t distribution
for conducting hypothesis tests and constructing confidence intervals:
Comparing two means using paired samples
Goal: Compare two population means A and B by taking one random sample of items and
measuring them under two different conditions, A and B. The basic idea behind this is that
many extraneous sources of variation in the population can be filtered out by pairing, which
then leaves a clearer picture of the true difference between the means.
For example, think of testing a new drug by measuring peoples’ responses before (A) and
after (B) they take the drug. By comparing the i th person’s individual responses, xi versus yi
(before & after), all of the extraneous factors related to this individual’s life style are
automatically “filtered out” and the difference x i-yi only measures the actual response of that
person to the drug.
In general, pairing is a better thing to do (than independent samples) if pairing is physically

possible for the situation you are studying.
Data layout:
item # sample A sample B difference (A - B)
1 x1 y1 d1
202

2 x2 y2 d2
3 x3 y3 d3
: : : :
n xn yn dn
Note: The two sample sizes must be equal since the same n items in the random sample are
being measured twice.
Statistics (sample means & standard deviations) calculated from the differences, di:
Mean of the differences:
Standard deviation of the differences: sd
The analysis then proceeds exactly as if you were doing single-sample inference for a mean
using t distribution:
Statistic:
Standard error: (n = the number of pairs)
Distribution: t (d.f. = n-1)
(1-)100% confidence Interval estimate of A-B
 t/2 (d.f. = n-1)
Hypothesis Tests of H0: A-B = D0
Test statistic: t = (d.f. = n-1)
203

Example 9.3
(a) These samples are definitely “paired” because the each car is measured twice, once for
shock absorber A and once for B.
Car # Brand A Brand B difference (A-B)
(manufacturer) (competitor)
1 8.8 8.4 .4
2 10.5 10.1 .4
3 12.5 12.0 .5
4 9.7 9.3 .4
5 9.6 9.0 .6
6 13.2 13.0 .2
= .416666, sd = .132916
Test of H0:A-B = 0 versus a 2-sided alternative using  = 0.05:
t= = = 7.6787. The critical t value ( = .05) is
±t.025(d.f. = 6-1 = 5) = ± 2.571. Since t = 7.6787 exceeds t.025 = 2.571,
We can conclude that this data does show that there is a difference between the mean
strengths of the two brands of shock absorbers.
What if you make a mistake in the beginning and think that these samples are independent?
t= = = 0.4043.
Next,
204

= = 0.5116 and = = 0.5507
= = = 9.986, which rounds down to
 = 9. Therefore, the critical values are ±t.025(d.f. = 9) = ± 2.262.
As you can quickly see, t = 0.4043 doesn’t fall in wither tail of the rejection region, so the
(false) conclusion would be that there is no difference between the two population means.
The moral of this story: Mistakenly using the independent samples test (in those cases
when the paired samples test should be used) can lead to incorrect conclusions (so be careful
to correctly identify when to use the independent versus paired samples approach).
9.2 INFERENCES ABOUT THE DIFFERENCES BETWEEN

PROPORTIONS
Goal: Compare two population proportions p A and pB by comparing the sample proportions
and from two random samples, one taken from population A and the other from
population B.
Data layout:
Sample A Sample B
XA = # of ‘successes” YB = # of ‘successes”
nA = sample size nB = sample size
205

Note: The two sample sizes nA and nB do not have to be equal.
Statistics (sample proportions) calculated from data:
Sample A Sample B
= =
The analysis proceeds a little differently depending on whether you are doing a confidence
interval or a hypothesis test:
Confidence Interval Hypothesis Test
Statistic: - Statistic: -
Standard error: Standard error:
Distribution: Z, in both cases.
where and = 1-
Note: The text limits its hypothesis tests for proportions to the most common case, where D 0
is 0. The standard error above is based on the assumption that D0 = 0.
(1-)100% Confidence Interval estimate of pA-pB:
( - )  z/2
206

Hypothesis Test of H0: pA-pB = 0
Test statistic: z= where
Sample size formulas for estimating pA-pB or A-B
The method for finding the minimum necessary sample sizes, nA and nB for estimating either pA-pB or
A-B is the same: set the desired margin of error, ME, that you are willing to accept equal to the
half-width of the confidence intervals and solve for the sample sizes.
Since this will result is one equation with two unknowns (n A and nB), we usually have to
impose some other condition on the two sample sizes. One of the most frequently used
conditions is that one sample be a fixed (specified) multiple of the other. So, let us assume
that:
nA = r-nB
where r is a constant that you specify in advance. For example, samples from population A
might be cheaper to obtain than samples from population B, so you might want to specify
that twice as many sampled items are taken from A as from B. In that case, you would use a
value of r = 2.
For estimating A-B:
Set: ME = z/2 . Then use the fact that nA = r-nB to write
ME = z/2 . Solve to find nB = . Therefore, we would use
samples of size:
nB = and nA = rnB
207

Note: the text only discusses the case of equal sample sizes (when r = 1)
For estimating pA-pB
Set ME = z/2 .
Then use the fact that nA = r-nB to get ME = z/2 .
Solve to find nB = .
Therefore, we would use samples of size:
nB = and nA = rnB
Notes: (a) The text only discusses the case where r=1 (i.e., equal sample sizes)
(b) Also, to use this formula you have to first come up with reasonable guesses
(estimates or bounds) for pAand pB.
(c)The most conservative (i.e., largest sample size) thing to do is use pA = pB = .5.
Otherwise, use upper (or lower) bounds on pA and pB if you know of some.
208

9.3 INFERENCES CONCERNING VARIANCES
The F distribution can be shown to be the appropriate probability model for the ratio of the
variances of two samples taken independently from the same normally distributed
population, with there being a different F distribution for every combination of the degrees
of freedom (df) associated with each sample. For each sample, df =n - 1. The statistic that is
used to test the null hypothesis that two population variances are equal is
Since each sample variance is an unbiased estimator of the same population variance, the
long-run expected value of the above ratio is about 1.0. (Note: The expected value is not
exactly 1.0, but rather is df2= (df2 - 2), for mathematical reasons that are outside of the scope
of this outline.) However, for any given pair of samples the sample variances are not likely
to be identical in value, even though the null hypothesis is true. Since this ratio is known to
follow an F distribution, this probability distribution can be used in conjunction with testing
the difference between two variances. Although a necessary mathematical assumption is that
the two populations are normally distributed, the F test has been demonstrated to be
relatively robust, and insensitive to departures from normality when each population is
unimodal and the sample sizes are about equal.
Example 9.4
For a random sample of n1=10 life bulbs the mean bulb light x1-bar =400hrs, with S1=200.
For another brand of bulb whose useful life is assumed to be normally distributed, a random
sample of n2=8 as a ample mean of x2-bar=4300 hour and a sample standard deviation of
S2=250. Test the null hypothesis that the samples were obtained from populations with equal
variances, using the 10 percent level of significance for the test, by use of the F distribution:
209

For the test at the 10 percent level of significance, the upper 5 percent point for F and the
lower 5 percent point for F are the critical values:
Since the computed F ratio is neither smaller than 0.304 nor larger than 3.68, it is in the
region of acceptance of the null hypothesis. Thus, the assumption that the variances of the
two populations are equal cannot be rejected at the 10 percent level of significance.
210

CHAPTER SUMMARY
 Confidence Interval estimate of A-B
 z/2  t/2
 (1-)100% Confidence Interval estimate of pA-pB:
( - )  z/2
Hypothesis Test of H0: pA-pB = 0
Test statistic: z= where
The method for finding the minimum necessary sample sizes, nA and nB for estimating either pA-pB or
A-B is the same: set the desired margin of error, ME, that you are willing to accept equal to the
half-width of the confidence intervals and solve for the sample sizes.
 The statistic that is used to test the null hypothesis that two population variances are
equal is
211

1. There are two populations. A sample of size 120 from one of the populations gave a mean
of 15 and a standard deviation of 1.3. A sample of size 88 from the other population gave a
mean of 13.5 and a standard deviation of 1.5. Find a 98% confidence interval for the
difference between the population means. Given sample information:
2. The average length of twenty trout caught in a lake was 10.8 inches with standard
deviation of 2.3 inches, and the average length of fifteen trout caught in another lake was
9.7 inches with standard deviation of 1.5 inches Construct a 90 percent confidence
interval for the difference in the true mean lengths of trout in the two lakes.
x1 :10.8 Sx1 : 2.3 n1 : 20

x 2 : 9.7 Sx 2 :1.5 n2 : 15
C  Level : .9 Pooled : Yes
3. A farmer tried Feed A on 256 cattle and Feed B on 144 cattle. The mean weight of cattle
given Feed A was found to be 1350 pounds with a standard deviation of 180 pounds. On
the other hand, the mean weight of the cattle given Feed B was found to be 1430 pounds
with a standard deviation of 210 pounds. At the 5 percent level of significance, is Feed B
significantly better than Feed A? Find the p-value
4. At a certain university twelve voters were picked at random from those who are in favor
of impeachment of the president, and ten were selected at random from those who are
against. The following table give their ages.
In favor 27 34 28 30 29 50 30 44 29 32 41 35
Against 31 36 43 40 32 48 30 29 42 49
H0: 1-2=0 H1: 1-2≠0

212

At a 10% level of significance, is it true that the age of those in favor of impeachment
significantly differs from the age of those against. Use a two sample t-test with pooled
variance
5. Paired t-test.
Dr. Williams claims that the special diet that he recommends significantly reduces weight. A
sample of eight persons was selected and they were put on the diet for a period of 6 weeks.
The table below shows the weights (in pounds) of those eight person before and after dieting.
Before 182 180 195 178 177 221 198 208
After 168 183 187 169 161 204 194 196
a) Construct a 99% confidence interval for the mean difference d in weight before
and after using the dieting recommended by Dr. Williams. Use a paired difference
sd
d  t
n .
2
b) Using a 1% level of significance, can you conclude that the mean weight loss for
all persons due to this special diet is greater than zero?
6. In a study to estimate the proportion of residences in a certain city and its suburbs that
subscribe to a certain magazine, it is found that 63 of 120 urban residences subscribe
while only 34 of 125 suburban residences subscribe. Find a 90% confidence interval for
the difference in the proportion of urban and suburban residences that subscribe to writer's
digest.
7. A jar containing 130 mosquitoes was sprayed with an insecticide of Brand A and it was
found that 98 of them were killed. When another jar containing 150 mosquitoes of the same
type was sprayed with Brand B. 120 of them were killed. At the 2 percent level of
significance, do the two brands differ in their effectiveness?
H 0 : p1  p2 H a : p1  p 2 two tailed test
213

CHAPTER 10
SIMPLE LINEAR REGRESSION AND CORRELATION
CONTENTS
10.1. SIMPLE LINEAR REGRESSION (REGRESSION OF Y ON X) 204
10.2. THE COVARIANCE AND THE CORRELATION COEFFICIENT 208
10.3. THE RANK CORRELATION COEFFICIENT 214
INTRODUCTION
Most of the analysis discussed in the previous chapters deal with one variable case. Some
times, how ever, we are interested in determining the degrees of relation ship between two or
more variables and even we try to estimate by how much one variable related to it changes
by one. Regression and correlation analysis are used to study relationships among variables.
This chapter introduces you to such and related issues
Objectives
After completion of this chapter students will be able to:
 Explain the meaning of regression
 Explain the meaning of correlation
 Draw scatter diagram to identify the type of relation ship that exists between
variables
 Differentiate between dependent and independent variable
 Compute and interpret the regression coefficients
 Compute and interpret the coefficient of linear regression
214

10.1 SIMPLE LINEAR REGRESSION
Linear regression and correlation is studying and measuring the linear relation ship among
two or more variables. When only two variables are involved, the analysis is referred to as
simple correlation and simple linear regression analysis, and when there are more than two
variables the term multiple regression and partial correlation is used.
10.1.1 DEFINITION
Regression Analysis: is a statistical technique that can be used to develop a mathematical

equation showing how variables are related.
Correlation Analysis: deals with the measurement of the closeness of the relation ship
which are described in the regression equation.
We say there is correlation when the two series of items vary together directly or inversely.
In simple linear regression analysis, two variables are under study/one independent and one
dependent.
i) The independent (explanatory) variable
A variable whose value is used to estimate the value of the dependent variable. It is
denoted by Y
ii) The dependent (response) variable
Is a variable whose value is estimated by the independent variable. It is denoted by X.
10.1.2 FITTING LINEAR REGRESSION BY LEAST SQUARES METHOD
Regression equation of Y on X and X on Y
The simple linear regression madder of Y on X is given
215
INTRODUCTION TO STATISTICS: Stat 281
Where B0 & B1 represent the intercept and the slope (they are called parameters,
regression coefficient)
i - is the random error term
The random error term, i, is included in the model to represent the following two
phenomena.
1. Missing or omitted variables
2. Random variation
Assumptions
3. The random error term, i, has a mean equal to zero
4. The errors associated with difference observations are independent
5. For any given Xi, the distribution of errors in normal.
6. The distribution of population errors for X has the same (constant) deviation
which is denoted by
Note:
Estimation of the regression coefficients
One of the methods that help us to find the estimates of B0 & B1 is the least squares method
or ordinary least squares method (OLS))
The resulting estimates of B0 & B1 denoted by & , respectively are called the Least
squares Estimates
Note:- This method gives the values and such that the sum of squares errors is
minimum. i.e. We minimize
216

(SS Residual)
Where Yi = the actual value
= the estimated value
To minimize we take partial derivates respect to and and get values of
& by equating the derivatives to zero.
& derivate it
i.e.
Those will lead to the following normal equations:
When we solve the two equations, we get the least squares values B0 & B1
Then the estimated regression lone (the least squares regression lines, will be given by
Sometimes called the regression of Y on X
217

Interpretation of &
The value of gives the predicted or the mean value of Y for X = 0. The value of , gives
the average change in Y (dependent variable) due to a change of one unit in X (independent
variable).
Example 10.1
Find the least squares regression line for the data on the final marks & number of
hours spent on studying
Xi 8 5 13 10 6 18 15 2 9 11
Yi 65 94 72 70 54 90 85 33 56 29
Solution
The Least square regression line is
 On average the final mark of a student increased by 3.59 for a one

hour increase in the number of hours spent on studying
218

 29.88 indicated the expected mark of a student who spent zero hour
on studying
The least square regression model is given by
10.2 THE COVARIANCE AND CORRELATION COEFFICIENT
CORRELATION
Correlation is a statistical method used to determine whether a relationship between a

variable exists
MEASURING SIMPLE LINEAR COEFFICIENT
SCATTER DIAGRAM
- is a graph that pertains the relationship between two variables
Strong +ve relationship
219

Weak +ve linear relationship
–ve linear relationship
Weak –ve linear relationship
COVARIANCE
If (X1, Y1), (X2, Y2)…., (Xn, Yn) are n pairs of observations of the variables X and Y in a bi-
variety distribution, then
sxy = Cov (XY)
And
Pearson correlation Coefficient
COEFFICIENT OF CORRELATION
This is a numerical measure of strength of direction of linear relationship between two

variables. The symbol for the sample correlation coefficient is r and that of population
correlation coefficient is (rho)
The sample correlation r is calculated as
220

where
Calculate correlation coefficient for Example 1 above.
Solution:
Since r is positive & close to 1, is indicates there is a strong positive linear relation ship
between the number of hours spent on studying and the final marks.
(i.e r is between – 1< r < 1, if r = -1, there is strong negative linear relation ship & if r=1,
there is strong positive relationship & if it is 0, no linear relation ship between the Y-
dependent and X independent variable)
221

RELATION SHIPS AMONG REGRESSION SLOPES, CORRELATION

COEFFICIENT, COVARIANCE & VARIANCE
Regression coefficient Regression Coefficient

of X on Y (bxy) of Y on X (byx)
i) bXY = i)
ii) ii)
iii) iii)
Example 10.3
Two variables have the regression lines with equations:
3X + 2Y = 26 and 6X + Y = 31
Calculate i) correlation coefficient between X and Y
ii) Standard deviation of Y if variance of X is 25
Solution:
Let us suppose that
3X + 2Y = 26 ….( *)
6X + Y = 31 ……(**) are the lines of regression of Y on X and X on Y respectively
222

r2 is called the coefficient of determination, which is a better indicator of the strength of a

relationship than the correlation coefficient. It is better because it identifies the percentage of
variation of the dependent variable that is directly attributable to the variation of the
independent variable.
But since both the regression coefficient are negative
i) r= -0.5
(Since r2<1, our assumption that (*) & (**) are the liner regression of Y on X & X on Y
respectively is true)
 B1; the slope of regression of Y on X
223

Similarly, the coefficient of regression of X on Y indicates the change in the value of
variable X corresponding to a unit change in the value of variable Y and is =
Correlation coefficient is the g geometric mean between the regression coefficient
To show relation ship
n X iYi    X i   Yi 
Y on X : bYX  ..........1
n X i 2    X i 
2
224

Using equation 7 and 8, we get:
Using equation 8 & 9 into equation 2 we get bXY:
Using equation 8, 7 & 9 in to equation 3, we get:
225

10.3. THE RANK CORRELATION COEFFICIENT
We calculate what is called Spearman’s rank correlation coefficient as follows:

Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula
Where
D= the difference between paired ranks

n=the number of pairs
Example 10.4
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is correlation
between the tests of the ladies
Lipsticks A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7
Solution:
RX 2 1 4 3 5 7 6 Total
RY 1 3 2 4 5 6 7
D=RX-RY 1 -2 2 -1 0 1 -1
D2 1 4 4 1 0 1 1 12
Yes, there is positive correlation. (i.e r is between – 1< r < 1, if r = -1, there is strong
negative correlation & if r=1, there is strong positive correlation & if it is 0, no correlation ).
CHAPTER SUMMARY
226

Many relation ships among variables exist in real world. One way to determine whether a
relation ship exists is to use the statistical techniques known as correlation and regression.
The strength and direction of the relation ship is measured by the value of the correlation
coefficient. It can assume values between and including -1 and +1.
The closer the coefficient to +1 or -1, the stronger the relation ship is between the variables.
A value of +1 or -1 indicates a perfect relation ship. A positive relation ship between two
variables means that for small values of the independent variables, the values of the
dependent variable will be small, and for large values of the independent variables, the
values of the dependent variable will be large.
A negative relation ship between two variables means that for small values of the
independent variable the values of the dependent variable will be large and for that large
values of the independent variable, the values of the dependent variable will be small
Relation ship can be linear or curvilinear, to determine the shape, one draws a scatter plot of
the variables. If the relation ship is linear, the data can be approximated by a straight line
called regression line or the line of best fit.
227

1. A study was reported in a medical journal suggesting that the peak heart rate on
individual can reach during intensive exercise decreases with age. A cardiologist
wanted to do his own study treadmill at 6 miles per hour and their age their heart rates
were recorded as follows.
Age(X) 30 30 40 20 20 45 30 45 50
Heart rate(Y) 190 180 180 200 195 170 180 175 165
a) Find the least square regression of Y on X.
b) For an 80 years old man, what peak heart rate do you predict?
c) Calculate the Pearsonian coefficient of correlation.
2. Given the following data:
, , , , 2 =775, n=100.
Based on the above data find
a) The two lines of regression. b) The sample variances of X and Y.
c) The sample covariance
d) The Karl Pearson’s coefficient of correlation and interpret your result.
3. The equations of two regression lines between two variables are expressed as:
6x + y = 31 and 3x + 2y = 26.
a) Identify which of the two can be called regression of Y on X and which of X on Y.
b) Find: the most probable value of Y when x = 5, , , and r.
c) If , find and .
228

APPENDICES
APPENDIX A
CUMULATIVE AREA OF THE STANDARD NORMAL CURVE from 0 to z
0. .01 .02 .03 .04 .05 .06 .07 .08 .09

0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0754
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
0.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
229

2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998
3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
3.6 .4998 .4998 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.8 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.9 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000
230

APPENDIX B
CUMULATIVE AREA OF THE Student- t CURVE WITH DEGREES OF FREDOM n-1
The t- Distribution
1 3.078 6.314 12.706 31.821 63.657 1
2 1.886 2.920 4.303 6.965 9.925 2
3 1.638 2.353 3.182 4.541 5.841 3
4 1.533 2.132 2.776 3.747 4.604 4
5 1.476 2.015 2.571 3.365 4.035 5
6 1.440 1.943 2.447 3.143 3.707 6
7 1.415 1.895 2.365 2.998 3.499 7
8 1.397 1.860 2.306 2.896 3.355 8
9 1.383 1.833 2.262 2.821 3.250 9
10 1.372 1.812 2.228 2.764 3.169 10
11 1.363 1.796 2.201 2.718 3.106 11
12 1.356 1.782 2.179 2.681 3.055 12
13 1.350 1.771 2.160 2.650 3.012 13
14 1.345 1.761 2.145 2.624 2.977 14
15 1.341 1.753 2.131 2.602 2.947 15
16 1.337 1.746 2.120 2.583 2.921 16

231

17 1.333 1.740 2.110 2.567 2.898 17
18 1.330 1.734 2.101 2.552 2.878 18
19 1.328 1.729 2.093 2.539 2.861 19
20 1.325 1.725 2.086 2.528 2.845 20
21 1.323 1.721 2.080 2.518 2.831 21

22 1.321 1.717 2.074 2.508 2.819 22
23 1.319 1.714 2.069 2.500 2.807 23
24 1.318 1.711 2.064 2.492 2.797 24
25 1.316 1.708 2.060 2.485 2.787 25
26 1.315 1.706 2.056 2.479 2.779 26
27 1.314 1.703 2.052 2.473 2.771 27
28 1.313 1.701 2.048 2.467 2.763 28
29 1.311 1.699 2.045 2.462 2.756 29
1.282 1.645 1.960 2.326 2.576
232

APPENDIX C
CUMULATIVE AREA OF RIGHT TAIL AREAS FOR THE CHI-SQUARE
DISTRIBUTION WITH N-1 DEGREES OF FREEDOM
Right tail areas for the Chi-Square Distribution
1 3.841 5.024 6.635 7.879 1
2 5.991 7.378 9.210 10.597 2
3 7.815 9.348 11.345 12.838 3
4 9.488 11.143 13.277 14.860 4
5 11.070 12.832 15.086 16.750 5
6 12.592 14.449 16.812 18.548 6
7 14.067 16.013 18.475 20.278 7
8 15.507 17.535 20.090 21.955 8
9 16.919 19.023 21.666 23.589 9
10 18.307 20.483 23.209 25.188 10
11 19.675 21.920 24.725 26.757 11
233

12 21.026 23.337 26.217 28.300 12
13 22.362 24.736 27.688 29.819 13
14 23.685 26.119 29.141 31.319 14
15 24.996 27.488 30.578 32.801 15
16 26.296 28.845 32.000 34.267 16
17 27.587 30.191 33.409 35.718 17
18 28.869 31.526 34.805 37.156 18
19 30.144 32.852 36.191 38.582 19
20 31.410 34.170 37.566 39.997 20
21 32.671 35.479 38.932 41.401 21
22 33.924 36.781 40.289 42.796 22
23 35.172 38.076 41.638 44.181 23
24 36.415 39.364 42.980 45.558 24
25 37.652 40.646 44.314 46.928 25
26 38.885 41.923 45.642 48.290 26
27 40.113 43.194 46.963 49.645 27
28 41.337 44.461 48.278 50.993 28
29 42.557 45.722 49.588 52.336 29
30 43.773 46.979 50.892 53.672 30
234

235

References:
1. Rastogi, V. B. (2006). Fundamentals of Biostatistics. Ane Books India, New Delhi.

2. Bluman, A.G. (1995). Elementary Statistics: A Step by Step Approach (2nd edition).
Wm. C. Brown Communications, Inc.
3. Eshetu Wencheko (2000). Introduction to Statistics. Addis Ababa University Press.
4. Freund, J. E. and Simon, G. A. (1998). Modern Elementary Statistics.
5. Spiegel, M. R. (2001). Theory and Problem of Statistics. Schaums Outline Series.
6. Leonard Santana (2009). Applied Linear Statistical Models
7. Michael W. Trosset (2004): An Introduction to Statistical Inference and Its Applications.

Department of Mathematics, College of William & Mary,
Williamsburg,
8. James H. Stapleton (1995). An Introduction to Linear Models. A Wiley-Inter science
Publication JOHN WILEY & SONS, INC.
9. Gotz Rohwer (2010): Models in Statistical Social Research. Library of Congress
Cataloging in Publication Data
10. George Roussas(2003). Introduction to Probability and Statistical Inference . University
of California, Davis
11. Patrice Bertail and Paul Doukhan (2006). Dependence in Probability and Statistics.
Springer Science+Business Media, LLC
12. Leonard J. Kazmier(2004). Theory and Problems of Business Statistics, Fourth Edition.
W. P. Carey School of Business Arizona State University
236

Introduction To Statistics - Doc1

Uploaded by

Copyright:

Available Formats

Introduction To Statistics - Doc1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics - Doc1

Uploaded by

Copyright:

Available Formats

INTRODUCTION TO STATISTICS STAT 281

INTRODUCTION TO THE COURSE

Sufficient examples as well as activities are provided whenever necessary.

By the end of the course, the student should be able to:

 Explain the basic concepts of statistics.

1.1. DEFINITIONS AND CLASSIFICATION OF STATISTICS 3

1.3. DEFINITION OF SOME TERMS 4

After completing this chapter, students are expected to be able to:

 Explain the meaning and uses of statistics

1.1 DEFINITION AND CLASSIFICATION OF STATISTICS

 Statistics as a numerical data (plural meaning)

Prepared by Big Bang, August, 2017 GC

Statistics are measurements, enumerations or estimates analyzed and presented as to

1.2 Stages in statistical investigation

3. Presentation of the data

Prepared by Big Bang, August, 2017 GC

This is the process of re-organization, classification, compilation, and summarization of data to

1.3 Definition of some terms in Statistics

There are two groups of data:

Prepared by Big Bang, August, 2017 GC

 Parameter:- is numerical measurement, which describes some characteristics of a

1.4 Applications, Uses and Limitations of statistics

Some of the applications of statistics:

Prepared by Big Bang, August, 2017 GC

1. Statistics is not suitable to the study of qualitative phenomenon

2. Statistics does not study individuals

3. Statistical laws are not exact

4. Statistics can be easily misused

5. Statistics is only one of the methods of studying a problem

Prepared by Big Bang, August, 2017 GC

1.5 SCALES OF MEASUREMENT

Data can also be classified according to different aspects such as:

In qualitative classification, data are arranged according to attributes.

Sex: male or female

Marital status: married, single, divorce, widowed.

Prepared by Big Bang, August, 2017 GC

Educational standard: Literate or Illiterate.

Rank of instructors: Graduate assistant, assistant lecturer, lecturer, and so on.

II) Depending on time reference

Prepared by Big Bang, August, 2017 GC

III) Depending on scales/Level of measurement

a) Nominal scale variables

Sex, Religion, Nationality, color, are nominal variables.

i) Rank of instructors in a university as graduate assistant, lecturer, and professor is ordinal.

Prepared by Big Bang, August, 2017 GC

1.6 INTRODUCTION TO METHODS OF DATA COLLECTION

Prepared by Big Bang, August, 2017 GC

 If sampling is preferred, decide on sample size, selection method,… etc.

Prepared by Big Bang, August, 2017 GC

SCOPE OR COVERAGE OF DATA COLLECTION

Advantages of census survey

 It is more representative than sample survey

Disadvantages of census survey

 Completeness is impossible when the population is large

ii) Sample survey

Advantage of sample survey over census survey

Prepared by Big Bang, August, 2017 GC

Prepared by Big Bang, August, 2017 GC

1. Broadly, define the term ‘Statistics’.