Statistics For Economics Module Teaching

STATISTICS FOR ECONOMICS
Statistics for Economists
ODA BULTUM UNIVERSITY

COLLEGE OF BUSINESS AND ECONOMICS
DEPARTMENT OF ECONOMICS
STATISTICS FOR ECONOMICS MODULE

TEACHING MATERIAL FOR UNDERGRADUATE
ECONOMICS STUDENTS IN ODA BULTUM UNIVERSITY
COLLEGE OF BUSINESS AND ECONOMICS
DEPARTMENT OF ECONOMICS
Prepared
By:
Abdella Mohammed Ahmed (M.Sc.)
JULY, 2024
CHIRO, ETHIOPIA
_________________________________________________________________________1
ODA BULTUM UNIVERSITY, DEPARTMENT OF ECONOMICS
Introduction
Statistics is the science that deals with the methods of collection, organization, analysis of
data and interpretation of the result, which often leads to the drawing of conclusions. It is
not only scientific methods for collecting, organizing, summarizing, presenting and
analyzing data but also drawing valid conclusions and making reasonable decisions on the
basis of such analysis. Statistics is an art of learning form data
In a narrower sense, the term statistics is used to denote the data themselves or numerical
summary derived from the data, such as mean, median, mode, range etc. Thus, we speak of
employment statistics, accident statistics, etc.
Statistics is also used to mean either statistical data or statistical method. When it is used in
the sense of statistical data, it refers to quantitative aspects of things, and is a numerical
description. The other aspect of statistics is as a body of theories and techniques employed
in analyzing the numerical information and using it to make wise decisions.
The science of statistics is very essential for research and decision making processes in all
aspects of human life. The primary purpose of statistics is to provide information for the
decision-making process. That is why statistics is called a partner in decision making.
_________________________________________________________________________2
CHAPTER ONE
INTRODUCTION
This chapter introduces basic concepts of statistics like, definition of statistics and the two
broad divisions of statistics: descriptive and inferential statistics. It also deals with the
importance of statistics in general and its application in economics in particular. Basics
statistical terms or concepts are also explained here so that students will be familiar with
these concepts and be able to understand the subsequent chapters with out difficulties.
Objectives,
After this chapter, students will be able to
 Define statistics
 Identify the two divisions of statistics
 Understand the meaning of some basic statistical terms or concepts
 List application of statistics in the field of economics
 Identify reasons why we use samples instead of complete enumeration of the entire
population-census
1.1 What is Statistics?

data and interpretation of the result, which often leads to the drawing of conclusions. It is
not only scientific methods for collecting, organizing, summarizing, presenting and
analyzing data but also drawing valid conclusions and making reasonable decisions on the
basis of such analysis. Statistics is an art of learning form data
In a narrower sense, the term statistics is used to denote the data themselves or numerical
summary derived from the data, such as mean, median, mode, range etc. Thus, we speak of
employment statistics, accident statistics, etc.
Statistics is also used to mean either statistical data or statistical method. When it is used in
the sense of statistical data, it refers to quantitative aspects of things, and is a numerical
description. The other aspect of statistics is as a body of theories and techniques employed
in analyzing the numerical information and using it to make wise decisions.
_________________________________________________________________________3
The science of statistics is very essential for research and decision making processes in all
aspects of human life. The primary purpose of statistics is to provide information for the
decision-making process. That is why statistics is called a partner in decision making.
1.2 Classification of Statistics

Statistics can be classified into two: descriptive statistics and inferential statistics
1. Descriptive statistics: refers to the procedures used to organize and summarize
masses of data. Example, frequency distribution, measures of central tendency-
mean, mood and median. And measure of dispersion such as range, variance,
standard deviation, etc
2. Inferential statistics: is an area of statistics in which conclusions about a large body
of data (population) are reached by examining only part of those data (sample).
Inferential statistics includes the methods used to find out something a bout the
population based on a sample.
1.3 Function of Statistics

 Statistics presents facts in a definite form: statistics uses numerical data
and the conclusions stated numerically are definite and hence more
convincing than conclusions stated qualitatively.
 Statistics simplifies complex mass of data: the complex data may be
reduced to totals, averages, percentages etc. and presented either graphically
or diagrammatically using different statistical procedures
 Statistics classifies numerical facts: classification of data into two or more
groups. And this helps us to understand the basic characteristics of the data.
 Statistics furnishes a technique of comparison: certain facts, by
themselves, may be meaningless unless they are capable of being compared
with similar facts at other places or at other period of time.
_________________________________________________________________________4
1.4 Application of Statistics (in Economics)

Statistical data and methods of statistical analysis render valuable assistance in
the proper understanding of the economic problems and formulation of
economic policy. Economic problems almost always involve facts that are
capable of being expressed numerically.
Statistical methods are extensively used in all branches of economics
i. Time-series is used for studying the behavior of price, production, and
consumption of commodities.
ii. Index numbers are useful in economic planning as they indicate the
change over a specified period of time
iii. Demand analysis is used to study the relationship between price of a
commodity and its output.
iv. It is useful for forecasting economic variable based on the existing data
Information.
1.5 Basic concepts

These are some basic concepts or terms in statistics and we usually come across such words
or concepts in most statistical courses and econometrics. These are:
Population: consists of totality of the observations with which we are concerned. Or the
collection of all individuals, items, or data within the scope of the study
Sample: sample is a subset of a population. The member of a sample may be selected using
either probability or non-probability sampling techniques.
Parameter: descriptive measure of the population that has fixed value. Examples are the
population mean, population standard deviation.
Statistic: is a descriptive measure of sample that has different values when different
samples are taken.
Census: the set of measurement from the whole population is called census (a measure of
each item in a population. A census is a complete enumeration of the entire
population.
_________________________________________________________________________5
Sampling error: is the difference between population value (parameter) and the
corresponding sample value (statistics). Sampling error occurs when the sample
does not perfectly represent the population from which it was selected
Sample size: the amount of information or measurements should be collected in the sample
is called sample size.
Sampling frame: sampling frame is the list of population elements from which the sample
will be drawn.
Let us see the above basic concepts using the following data so that it will be very easy to
understand. Let us see the above basic concepts using the following data so that it will be
very easy to understand.
Suppose there are only 10 students in group 1(Economics department). We want to know
the academic performance of the group. The variable that we will be more interested is
grade point average (GPA). We collected the following data form the whole population (10
students)-census.
Students(N) 1 2 3 4 5 6 7 8 9 10
GPA(X) 2.0 2.7 3.2 1.0 2.5 3.5 1.5 3.0 2.5 2.8
Populations mean (µ ) =
X 
24.7
 2.47 (parameter)
N 10
Because of a number of reasons we may not be able to collect data from the whole
population; in that case we collect information from a sample using appropriate sampling
techniques.
Let us take three samples: sample1, sample2 and sample3, with sample size four (n = 4)
Sample1(X1) Sample2(X2) Sample3(X3)

1.0 2.0 1.5
2.5 2.5 2.0
2.7 3.0 1.0
3.5 2.8 2.8
_________________________________________________________________________6
X 1  9.7 X 2  10.3 X 3  7.3


Sample mean ( X ) = Statistics
_
X1 
X 1

9.7 _
 2.425 , X 2 
X 2

10.3 _
 2.575 , X 3 
X 3

7.3
 1.825
n1 4 n2 4 n3 4
Form the above figure you can see that there is difference between the parameter (2.47) and
sample means (statistics=2.425, 2.575, 1.825). This difference is said to be sampling error.
In any of statistical analysis our interest is to reduce, if not possible to eliminate, sampling
error.
1.6 Reasons for Sampling
Even if a decision maker can take a census, there are often reasons to sample. The reasons
fall into the following categories:
1. Time constraint: the major advantage of sampling is that it is much faster than
taking a census.
2. Cost constraint: the cost of taking a census is greater than that of a sample.
3. Improved accuracy: the result of a sample may be more accurate than the result of
a census. A person gathering data from fewer sources tends to be more complete and
thorough in both gathering and tabulation. There are likely to be fewer human errors
than census.
4. Impossibility of census: sometimes taking a complete census to gather information
is economically impossible. Obtaining information may require a change, or
destruction of the item from which information is being gathered.
Summary
data and interpretation of the result. It is the art of learning from data. It is concerned with
the collection of data, their subsequent description, and their analysis, which often leads to
the drawing of conclusions. Statistics can be classified into: descriptive statistics and
inferential statistics
Descriptive statistics: refers to the procedures used to organize and summarize masses of
data. Inferential statistics is an area of statistics in which conclusions about a large body of
data (population) are reached by examining only part of those data (sample).
_________________________________________________________________________7
Statistics can be used to presents facts in a definite form, simplifies complex mass of data,
classifies numerical facts and it furnishes a technique of comparison. A population is treated
as universe and a sample is a fraction or segment of universal. Parameter is the descriptive
measure of population where as statistics is the descriptive measure of sample. Usually
there is difference between parameter and statistics and this difference is said to be
sampling error.
Even if we can take census, there are reasons to take a sample, some of the reasons are: time
constraint, cost, to improve accuracy, impossibility of census.
Exercise
1. Define statistics briefly and explain how useful is statistics for your field of study
2. compare and contrast the following pair of statistical terms
a. population and sample
b. parameter and statistics
c. sample size and sampling error
3. Explain the reasons why in most cases we prefer sampling to census or complete
enumeration of the entire population
4. What does statistics mean in the sense of statistical data?
5. Explain the two main classification of statistics and give examples for each
6. “ statistics is a method of decision-making in the face of uncertainty on the basis of

numerical data.” comment and explain with suitable illustrations
7. “Statistics is a partner in decision making process” explain this concept
8. Suppose you determined the grade point average (GPA) of some members of the
group. What would this represent: a census or a sample?
_________________________________________________________________________8
Chapter two
Probability
Introduction
We live in a world in which we are unable to forecast the future with compete certainty.
Our need to cope up with uncertainty leads us to the study and use of probability theories.
Probability concepts, rules and principles can be applied in many disciplines like statistics,
economics, management, etc. Probability and statistics are related in an important way.
Probability is used as a tool in statistics and economics uses statistics as a tool.
In this chapter, we will see definition of basic probability concepts like experiments,
outcomes, events, and sample space. We will also see methods of assigning probability like
classical, relative frequency and subjective approach. How to find probability of events
using the general addition rule and multiplication rule is also treated in this chapter.
Examples are given in each sub units and finally exercises are given at the end of the
chapter for your better understanding of the chapter.
Objectives;
After this chapter the students will be able to:
 Define probability and other related concepts
 Assign probability to different outcomes
 Find probability of events in an experiment
2. Elementary Probability
Probability is a part of our every day lives. In personal and managerial decisions we face
uncertainty and use probability theories. We live in a world in which we are unable to
forecast the future with complete certainty. Our need to cope up with uncertainty leads us to
the study and use of probability theories.
Probability as a general concept can be defined as the chance of an event occurring and its
value lies between 0 and 1.
Probability concepts, rules and principles can be applied in many disciplines like statistics,
economics, management, etc.
_________________________________________________________________________9
2.1 The Role of Probability in Statistics

Probability and statistics are related in an important way. Probability is used as a tool in
statistics; it allows you to evaluate the reliability of inference about the population when
you have only sample information.
When the population is know, probability is used to describe the likelihood of observing a
particular sample outcome. When the population is unknown and only a sample form that
population is available; probability is used in making statements about the makeup of the
population that is, in making statistical inferences.
2.2 Definition of Basic Concepts (Experiment, Outcome, Events and Sample Space)
The firs technical term we will study is experiment, we will see that most of the other terms
in probability follows quite naturally from this term.
Experiment: any activity that yields a result or an outcome is called an experiment.
Normally, there are a variety of possible outcomes of an experiment and the one that occurs
when the experiment is performed is a matter of chance.
Example: consider the experiment of tossing a coin, there are two possible outcomes: head
(H) and tail (T). The process of under taking tossing a fair coin represents an experiment
An event: is the out come of an experiment. It is the subset of a sample space. The tossing
of a coin experiment the occurrence of head or tail on the upper face of the coin represents
an event.
Sample space is a collection (or set) of some of the possible outcomes from the sample
space. In other words, an event is a subset of the sample space. We say that the event occurs
if, when we perform the experiment.
Examples: 2.1.
1. If we toss a coin once the possible out comes are Head (H) or Tail (T) and
the sample space (S) is S = {H, T}
3. If we toss a die, there are six possible outcomes, therefore, the sample space
is S= {1, 2, 3, 4, 5, 6}
4. Consider the experiment of flipping two coins, the sample space is
S= {HH, HT, TH, TT}
10
_________________________________________________________________________
Probability is a concept that is used to measure the likelihood of occurrence of different

possible events. So given an experiment that could result in any one of the many outcomes,
probability is used to measure the likelihood of occurrence of these outcomes.
2.3. Approaches to define probabilities
There are three basic types of defining (determining) the probability of the occurrence of an
event. These are
1. Classical probability approach
2. Empirical or relative frequency probability approach
3. Subjective probability approach
1. Classical probability approach

Classical probability uses sample space to determine the numerical probability that an event
will happen. Classical probability assumes that all outcomes (events) in the sample space
are equally likely and mutually exclusive. Equally likely events are events that have the
same probability of occurrence with no special reason for a particular outcome to occur
more frequently than the other does. Other hand mutually exclusive event means, events
that cannot occur simultaneously.
For example, when a single die is rolled, each number found on different face of the die
have equal chance to appear on the upper face. In addition, if a card is selected from an
ordinary deck of 52 cards, each card has the same chance of being selected.
Given the assumption , if the sample space of an experiment contain n out come n(s), and
n(A) is the number of outcome of the event A , then the probability of outcome A can be
determine as:
Probability (A) = no of outcome A
no of outcome in the sample space (N)
n( A)
P (A)=
n( S )
Example: In tossing a die, the possible outcomes or the sample space of the experiment is
S  1,2,3,4,5,6
n( S )  6
11
_________________________________________________________________________
If A is the event in which the number appear on the top is even numbers, event A is defined
A  2,4,6
as
n( A)  3
n( A) 3 1
p( A)   
n( S ) 6 2
2. Relative Frequency or Empirical Probability approach
The difference between classical and empirical probability is that classical probability
assumes that certain outcomes are equally likely, while empirical probability relies on
action experiment to determine the likelihood of outcomes. In other words, this approach
the probability of an event can be estimated only through repeated experiment. In such
n
experiment, the relative frequency of the event can serve as its probability
N
Suppose, for example, that a researcher asked 25 people if they liked the taste of a new soft
drink. The response were classified as "Yes”, “No" or "Undecided". The results were
categorized in frequency distribution as shown
Response frequency
Yes 15
No 8
Undecided 2
Total 25
Probability now can be computed for various categories using the relative frequency
approach, i.e., the probability of selecting a person who liked the taste is 15/25 or 3/5
Given a frequency distribution, the probability of an event being in a given class is:
n
P (E) = where n- Frequency for the class (f) and N- total frequency of the
N
distribution.
Response frequency Relative frequency (probability)
Yes 15 0.6
No 8 0.32
Undecided 2 0.08
Total 25 1.00
12
_________________________________________________________________________
3. Subjective Probability
Subjective probability, probabilities of the occurrence of an event determined based on an
educated guess or estimate. This guess is based on the person's professional and life
experience and evaluation of a solution.
For example, a physician might say that because of his diagnosis, there is a 30% chance
that the patient will need an operation.
A probability function is a real valued function defined on the class of all subsets of the
sample space s : the value that is associated with sub set A denoted by p(A).The assignment
of probability for an event must satisfy the following three rules.
1.P( S )  1
2.P( A)  0 For event, A is subset of sample space s
3.P( A  B )  P ( A)  P ( B )ifA B
It means the probability of an event must be non-negative and the probability of the union
of two mutually exclusive events is the sum the probabilities.
2.4 Principles of Counting, Permutations and Combinations

Principles of counting: In many cases, we shall be able to solve a probability problem by
counting the number of points in the sample space without actually listing each element.
The fundamental principle of counting often referred to as multiplication principle, is
stated as follows:
If an operation can be performed in N ways and if each of these a second operation can be
performed in M ways, the two operations can be performed together in NM ways.
Alternatively, if a choice consists of steps, the first of which are "N" possible choices and
the second one has "M" choices, taken together they have NM choices.
Example 1: How many sample points are in the sample space when a pair of dice is thrown
once?
Solution: the first die can land in any one of six ways (N=6 ways). For each of these 6ways
the second die can also land in six ways (M=6 ways), therefore, the pair of dice can land in
NM= (6) (6) =36 ways, see the following table 2.1, there are 36 possible outcomes.
Table 2.1 there are 36 possible outcomes in an experiment of throwing a pair of dice once
13
_________________________________________________________________________
The second die

1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
The first die
2 2,1 2,2 2,3 2,4 2,5, 2,6

3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6
Example 2: A new car buyer has a choice of five body styles, two engines styles, and eight
different colors. How many different car choices does the buyer have?
Solution; there are 528=80 different choices among the cars that could be ordered.
Permutation: is an ordered arrangement of a group of objects. The number of permutation

of N distinct objects is N! (Read N factorial)
Consider the three letters a, b, and c. The possible permutations are abc, acb, bac, bca, cab
and cba. Thus we see that there are 6 distinct arrangements. We could arrive at the answer 6
without actually listing the different orders. There are N1=3 choices for the first, then N2=2
for the second, leaving only N3=1 choice for the last position, giving a total of N1N2N3
=(3)(2)(1)=6 permutations. In general, N distinct objects can be arranged in N(N-1)(N-
2)...(3)(2)(1) ways . We represent this product by the symbol N! This is read “N factorial”
by definition 1 = 1 and 0! = 1.
The permutation of n distinct objects arranged in a circle is (n-1)! The number of distinct
permutations of n things of which n1are of one kind, n2 of a second kind,...,nk of Kth kind is
n!
n1!n2 !...nk !
The number of permutations of n distinct objects taken “r “ at a time is
n!
nPr  Where n= the total number of objects in the group
(n  r )!
r=the number of objects actually selected
Example 1: In how many different ways can a party of 7 persons arranged themselves
14
_________________________________________________________________________
a) In a row of 7 chairs
Solution: 7!=7654321=5040ways
b) Around a circular table
Solution: (n-1)! = (7-1)!=6!=654321=720ways
Example 2 : How many distinct permutations can be formed from all the letters of each
Word? a) Them b) unusual c) sociological
Solution: a) them has four different letters therefore, the number of permutation is
4! =4321=24 ways
c) The word “unusual” has 3 u‟s, therefore the number of permeations
are:
7!
 840 ways
3!
d) In the word “sociological” the numbers of each letters is (0=3,l=2,i=2
and c=2)
12!
 9,979,200 ways
3!2!2!2!
Example 3: Two lottery tickets are drawn from 20 tickets for the first and second prizes.
Find the number of sample points in the space
Solution: the total number of sample points is

20!
20 P2 =  380
(20  2)!
Example 4: How many different ways can 3 red, 4 yellow and 2 blue bulbs be arranged in a
string of Christmas tree lights with 9 sockets?
Solution: the total number of distinct arrangement is
9!
 1260 Arrangements
3!4!2!
In many problems, we are interested in the number of ways of selecting r objects from n
without regard to order. These selections are made a principle of counting called
combinations.
15
_________________________________________________________________________
Combination: one particular arrangement of a group of objects or person selected from a

large group without regard to order. If order is not important then we can use the
combination formula to count the number of combinations. Combination of n objects taken
r at a time is given by
n!
n Cr =
(n  r )!r!
Example: A bag contains 6 white, 7 red and 5 black balls. Find the chance or probability
that three balls drawn at random are all white
Solution: Before we find the probability, we have to find the different combinations.
Therefore, since the number of balls are 18 = (6+7+5), how many combination of three
balls can be drawn form 18 balls. This is combination of 3 balls out of 18 balls:
18!
18 C3 =  816ways
(18  3)!3!
We can draw 3 balls form 18 balls in 816 different ways.
There are 6 white balls in a bag and how many combinations of 3 balls can be drawn from 6
white balls? This is also combination of 3 balls out of 6 balls, symbolically,
6!
C3   20 Ways
6
6  3!3!
We can draw 3 balls out of 6 balls in 20 different ways.
Therefore, the probability that three balls drawn at a random are all white is given by:
20 5
 =0.025
816 204
2.5 Some Rules of Probability

In application of probability, there is a need to combine the probabilities of related events in
some meaningful ways. Two of the fundamental methods of combining probabilities are:
The Additive Rule and mutually exclusive events
If A and B are any two event in the sample space s, then
P (A B) = P (A) + P (B) – P (A B) this is the same as:
P (A or B) = P (A) + P (B) – P (A  B)
16
_________________________________________________________________________
P( A B C )  P( A)  P( B)  P(C ) )
 P( A C )  P( A B )  P(C C )  P( A B C ) (AC)
2
Example 1: The Probabilities that Abebe passes microeconomics is , and the probability
3
4 1
that he passes statistics . If the probability of passing both courses is , what is the
9 4
probability that Abebe will pass at least one of these courses?
Solution:
If M is the event "passing Macroeconomics I, and S the event "passing statistics," then by
the additive rule we have
P( M S )  P( M )  P( S )  P( M )
2 4 1 31
=(  )  
3 9 4 36
If the events are mutually, exclusive events the addition rule modified as
P (A B) = P (A) +P (B)
P( A B C )  P( A)  P( B)  P(C )
This is because if the events are mutually exclusive P( A B )  0 . A mutually exclusive
event means that when one of the events occurs, none of the other events can occur at the
same time.
Example 1: Find the probability of getting 6, or 4 or 2 on one role of a die.
Solution:
Since the occurrence of the events are mutually exclusive,
P( A B C )  P( A)  P( B)  P(C )
Where A-the event that number 6 appear on the upper face of the die
B- the event that number 4 appear on the upper face of the die.
C-the event that number 2 appear on the upper face of the die.
n( A) 1
P( A)  
n( S ) 6
n( B ) 1
P( B )  
n( S ) 6
17
_________________________________________________________________________
n(C ) 1
P (C )  
n( S ) 6
P( A B C )  P( A)  P( B)  P(C )
1 1 1 3 1
=    
6 6 6 6 2
Conditional probability, independence and multiplication Rule

There is probability rule that can be used to calculate the probability of the intersection of
several events. However, this rule depends on the important statistical concept of
independence or dependence of events.
Two events A and B are said to be independent event if and only if the probability of one
event is not influenced by the occurrence of the other event.
The probability of an event A, given that event B has occurred is called the conditional
probability of A, given B has occurred, denoted by P( A B ) defined as
P( A B )
P( A B ) = if P (B)  0
P( B )
The conditional probability of B, given that A has occurred
P( A B )
P( B A) = if P (A)  0
P( A)
Now let us redefine independence of event in terms of conditional probability. Two event A
and B are said to be independent if and only if either
P( B A) =P(A) or
P( B A) = P (B) otherwise, the events are said to be dependent events.
Once we have determined whether events are independent or dependent, we define

multiplicative rule as follow.
The probability that both events, A and B occur is
P( A B) = P( A) P( B A)
= P ( B ) P( B A)
If A and B are independent P( A B ) = P( A) P ( B )
18
_________________________________________________________________________
Similarly if A, B and C are mutually independent event, then the probability that A, B and
C occur is
P( A B C )  P( A)  P( B)  P(C )
Example: The following table gives the classification of all employees of a company by
Sex and college graduate.
College graduate Not a college Total
(G) graduate (N)
Males (M) 7 20 27
Female (F) 4 9 13
Total 11 29 40
If one of these employees is selected at random for membership on the employee

management committee, what is the probability that this employee is a female and a college
graduate?
Solution: we are to calculate the probability of the intersection of events "Female" (denoted
by F) and "college graduate” denoted by (G). The probability will be calculated using the
formula:
P (F and G) = P (F) P (G/F)
P (FG) = P (F) P (G/F)
Notice that there 13 females among 40 employee, hence, the probability that a female is
selected is P (F) =13/40
There are 4 college graduates among 13 females employees, hence the conditional
probabilities of G given F is
P (G/F) =4/13
Therefore the joint probability of F&G is
P( FandG )orP( F G )  P( F ) P( F G )
13 4 4
( )  0.10
40 13 40
In the same manner, we can compute three other joint probabilities for the table as follows.
27 20 7
P (M and G) =P (M G) = P( M ) P(G M ) = ( )
40 27 40
19
_________________________________________________________________________
27 20
P (M and N) = P( M N ) P( M ) P( N M ) = ( ) = 0.500
40 27
13 9
P (M and N) = P( F N ) P( F ) P( F N ) = ( ) =0.225
40 13
Complement Rule
The complement of an event A is the event that A does not occur –that is the event
consisting of all sample space that is not in event A. It is denoted with A‟. Since the two
events are mutually exclusive, the probability of the occurrence of the event is the sum of
the probability of the complementary event and the probability of the event A.
P (A) +P (A‟) =1
P (A‟) = 1- P (A)
Example 2.13: if the probability that an automobile mechanic will service 3,4,5,6,7, 8 or
more cars on any given workday are respectively 0.12, 0.19, 0.28, 0.24, 0.10 and 0.07.
what is the probability that he will service at least 5 cars on his next day of work?
Solution: let E be the event that at least 5 cars are serviced. Now P(E) = 1 – P(E‟) where E‟
is the event that fewer than 5 cars are served.
Since P (E‟) = 0.12 + 0.19 = 0.31, it follows that
P (E) = 1 – 0.31 = 0.69
Summary
Probability is a part of our every day lives. In personal and managerial decisions we face
uncertainty and use probability theories. We live in a world in which we are unable to
forecast the future with complete certainty. Our need to cope up with uncertainty leads us to
the study and use of probability theories.
Probability as a general concept can be defined as the chance of an event occurring and its
value lies between 0 and 1.
Permutation is an ordered arrangement of a group of objects where as combination is a
particular arrangement of a group of objects or persons selected from a large group without
regard to order. We can assign probabilities to the outcomes using classical probability,
empirical probability and subjective probability approaches.
20
_________________________________________________________________________
In application of probability, there is a need to combine the probabilities of related events in

some meaningful ways. Two of the fundamental methods of combining probabilities are the
addition rule and the multiplication rule.
We can apply the general rule of addition to compute the probabilities of events that are not
mutually exclusive. The multiplication rule stats that: if an operation can be performed in N
ways and if each of these a second operation can be performed in M ways, the two
operations can be performed together in NM ways, this is the fundamental principle of
counting often referred to as multiplication rule. The multiplication rule is used to find joint
probabilities and conditional probabilities of events.
Exercises
1. A news paper editor is going to assign two reporters to cover a political convention.
The assignment will be made from a pool of six women and four men. How many
groups of two reporters can be assigned if
a. Both are to be women?
b. Both are to be men?
c. There is to be one of each sex?
2. In a high school graduating class of 100 students, 54 studied Mathematics,69 studied

History and 35 studied both Mathematics and History. If one of these students is
selected at a random, find the probability that
a. the student took Mathematics or History
b. the student do not take either of these subjects
3. At a large bank 6% of the employees are computer programmers, 50% of the

employees are female, and 2% of the employees are female and computer
programmers. If an employee is selected randomly, what is the probability that
a. the employee is a computer programmer, given that the employee is female
21
_________________________________________________________________________
Chapter three
3. Random Variables and Probability Distributions
3.1. Introduction
The concept of a probability space that completely describes the outcome of a random
experiment has been developed in chapter II. In this chapter, we develop the idea of a
function defined on the outcome of a random experiment, which is a very high- level
definition of a random variable. Thus, the value of a random variable is a random
phenomenon and is a numerically valued random phenomenon.
A random variable could be discrete or continuous. In this chapter, we will see probability
distribution of discrete random variable and probability density function of continuous
random variable. Finally, we will try to see the cumulative probability distribution of
discrete and continuous variable and the expected values, variance and standard deviations
of discrete and continuous random variable.
.
Objectives;
After the end of this chapter, the students will be able to:
 Define random variable
 Determine the probability distribution and cumulative probability distribution
function
 Find the expected value, the variance and the standard deviation of a probability
distribution.
3.2. Definition of a random variable

Consider a random experiment with sample space S. Let w be a sample point in S. we are
interested in assigning a real number to each w  S. A random variable, X (w), is a single -
valued real function that assigned a real number, called the value of X (w), to each sample
point w  S. That is, it is a mapping of the sample space onto the real line.
Generally, a single letter X instead of the function X (w) represents a random variable.
Therefore, in the remainder of the module we use X to denote a random variable. The
22
_________________________________________________________________________
sample space S is called the domain of the random variable X. In addition, the collection of
all numbers that are values of X is called the range of the random variable X.
Example3.1. Suppose that a coin is tossed twice so that the sample space is
S  TT , TH , HT , HH  . Let X represents the number of heads, which can come up. With
each sample point, we can associate a number for X as shown in table below. Thus ,in the
case of TT ( i.e 0 heads), X=0 while for TH ( 1 head) X=1, or HH ( i.e 2 heads) X=2. It
follows that X is a random variable.
Table 3.1
Sample Point TT TH HT HH
X 0 1 1 2
It should be noted that many other random variables could also be defined on this sample
space, for example the square of the number of heads, the number of heads minus the
number of tails, etc.
A random variable could be discrete or continuous. A random variable is discrete if the
number of values it can assume forms a countable set; that is, this set has either a finite
number of elements or its elements are countable infinite in that they can be put into a one –
to –one correspondence with the positive integers. For instance, if the random variable X
represents the number of points obtained in the roll of a single six- sided die, then X takes
on a finite set of possible outcomes: 1,2,3,4,5,6. But if the random variable X is defined
according to the rule “ roll a single six –sided die repeatedly until a 4 appears for the first
time,” then this could happen on the first roll (X=1), on the second roll (X=2), on the third
roll (X=3), and so on . Clearly, there are infinitely many possibilities and thus X assumes
the countable infinite set of values 1,2,3,....
A random variable X is continuous if it can assume an infinite or uncountable number of

values over some interval. For instance, if on a line segment of length L two points A and B
are chosen at random, then a random variable can be defined as X  A  B or the distance
23
_________________________________________________________________________
between A and B. Clearly X assumes an infinite number of values on the interval 0 ≤ X ≤

L. In fact, variables measured in terms of temperature, kilogram and so on, can take on
essentially any real value over some appropriate interval.
3.3 Discrete Probability Distributions

Let X be a discrete random variable and suppose that the possible values which it can
assume are given by x1, x2, x3......., arranged in increasing order of magnitude. Suppose also
that these values are assumed with probabilities given by
P( X  xk )  f ( xk ) k=1,2,3...
It is convenient to introduce the probability function, also referred to as probability
distribution, given by
P( X  x)  f ( x)
In general f ( x) is a probability function if
1. f ( x)  0
2.  f ( x)  1
x
Where the sum in equation ( 2) is taken over all possible values of x, a graph of f(x) is
called a probability graph.
Example3. 2. Find the probability function corresponding to the random variable x of
Example 3.1 and construct a probability graph.

a) Assuming that the coin is fair, therefore, we have
P (TT) = ¼ P(TH) = ¼ P(HT) = ¼ P(HH) = ¼
Then
P( X  0)  P(TT )  1/ 4
P( X  1)  P( HT U TH )  P ( HT )  P (TH )  1/ 4  1/ 4  1/ 2
P( X  2)  P ( HH )  1/ 4
24
_________________________________________________________________________
The probability distribution function is given by
Table 3.2: probability distribution of example 3.2

x 0 1 2
f(x) ¼ ½ 1/4
A probability graph can be obtained by use of a bar chart, as indicted in figure 3.1 or a
histogram, as indicted in figure 3.2. In the bar chart, the sum of the ordinates is 1 while in
the histogram the sum of the rectangular areas is 1. In the case of the histogram, we can
think of the random variable X as being made continuous, eg. X=1 mean that it lies between
0.5 and 1.5
f(x)
1/2
1/4
0 1 2 x
Figure 3.1 Bar chart of probability distribution of example 3.2
f(x)
1/4
0 1 2
x
Figure 3.2 Histogram of probability distribution of example 3.2
25
_________________________________________________________________________
Distribution Functions for Discrete Random Variables

The cumulative distribution function, or briefly the distribution function, for a random
variable X is defined by
P( X  x)  F ( x)
Where x is any real number, i.e. -  < x <  . The distribution function can be obtained
from the probability function by nothing that
F ( x)  P ( X  x)   f (u )
u x
Where the sum on the right is taken over all values of u, for which u  x, conversely the
probability function can be obtained from the distribution function
If X takes on only a finite number of values x1, x2,....,xn then the distribution function is
given by
0    x  x1
 f (x ) x1  x  x2
 1
 f ( x )  f ( x2 ) x2  x  x3
F ( x)   1
.
.

 f ( x1 )  ...  f ( xn ) xn  x  
Example3. 3. Let a random experiment involve the rolling of a pair of fair six-sided dice
and let the random variable X be defined as the sum of the faced showing. See the
following table (table 3.3). The sample space S (the domain of X) and the range of X are
depicted in figure 3.4. The probability associated with the X‟s and the cumulative
probability function F(X) appear in table 3.4
26
_________________________________________________________________________
Table 3.3: the sum of upturned faces in rolling a pair of fair die
The second die
1 2 3 4 5 6
1 2 3 4 5 6 7
The first die
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Table 3.4: the probability distribution and cumulative probability of example 3.3
X f (X ) F(X)
X1 = 2 f ( X1)  1
36 F ( X1)  1
36
X2 = 3 f (X2)  2
36 F(X2)  3
36
X3 = 4 f ( X3)  3
36 F ( X 3 )  6 36
X4 = 5 f (X4)  4
36 F ( X 4 )  10 36
X5 = 6 f ( X 5 )  5 36 F ( X 5 )  15 36
X6 = 7 f ( X 6 )  6 36 F ( X 6 )  2136
X7 = 8 f ( X 7 )  5 36 F ( X 7 )  26 36
X8 = 9 f ( X 8 )  4 36 F ( X 8 )  30 36
X9 = 10 f ( X9 )  3
36 F ( X 9 )  33 36
X10 = 11 f ( X10 )  2 36 F ( X10 )  35 36
X11= 12 f ( X 11 )  1
36 F ( X 11 )  1
11
 f (X )  1
i 1
i
27
_________________________________________________________________________
f x 
7 36
6 36
5 36
4 36
2 36
2 36
1 36 …
11 2 3 4 5 6 7 8 9 10 11 11
12
0 X
Figure 3.3: probability distribution graph of example 3.3
Example 3.4 a) find the distribution function for the random variable x of example 3.2.
b) Obtain its graph
Solution:
a) the distribution function is
0   x  0
1
 0  x 1

F ( x)   4
3 1 x  2
4
1 2 x

28
_________________________________________________________________________
b) The graph of F(x) is shown in the following figure3.4
F(x)
¼ = f (2) = F (2) – F (1)

3/4
1/2 1/2 = f (1) = F (1) – F (0)
1/4
¼=f (0) = F (0) - F (-∞)
0
1 2 x
Figure 3.4
The following things about the above distribution function, which are true in general,
should be noted.
1. The magnitude of the jumps at 0, 1, 2 are ¼, 1/2, ¼ which are precisely the
ordinates in fig3.4. This fact enables one to obtain the probability function
from the distribution function.
2. Because of the appearance of the graph of fig3.4. It is often called a staircase
function or step function. The value of the function at an integer is obtained
from the higher step, thus the value at 1 is ¾ and not ¼. This is expressed
mathematically by stating that the distribution function is continuous from
the right to 0,1, 2.
3. As we proceed from left to right( i.e. going upstairs) the distribution
function either remain the same or increases, taking on values from 0 to 1.
Because of this it is said to be a monotonically increasing function.
29
_________________________________________________________________________
3.4 Continuous Probability Distributions
If X is a continuous random variable the probability that X takes on any one particular value
is generally zero. Therefore, we cannot define a probability function in the same way as for
a discrete random variable. In order to arrive at a particular distribution for a continuous
random variable we note that the probability that X lies between two different values is
meaningful.
In general for continuous random variable X
1. f ( x)  0

2.  f ( x)dx  1

Where the second is a mathematical statement of the fact that a real-valued random variable
must certainly lies between   and  . We then define the probability that X lies between
a and b by
b
P(a  X  b)   f ( x)dx
a
A function f (x) , which satisfies the above requirement, is called a probability function or
probability distribution for a continuous random variable, but it is more often called a
probability density function or simply density function. Any function f (x) satisfying
properties 1 and 2 above will automatically be a density function and required probabilities
b
can then be obtained from P(a  X  b)   f ( x)dx
a
Example 3.5: (a) Find the constant C such that the function
cx 2 0 x3
f ( x)   is a density function and
0 otherwise
(b) Compute P(1  x  2)
30
_________________________________________________________________________
Solution:
(a) Since f (x) satisfies property, (1) if c  0, it must satisfy property 2 in order to be a
density function. Now
 3
1
 f ( x)dx  1   cx dx  1  9c  1  c  9
2
 0
2
1 7
(b) P(1  x  2)   x 2 dx 
1
9 27
In case f(x) is continuous, which we shall assume unless otherwise stated, the probability
that X is equal to any particular value is zero. In such case we can replace either or both of
the signs < by  thus
7
P(1  X  2)  P(1  X  2)  P(1  X  2)  P(1  X  2) 
27
Example 3. 6: The distribution function for a random variable X is

1  e 2 x x0
F ( x)  
0 x0
Find (a) the density function, (b) the probability that X>2, and (c) the probability that -3 <
X < 4.
Solution:
d
(a) f ( x)  F ( x)  2e 2 x x0
dx
 
(b) P( X  2)   2
2e  2u du  e  2u l 2  e  4
Another method
By definition, P (X < 2) = F(2) = 1 - e-4. Hence,
P (X > 2) = 1 – (1 – e-4) = e-4
(c ) 
P 3  X 4   4
3
f (u )du   0du 
0
3 
4
0
2e  2u du
  e  2u / 0  1  e 8
4
31
_________________________________________________________________________
Another method
P(-3 < X < 4) = P(X < 4) - P(X < -3)
= F(4) – F(-3)
= (1 - e-8) – (0) = 1 – e-8
Example3.7. Assume that X be a continuous random variable with the following

Probability function
f(x) = A (2x – x2) 0<x<2

0 otherwise
a) What is the value of A?

b) Find P (X>1]
Solution:
 0 2 
2
 f ( x)dx  1   odx   A(2 x  x )dx   odx  1   A(2 x  x 2 )dx  1
2
0
  0 2
thus , we obtain
1
 2 x 3
A x   1
 3 0
 8  4A
A 4    1
 3 3
3
A
4

b) P( x  1)   f ( x)dx
1

32
41
  3 x
 2 x  x 2 dx   x 2  
4 3
3 4 2  1
 
4  3 3 2
32
_________________________________________________________________________
3.5. The Expected Values of Random Variable, Moment, Skewness and Kurtosis
Expected value of random variable

If x is a random variable, then the expectation (expected value or mean) of x denoted by
E[x], is defined by
E( X )   xi f ( xi ) x discrete

E ( X )   x f ( x)dx x continuous

Where f ( xi ) if the probability distribution of variable xi

Thus, the expected value of x is a weighted average of the possible values that x can take;
where each value is weighted by the probability, that x takes that value.
The first moment is the expected value of X, E(x),

We can also define the central moment (or moment about the mean) of a random variable
these are the moments of the difference between a random variable and its expected value.
the nth central moment is defined by

E (x  x 
n
  xi  x  f ( x)
n
x discrete
 x  x  f ( x)dx
 n
x continu

The central moment for the case of n=2 is very important and carries a special name, the
variance, which is usually denoted by  x , thus,

2

2
x 

2

 
   x  x   x  x 2 2   x  x f ( x)
2
  
 

2
 xx f ( x) dx

The positive square root of the variance is called the standard deviation and is given by
 
2
 x
 var ( x )  E xx
33
_________________________________________________________________________
The variance (or standard deviation) is a measure of the dispersion or scatter of the values
of the random variable about the sample mean ( x ) or population mean( )
If the values tend to be concentrated near the mean, the variance (or standard deviation) is
small while if the values tend to be distributed far from the mean, the variance is large.
Small Variance
Large Variance

Figure 3.5 normal distributions with the same mean () but with different standard
deviations ()
Example 3.8: Find the expected value, variance & standard deviation of
1
 x 0 x2
f ( x)   2
0 otherwise
Solution:
4
Mean = E ( X ) 
3
34
_________________________________________________________________________
2
 2 2
 4   4 2
 4 1

2
 E ( x  )    x   f ( x) dx    x   xdx  2
 3    3 0  3 2 3
2 2
  
9 3
Some Theorems on Variance

1. 2  E [(x-)2] = E(x2) - 2 = E(x2)- [E (x)]2
Where  = E(x)
2. If c is any constant
Var (cx) = c2 var (x)
3. The quantities E [(x-a)2]is a minimum when a = = E (x)
4. If x and y are independent random variables
var ( x  y)  var(x)  var ( y) or  2x  y  2x   2y

var ( x  y)  var ( x)  var ( y) or  2x  y   2x   2y
Example3.9: consider the random variable X1, X2 & X3 with the following probability
distribution:
1
 0x4
f ( x1 )   4
0 otherwise
1
 0.5  X  3.5
f ( x2 )   3
0 otherwise
1
 1 x  3
f ( x3 )  2
0 otherwise
From direct computation of the mean value of these random variables, we see that
35
_________________________________________________________________________
E( x1 )  E( x2 )  E( x3 )  2 . However, there spreads about the mean values are different, see
the figure 3.6
F(x)
1
f(x3)
2
1
f(x2)
3
1
f(x1)
4
0 0.5 1 2 3 3.5 4 A
Figure 3.6 probability distributions with the same expected value or mean (µ=2) and with
different standard deviations ()
In terms of variance, we therefore say that x1 has the largest variance, while x3 has the
smallest variance.
Example3.10: Let x be a continues random variable with
1
 2x6
f ( x)   4
0 otherwise
Find the expected value and variance

Solution:
 6
x x2 36 4
 x f ( x)  
6
E ( x)  dx  / 2
  4

2 4 8 8 8
 

2 
 
6
x  42 dx  4

2 2
 E ( x  x)   x  x f ( x) dx  
x
   2 4 3
36
_________________________________________________________________________
Properties of expected value and variance

The following are the important properties of an expected value of a random variable:
1. The expected value of a constant c is constant. That is, E(c) = c, for every constant
c.
2. The expected value of the product of a constant c and a random variable x is equal to
constant c times the expected value of the random variable. That is, E (cx) = cE(x).
3. The expected value of a linear function of a random variable is same as the linear
function of its expectation. That is, E(a + bx) = a + bE(x)
4. The expected value of the product of two independent random variables is equal to
the product or their individual expected values. That is , E(xy) = E(x) E(y)
5. The expected value of the sum of two independent random variables is equal to the
sum of their individual expected values. That is, E(x+y) = E(x) + E(Y)
6. The variance of the product of a constant and a random variable x is equal to the
constant squared times the variance of the random variable x. That is, Var(cx) =
c2var(X)
7. The variance of the sum (or difference) of two independent random variables equals
the sum of their individual variances. That is , Var (x ± y) = Var(x) ± Var(y)
Moments
A moment of a random variable X is defined as the expected value of some particular
function of X. In general, the moments of a probability distribution amount to a collection
of descriptive measures that can be used to characterize the location and shape of the
distribution. Hence, a probability distribution can be completely specified in terms of its
moments. As we shall now see, moments of a random variable typically are defined in
terms of having either zero or the expectation of X as the reference point.
For X a discrete random variable, the rth moment about zero is
  E ( X ' )   xi f ( xi )
1 r
r
i
37
_________________________________________________________________________

1
(Note that the first moment about zero is the mean of X or  E ( X )   ) and the 4th
r
central moment of x are the rth moment about he mean of X is
 r
 
 E ( X  )r  
i
x    f ( x )
i
r
i
If X is a continuous random variable with probability density f(x), then, provided the
following integrals exist, we may correspondingly define

1 
 E ( X ' )   x r f ( x)dx;
r 
and
 r
 
 E ( X  )r  


x   r f ( x)dx;
It is easily verified that:

a) The zero central moment of X is one (µ 0=E(X-µ)0=E(1)=1)
b) The first central moment of X is zero (µ 1=E(x- µ)=E(X)- µ = 0)
c) The second central moment of X is the variance of X or µ 2 = [E(x- µ)2]=V(X)
The first moment about zero locates the mean or measures central tendency of a probability
distribution and the second moment about the mean describes its shape in terms of variation
or dispersion about the mean. Additional information about the shape of a probability
distribution, as characterized by measures of skewness and kurtosis, are provided by the
third and fourth central moments of X, respectively. In particular, we shall develop
standardized (independent of units and taken relative to ) measures of skewness and
kurtosis.
Skewness
Skewness is the degree of asymmetry, or departure form symmetry, of a distribution. If a
distribution has a longer tail to the right of the central maximum than to the left, the
distribution is said to be skewed to the right, or to have positive skewness. If the reverse is
true, it is said to be skewed to the left, or to have negative skewness.
38
_________________________________________________________________________
In this regard, the third central moment of X is

µ 3 = E [(X- µ)3]
and the standardized third moment or the coefficient of skewness is
3 3
3  
 3 (  2 ) 3 2,
Where the sign of 3 is determined by that of µ 3; that is, for uni-modal probability
distributions:
a. If µ 3 >0, then 3 > 0 and thus X‟s probability distribution is positively skewed or
skewed to the right.
b. If µ 3 < 0, then 3 < 0 and thus X‟s probability distribution is negatively skewed or
skewed to the left.
c. If µ 3 = 0, then 3 = 0 and thus X‟s probability distribution is symmetrical (about the
mean).
Kurtosis
Kurtosis is the degree of peaked ness of a distribution, usually taken relative to a normal
distribution. A distribution having a relatively high peak, is called leptokurtic, while a
distribution which is flat-topped, is called platykurtic. The normal distribution, which is not
very peaked or very flat-topped is called mesokurtic.
The fourth central moment of X is


 4  E  X   4 
and the standardized fourth moment or the coefficient of kurtosis is
4 4
4  
 4
( 2 ) 2
If the peak of X‟s probability distribution mirrors that of a normal distribution, then 4 = 3.
If 4 > 3 (respectively, <3), then the peak of the probability distribution is sharper
(respectively, flatter) than that of a normal distribution.
39
_________________________________________________________________________
For purposes of computational expedience, we may express any central moment of a

random variable X in terms of its moments about zero. Specifically, since
r
r
( X   ) r   (1) j    i X r  j
j 0  j
It follows that,
 r  E ( X   ) r    (1) j   j  r  j
r
r 1
j 0  j
Form the properties of the expectation operator explained earlier, we can readily
demonstrate that:
 2  1   2 ;
1
 3   3  3  2  2 3 ;
1 1
 4   4  4  3  6 2  2  3 4 .
1 1 1
x
If we standardize the random variable X to obtain Z  , then, since E(Z)=0, the rth

central moment of Z can be expressed in terms of the rth central moment of X as
 X    r  1   r ( x)  r ( X )
 ( Z )  E ( Z )  E 
r r
  r  E ( X   ) r
  2
      
   (X ) r 2
r

Also, V(Z) = µ 2(Z) = 1 and 3(Z) = 3(X)and 4(Z) = 4(X).Hence standardizing a random
variable X affects its mean and variance but not its standardized third and fourth moments.
Table 4.5
X F(x)
1 0.2
2 0.3
3 0.4
5 0.1
1.0
40
_________________________________________________________________________
Example 4.6.1. Given the discrete probability distribution in Table 4.5, determine and
interpret its standardized third and fourth moments or the coefficients of skewness and
kurtosis. From (4.21):
  E ( X )   X i f ( xi )  1(0.2)  2(0.3)  3(0.4)  5(0.1)  2.50,
  E( X 2 )  
1
f ( xi)  1(0.2)  4(0.3)  9(0.4)  25(0.1)  7.50,
2
2 X i
 ) X
1
 E( X 3 f ( xi)  1(0.2)  8(0.3)  27(0.4)  125(0.1)  25.90,
3
3 i
 ) X
1
 E( X 4 f ( xi)  1(0.2)  16(0.3)  81(0.4)  625(0.1)  99.9,
4
4 i
Then from the relationship developed earlier, we can get
 2  V ( X )   2   2  7.5  (2.5) 2  1.25,

1
 3   3  3  2  2 3  0.90,
1 1
 4   4  4  3  6 2  2  3 4  4.96.
1 1 1
There fore, coefficient of skweness and kurtosis respectively are,

3 4
3   0.64,  4   3.17
 
31 2
2 2
Since 3>0, this discrete probability distribution is slightly skewed to the right. Moreover,
with 4>0, the distribution has a peak that is slightly sharper than that of a normal
distribution.
Example 4.6.2. Let the probability density function for a continuous random variable X be
2 x, 0  x  1;
f ( x)  
 0 elsewhere.
Then:
 1 2
  E ( X )   xf ( x)dx  2 x 2 dx  x 3 ]0  0.666,
1
 0 3

1  1 1
 E ( X 2 )   x 2 f ( x)dx  2 x 3 dx  x 4 ]  0.500,
1
2  0 2 0
 3  E ( X 3 )   x 3 f ( x)dx  20 x 4 dx  5 x 5 ]0  0.400,

1  1 2 1
 4  E ( X 4 )   x 4 f ( x)dx  20 x 5 dx  3x 6 ]0  0.333,

1  1 1 1
Then from the relationships developed earlier, we can get
41
_________________________________________________________________________
 2  V ( X )   2   2  0.055,
1
 3   3  3  2  2 3  0.007,
1 1
 4   4  4  3  6 2  2  3 4  0.007.
1 1 1
Finally, coefficient of skweness and kurtosis respectively are,
3 4
3   0.534,  4   2.333.
 
32 2
2 2
With 3 < 0 we see that this continuous probability distribution is moderately skewed to the
left, and a4 < 3 indicates that its peak is a bit flatter than that of a normal distribution.
Summary
Generally a random variable is a variable assigned to the different values of probability
distribution. Random variable is represented by a single letter X and the collection of all
numbers that are values of X is called the range of the random variable X.
This chapter developed the concept of function defined on the outcomes of random
phenomena. These functions which are called random variables, can be classified into two
types: discrete random variable that have a set of possible values that are either finite or
accountably infinite, and continuous random variables that can assume an uncountable set
of possible values.
Associated with both types of random variables is the concept of cumulative distribution
function that denotes the probability that a random variable X takes on a value that is less
than or equal to X.
Similarly the probability density function is a nonnegative function associated with a
continuous random variable such as that integrating the probability density function
between two distinct values of the random variable gives the probability that the random
variable takes a value that lies between these two values. Thus, the area under the curve
defined by the probability density function is the probability that the random variable lies
between the values limiting the area. Because of this, the probability that a continuous
random variable takes on a particular value is zero since the area associated with a point is
42
_________________________________________________________________________
zero. Understanding the concept of random variable is key to understanding the rest of the
topics
Exercises
1. suppose that the error in the reaction temperature, in oC, for a controlled laboratory
experiment is a continuous random variable X having the probability density function
X 2
 1  X  2
f ( x)   3
0
 otherwise
a. Is that valid probability density function? Why?
b. Find P(0  X  1)
c. Find the cumulative probability distribution function?
2. the cumulative density function of the random variable X is defined by
0 x2

F ( X )   A( X  2) 2x6
1 x6

a. what is the value of A (1/4)

b. With the above value of A, what is P(>4)? (1/2)
c. with the above value of A, what is P(3  x  5 ) (1/2)
3. Given the following cumulative distribution function,

0 x0
1
 0 x2
2
F(X )  
5 2 x3
6
1 x3

a. f ( 0) (½)
b. f (2) (1/3)
c. f (3) (1/6)
43
_________________________________________________________________________
4. Is the expression F (t )  (1  et )1 a legitimate (continuous) cumulative distribution

function? (yes)
5. Given the following discrete probability distribution,
x 1 5 7 9
f(x) 1/6 2/6 2/6 1/6
Find
a. The expected value of x E(X)
b. The variance of x V(X)
6. Given the probability density function of
2(1  x) 0  x 1
f ( x)  
0 elsewere
Find
a. The expected value of x, E(X)
b. The variance of x, V(X)
7. verify that if X is a discrete or continuous random variable whose variance exists,
then for a and b constant;
a. V(a) = 0
b. V(a + x) = V(x)
c. V(a + bx) = b2V(x)
d. V(x) = E(x2) – E(x)2
8. let the cumulative distribution function for a random variable appear as:
0 x0

F ( x )  2 x  x 2 0  x 1
1 x 1

1
a) P( X  )
4
44
_________________________________________________________________________
1 3
b) P(  x  )
3 4
c) The probability density function
9. Comment on the following statement, the probability mass function of a discrete

random variable X has 1 as an upper bound but the probability density function of a
continuous random variable X need not be bounded.
Is the function
 3 12
 x 0  x 1
f ( x)   2
0
 otherwise
bounded over its domain? Is it a probability density function?
10. A test engineer discovered that the cumulative density function of the life time of an
equipment in years is given by
0 x0
F ( x)   x 5
1  e 0 x
a) What is the expected lifetime of the equipment?
b) What is the variance of the lifetime of the equipment?
45
_________________________________________________________________________
Chapter four
4. Special Probability Distributions and Densities
Introduction
Chapter 3 deals with general probability of random variables. Random
variables with special probability distributions are encountered in different
filed of social and natural sciences, including business and economics. The
objective of this chapter is to describe some of these distributions, including
their expected values and variance. These include the discrete random variable
probability distributions like, Bernoulli distribution, Binomial distribution,
hyper geometric distribution, Poisson distribution. The continuous random
variable: uniform probability distribution and normal distribution are also
discussed here. Examples are given for each distribution and there are
exercises at the end of the chapter so that you can understand each distribution
very well.
Objective;
After this chapter the students will be able to:
 Identifies the different types of special probability distributions
 Uses these special probability distributions to solve problems
 Find the expected values and variance of each special probability
distributions
46
_________________________________________________________________________
4.1. Discrete distribution

In this subsection, we will see some standard discrete distributions. A few of these appear
repeatedly throughout your statistics course.
4.1.1. The Bernoulli distribution: This is perhaps one of the simplest possible discrete
random variables. We say that a random variable x has the Bernoulli (p) distribution if and
only if its probability mass function is given by
f(x) = P (X = x) = Px (1-p) 1-x for x = 0, 1
Where 0 < P <1, here, P is often referred to as a parameter. In applications, we may collect
dichotomous data, for example simply record whether an item is defective (x = 0) or non-
defective (x = 1), whether an individual is married (x = 0) or unmarried (x = 1) or whether a
vaccine works (x = 0) or does not work (x = 1), and so on. In such situation, P stands for
P(x = 1) and 1-P stands for P (x = 0). In other words, Bernoulli distribution deals with a
simple experiment that may result in either of two possible outcomes. We call an
experiment with two possible outcomes Bernoulli trials and we label the two outcomes
success (S) and failure (f). Here the probability of success is the complement of failure.
The Binomial Distribution

An experiment that consists of n fixed repeated independent Bernoulli trials, each with
probability of success p, is called a binominal experiment with n trials and parameter p.
We say that a discrete random variable X has the Binomial (n, P) distribution if and only if
its probability mass function (PMF) is given by
n
f ( x)  P( X  x)    p x (1  P) n  x
 x
n
f ( x)  P( X  x)    P x q n  x where q  1  p
 x
Where 0<P<1. Here again P is referred to as a parameter. Observe that the Bernoulli (P)
distribution is the same as the Binomial (1, P) distribution.
47
_________________________________________________________________________
The Binomial (n, p) distribution arises as follows, consider repeating the Bernoulli
experiment independently n times where each time one observes the out come (0 or 1)
where P = P (X = 1) remains the same throughout.
Binomial distribution is a process in which

a) The process is performed under the same conditions for a fixed and finite number of
times, say “n”.
b) Each trial is independent of other trial. i.e. the probability of an out come for any
particular trial is not influenced by the outcomes of the other trials.
c) Each trial has two mutually exclusive possible outcomes, such as “success” or
“failure”, “non-defective ” or “defective”, “yes” or “no”, „hit‟ or „miss‟ and so on.
The outcomes are usually called success or failure for convenience.
d) The probability of success, P remains constant from trial to trial (so is the
probability of failure q where q = 1-p)
Example4.1: In a short multiple choice quiz suppose that there are ten unrelated questions,
each with five suggested choices as the possible answers, each question has exactly one
correct answer given. An unprepared student guessed all the answers in that quiz. Suppose
that each correct (wrong) answer to a question carries one (zero) point.
a. Find the probability that the student gets 0/10
b. Find the probability that the student gets at least 8
Solution:
Let X stand for the student‟s quiz score.
 1
We can postulate that X has the Binomial  n  10, P   distribution, then
 5
n
P( X  x)    px (1  p) n  x
 x
48
_________________________________________________________________________
10   1 
0 10
4 0
a) P( x  0)         0.10737  ( the probabilit y that the student get )
 0  5 5 10
10   1   4 
8 2
10   1   4 
9 1
10   1 
10 0
4 5
b) P( x  8)                       7.7926 x 10
 8  5  5  9  5 8 10   5  5
 the probabilit y that the studnet ' s result is  8
Example 4. 2: We toss a coin five times and we are interested in the number of heads in
each possible out comes. Find the probability distribution of X. X represents the number of
heads and graph the distribution
Solution :
The probability distribution of X is
No of heads (x) 0 1 2 3 4 5
F(x) = P (X = x) 1 5 10 10 5 1
32 32 32 32 32 32
This probability distribution can be found using Binomial formula. The probability that the
random variable X equals
0 5
 5  1  1 1
f (0)  P( x  0)        
 0  2  2 32
1 4
 5  1  1 5
f (1)  P( x  1)        
1  2   3 32
10
32
5
32
1
32
0 1 2 3 4 5
49
_________________________________________________________________________
Expected value and standard deviation of Binomial Distribution

The expected value of a discrete random variable always can be computed from
n
E ( x)   f ( x ) xi
i 1
i
For a binomial random variable, X has a mean

 =E (x) = np
1 1
Example 4.1 if P= and n=10, then E ( x )  10( )  2
5 5
1 1
Example 4.2: 1 if P= and n=5, then E ( x )  5( )  2.5
2 2
The variance (  2 ) and standard deviation (  ) of Binomial distribution is
 2  npq  (np(1  p))
  npq
4.1.3. Multinomial Distribution

Binomial experiment is the where the possible outcomes are only two. However, the
possible outcomes or categories could be greater than 2, the experiment is said to be
multinomial experiment and the distribution is multinomial distribution
n
p ( x1  n1 and x2  n2 and .... xk  nk )  ( p1 ) n1 ( P2 ) n 2 ( pk ) nk
n1n2 ...nk
Multinomial experiment is simply an extension of a binomial experiment and applies when
there are two or more classes or categories
Example 4.3: suppose that we want to know the probability of drawing two red, one white,
and zero blue marbles from a box that contains four red, two white two blue marbles. If we
randomly draw one marble at a time and replace it before drawing the next, the probability
is given by.
Solution:
n!
P ( X R  2 and X w  1 and X B  0)  ( PR ) n R ( Pw ) n w ( PB ) n B
nR !nw!nB !
3!
 (0.50) 2 (0.25)1 (0.25)0  0.188
2! 1! 0!
50
_________________________________________________________________________
4.1.4. The Hyper geometric Probability Distribution

If the number of elements in the population is large relative to the number in the sample, the
probability of selecting a success on a single trial is equal to the proportion P of success in
the population. Because the population is large in relation to the sample size, the probability
will remain constant (for all practical purposes) from trial to trial, and the number X of
success in the sample will follow a binomial probability distribution. However, if the
number of elements in the population is small in relation to the sample size ( n > 0.05) or
N
if the sampling is with our replacement, the probability of a success for a given trial is
dependent on the outcomes of preceding trials. Then the number X of successes follows
what is known as hypergeometric probability distribution
The formula for calculating the probability of exactly K successes in n trials is given
by or a population contains M successes and N – M failures. The probability of exactly K
successes in a random sample of size n is
M N M
CR Cn  K
P( X  k )  N The mean and variance of hypergeometric distribution
Cn
M 
  n 
N 
M   N M   N n
 2  n    
 N   N   N 1 
Example 4.4: A case of wine has 12 bottles, 3 of which contain spoiled wine. A simple of 4
bottles is randomly selected from the case.
1. Find the probability distribution for x, the number of bottles of spoiled wine in the
sample.
2. What are the mean (µ) and variance ( 2) of x?
Solution:
C x 3C4 x 9
P( x ) 
C412
51
_________________________________________________________________________
The possible values for x are 0, 1, 2 and 3, therefore

3
C0 C 9 4 1(126) 126 14
1) P(0)    
C12 4 495 495 55
3
C1 C 93 3(84) 252 28
P(1)    
C12 4 495 495 55
3
C2 C 9 2 3(36) 108 12
P(2)    
C12 4 495 495 55
3
C3 C 91 1(9) 9 1
P(3)  12
  
C 4 495 495 55
3
2)   4  1
 12 
 3   9   12  4 
2  4     0.5455
 12   12   11 
4.1.5. The Poisson Probability Distribution

Another discrete random variable that has numerous practical applications is the Poisson
random variable. Its probability distribution provides a good model for data that represent
the number of occurrences of a specified event in a given unit of time or space.
Examples of Poisson distribution are:
 the number of calls received during a given period of time
 the number of machine breakdowns during a given day
 the number of traffic accidents on a given road
 typing error made by a typist on a page
In each example, x represents the number of events that occur in a period of time or space
during which an average of  such events can be expected to occur.
Let µ be the average number of times that an event occurs in a certain period of time or
space. The probability of K occurrences of this event is
52
_________________________________________________________________________
 k e 
P( X  K ) 
K!
For values of K = 0, 1, 2, 3, …
The mean and standard deviation of Poisson distribution respectively are:
Mean  
S tan dard divation     the symbole e  2.71828
Example 4.5: the average number of traffic accidents on a certain section of highway (for
example, from Addis to Adama) is two (µ = 2) per week
1. Find the probability that:
a. No accidents on this section of high way during a week-P(X=0)
b. At most three accidents on this section of high way during 2 weeks-P(3)
Solution:
 K e 
p (x  K ) 
K!
20 e  2
a) P( x  0)   0.135335
0!
b) P( X  3)  P(0)  P(1)  P(2)  P(3)

40 e  4
P(0)   0.018316 (  in this case is  2  2 weeks  4)
0!
41 e  3
P(1)   0.073263
1!
42 e  2
P(2)   0.146523
2!
43 e 1
P(3)   0.19536
3!
P( X  3)  0.866924
53
_________________________________________________________________________
4.2. Some Special Probability Density (Continuous case)

4.2.1. The Uniform Density (Distribution)
A continuous random variable x is said to have a uniform distribution over the internal [a,
b] if its probability density function is given by
 1
 a xb
f ( x)   b  a
 0 otherwise
It is used to model events that are equally likely to occur at any time within a given time
interval.
Graph of uniform distribution is
f(x)
1
ba
a b
Figure 4.2 uniform distribution graph
The cumulative density function of a uniform distribution is given by

0 xa
x  a

F ( x)  P( X  x)   a xb
b  a
1 xb
The expected value of X is given by
54
_________________________________________________________________________
 b
x  x2  b
E ( x)   xf ( x)dx  
 a ba
dx   
 2(b  a)  a
b2  a 2 (b  a) (b  a) a  b
 
2(b  a) 2(b  a) 2
The second moment of x is given by

 b
x2 x3
E ( x 2 )   x 2 f (( x)dx   dx 
b
 aba
/
3(b  a) a
b3 a 3 (b  a)(b 2  ab  a 2 ) b 2  ab  a 2
  
3(b  a) 3(b  a) 3
Thus, the variance of x is given by

2
x
 
 E x 2  ( E ( x))2
b 2  ab  a 2 b 2  2ab  a 2
 
3 4
b  2ab  a
2 2

12
(b  a) 2

12
Example 4.3. The time that Abebe, the teaching assistant, takes to grade a paper is
uniformly distributed between 5 minutes and 10 minutes. Find the mean and variance of the
time to grade a paper.
Solution: Let X be a random variable that denotes the time it takes Abebe to grade a paper,
therefore, the mean or expected value E(X) and variance (  x )
2
10  5
E ( x)   7.5
2
(10  5) 2 25

2
x
 
12 12
55
_________________________________________________________________________
4.2.2. The Normal distribution
It has been observed that most business and economic variables generate continuous data
whose behaviour is often best described by a bell-shaped continuous curve. Since this is
what we normally come across in the case of most populations on these variables, a bell-
shaped curve has come to be universally known as normal curve. Accordingly, the
probability distribution described by a normal curve is called the normal (probability)
distribution.
The normal distribution has come to acquire a wide range of applications in many areas of
human knowledge. It is being used in almost all databased research in the field of
agriculture, trade, business, and industry. As will be noticed a little later, much of the theory
of inductive statistics, concerning estimation of unknown population parameter and testing
hypotheses on the basis of sample statistics, has been developed using the concepts of
normal curve.
4.2.2.1. Normal Distribution Equation and Its Parameters

A continuous variable that yields a bell-shaped curve as shown in figure 4.3 is called
normal variable. The normal curve describes the probability distribution of the normal
variable X, which is said to be normally distributed with parameters mean  and standard
deviation 
Denoted as n(X; , ) in which n is read as normal, a normal distribution is completely
defined by the mathematical equation for the normal curve.
1 x
1  ( )2
f ( x)  n( x;   )  e 2 
2
In which  = 3.14159, e = 2.71828, and -∞ < X< ∞
56
_________________________________________________________________________
Figure 4.3 A normal distribution curve
The parameter  and  are critical values in the normal distribution equation. Other terms
being constant, the exponent
1 x 2 ( x   )2
 ( ) or 
2  2 2
Is the only operational part of the equation. It shows the deviation of the value of normal
variable X from it mean  . The larger these deviations, the higher the value of standard
deviation  (or variance 2), which is the denominator in the exponent.
It may be seen that  lies at the center of the normal curve, and indicates the central value of
the normal distribution. The standard deviation  is a measure of the extent of the spread of
X values from the central value -. Thus, while  fixes the position or the level of the
distribution on the X-axis,  determines the spread of the distribution along the X-axis on
both sides of the central value.
In the light of the above, consider the following three situations as shown in the figure 4.4
1. A change in  , standard deviation  remaining the same, shifts the curve along the X-
axis without changing its spread. This is shown in figure below.
57
_________________________________________________________________________
1
µ1 µ2
Figure 4.4 Normal curves with different means ( 1> µ 2) and 1 = 2
2. A change in , means-  remaining the same, changes the shape or spread of the normal
curve. This is shown in figure 4.5
1
2
µ 1 = 2
Figure 4.5, Normal curves with different standard deviations (2>1) and 1=2
3. an increase in  increases the spread of the normal curve equally on both sides of the
central value. It lower the normal curve in height, irrespective of whether or not there is any
change in. A decrease in, on the contrary, reduces the spread of the normal curve and
increase its height. The inverse relationship obtaining between the extent of spread of the
58
_________________________________________________________________________
normal curve and its height at the central value can be easily grasped by observing figure
4.5. This is so because the total area under any two normal curves must always be 1.
4.2.2.2. Properties of the Normal Curve

As the positioning and spread of the normal curve is determined only by the value of  and
, it means there are different normal curves ( different in spread and positioning) for
different values of these two parameters. This enables us to state the following properties of
the normal curve.
1. The normal curve is not a single curve representing only one continuous
distribution. It represents a family of normal curves. Since, for each different value
of  and/or , there is a specific normal curve different in positioning on the X- axis
and spread around the central value.
2. A change in the value of  displace the entire curve of a difference level, whereas a
change in  changes the spread and determines its height.
3. The normal curve is completely determined by the values of  and , which are its
two parameters.
4. The mode of the normal distribution occurs at a point on the X- axis where the curve
reaches the maximum height. Since most observation tend to cluster around the
mean value, the point of mode coincides with the point of mean. That is, mean and
mode are equal at the point where the curve attains the maximum height.
5. Being bell-shaped, the normal curve is perfectly symmetrical about the vertical axis
through the mean. As a result, 50 percent of the area lies to the right of mean and the
other 50 percent to its left.
6. Perfectly symmetry also means that the mean, median and mode are all equal in the
case of a normal distribution. As the normal curve is neither too peaked nor too flat,
it is mesokurtic.
4.2.2.3. The Standard Normal Distribution

Since each normally distributed variable has its own mean () and standard deviation () as
stated earlier, the shape and location of these curves will vary. In practical application, then
59
_________________________________________________________________________
one would have to have a table of areas under the curve. In order to simplify this situation,
statisticians use what is called the standard normal distribution.
The standard normal distribution is a distribution with mean of 0 and a standard deviation of
1. One advantage of all normally distributed variables is that they can be transformed into
the standard normal distribution by using the formula for the standard score (Z):
value  mean X 
Z or Z
s tan dard devaition 
As we stated earlier, the area under the normal distribution curve is used to solve practical
application problem, hence the major emphasis of this section is to show the procedures for
finding the area under the normal distribution curve for any Z value. Once the X values are
transformed by using the above formula, they are called Z values. The Z value is actually
the number of standard deviation that particular X value is away from its mean ( i.e. below
or above the mean). For example Z   2 implies that the value X is 2 standard deviation
above or below the mean
60
_________________________________________________________________________
X-values µ-3 µ-2 µ-1 µ µ+1 µ+2 µ+3
Z-values -3 -2
-3 -1 0 1 2 3
68.26%
95.44%
99.74%
Figure 4.6: Normal distribution with X-values and corresponding Z-values
Figure 4.6 shows the graph of the probability density function of Z with mean equal to zero
and standard deviation equal to one. This curve is symmetric, bell-shaped and is centered on
the mean equal to zero and has most of the area contained with in the range  3 (99.74%)
Example 4.7: A continuous manufacturing process produces items whose weights are
normally distributed with a mean weight of 800 g and standard deviation of 300 g. A
random sample of 16 items is to be drawn from the process.
a. What is the probability that the arithmetic mean of the sample
exceeds 900 g ? Interpret the results.
61
_________________________________________________________________________
b. Find the value of the sample arithmetic mean within which the
middle 95 percent of all sample means will fall.
Solution: (a) we are given the following information
µ = 800g,  = 300g, and n= 16
since population is normally distributed, the distribution of sample mean is normal with
mean and standard deviation equal to
 x    800
 300 300
And  x     75
n 16 4
The required probability P( x  900) is represented by the shaded area in figure 4.7 of a
normal curve. Hence
Hence, 9.18 percent of all possible samples of size n=16 will have a sample mean value
greater than 900g.
0.0918
 x  800 x  900
Z  1.33
Figure 4.7
(b) since Z = 1.96 for the middle 95 percent area under the normal curve as shown in
figure 4.8 , therefore using the formula for Z to solve for the values of x in terms of
the known values are as follows:
62
_________________________________________________________________________
x1   x  Z x
 800  1.96(75)  653 g
x2   x  Z x
 800  1.96(75)  947 g
therefore, 95% of the population are with in [653,947]
0.9500
x1  x  41.5 x2
Figure 4.8
Example 4.8: In a normal distribution 31 percent of the items are under 45 and 8 percent
are over 64. Find the mean and standard deviation of the distribution.
Solution: since 31 percent of the items are under 45, therefore the left of the ordinate at X =
45 is 0.31, and obviously the area to the right of the ordinate up to the mean is (0.5-0.31) =
0.19. The value of Z corresponding to this area is 0.5. Hence
64  
Z   0.5 or    0.5  45

As 8 percent of the items are above 64, therefore areas to the right of the ordinate at 64 is
0.08. Area to the left to the ordinate at X = 64 up to mean ordinate is (0.5 – 0.08) = 0.42 and
the value of Z corresponding to this area is 1.4. Hence
64  
Z  1.4 or    1.4  64

63
_________________________________________________________________________
From these two equations, we get 1.9 = 19 or  = 10 in the first equation, we get
 - 0.5 10 = 45 or  = 50
Thus, mean of the distribution is 50 and standard deviation 10
19% 42%
8%
31%
45  64
Figure 4.9
64
_________________________________________________________________________
Summary
This chapter introduced some of the many classes of random variables. The Bernoulli
random variable is used to model experiments that have only two possible outcomes, which
are referred to as success and failure. The random variable that is used to denote the number
of successes that occurred in those n Bernoulli trials is the Binomial random variable and
the distribution is Binomial distribution. One popular area of application of probability is
quality control. Some of the items coming off a product line are good and some are bad. If
we know before hand the fraction of the items in a production batch that are good, we may
want to know the probability that the sample contains a specified number of bad items. The
random variable that is used to denote this number is the hyper geometric random variable
and the distribution is hyergeometric distribution.
The Poisson random variable is used to count the number of events over a given interval,
space, etc. Example, the number of equipment failures over a given interval.
The two commonly used continuous random variables are the uniform distributed random
variable, which is used to denote the time of events that are equally likely to occur at any
time within a given interval. The other is the normal distribution that is used to denote
events that have a higher probability of occurrence around the mean value and a smaller
probability of occurrence the farther a way from the mean value.
65
_________________________________________________________________________
Exercise
1. In a class there are 40 students for the course statistics for economists. The students‟
result out of 100% is normally distributed with mean 70 and variance 64. The top 10%
of the students are to get “A” grade and the bottom 5% are to get “F” grade
a. How many students‟ result is with in the range 62 to 78?
b. Determine the minimum mark that the students should get in order to get “A”
grade
c. For what range of markets students get “F” grade
2. There are different types of distributions in statistics. Discuss the difference in

application of Binomial, Poisson and Normal probability distributions?
3. Suppose that a sample of households is randomly selected from all the households in the
city in order to estimate the percentage in which the head of the household is
unemployed. A literature indicates that the percentage of unemployment in the city is
10%. If a random sample of 5 household is to be selected from the households in the
city. What is the probability that all five heads of the household are employed
4. A shipment of 8 similar microcomputers to a retail outlet contains 3 that are defective. If
a school makes a random purchases of 2, of these computers, let X be a random variable
whose value is the number of defective computers.
a. Find the probability distribution for the number of defectives?
b. Find the mean and variance of the distribution?
5. The probability that a person dies from a certain respiratory infection is 0.002. find
the probability that fewer than 5 of the next 2000 so infected will die (0.6288)
6. Lots of 40 components each are called acceptable if they contain no more than 3
defective. The procedure for sampling the lot is to select 5 components at random
and to reject the lot if a defective is found. What is the probability that exactly 1
defective will be found in the sample if there are 3 defective in the entire lot?
(0.3011)
66
_________________________________________________________________________
7. The number of mistakes counted in one hundred typed pages of a typist revealed
that she made 2.8 mistakes on average per page. Find the probability that in a page
typed by her
a. there are no mistakes (0.061)

b. there are two or less mistakes (0.471)
8. A stockiest has 20 items in a lot. Out of these 12 are non-defective and 8 are
defective. A customer selects 3 at random
a. What is the probability that all the three items are non-defective? (0.193)
b. that is the probability that out of these three items, two are non-defective and
one is defective?(0.463)
9. The lifetimes of certain kinds of electric devices have a mean of 300 hours and
standard deviation of 25hous. Assuming that the distribution of these lifetimes,
which are measured to the nearest hour, can be approximated closely with a normal
curve
a. Find the probability that any one of these electric devices will have a
lifetime of more than 350 hours. (0.0228)
b. What percentage will have lifetimes of 300 hours or less?(50%)
c. What percentage will have lifetimes from 220 or 260 hours?(5.41%)
67
_________________________________________________________________________
Chapter five
5. Joint Distributions
Introduction
We have so far been concerned with the properties of a single random variable defined on a
given sample space. Sometimes we encounter problem that deal with two or more random
variables defined on the sample space. In this chapter, we consider this joint probability
distribution or multivariate random variable.
The random variable ideas discussed earlier can be easily generalized to two or more
random variables. We consider the typical case of two random variables which are either
both discrete or both continuous. It also discussed the concept of covariance and correlation
coefficient of two random variables. In cases where one variable is discrete and the other
continuous, appropriate modifications are easily made, generalizations to more than two
variables can also be made
Objectives;
After this chapter, the students will be able to:
 identify univarate and bi-variate random variable
 find probability of joint probability distributions
 find the marginal probability distribution of joint distribution
 calculate the covariance and correlation coefficient of two random variables
68
_________________________________________________________________________
In chapter III, we have seen a random variable which means single random variable
(example X). Random variable could represent characteristics of an item or an
individual. For instance X, could represent age of an individual. However, some time,
we may be interested to know two characteristics of an item or individual, for example
level of education of an individual which can be represented by X and income which
can be represented by Y. In this case, our random variable is two and the distribution is
said to joint distribution. Like what we have seen in chapter III, random variable could
be discrete or continuous:
5.1. Discrete Case

If X and Y are two discrete random variables we define the joint probability function of X
and Y by
P (X=x,Y=y) = f(x,y)
where 1. f(x,y) > 0
2.   f ( x, y)  1
x y
i.e. the sum the probability of x and y is equal to one

Suppose that X can assume any one of m values x1, x2, . . . ,xm and Y can assume
any one of n values y1, y2, . . . yn. Then the probability of the event that X=xj, and Y=yk is
given by
P(X=xj, Y=yk) = f(xj, yk)
A joint probability function for X and Y can be represented by a joint probability
table as in Table 5.1. The probability that X=xj is obtained by adding all entries in the row
corresponding to xj and is given by
n
P( X  x j )  f1 ( x j )   f ( x j , yk )
k 1
For j=1,2,. . . ,m these are indicated by the entry totals in the extreme right hand column or
margin of Table 5.1. Similarly the probability that Y=yk is obtained by adding all entries in
the column corresponding to y and is given by
n
P(Y  y k )  f 2 ( y k )   f ( x j , y k )
j 1
69
_________________________________________________________________________
For k=1,2,. . . ,n these are indicated by the entry totals in the bottom row or margin of Table
5.1
Table 5.1 Joint probability distribution
Y Totals
X y1 y2 … yn 
x1 f(x1,y1) f(x1,y2) … f(x1,yn) f1(x1)
x2 f(x2,y1) f(x2,y2) … f(x2,yn) f1(x2)
.. .. .. .. .. ..
xm f(xm,y1) f(xm,y2) … f(xm, yn) f1(xm)
Totals  f2(y1) f2(y2) … f2(yn) 1  Grand Total
Because the probabilities P(X=xj, Y=yk) = f(xj, yk)

n
and P(Y  yk )  f 2 ( yk )   f ( x j , yk ) are obtained from the margins of the table we often
j 1
refer to f1(xj) and f2(yk) (or simply f1(x) and f2(y)) as the marginal probability functions of X
and Y respectively. It should also be noted that
m n
 f1 ( x j )  1
j 1
 f (y )  1
k 1
2 k
This can be written as:

m n
  f (x , y )  1
j 1 k 1
j k
This is simply the statement that the total probability of all entries is 1. The grand total of 1
is indicated in the lower right-hand corner of the table.
The joint distribution function of X and Y is defined by
F ( x, y)  P( X  x, Y  y)    f (u, v)
u x v y
In Table 5.1, F(x,y) is the sum of all entries for which xj < x and yk < y.
70
_________________________________________________________________________
5.2.Continuous Case
The case where both variables are continuous is obtained easily by analogy with the discrete
case on replacing sums by integrals. Thus the joint probability function for the random
variables X and Y (or, as it is more commonly called, the joint density function of X and Y)
is defined by
1. f ( x, y )  0
 
2.   
f ( x, y ) dx dy  1
The joint distribution function of X and Y in this case is defined by
 
x y
F ( x, y )  P ( X  x, Y  y )  f ( x, y ) du dv
u   v  
It follows that
2F
 f ( x, y )
xy
i.e. the density function is obtained by differentiating the distribution function with respect
to x and y, therefore, we obtain
 
x y
P( X  x)  F1 ( x)  f (u, v) du dv
v   y  
 
x y
P(Y  y )  F2 ( y )  f (u, v) du dv
u   y  
We call, the above equation, the marginal distribution functions, or simply the distribution
functions, of X and Y respectively. The derivatives with respect to x and y are then called
the marginal density functions, or simply the density functions, of X and Y and are given by

x
f1 ( x)  u  f ( x,v)dv f 2 ( y)   u  
f (u, y) du
71
_________________________________________________________________________
Example 5.1 The joint probability function of two discrete random variables X and Y is
given by f(x,y) = c(2x + y), where x and y can assume all integers such that 0 < x < 2, 0 < y
< 3, and f(x,y) = 0 otherwise.
a) Find the value of the constant c,
b) Find P(X = 2, Y = 1).
c) Find P(X > 1, Y < 2)
Solution:
The sample points (x,y) for which probabilities are different from zero are indicated in fig
5.1. The probabilities associated with these points, given by c(2x + y), are
a) + y), are shown in Table 5.2. Since the grand total, 42c, must equal 1, we have c =
1/42.
y
Table 5.2
Y Totals 3  
X 0 1 2 3  2  
0 0 C 2c 3c 6c
1  
1 2c 3c 4c 5c 14c
2 4c 5c 6c 7c 22c    x
0 1 2
Totals  6c 9c 12c 15c 42c
Figure 5.1
b) from table 5.2 we see that
5
P( X  2, Y  1)  5c 
42
c) from table 5.2 we see that
P ( X 1 , Y 2)    f ( x, y )
x 1 y2
 (2c  3c  4c)  (4c  5c  6c)

24 4
 24c  
42 7
72
_________________________________________________________________________
as indicated by the entries shown shaded in the table
Example 5.2 Find the marginal probability functions (a) of X and (b) of Y for the random
variables of example 5.1
Solution:
a) The marginal probability function for X is given by P(X = x) = f1(x) and can be obtained
from the margin totals in the right-hand column of Table 5.2. From these we see that
6c  1 7 x0
P( x  x)  f1 ( x)  14c  1 3 x 1
22c  11 21 x  2
1 1 11
Check :   1
7 3 21
b) The marginal probability function for Y is given by P(Y = y) = f2 (y) and can be obtained
from the margin totals in the last row of Table 5.2. From these we see that
6c  1 7 x0
P( X  y )  F2 ( y )  9c  3 14 x 1
12c  2 7 x2
15c  5 14 y3
1 3 2 5
Check :    1
7 7 7 14
5.3. Independent Random variables

Two variable X and Y are said to be independent variable if and only if the probability of
one variable is not influenced by the occurrence of the other variable.
Suppose that X and Y are discrete random variables. If the events X = x and Y = y are
independent events for all x and y, then we say that X and Y are independent random
variable‟s. In such case
P(X=x, Y=y) = P(X=x) P(Y = y)
or equivalently
73
_________________________________________________________________________
f(x,y) = f1(x) f2(y)

Conversely, if for all x and y the joint probability function f(x,y) can be expressed as the
product of a function of x alone and a function of y alone (which are then the marginal
probability functions of X and Y), X and Y are independent. If however, f(x, y) cannot be
so expressed then X and Y are dependent.
If X and Y are continuous random variables we say that they are independent random
variables if the events X<x and Y<y are independent events for all x and y. In such case we
can write
P(X<x, Y< y) = P(X<x) P(Y <y)
or equivalently
F(x,y) = F1(x) F2(y)
where F1(x) and F2(y) are the (marginal) distribution functions of X and Y respectively.
Example 5.3 Show that the random variables X and Y of example 5.1 are dependent.
Solution:
If the random variables X and Y are independent then we must have, for all x and y.
P(X = x, Y = y) = P(X = x) P(Y = y)
But, as seen from example 5.1(b) and 5.2,
5 11 3
P( X  2, Y  1)  P( X  2)  P(Y  1) 
42 21 14
P( X  2, Y  1)  P( X  2) P(Y  1)
The result also follows from the fact that the joint probability function, (2x + y)/42, cannot
be expressed as a function of x alone times a function of y alone.
5.5. Conditional Distributions

We already know that if P(A)>0
P( A  B)
P( B \ A) 
P( A)
74
_________________________________________________________________________
If X and Y are discrete random variables and we have the events (A:X=x), (B:Y=y), then
the above conditional distribution becomes
f ( x, y )
P(Y  y \ X  x) 
f1 ( x)
Where f(x,y) = P(X=x, Y=y) is the joint probability function and f1(x) is the marginal
probability function for X. We define
f ( x, y )
f ( y \ x) 
f1 ( x )
and call it the conditional probability function of Y given X. Similarly, the conditional
probability function of X given Y is
f ( x, y)
f ( x \ y) 
f 2 ( y)
We shall sometimes denote f(x/y) and f(y/x) by f1 (x/y) and f2(y/x) respectively.
These ideas are easily extended to the case where X, Y continuous random variables are.
For example, the conditional density functions of Y given X
f ( x, y )
f ( y \ x) 
f1 ( x )
Where f(x, y) is the joint density function of X and Y, and f1(x) is the marginal density
function of X. We can for example find that the probability of Y being between c and d
given that x<X<x+dx is
P(c  Y  d \ x  X  x  dx)   f ( y \ x)dy

d
Generalizations of these results are also available.
Example 5. 4. The joint probability function of two discrete random variable X and Y is
given by f ( x, y )  c(2 x  y ), where x and y can assume all integers such that
0  x  2 , 0  y  3, and f ( x, y )  0 otherwise. Find, a) f ( y 2) b) P( y  1 x  2)
75
_________________________________________________________________________
Solution
f ( x, y ) (2 x  y ) 42
a. f ( y / x)   , so that with X = 2
f1 ( x) f1 ( x)
(4  y ) 42 4  y
f ( y 2)  
11 21 22
5
b. P( y  1 x  2)  f (1 2) 
22
5.6. Conditional Expectation, Variance, and Moments

If X and Y have joint density function f(x,y), then the conditional density function of Y
given X is f(y/x) = f(x,y)/f1(x) where f1(x) is the marginal density function of X. We can
define the conditional expectation or conditional mean of Y given X by
E E X  x  

 
yf ( y / x) dy
Where “X = x” is to be interpreted as x < X < x + dx in the continuous case.

We note the following properties:
1. E (Y / X = x) = E (Y) when X and Y are independent

2. E (Y)   E (Y / X  x) f1 ( x) dx

It is often convenient to calculate expectations by use of property 2, rather than directly.
Example 5.5. The average travel time to a distant city is c hours by car or b hours by bus. A
man cannot decide whether to drive or take the bus, so he tosses a coin. What is his
expected travel time?
Solution:
Here we are dealing with the joint distribution of the outcome of the toss, X and the travel
time, Y, where Y = Y car if X = 1. Presumably, both Y car and Y bus are independent of X, so
that by Property 1 above
E (Y / X = 0 ) = E (Y car / X = 0) = E (Y car ) = c
and E (Y / X = 1) = E (Y bus/ X = 1) = E (Y bus) = b
Then property 2 (with the integral replaced by a sum) gives, for a fair coin,
76
_________________________________________________________________________
cb
E (Y )  E (Y / X  0) P( X  0)  E (Y / X  1) PX  1) 
2
In a similar manner we can define the conditional variance of Y given X as
 
E Y  2  / X  x   ( y  2 ) 2 f ( y \ x)dy
2 

where 2=E(Y / X = x) Also we can define the rth conditional moment of Y given X about
any value “a” as
 r

E Y  a  / X  x   ( y  a) r f ( y \ x) dy


The usual theorems for variance and moments extend to conditional variance and moments.
5.7. Variance for Joint Distributions: Covariance

The results given above for one variable can be extended to two or more variables. Thus,
for example, if X and Y are two continuous random variables having joint density function
f(x,y), the means or expectations of X and Y are
   
x  E( X )    xf ( x, y ) dx dy, Y  E (Y )    yf ( x, y )dx dy
   
and the variances are

2
X
    ( x   ) f ( x, y) dx dy
 E ( X   X )2 




X
2
 E (Y   )     ( y   ) f ( x, y ) dx dy
 

2 2 2
Y y
Y  
Another quantity which arises in the case of two variables X and Y is the covariance
defined by
 XY
 Cov( X , Y )  E X   X Y  Y 
In terms of the joint density function f(x,y) we have

 
 XY

  
( x   X ) ( y  Y ) f ( x, y )dx dy
Similar remarks can be made for two discrete random variables. In such case,
X   x f ( x, y ) Y   y f ( x, y )
x y x y
 XY
  ( x   X )( y  Y ) f ( x, y )
x y
77
_________________________________________________________________________
where the sums are taken over all the discrete values of X and Y
The following are some important theorems on covariance.
 XY
 E ( XY )  E ( X ) E (Y )  E ( XY )   X Y
If X and Y are independent random variables

XY = Cov (X , Y) = 0
Var (X + Y) = Var (X) + Var (Y) + 2 Cov (X, Y)
   X   Y  2 XY
2 2 2
X Y
or
 XY   X  Y
Correlation coefficient
If X and Y are independent, then Cor (X, Y) = XY = 0. On the other hand if x and Y are
completely dependent, for example when X = Y, then Cor (X, Y) – XY = XY From this
we are led to a measure of the dependence of the variables X and Y given by
 XY
P
 XY
which is a dimensionless quantity. We call p the correlation coefficient or coefficient of
correlation. From Theorem 3-17 we see that -1 < p < 1. In the case where p = 0 (i.e. the
covariance is zero) we call the variables X and Y uncorrelated. In such case however the
variables may or may not be independent.
Exercise
1. the joint density function of the random variable X and Y is given by

8 xy 0  x  1, 0  y  x
f ( x, y )  
0 otherwise
Find
a. the marginal density of X
b. the marginal density of Y
78
_________________________________________________________________________
c. the conditional density of X

d. the conditional density of Y
2. suppose X and Y have the following joint distribution
Y
X -3 2 4 Sum
1 0.1 0.2 0.2 0.5
3 0.3 0.1 0.1 0.5
Sum 0.4 0.3 0.3
a. find the distribution of X and Y

b. find cov( x, y ) , i.e the covariance of X and Y
c. find the correlation coefficient of X and Y
d. are X and Y independent?
3. suppose that the random variable X and Y have joint density function given by
c(2 x  y ) 2  x  6, 0  y  5
f ( x, y )  
0 otherwise
Find:
a. the constant c
b. the marginal distribution function of X and Y
c. the marginal distribution function of X and Y
d. P(3  X  4, Y  2)
e. P( X  3)
f. P( X  Y )  4
g. The joint distribution function
h. Whether X and Y are independent
79
_________________________________________________________________________
4. the joint probability mass function of two random variable X and Y is given by
k ( 2 x  y ) x  1, 2, y 1, 2
f ( x)  
0 otherwise
Where k is a constant
a. What is the value of K ?
b. find the marginal probability mass function of x and y
c. Are x and y independent?
5. The joint probability mass function of two random variables X and Y is given by
1
 (2 x  y ) x  1, 2 y  1, 2
f ( x, y )  18
0 otherwise
a) What is f ( y x)
b) What is f ( x y )
6. If X and Y have joint density function

3 4  xy 0  x  1, 0  y 1
f ( x, y )  
0 otherwise
Find:
a) f ( y x)
1
b) P( y  1 2 12 x  dx)
2
80
_________________________________________________________________________
Chapter six
Sampling distribution
Introduction
The main objective of statistical analysis is to know the actual value of different parameters
of a given population. One way of knowing the parameters can be through conducting
census. A census means complete enumeration of the entire population and determining the
value of parameter of interest. However, in most cases census is not feasible from practical
point of view due to cost, time, labor and other constraints. Alternative to census one can
use sampling approach to determine the same thing. Sampling is the process of selecting a
sample from a population. That is a random samples of a given size are taken from the
population and these samples characteristics are properly analyzed to infer the
characteristics of the population from the sample taken.
When random samples of a certain size are repeatedly drawn from a given population to
determine sample statistic, the computed value of the sample statistic (e.g. Sample mean)
will differ from sample to sample. Since the sample statistic based on a sample of certain
size, they are a random variable and each follow a probability distribution of its own called
sampling distribution
Sampling distribution has its own properties upon which rules for generalizing about
population based up on sample drawn from a population. In this chapter, we will study the
properties of some statistics a bit in depth and about widely used sampling distribution such
as t, F, and  2 distribution.
Objective of the chapter
After this chapter, the student will able to:
- Familiar with the concepts like statistic, parameters, random variable
- define sampling distribution
- identify the distribution of different sample statistic(sample mean, proportion,
variance)
- describe the properties of the sampling distribution of sample statistic
81
_________________________________________________________________________
- State some of the properties of standard probability distributions: t, F, and  2

distribution.
6.1. Statistic, Random variable, Parameter and Sampling distribution
Before we start to study the properties of sampling distribution and its application, let us
get familiar with the following concepts.
Random variable
A variable is a random variable if its value determined by a random experiment. If variable
X is said to be random variable, it represents a phenomena of interest in which the observed
outcomes of an activity is entirely by chance. It is unpredictable and varies or changes
depending up on particular outcome of experiment measured. For example, suppose you
toss a die and measure X as the numbers observed on the upper face. The variable X can
take on any of six values 1, 2, 3, 4, 5 and 6 depending on random outcome of the
experiment. Since the value of X cannot determine before the experiment, variable X
represents a random variable. X can be also being occurrence of an event like number of
telephone call received randomly during a given time.
Statistic
A statistic is a numerical descriptive measures calculated from a sample. In other words, it
represents the summary measures that describe the characteristics of a sample. In most
cases, it refers to sample mean and sample variance. If X1, X2. . . Xn are a random sample,
n n
 xi  ( x  x)
i
2
then X  i 1
is called a sample mean and S 2  i 1
is called a sample variance.
n n 1
The value of X and S2 represent a statistic.
For example: Consider a population consists of five observations: 3, 6, 9, 12, and 15. If a
random sample of size n=3 is selected without replacement, find the sample mean and
sample variance (statistic for the sample drawn).
Solution: suppose the sample drawn from the population is 3, 6, 9, then
n
X i
3 6  9
X i 1
 6
N 3
82
_________________________________________________________________________
Xi Xi - X (Xi- x )2
3 -3 9
6 0 0
9 3 9
 ( x  x)
i
2
 18

( x  x) 2
18
S 2 i
 9
n 1 2
S 92
Values such as X =6, s 2  9 represents a statistic, summary value of the sample.
Parameter
A parameter is a numerical descriptive measure that characterizes a population. In other
words a summary measure that describes any characteristic of the population. Since it is
determine based on observations of population, the value of parameters are unknown in the
case of large population. Parameters include population mean and variance among others .
The mean and the variance of the above population represent the parameter of a given
3  6  9  12  15
population. That is    9 representing a parameter of a population that
5
populations mean. Here we can determine population parameter since the population under
the study is finite.
Sampling distribution
Sampling distribution provides the basis for determining the level of confidence or
reliability with which a particular value of a given sample statistics can be used as an
estimate of the parameter. It also serves as the necessary ground for evaluating a particular
hypothesis stated with reference to a parameter. Both these processes require a clear
understanding of the various sampling distributions and their properties defining the
relationships between a given sampling statistics and the corresponding population
parameter. Therefore, let us first describe what a sampling distribution means and
understand the properties of different statistic sampling distribution.
As stated in the introductory part, sampling is used alternative to censuses to determine the
characteristic of a population. That is a random sample of a given size is taken from a given
83
_________________________________________________________________________
population up on which we based to estimate the parameters of a given population.

However, when the samples drawn repeatedly from a population, the sample may or may
not be a representative sample. In other words, sample statistics such as sampling mean and
variances are random variables because different samples can lead to different values of
sample statistics.
Since the value of a statistic for different samples has its own number of occurrence
(frequency), based on the frequency of occurrence, the probability for obtaining a given
statistic can be determined. A sample statistic associated with its probability of occurrence
represents sampling distribution.
Definition: The sampling distribution of a statistic is the probability distribution for the
possible values of the statistic that results when random samples of size n are repeatedly
drawn from the population.
Example: Consider a population consists of N = 5 numbers 3,6,9,12,15. If a random sample
of size n=3 is selected without replacement find the sampling distribution distributions for
the sampling distribution distributions for the sample mean, X .
Solution: there are 10 possible random samples of size n=3 and each sample have equally
likely draw with probability of 1 . These samples, along with the calculated value of X
10
are given as follows:
Sample Sample values X (sample mean)
1 3, 6, 9 6
2 3, 6, 12 7
3 3, 6, 15 8
4 3, 9, 12 8
5 3, 9, 15 9
6 3, 12, 15 10
7 6, 9, 15 9
8 6, 9, 15 10
9 6, 12, 15 11
10 9, 12, 15 12
84
_________________________________________________________________________
Sampling distributions for the sample mean

X P(X )
6 0.1
7 0.1
8 0.2
9 0.2
10 0.2
11 0.1
12 0.1
Exercise Given the age of 5 children as follows

Child ( X ) Age
1 2
2 4
3 6
4 8
5 10
If a random samples of size 2 without replacement drawn from this population, construct
the sampling distribution of sample mean.
6. 2 Sampling distribution of the sample mean

As stated above, sampling distribution of sample mean is the probability distribution of all
possible sample means of a given size selected from a population. Suppose we are
interested to take a sample of size n from population having N elements. There
are  N  number of samples that could be taken from the population, if the sampling is
 
 n 
without replacement. For each sample, we compute  N  means and construct a frequency
 
 n 
distribution called sampling distribution of the sample mean.

Illustration: suppose that a baby sitter has 5 children under her supervision. The average
age of these five children is 6 years and the age of each child is given as
85
_________________________________________________________________________
Child (x) Age

1 2
2 4
3 6
4 8
5 10
Now let us take random samples of size 2 without replacement from this population. There
are 5 such possible samples. These samples along with their means are:
   10
 2
Samples mean X  i
2, 4 3
2, 6 4
2, 8 5
2, 10 6
4, 6 5
4, 8 6
4, 10 7
6, 8 7
6, 10 8
8, 10 9
Then organize this distribution of sample means in to frequency distribution and probability
distribution
Sample mean Frequency Relative frequency Probability
3 1 1 0.1
10
4 1 0.1
1
10
5 2 0.2
2
6 2 10 0.2
2
7 2 10 0.2
8 1 2 0.1
10
9 1 1 0.1
10
1
10
86
_________________________________________________________________________
This probability distribution of sample means is referred to as sampling distribution of

sample mean
Now, having defined the meaning of sampling distribution of sample mean, let us study
some of its important properties.
If a random sample of n is selected from a population with mean  and variance 2, the
sampling distribution of the sample mean, X will have mean  and standard
deviation  . That is if x1, x2, - - xn constitutes a random sample from a population with
n
2
mean  and variance 2 then E ( X )   and var ( X )  .
n
Proof
1 n X  X 2  X 3   Xn
X  
n i 1
xi  1
n
  1 
E X  E   X 1  X 2   X n 
n 
n
 
n
  1 1 1
Var X  var  X 1  X 2    X n 

n n n 
 1  2 n 2
n 2 2
   i  2 
i 1  n  n n
2
var( X ) 
n
(The variance of linear combination if independent random variable is the sum of linear
coefficients squared times the variance of random variable.)
Using the above example, we can illustrate that the population mean and the mean of
sampling distribution of sample means are equal.
87
_________________________________________________________________________
Sample mean ( X ) Probability of X
3 0.1
4 0.2
5 0.2
6 0.2
7 0.2
8 0.1
9 0.1
Then, E(x) = Σ X (P ( X ))
= 3(0.1) + 4(0.1) + 5 x (0.2) + 6(0.1) + 7(0.2) + 8(0.1) + 9 (0.1)
=6
5
 xi  xi 2  4  6  8  10
  i 1

N 5 5
30
 6
5
Thus if samples of n random and independent observation are repeatedly and independently
drawn from a population, then as the number of samples become large, the mean of sample
means approaches the true population mean. Moreover the variance of sampling distribution
, 
2
of X decreases as the sample size n increases. This means as sample sizes become
n
larger, sampling distribution of X concentrated around the population mean. Thus, larger
samples results in greater certainty about our inference of the population mean.
In statistics, the degree of precision or reliability of estimator of population parameters is
measured by standard error of the estimator. In this case the degree of precision that
sampling distribution of sample mean ( X ) estimates the population mean could be
measured by the standard error of the sample mean.
The standard error of the estimator is the standard deviation of statistic used as an estimator
of a population parameter. Therefore, the standard deviation of X is referred to as the
standard error of the mean SE ( X )  var( X )  

2
. If the variance of the sample mean
n
88
_________________________________________________________________________
is denoted by 
2
then the corresponding standard error of X , is given as
n
S E (X )  var( X )
2 
 
n n
S E (X )  
n
If the sample size n is not a small fraction of the population size, N, then individual sample
members is not distributed independently of one another. Since a population member cannot
be included more than once in a sample, the probability of a specific sample member being
the second observation depends on the sample member chosen as the first observation.
Thus, the observations are not selected independently. In this case the variance of the
sample mean is
Var( X ) 
2 N  n  where
( N  n)
is often called a finite population
n N 1 N 1
 N n
correction factor. S E ( X )  , when N is large relative to the sample
n N 1
N n 
size n, is approximately equal to 1 and S E ( X )  .
N 1 n
As a general rule of thumb, correction factor for finite population is used if the sample size
is more than 5 percent of a given population.
We have now developed expressions for the mean and standard deviation of the sampling
distribution of sample mean X . However, we have to know the distribution of the mean and
standard error of X to make inference about the population parameters. So let us define the
distribution of sampling mean.
If X is the mean of random sample of size n from normally distributed population with
mean µ and variance δ2, its sampling distribution is normally distributed with mean µ and
variance 
2
regardless of the sample size.
n
89
_________________________________________________________________________
If the parent population is non-normal the sampling distribution of X will be

approximately normally distributed for large samples. This can be possible by central limit
theorem that is stated below.
Central limit theorem

If random samples of n observations are drawn from a population with any probability
distribution with mean µ and standard deviation δ, then, when n is large, the sampling
distribution of the sample mean X is approximate normal distribution with mean µ and
variance 
2
.
n
2
X : N ( , )
n
Thus for sufficiently large samples the sampling distribution of sample mean is
approximately normal. How large must the sample size n be so that the normal distribution
provides a good approximation for the sampling distribution of X ? The answer depends on
the shape of sampled population. The greater the skweness of the sampled population
distribution, the larger the sample size must be before the normal distribution is adequate
approximation for sampling distribution of X . For most sampled populations, sample sizes
of n  30 will suffice for the normal approximation to be reasonable.
Many estimators that are used to make inferences about population parameters are sums or
averages of sample measurements. Therefore, we have to restate the central limit theorem in
the form that enables us to make some statistical analysis about the population parameters
based on the average values of sample measurements. Thus, If X1, X2, - - - Xn be a set of n
independent random variables having identical distributions with mean µ and variance δ2,
then the distribution of a random variable, Z defined as:
X 
Z Is normally distributed with mean 0 and variance 1 as sample size becomes
 n
large (n   )
If we able to convert a random variable, X in to standard normal, it is possible to describe
the behavior of sample mean X by calculating the probability of observing certain values
of X in a repeated sampling.
90
_________________________________________________________________________
Example: A soft drink vending machine is set so that the amount of drink dispensed is a
random variable with mean of 200 milliliters and standard deviation of 15 milliliters. What
is the probability that the mean amount dispensed in a random sample of size 36 is at least
204 milliliters?
Solutions: The distribution of X has mean X = 200 and standard error of
 15
    2.5
X
n 36
According to central limit theorem, the sample mean approximately normally distributed
X  204  200
and can be converted to standard normal as Z    1.6
SE ( X ) 2.5
The probability that the sample mean greater than 204 is P( X  204)  P( Z 1.6) .
From the standard normal, Z- table P (Z > 1.6) = 0.0548. From the result we can concluded
that the probability that sample mean will be greater than 204 is equal to 0.0548.
Exercise: A bulb manufacture claim that the life of its bulb is normally distributed with
mean 36,000 hours and standard deviation of 4,000 hours. A random sample of 16 bulbs
had on average life of 34,500 hours. If the manufactures claim is correct what is the
probability that the sample mean small than 34,500 hours.
6.3 Sampling distribution of a sample proportion

Like we use sample mean to infer about population mean, sample proportion is used to infer
about population proportion. Such method of statistical analysis is mostly applicable to
qualitative data to determine the proportion or percentage of the elements that belongs to a
given category. For example, it is possible to relate the proportion of success in a sample to
the proportion of success in the population. Based on the percentage of defective radio from
a sample on the assembly line, we can conclude the percentage of defective radios in the
entire population.
91
_________________________________________________________________________
To make inference however, the knowledge of sample distribution of proportion is required.

Therefore, let us characterizing sampling distribution sample proportion by starting with its
meaning.
Sampling distribution for sample proportion is the probability distribution of a sample
proportion of all possible samples of the same size drawn from the same population.
Let P – represents sample proportion
 _ represents population proportion
X
The sample proportion P is defined as where X is the number of successes (or the
n
number of items in the sample with the characteristics we are interested in) and n is the
X
sample size. Similarly  can be defined as where N is the population size and X the
N
number of success in the population. So a sample of size n is taken from the population and
identifying the proportion of elements with the feature of interest to determine P. Take a
repeated sample of same size and determine the frequency and then the probability of each
proportion of sample with characters of interest to construct the sampling distribution of
sample proportion.
Example: supposes that we have a population of five students who are asked if they wanted
to become stations or not. The answers to the questions are given below.
Student Answer
1 Yes(Y)
2 No (N)
3 No (N)
4 Yes(Y)
5 No (N)
The number of students who wants to become statisticians out of this population is two.
Hence N = 5 and X= 2. The population proportion of the students who want to become
X 2
statisticians,  =   0.4 or 40 percent. Now, let us take all possible samples
N 5
92
_________________________________________________________________________
of size 4 from this population of size 5 and compute the sample proportion (p) of the student
for each sample who becomes statisticians.
Possible samples proportion (P)

1,2,3,4 (Y, N, N, Y) 2 out of 4 = 0.50
1,2,3,5 (Y,N,N,N) 1 out of 4 = 0.25
1,2,4,5 (Y,N,Y,N) 2 out of 4 = 0.50
2,3,4,5 (N,N,Y,N) 1 out of 4 = 0.25
1,3,4,5 (Y,N,Y,N) 2 out of 4 = 0.50
ΣP = 2
N
If a given population has N elements, there are   possible samples of size n can be
n
drawn with out replacement. Since N=5 n=4 then, the number of sample with size n=4
5 5!
drawn from population of size N=4 can be determined as     5 as indicated
 4 4!1!
above. So, now we can construct sampling distribution of the above example as follows.
Sample proportion (P) Frequency of the population Probability of the population
0.5 3 3 = 0.6
5
0.25 2 2 = 0.4
5
5
Exercise: supposes that a manufacture of a bulb produces 6 bulbs a week, out of which 2 of
them are defective. If a random sample of 4 bulbs are taken to determine the number of
bulbs which are detective,
N
a. determine the number of sample drawn using the formula  
n
b. Construct sampling distribution of sample proportion (of detective bulbs).
93
_________________________________________________________________________
Properties of the sampling distribution of sample proportion
1. The mean of the sampling distribution of proportions of random samples of size n

equal to population mean.
X 1
E ( P)  E    E ( X )
n n
but the expected value of x, E(x) = n 
1 n
E ( P)  E (X )   
n n
E (P) = 
2. The variance of the sampling distribution of sample proportion, P is the variance of

population distribution divided by n.
Var (x) = n  (1- )
X 1 (1  )
So Var ( P)  Var    2 Var ( x) 
n n n
Since the standard deviation of P (standard error of P) is the square root of the variance,
 (1   )
P 
n
Similar to variance for sample mean we can use the finite population correction factor when
the population is not large compared to the sample size.
 (1   )  N n
Var ( P)   
n  N 1 
 (1   ) N n
P 
n N 1
3. The distribution of sample proportion approximate normal distribution for large

sample size. By central limit theorem, it is possible to convert to standard normal by
subtracting.  From P and divided by the standard error.
P
Z

94
_________________________________________________________________________
Example: 45% of all graduate students pursuing their doctorate degree at Addis Ababa
University are married. If a sample of 200 graduate students is selected at a random, what is
the probability that the proportion of married students in this sample would be between 40%
and 48 %.
Solution: Distribution of proportion of various samples of 200 graduate students each from
the population would follow the normal distribution with average population proportion
=0.54 and δP. The standard error of the population:
 (1   ) 0.45(0.55)
P  
n 200
 0.0012375  0.035
To find the probability that the proportion of married students in the sample of 200 would
be between 40% and 48%, we must find the area between 0.40 and 0.48.
0.4 =0.45 0.48

δP=0.035
Area between 0.4 and 0.45 can be found after converting the value to standard normal
P1   0.40  0.45  0.05 
Z1     
P 0.035  0.35 
Z1 = -1.43. From Z- table the area equals 0.4236
P2   0.48  0.45 0.03
Similarly, Z 2     0.86
P 0.035 0.035
The area from the table for Z = 0.86 is equals to 0.3051.
Therefore, Total area = 0.4236 + 0.3051 = 0.7287
 P (0.40 < P < 0.48] = 0.7287
95
_________________________________________________________________________
Exercise: According to the internal Revenue service, 75% of all tax returns lead to a refund.
If a random sample of 100, tax return is taken.
A. What is the mean of the sample proportion of returns leading to refunds?
B. What is the Variance o the Sample proportion?
C. What is the standard error of the sample proportion?
D. Determine the probability that the sample proportion exceeds 0.80?
6.4. Sampling distribution of the sample variance

In the proceeding section, we have studied some of the properties of the properties of
sampling distribution of sample mean and sample proportion that is used to make inference
about population mean and proportion respectively. In this section, we will consider
sampling distribution of variance that is used for inference about population variance.
Consider a random sample of n observation drawn form population with unknown mean
and unknown variance δ2. If the sample members are x1, x2 -------xn, the population
Variance δ2 is defined as:
 2  E( x   ) 2 
The sample Variance, S 2 is defined as
1
(X
2
S2   X ) Moreover, its square root is termed as sample standard
n 1
i
deviation.
Here we use n-1 to find sample standard deviation for a random sample of n- observation.
This is because we computed sample mean and left with n-1 different value that can be
uniquely defined.
Given the above definition of sample variance, let us define its mean and distribution.
The mean (the expected value) of sample variance is equals to population variance.
E (S2) = δ2
Proof S 2 
(X i  X )2
from the chapter on expectation
n 1
96
_________________________________________________________________________
  X X   
n
(X   
2
i  X )2  i
n 1
   X    
i
2
2X     X    
i  X   2 
 X     2X      X     X   
2
 
2
i i
   X     2n ( X   )
i
2 2
 n ( X  )2
 ( X  )  n ( X  )
2
 i
2
Taking the expectation

E Xi  X 
2

 E  ( Xi   ) 2  
 nE X   
2
  E Xi    
n
i 1
2
 n E X   
2
The expectation of ( X i   )2 is the population variance δ2, and the expectation of
X    is the variance of the sample mean, that is, n

2
2
.Hence we have
n  n 2
E  ( X i  X ) 2   n 2   (n  1) 2
 i 1  n
E S 
 E 2  1


(X i  X )2 
n 1 
So

1
n 1
E  (X i  X )2  
1
 n  1  2   2
n 1
 E (S 2 )   2
This implies, sample variance, 2 is unbiased estimator of population variance, 2. This
means that, in a repeated sampling, the average of all your sample estimates will equal the
target parameter, 2.
As we have seen in the preceding topics, identifying the distribution of sampling
distribution of a sample statistic is essential to make inference about a parameter of
population. Therefore, let us identify the distribution of sample variance.
97
_________________________________________________________________________
Consider the distribution of 2 on a repeated random sampling from a normal distribution.

Theoretically, since variance cannot be negative, the sampling distribution of sample
variance starts from 2 = 0. Its shape is non-symmetric and changes with each different
sample size and value of 2. As we standardize random variables by forming Z-distribution,
sample variance can be standardized and form a distribution called chi-square distribution.
Given a random sample of n observations from a normally distributed population whose
(n  1) S 2
population variance is 2 and resulting sample variance is s2, and then has a chi
2
square (  2 ) distribution with n-1 degree of freedom.
F(  v2 ) Chi square distribution
(  v2 ) Chi square
When the population mean  is not known, a particular sample mean X based on a
random sample of size n may be used as an unbiased estimate of. Therefore, we can define
 v2 as
(X
2
X 
2
X X  X )2
  i     i   i
   2
Since sample variance defined as
(X  X )2
s 2

n 1
i
or (X i  X )2  (n  1) S 2
98
_________________________________________________________________________
We can define  2 distribution as
(n  1) S 2
 v2 
2
Chi square has many important applications. Some of its application (uses) is
- Test of independence of attributes
- Test of goodness of fit
- Test for the equality of population variance and test for homogeneity
The calculated value of  2 is compared with the critical value at a particular level of
significance and degree of freedom. If  2 cal   critical

2
, then the null hypothesis is rejected in
favor of the alternative hypothesis.
The chi square distribution has several important mathematical properties. Some of them
are the following.
1. If X1, X2 - - - Xn are independent random variables having standard normal
distributions, then
n
Y   X i 2 Has the chi-square distribution with V=n-1 degree of freedom.
i 1
2. If X1, X2 - - - Xn are independent random variables having chi-square distribution

with V1, V2, - - - Vn degrees of freedom, then
n
Y   X i Has the chi-square distribution with V1 + V2 + V3 - - - Vn degree
i 1
of freedom.
3. The mean and variance of chi-square distribution are equal to the number of degree
of freedom and twice the number of degree of freedom
E     V and var   
2
v V
2
 2V Where V is the degree of freedom
That is
  n  1  2  n 1
E   E  S 2  but E ( 2 )   2
  2
2

n 1
 2  2  n 1

99
_________________________________________________________________________
To obtain the variance of S2

  n  1 2 
2
 n 1
Var  s    2  var( s 2 )
    
2

2 2
Var ( ) 
2
n 1
 n  1
2
2 2

2 (n  1)
2(n  1)
For many applications involving, the population variance we need to find values for the
cumulative distribution of  2 , especially the upper and lower tails of the distribution. To
make inference about the population variance the calculated value of  2 is compared with
tabulated value of  2 for the given level of degree of freedom. For convenience of
interpretation, the  2 values listed under any column headed by specific value of  may be

2
denoted as . It means the probability is  that a random sample size n produces a
 ,v .

2
 2 value greater than the tabulated value .
for d.f V = n-1.
For example, the tabulated  2 value for v = n - 1 = 10 d.f under the column heading  =
0.05 is 
2
= 18.3. It means the probability is  = 0.05 that the  2 value computed from a
0.05

2
sample size of n = 11 is greater than = 18.3
0.05,10
f(  2)
=0.05
100
_________________________________________________________________________

2
This =18.3, the area  = 0.05 is the probability that X2 value based on sample of size
0.05

2
n=11 is greater than = 18.3
0.05

2
Tabulated value of X2 distribution with v=10 above which the area is .

The probability above can be stated as

P  v2     P  X  18.3  P 18.3  X      005
2
0.05
2 2
P   18.3  P  0    18.3  1    0.95A  = is the area to its right

2 2
2
10 v
under the chi square curve with v degree of freedom is equal to .
P X 2
0.05

 K u  0.05  upper tail
P( X 0.05  K L )  0.05  lower tail
2
Where Ku = upper tail critical value

KL – lower tail critical value
P  2
10
 3.94  0.05 
P( 
2
 18.31)  0.05
10

2
That is, is such that if X is a random variable having a chi square distribution with v
x, y
degree of freedom, then

P x 
2
x, y  
Example
A cement manufacturer claims that concrete prepared from his product has a relatively
stable compressive strength and that strength measured in kilograms per square centimeter
lies with in a range of 40 kg . A sample of n=10 measurements produced a mean an a

cu 2
101
_________________________________________________________________________
variance equal to, X  312 and s2 = 195. Do these data present sufficient evidence to reject
the manufacturer‟s claim d.f the population variance is equal to 100.
The claim of the manufacturer can be rejected if the calculated value of chi square exceeds

2
the critical value of  16.919 from the table
0.05,9
(n  1) s 2 9(195) 175
  2
   17.55

2
100 100
Since the observed value of chi square value 17.55 is greater than the critical value, we can
reject the manufacturer claim.
6.4 The F-distribution

There are cases where statistical analysis involves comparing two-population variance. you
might need to compare the precision of one measuring device with that of another, the
stability of one manufacturing process with that of another, or even the variability in the
grading procedure of one college instructor with that of another.
 and  2 , is to use the ratio of the sample

2 2
One-way to compare two population variances, 1
variances, s 1
2 . When independent random samples are drawn from two normal populations
s 2
with equal variance from two normal populations with equal

2
variance,  2
1
 2
2
 then s 1
2 has a probability distribution in repeated sampling that is
s 2
termed as F-distribution.
F-distribution is a sampling distribution of the ratio of two independent random variables
with chi square distributions, each divided by its respective degree of freedom. If U and v
are independent random variables having chi square distributions with v1 and v2 degree of
freedom, then

2
u 1
v1 v1
F 

2
v
v2 2
v2
102
_________________________________________________________________________
Is a random variable having F-distribution whose values vary with every set of two samples
of size n1 and n2.
 and  in the above equation

2 2
Substituting the value of
1 2
2
( n1  1) s 1
n1  1

2
F 1
2
( n2  1) s 2

2
2
n2  1
F  s
2 2
2 1
s
1 2
1 2
 and  2 are the variances of independent random samples of size n1 and n2 from
2 2
If 1
normal populations with the variances  1 and  2 then

2 2  2 2
F  v1 ,v2   2 2
1 s1 is a
 2s 2
random variable having F-distribution with n1-1 and n2-1 degrees of freedom.
The critical values of F-distribution are tabulated like the case of chi square and Z-
distribution. F, v1, v2 represent area to its right under the curve of F-distribution with v1
and v2 degree of freedom is equal to . That is F, v1, v2, is defined as P(F > f, v1, v2)=
Example: if the value of V1 = 10 V2 = 20  = 0.05
F10, 20, 0.05 = 2.35
P(F10, 20 > 2,35) = 0.05
To test whether the variance of two populations is equal or not, compare the calculated
value of F with the critical value of F.
Example: The research staff of investors was interested to determine if there is a difference
in the variance of maturities of AA-rated industrial bond and CC-rated industrial bonds. A
2
random sample of AA-rated bonds resulted in a sample variance s X
 123.35 and an
103
_________________________________________________________________________
independent random sample of 11 cc- rated bonds resulted in a sample variance

2
S 2
 8.02 whether the population variance of the two populations is.
6.5 The t-distribution

In the previous topics, we have seen that for random samples from a normal population with
mean  and variance 2, the random variable Z, has a normal distribution with mean  and
2 X 
variance . In other word has standard normal distribution if the size of sampled
n 
n
population is normal or the size of sample used is large. However if the sample size used is
small making inference about the population mean based on z-distribution as test statistics
involves two type of problems.
1. The shape of the sampling distribution of X and /or Z statistics depend on the shape
of population sampled. We can no longer assume that the distribution of X is
approximately normal, because central limit theorem ensures normality only for
sample that are sufficiently large. However, the sampling distribution of sample
mean is normal if the sampled population is normal.
2. The population standard deviation is always unknown. Even though it is possible to
estimate the population standard deviation with the sample standard deviation s, it is
poor approximation of for population standard deviation when the sample size is
small.
In the case where the population standard deviation is unknown, standard normal statistic
cannot be used. It is natural to replace the unknown  by the sample standard deviation, s.
These will gives a distribution called student t-distribution after Gosset who developed the
X 
probability distribution of the statistic t  .
s
n
Given a random sample of n observations, with mean X and standard deviation s, from
normally distributed population with mean , the random variable t, follows the student‟s t
distribution with (n-1) degrees of freedom .The shape of the student‟s t distribution is rather
similar to that of the standard normal distribution. Both distributions have mean zero, and
104
_________________________________________________________________________
the probability density functions of both are symmetric about their mean. However, the
density function of the student‟s t distribution has a larger dispersion (variability) than the
standard normal distribution. The actual amount of variability in the sampling distributions
of t depends on the size of the sample n.
As the number of degree of freedom increases, (sample size increases) the student‟s t-
distribution becomes increasingly similar to the standard normal distribution. This is
intuitively reasonable and follows from the fact that for a large sample size, the sample
standard deviation is a very precise estimator of the population standard deviation. In
particular, the small the degree of freedom associated with the t-statistic, the more the
variable will be its sampling distribution.
If xi is the n sample values drawn from a normal population with mean  and variance 2,
xi  
the standard normal random variable can be defined as: Z  which follows a normal

n
distribution with mean 0 and variance 1. For the same n sample values, the square of
gives a  2 variable, which follows a  2 distribution with

2
standard normal distribution Z i
n-1 degree of freedom given as:

2
( Xi  X )
Y  Z   2
i
2
A sample statistic t can be also defined as the ratio of the standard normal Z to the square
root of chi square distribution –Y.
105
_________________________________________________________________________
Z X 
t where Z 
y 
n 1 n
y
2
Z i
x

n X 
t
 X i X  1  X i X 
1  2
n n 1
n n 1
X 
t
S n
t- Values in a repeated sampling follow a t-distribution with n-1 degree of freedom.

By virtue of the central limit theorem as n tends to be large, the sample standard error
approaches population standard deviation. That is when n becomes large; the t-statistic
approaches the standard normal variable. That is If n>30, then s   so that
X  X 
t  Z  .
 
n n
In order to base inferences about population mean on student‟s t-distribution critical values
are tabulated for different degree of freedom. t,v represents the area to the right under the
curve of the t-distribution with v degrees of freedom is equal to . That is if t is a random
variable having t-distribution with v degrees of freedom, then P (t > t,v) =  .Since the
density function is symmetric, t1-,v = -t,v
The tabulated t values are denoted by t. The area under the t-distribution curve above t is
 and the one below t is 1-.
t1-=-t t
106
_________________________________________________________________________
Tabulated t and t1-=-tvalues

The areas under the t distribution curve can be interpreted in terms of probabilities by
taking, say a t distribution based on a sample of size n=15. Thus for v=n-1 = 14, the t value
above which the area under t-curve is =0.05, is t0.05=1.76. It means the probability of t
value computed for a random sample of size n=15 being greater than t0.05=1.76 is =0.05
and may be stated as
P(t > t0.05) = P(t > 1.76)=0.05
Similarly, the t value below which the area under the t-distribution curve is =0.05 is t0.95 =
-t0.05 = -1.76. It means the probability of t value based on a random sample of size n=15
being less than –t0.05 = -1.76 is =0.05. It is stated as P(t < -t0.05) = P (t < -1.76) = 0.05
=0.05 =0.05
-1.76 1.76
There are various uses of t-distribution. A few of them are

- Hypothesis testing for the population mean.
- Hypothesis testing for the difference between two populations means with
independent samples.
- Hypothesis testing for the difference between two populations means with
dependent samples.
- Hypothesis testing for an observed coefficient of correlation including partial and
rank correlations
- Hypothesis testing for an observed regression coefficient.
107
_________________________________________________________________________
Chapter seven
Estimation
Introduction
In the previous chapters of this module, we have been dealing with the concepts that are
used in statistical inference. That is probability and probability distributions are considered.
However, Statistical inference relates sample characteristics to population characteristics to
draw conclusion about the population parameter. This is because taking entire population
for determination of its characteristics is not possible due to constraints like capital, time
and other resources. A process known as estimation and hypothesis testing could do such
process of drawing conclusion about the unknown population parameter based on sample
statistic. Estimation, which is the subject of this chapter, means estimating or predicting the
value of population based on sample observations. On the other hand, hypothesis-testing
means making decision about the value of parameter based on some preconceived value of
statistic.

- Describe concept like estimator, estimate, and point and interval estimation.
- Identify the different properties of estimators ( unbiased ness, efficiency and
sufficiency )
- Describe different method estimation (maximum likelihood ,least square estimation
and moment )
- conduct point estimation and interval estimation
- construct and interpret confidence interval for different population parameter
7.1. Estimator and estimate

Let the population parameter to be estimated based on a sample be denoted by  (read as
theta). It may be mean, proportion P, or population variance 2. The mean can be
estimated by either the sample mean or the sample median X md . The estimator of
population mean  , X represent an estimator. That is an estimator of a population
108
_________________________________________________________________________
parameter is a random variable that represent a parameter being estimated. Where as a

specific value of a random variable of estimator represent estimate.
In other words, estimators are expressed as a function of random variables and an estimate
is a single number.
For example: if X1 = 10, X2 = 6, X3 = 5, X4 = 8 be samples taken from a population to
estimate the population mean, then X 

 xi 
10  6  5  7
 7
n 4
The expression for X is estimator of population mean and its value 7 is called estimate.
Thus, an estimator is a rule, usually expressed as a formula that tells us how to calculate an
estimate based on information in the sample. Hence, estimators are also known as statistic.
Point estimate and interval estimate


A point estimate is a single value of an estimator . It is obtained from a random sample of
a given size drawn from the population whose parameter  is to be estimate. For example, a
particular sample mean X is a point estimate of the population mean.
The following are two important feature of point estimate

- A point estimate is almost invariably different from the actual value of the
parameter. This is because its value derived from a random sample whose values
vary from sample to sample but it close to the population parameter from both side.
- As the parameter to be estimated is unknown, neither can the error in the point
estimate be evaluated, nor its accuracy measured. This will greatly reduce the partial
utility of a point estimate. However, a measure of error and accuracy can be easily
introduced in terms of appropriate probability statement which expressing estimates
in interval rather than single number.
Based on sample data, two numbers are calculated to from an interval with in which the
parameter is expected to lie.
109
_________________________________________________________________________
Interval estimate expressing an estimate as an interval or range that most likely contains the
value of the population parameter. For example, when it is stated that the average age of the
university campus students lies between, say 16 and 24 years, is an interval estimate and
expressed as (16 yrs <  < 24 yrs)
The advantage of an interval estimate is that it allows us to assign a definite probability that
interval contains the parameter being estimated. It also helps indicate the magnitude of error
in estimation, which serves as a measure of how accurately and precisely a parameter has
been estimated.
7.2 Desirable properties of an estimator

We have seen that to make inference about population parameter we use sample statistic
(estimator) drawn from sample information. However, how do we know that sample
statistic that we computed from sample observations is the best estimator of population
parameter. The best estimator should be highly reliable and have the following desirable
properties unbiased ness, consistency, efficiency, and sufficiency.
1. Unbiased ness
An estimator of a parameter is said to be unbiased if the mean (expected value) of its
distribution is equal to the true value of the parameter. If the mean of a sampling
distribution is not equal to the parameter, the statistics (estimator) is said to be biased

estimate of the parameter. That is a statistic  is an unbiased estimator of the parameter 
 
if and only if E( ) =  sometimes  will overestimated and other times underestimate the
parameter, but it follows form the notation of expectation that if the sampling procedure is
repeated many times, then on the average, the value obtained for unbiased estimator will be
equal to the population parameter. Unbiased ness however does not mean that the estimate
we get with any particular sample is equal to population parameter or even very close to 
. Rather, if we could indefinitely draw random samples from the population, compute an
estimate each time, and then average these estimates of over all random samples, we would
obtain population parameter.
110
_________________________________________________________________________
For an estimator that is not unbiased, we define its bias as:

If W is an estimator of  , it bias is defined as: Bias (W) =E (W)- 
Let us show that sample mean, sample variance, and sample proportion are unbiased
estimators of their corresponding population parameter. That is
1. E  X  = µ
2. E(S2) = δ2
3. E(P) = 
Proof
1. E  X  = µ
Let us consider n sample taken from the population
then X 
x i

X1  X 2    X n
n n
1 1 1
X  x1  x 2   x n
n n n
 1  1  1 
E x  E  x1   E  x 2     E  x n 
n  n  n 
1 1 1
E x  E x1   E x 2     E x n 
n n n
But E (xi) = µ for i=1, 2, 3 . . . n

Thus, the equation can be rewritten as

Ex 
1
n
1
     
n
1
n
1
        
n
n
 
n

E x 
Thus, the sample mean is unbiased estimator of the population mean.

If S2 is the variance of random sample from an infinite population with finite variance δ2
then E(s2) = δ2
111
_________________________________________________________________________
Proof: Recall that the formula for sample variance
  xi  x 
2
S 2

n 1
  xi  x 
 1 2 
E (s2 )  E  
 n 1 
E (s2 ) 
1
n 1
 n
 i 1
2
E   xi  x  

  1
n 1
E   x  2 xx  x 
2
i i
2
 
2 2
Since xi  x 
2
x i
 2 xxi  x

1
n 1

E  x i  x  xi   x but  xi  nx
2


1
n 1
2

E  x i  2nx  nx
2 2


1
n 1
2

E  x i  nx
2

E  s2  
1
n 1

E   xi   n  E x
2 2
(1)   
Since var (x) E (xx) – E(x) E(x)

δ2= E(x2) - µ 2
E(x2) =  2 + µ 2 ……………… (2)
 
Similarly var x  E x    2
and var x    2
n
2
n
 E x  
2
 2
 E x  2 x   2 
n
2
     2
 E x  E 2 x  E  2 
 E x   2  
 2
2 2 2
Thus
n
 E x   
 2
2 2
 E x  
 2
2
   2
(3)
n
Substitute equation 2 and 3 in equation (1)
112
_________________________________________________________________________
1  2  2 2

 
E S2 
n  1 
   2
 n    

 n 

1
n 1

 2 n  1   2 
E (s 2 )   2 This implies sample variance is unbiased estimator of population
variance.
Illustration: consider a population consisting of the measurements 0, 3 and 12 and
sampling distribution of sample mean and sample median a random sample of n=3
measurements from a population defined by the probability distribution shown below.
X 0 3 12
1 1 1
P(X)
3 3 3
The sampling distributions of sample mean X and sample median m
__________________________________________________________________
X 0 1 2 3 4 5 6 8 9 12
1 1 1 1 3 6 3 3 3 1
P(X )
27 27 27 27 27 27 27 27 27 27
____________________________________________________________________
____________________________________________________________________
M 0 3 12
7 13 7
P (M)
27 27 27
____________________________________________________________________
Show that X an unbiased estimator of  and m is biased estimator of 
Solution: the expected value of a discrete random variable x is defined to be E(X)
=  xP( x ) , where the summation is over all values of x, then
E(X) =  xP( x ) = 5
113
_________________________________________________________________________
The expected value of the discrete random variable X is
E ( X )   X p( X )
E ( X ) =5=  This implies that sample mean is unbiased estimator of population mean.
To show the sample median is biased estimator of population mean, let us find the expected
value of the sample median.
E (m) =  mp(m) =4.56. Since the expected value of sample mean is not the same  ,
sample median is biased estimate of population mean.

Weakness of the concept of unbiased ness
- Some reasonable and even some very good estimators are not unbiased.
- Some unbiased estimators are poor estimate of population parameter.
Exercise-1
Consider the probability distribution shown below
X 0 1 4
1 1 1
P(X)
3 3 3
a) Find the sampling distribution of sample mean for a random sample of
n=2 measurement from the distribution.
b) Show that the sample mean is unbiased estimator of population mean.
Exercise-2
If X 1 X 2, ..... X n constitute a random sample from normal population with mean  , show
xi
that  n
is unbiased estimator of  2
2. Efficiency
Unbiased ness only ensures that the sampling distribution of an estimator has a mea value
equal to the parameter it is supposed to estimating .This is fine, but we also need to know
how spread out of the distribution an estimator is. The degree of dispersion of an estimator
can be measured as efficiency of an estimator.
114
_________________________________________________________________________
An estimator is said to be efficient if its value remains stable from sample to sample taken
randomly from the same population. It is estimator whose distribution is most closely
concentrated about the population parameter being estimated. That is it has minimum
variance compared to other estimator of the population parameter. Such type of estimators
is reliable and give greater information about population parameter.
Suppose there are several unbiased estimators of θ. Then among unbiased estimators of
population parameter, an estimator with minimum variance is said to be the most efficient
estimator of θ. Let  and  be two unbiased estimators of θ, based on the sample
observations. Then,  is said to be more efficient than  if var (  ) < var (  ). This is to
mean that the distribution of  is more tightly centered about θ than the distribution of  .
That means the sampling distribution of  is considerably more variable than the sampling
distribution of  .
To check whether a given unbiased estimator has the smallest variance or not we have to
check the following condition.

If  is an unbiased estimator of θ and
 1
var    then  has a minimum variance unbiased estimator of θ.
   2nf ( x)  2 
n  E   
 2  
Where f(x) is the value of the population probability density value at x and n is the size of
the random sample.
Example: show that x is a minimum variance unbiased estimator of the mean µ of a
normal population.
Solution: Since the normal distribution density function for the random variable x is given
by
x
2
1 1
f ( x)  .e 2
  for   x  
 2   
115
_________________________________________________________________________
x
2
1
It follows that n f ( x)  1n 2   
2   
 n f ( x) 1 x
So that   
    
  n f ( x)  2  1  x    2  1
and hence E     . E     2 .1
    2     
1 1 2
Thus  
  n f ( x)  2  n.
1 n
n . E    2
   
Since x is unbiased and var x   2

n
, it follow that x is a minimum variance unbiased
estimator of µ.
The efficiency of estimators can be also determined by comparing the variance of
estimators of the same parameter. If 1 and  2 are two unbiased estimator of the parameter
θ of a given population and the variance of 1 is less than of  2 , we say 1 is relatively
var 1
more efficient than  2 . In addition, we use the ratio: as a measure of the efficiency
var  2
of 1 relative to  2 .
One way of comparing estimators, those are not necessarily unbiased is to compute the
mean square error (MSE) of the estimator. If  is the estimator of θ, then the MSE of an
estimator  is defined as MSE (  ) = E (  -θ). It measures the dispersion around the true
value of the parameter. An estimator with least MSE consider as the estimator with less
variability from the population parameter supposed to be estimated.
Exercise
If x1 and x2 are the means of independent random samples of size n1 and n2 from normal
population with mean µ and variance  2 , show that the variance of unbiased estimator
n1
 x1  (1   ) x2 is a minimum variance when  
n1  n2
116
_________________________________________________________________________
3. Consistency
Consistency refers to the effect of sample size on the accuracy of the estimator. A statistic is
said to be consistent estimator of the population parameter, if it approaches the parameter as
the sample size increases. In other words, for large sample size, n the estimators will take on
values that are very close to the respective parameters.
Let  be an estimator of θ based on a sample y1, y2, ...... yn of size n. Then,  is consistency
estimator of θ for ever  >0, if the probability that the absolute value of the difference
between  and θ is greater than  and approach zero as the size of sample becomes large.

p      0 as n   this is often expressed as
p lim    Where plim means probability limit

n 
In other words, if  is an unbiased estimator of the parameter θ and

var   0 as n  , then  is consistent estimator of θ.
Consistency is an asymptotic property, which is a limiting property of an estimator. It

means when n is sufficiently large, we can be practically certain that the error made with a
consistent estimator will be less than any small pre assigned positive constant.
Alternatively, a sufficient condition for consistency is that the MSE (  ) tends to zero as n
increases indefinitely.
Example-1: show that for a random sample from a normal population, the sample variance
S2 is a consistent estimator of δ2.
Solution: Since S2 is an unbiased estimator of δ2 let us show that var (S2)  0 as n   . In
the previous chapter, we have shown that for a random sample from a normal population:
2 4
var( S 2 )  . It follows that var (S2)  0 as n   .This show that S2 is a
n 1
consistent estimator of the variance of normal population.
Example-2: let x1, x2 ....... xn be a random sample from a distribution with mean  and
variance  2 . Show that the sample mean is consistent estimator of  .
117
_________________________________________________________________________

Solution: From elementary statistics, it is known that E ( X ) =  and var ( X ) = . Since
n
E( X ) =  regardless of the sample size, it is unbiased. Moreover, as n increases
indefinitely, var ( X ) tends toward zero. Hence, X is consistent estimator of 
4. Sufficiency
An estimator is said to be sufficient if it used all the information about the population
parameter contained in the sample.
For example, the statistic mean uses all the sample values in its computation while mode
and median do not. Hence, the mean is a better estimator in this sense.
The statistic  is sufficient estimator of parameter θ if and only if for each value of  the
conditional probability distribution or density random variable x1, x2 ....... xn given  =  , is

independent.
An estimator of a population parameter should fulfill these properties discussed above to be
considered as a good estimator.
7.3. Method of estimation

Broadly, three different methods are used most often to estimate estimators with desirable
properties. These are
1. Method of moment and its extension, the general method of moment(GMM)
2. Maximum likelihood method (ML)
3. Least squares method of estimation (LS) or OLS estimation method.
7.3. 1 Method of moment

We have studied in the previous section, sample average as unbiased estimator of
population average and sample variance as unbiased estimator of population variances.
These estimators are an example of method of moment estimator. That is the mean of y,
E(y), is also called the first moment of y , and the expected value of the square of y, E(y2 )
is the second moment of y. In general, the expected value of y r represent the rth moment of
118
_________________________________________________________________________
a random variable. That is the rth moment of y is E ( y r ). Thus, the method of moment
estimation of parameter  is the process of relating a random variable to its expected value
in the distribution of sample random variable. If the population random variable has a
known probability distribution with unknown parameter,  the first moments of the
variable X,
m1  E ( X )  g ( )
is some function of the unknown parameter. The method of moment then generate an
estimator  for  by solving the equation

m1  g ( )
That is, the value  that makes the first sample moment equals to g ( ) is the method of
moments of estimator for  .
Let X1, X2…. Xn be a random sample of a random variable X. The average value of the Kth
1 n k
powers of moment of x1 , x2 , xn , M k  
n i 1 x i
is called the kth sample moment for
k=1, 2, 3 …
Thus, if a population has r parameters, the method of moments consists of solving the
system of equations

r
r  1, 2,
r
M k
r for the r parameter.
k
Example: given a random sample of size n from a uniform population with =1. use the
method of moments to obtain a formula for estimating the parameter .
Solution: the equation that we shall have to solve is M 1   , wherer

r
 x and
r r
1 M 1
  1
r  
2 2
 1 
x in addition, we can write the estimate of  as   2 x  1
2
Generally, if x1, x2 ……xn is a random sample of a random variable x whose probability
distribution depends on unknown parameters θ1, θ2 … k, the method of moments estimators
119
_________________________________________________________________________
for the parameters are given by setting sample moments equal to population moments and
solving the resulting equations simultaneously.
The following examples illustrate estimation procedure of method of moment.

Suppose T1, T2 ….Tn represent independent times to failure for a piece of equipment,
assumed to have an exponential lifetime with (unknown) parameter. The first moment of
1
the population random variable is E ( x )  and the first sample moment is of the variable

1 1
M1 
n
 xi  X the method of moments estimator for  is given by solving X 

1
  as estimator. If n=5 and the observed values are 30.4, 7.8, 1.4,
x
120 1
13.1, 67.3, then x   24 . The estimate for  is    0.42.
5 24
7.3. 2 Method of maximum likelihood
The maximum likelihood estimation method is used to estimate unknown parameter when
the probability distribution on the population is known. The estimation procedure done in
such way that taking the likelihood function of the sample values and take the estimate of
the unknown parameters those values that maximizes this likelihood function. This means
we look at the sample values and choose as our estimate of the unknown parameters the
values for which the probability or probability density of getting the sample value is a
maximum. Before we start to deal with the detail mechanics of maximum likelihood
estimation procedure, it is important to examine the structure of likelihood function itself.
Suppose that first consider X that is a discrete random variable and x 1,x2…xn is a random
sample of X: x1,x2…xn are the observed sample values. Then the probability of observing
the values that did in fact in the sample is
p( X 1  x1, X 2  x2 ...... X n  xn )  p( X 1  x1 ) p( X 2  x2 )... p( X n  xn )
= pX ( x1 ) pX ( x2 ).......... p X ( xn )
= LX ( )
120
_________________________________________________________________________
LX ( ) represent the likelihood function, which is it expresses the probability of observing

the numbers obtained as a function of  . The likelihood function is the joint probability
distribution of the data treated as a function of the unknown coefficient.
In other word, if x1, x2…xn are the value of a random sample for a population with the
parameter  and population distributions function f ( X ,  ) . Because of the random sampling
assumption, the joint distribution of x1, x2…xn is simple the product of the
densities: f ( X 1, ) f ( X 2 , )...... f ( X n , ) which is the likelihood function of the sample.
L( ; X 1..... X n )  f ( X 1, ) f ( X 2 , )...... f ( X n , )

Generally, the product of the density function of n independent and identical observation
gives the likelihood function.
Given the likelihood function, which is the joint probability distribution of the data,
determine the maximum likelihood estimator of population parameter that maximizes the
likelihood function. That is, according to maximum likelihood principle, out of all possible
values of  the value that makes the likelihood of observed data largest (maximum) should
be chosen.
Usually, it is more convenient to work with the log likelihood function, which is obtained
by taking the natural log of the likelihood function:
ln L( ; X 1, .... X n )   ln f ( X i ; )
As with all maximization and minimization problems, the value of the estimator that
maximizes the likelihood function can be found by trial and error. Alternatively, however
likelihood function would be maximized by the method of calculus. The necessary
condition for maximization of the likelihood function or its log is
 ln l ( ; data )
 0 this equation is called likelihood equation.

The root to likelihood equation gives rise to the maximum likelihood estimator.
Example-1: Assume x1, x2 …xn is a random sample of normally distributed random

variable x with mean  and variance 2. The likelihood function with two unknowns  and
2 is given by:
121
_________________________________________________________________________
L   ,  2    n( xi ,  ,  )
n
i 1
n
 1   
L  ,   
n
2
 . exp  1  2 .  ( xi   ) 
  2   2 i 1 
Find the maximum likelihood estimate of these two parameters.
Solution: Take the natural logarithm of L (, 2) then the joint probability distribution the
random variable becomes:
n L  ,   
n
n 2 
2 n
n 
2  ( xi   )2
2 2 2 2
To maximized the likelihood function log linearizes the maximum likelihood joint
probability function and take its partial derivative with respect  and 2
 n L  ,  2 
   1
n
 2
( x  )
i 1
i (1)
  n L   ,  2   n ( x  ) 2
  i
(2)
2 2 2 2( 2 )2
Setting these two partial derivatives equal to zero gives solution for,


n
1

n
x
i 1
i x Substituting this value in the second equation gives
 2
 
1
n
  ( x  x) 
i
2
 2
This implies x is the maximum likelihood estimator of  and  is the maximum
likelihood estimate of 2.

Example-2: If x1, x2 …xn are random sample from exponential population, find the
maximum likelihood estimator of its parameter 
Solution:
The likelihood function of exponential population is given by
n
 1    xi
n 1
n
L( ; X 1..... X n )  i 1
f ( X i ; )    e i 1
 
Take natural log both side and differentiate with respect to 
122
_________________________________________________________________________
1
ln L( ; X i )  n ln
n
 xi
1

  i 1
d  L   n 1
  . X i
d  2
Equating this derivative to zero and solving for  , we get the maximum likelihood
estimate
1 n
   xi  X
n i 1
Hence, the maximum likelihood estimator  = X
Exercise
Let X be a gamma random variable with parameters r and  . Assume X 1, X 2 ..... X n is a
random sample of X and the likelihood function for the sample is
n n
 r xir 1   x
Lx ( r,  )   f x ( xi )   e i
. Find the maximum likelihood estimator of the
i 1 i 1  (r)
population parameter.
7.3. 3. Least square estimation method
We have studied the method of moments and the maximum likelihood method for
estimating unknown parameters of population. Under this topic, we will study the third
method of estimating population parameter called least squares estimation. It is especially
applicable to models that involve the values for two or more variables. Least square method
is used to estimate unknown parameters in the assumed relationships between the variables
using regression analysis. This analysis how ever concerned with what is known as the
statistical dependence between variables not functional or deterministic relation ship. In
statistical relation between variables, we deal with random or stochastic variables that are
variables that have probability distribution.
For example, the dependence of crop yield on temperature, rainfall, and other weather
condition has statistical nature. That is the explanatory variable ,although certainly
important ,they we not enable agronomist to certainly predict crop yield exactly because of
the error involved in measuring the variables as well as other factors(variables) that
collectively affect the yield but difficult to identify individually. Variables having such
relation can be expressed as
Yi  1   2 X i  ui
123
_________________________________________________________________________
Where Yi represent the variable to be explained (dependent variable) and Xi is the
explanatory variable (independent variable). ui represent disturbance term (variation in y as

a result of other factor not included in the model) all factor affecting the dependent variable
other than the independent variable state in the model. 1 and  2 represent population
parameters that show the relation ship between dependent and independent variables. Least
square estimation method used to estimate these population parameters based on the
observed sample data.
Yi  1   2 X i  u i the estimated value of the dependent variable.
= Yi  ui
ui  Yi  Yi
= Yi - 1   2 X
Now given n pairs of observation on Y and X, the least square estimator of the actual value
of Y determine in such manner that it is as closed as the actual value. That is determining
the estimators, which minimizes the sum of the residual. However, the sum of residual term
equals to zero even though the individual values of error terms are scattered. To avoide this
situation there fore the sum of square residual is minimize to determine the estimator when
we use least square method of estimation. Therefore, the method of least square provides us
with unique estimate of 1 and  2 that gives the smallest possible value for  ui . This
2
can be done using differential calculus as follows:
u   ( Yi - µ ¶
2
(Yi  Yi ) 2 = 1  2 X i )
2
Min i =
Take the partial derivative of the minimization problem with respect to 1and  2 and
then set equal to zero. Solve the to equation simultaneously to find µ ¶

1and  2
( ui )
2
 2 (Yi  1   2 X i )  2 ui = n 1   2  X i ……(1)

 1
Using the basic properties of summation operator equation (1) can be rewritten as
Y  1   2 X
1  Y   2 X
124
_________________________________________________________________________
( ui )
2
 2 (Yi  1   2 X i ) X i  2 ui X i = 1  X i   2  X i2 … (2)

 2
After some algebraic manipulation of equation (2) we can obtain
n
( X i  X )(Yi  Y )
2  i 1
n
( X
i 1
i  X )2
n  X iYi   X i Yi
2 
n  X i2  (  X i )2
Example: Given the following data, determine the estimator of population that relates the
two variables X and Y.
Y 3.5 4.3 5.2 5.8 6.4 7.3 7.2 7.5 7.8 8.3
X 6 8 9 12 10 15 17 20 18 24
Solution:
Y X XY X2 Y2
3.5 6 21.0 36 12.25
4.3 8 34.4 64 18.49
5.2 9 46.8 81 27.04
5.8 12 69.6 144 33.64
6.4 10 64.0 100 40.96
7.3 15 109.5 225 53.29
7.2 17 122.4 289 51.84
7.5 20 150.0 400 56.25
7.8 18 140.4 324 60.84
8.3 24 199.2 576 68.89
Y  63.3  X  139  XY  957.3 X 2

 2239 Y 2
 423.49
Yi  1   2 X i  u i
125
_________________________________________________________________________
n  X iYi   X i Yi 10  957.3  139  63.3  0.252

2  =
n X  ( X i )
i
2 2
10  2239   139 
2
1  Y   2 X
63.3  139 
1 = ( )  0.252   =2.823
10  10 
7.4. Interval estimation

Introduction
Point estimators, though are simple to determine it has some drawbacks. First, a point
estimator from the sample may not exactly locate the population parameter. For example,
the average of a sample may or may not be equal or close to the average of the population.
If the sample average is different from the population average, the point estimator does not
indicate the extent of the possible error, even though this error can be reduced by increasing
the sample size. A point estimator may be a best guesses of a researcher about population
value. But by its nature, it provides no information about how close the estimate is likely to
population parameter. This is because the population parameter is not known, we cannot
known how close the estimate for a particular sample. However, we can make statements
involving probabilities and this is where the interval estimation comes in.
Secondly, a point estimator does not indicate how confident we can be that the estimate is
close to the parameter it is estimating. For instance, we cannot know exactly how close X
may be to X , nor do we know the probability that X will be within a given distance of 
for a sample size of n.
To overcome these draw backs, statisticians use another type of estimation known as
interval estimation. In this method, we first find a point estimate. Then we use this estimate
to construct an interval on both sides of the point estimate, with in which we can be
reasonably confident that the true parameter will lie. Such estimator of a parameter is called
interval estimator.
126
_________________________________________________________________________
An interval estimator or confidence interval is a rule or formula that is used how to

calculate an interval estimate that contain the parameter of interest with a certain probability
called confidence coefficient, designated by 1-. The confidence coefficient shows the area
under the sampling distribution .For example, most of the time, researcher use  = 5%
(95%) confidence intervals. This means that the probability that the interval will contain the
estimated parameter is equal to 0.95. Depending up on the accuracy of estimation we want
to achieve, it is possible to increasing or decreasing the certainty by changing confidence
coefficient.
7.4.1 Interval estimation and confidence intervals

As stated above point estimate provides no information about how close the estimate is
likely to the population parameter. However, one can asses the uncertainty of point
estimator by finding its standard deviation. Reporting the standard deviation of the
estimator, along with the point estimate, provides some information on the accuracy of our
estimate. However reporting standard deviation with the estimate makes no direct statement
about population value that likely to lie in relation to the estimate. To overcome such
limitation constructing confidence interval for estimation is used.
Suppose x is a random variable whose probability distribution depends on an unknown
parameter. Given a random sample x1, x2, x3 … xn, the two statistics L1andL2 from a 100
(1-) % confidence interval for  if: P  L1    L2   1   no matter what the unknown
value of  .
For some specified value of (1-) the interval L1    L2 ,  L1, L2  represent (1-) 100%
confidence interval that contain the unknown value of . The ends points L1andL2 are
called the lower and upper confidence limits. This leads to our saying, we are 100(1- )
percent confident that the interval contain the true population parameter value.
7.4.1.1 Interval estimate of population mean (population variance known

Suppose that the mean of random sample is to be used to estimate the mean of a normal
population with known variance 2.The sampling distribution of X for random samples of
127
_________________________________________________________________________
size n form a normal population with mean  and variance 2 is normal distribution
2
with  X  and X  . Then transforming the sampling distribution of sample means
n
into the standard normal distribution as
Z
 X  

n
 
Z  X   X Z
n n
Since  falls with in a range of values equidistance from X , the interval estimate of
population mean with normal distribution is equals to plus or minus its standard error times
table value of Z for the indicated level of significance.

  X Z
n
If the mean of a random sample of size n from a normal population with the known
variance 2, is to be used as an estimator of the mean of the population, the probability is 1-

 that the error will be less than Z  .
2 n
  
i.e. P x    Z  .   1   or
 2 n
P  Z   Z  Z    1  
 2 2 
1-
Z Z
2 2
128
_________________________________________________________________________
 
 
x
P  Z    Z   1  
 2   2 
   
  n 
   x    Z      1  
P  Z       
2  n 2  n  
 
   1 
P  x  Z       x Z 
   
 2  n 2  n 
     1  
P  x  Z       xZ 
   
 2  n 2  n   

 - acceptable error level where Z  is the Z value representing an area in the right and
2 2
left tails of the standard normal distribution and (1-) is the level of confidence.
For example: If  = 0.05 and

Z  = 1.96 then the 95% confidence interval of the population mean is
2
0.95  P  1.96  Z  1.96 

 
 x 
 P   1.96   1.96 
  
 n 
  1.96 1.96 
 P   x  
 n n 
 1.96 1.96 
 P  x   x 
 n n 
If δ = 10, n=25 and x =140

  10   10  
 P 140  1.96      140  1.96     0.95
  25   25  
 P 143.92    136.08  0.95
The population mean fall between 144 and 136 with 95% confidence.
129
_________________________________________________________________________
Example 2
Suppose that shopping times for customers at a local grocery store are normally distributed.
A random sample of shoppers in the local grocery store had a mean time of 25min. Assume
 x 6 minutes. Find a 95% confidence interval for the population mean.
Solution: The 95% confidence interval of estimator of population mean is given as
1.96 1.96
x x
n n
1.96(6) (1.96)(6)
25     25 
16 6
25  2.94
22.06    27.94
The above result can be interpreted as, based on 16 observations, there is 95% confidence
that the unknown population mean fall between 22 minute and 28 minutes.
Exercise: unoccupied seats on flights causes air lines to loss revenue .suppose a large
airline wants to estimate its average number of unoccupied seats per flights over the past
year. To accomplish this, the records of 225 flights are randomly selected, and the number
of unoccupied seats is noted for each of the sample flights. The sample mean and standard
deviations are given as follows;
x =11.6 seats s=4.1 seats.
Estimate  , the mean number of unoccupied seat per flight during past year using 90
confidence interval.
7.4.1.2 Interval estimation of population mean of normal distribution:

population with unknown variance.
In the preceding section, confidence intervals for the mean of a normal population when the
population variance is known were derived. Now we study the case of considerable
practical importance where the value of the population variance is unknown and the sample
size is too small.
In such situation, other probability distribution called t-distribution to under take confidence
interval estimation for unknown population parameter. As discussed under the topic of
130
_________________________________________________________________________
sampling distribution, given a random sample of n-observation with mean x and

estimated standard deviation s, from a normally distributed population, the random variable
t follows a t-distribution with (n-1) degree of freedom.
x
t 
s
n
Therefore if x and s are the value of the mean and standard deviation of a random sample of
size n from normal population, then
 S
x  t ,n1    x  t ,n1 Is (1- ) 100% Confidence interval for the mean of the
2 n 2 n
population with unknown standard deviation small sample size.
In other words if x1, x2 ………….. xn be a random sample of a normal random variable with
mean  and 2 then the interval (L1, L2) can be defined as ;
S t , n 1 S  t , n 1 
L1  x  2
and : L2  x   2  is a 100 (1- )% confidence interval
n n
for  or it can be expressed as
P t  t     1  
 2 2
x
where t 
s
 S S 
 P x  t    x  t   1  
 2 n 2 n
Example: Gasoline price rose drastically during the early years of this century. Suppose
that a recent study conducted using truck drivers with equivalent year of experience to test
run 24 trucks of a particular model over the same highway. Estimate the populations mean
fuel consumption for this model of trucks with 90% confidence if the fuel consumption, in
miles per gallon, for the 24 truck were:
15.5 21.0 18.5 19.3 19.7 16.9 20.2 14.5
16.5 19.2 18.7 18.2 18.0 17.5 18.5 20.5
18.6 19.1 19.8 18.0 19.8 18.2 20.3 21.8
131
_________________________________________________________________________
Solution: n=24 x = 18.68 n-1=23 S=1.695

S S
 x  t ( n 1)
   x  t
2 n 2 n
1.695 1.695
 18.68  t 0.05 , 23    18.68  t 0.05 , 24
24 24
 18.1    19.3
Exercise
1. A random sample of 64 sales invoices was taken from a large population sales
invoice. The average value was found to be Birr 2000 with a standard deviation of
birr 540. Find a 90% confidence interval for the true mean value of all the sales.
2. The quality control manager at a factor manufacturing light bulbs is interested to
estimate the average life of large shipment of light bulbs. The standard deviation is
known to be 100 hours. A random sample of 50 light bulbs gave a sample average of
life of 350 hours determine 95% confidence interval estimate of the true average life
of light bulbs in the shipment.
7.4.1.3 Interval estimation for difference of two means

If there are two populations with means 1 and 2 and standard deviation 1 and 2,
respectively, the most efficient point estimator of the difference 1-2 is the sample
statistic x1  x2 . A point estimate x1  x2 is obtained by selecting two independent random

samples of size n1 and n2 from each of the two populations computing the two sample
means as
x1  x i
and x2  x i
and take the difference x1  x2
n1 n2
So the desired confidence interval for 1-2 can be obtained in terms of the sampling
distribution of x1  x2 , provided that the two populations are approximately normal, or the
sample sizes n1 and n2 are both greater than or equal to 30. When either of these conditions
is satisfied, the sampling distribution of x1  x2 is approximately normal with mean
 
2 2
 x  x  1   2 and standard deviation of  x  x  1 2

then
1 2 1 2
n1 n2
132
_________________________________________________________________________
Z
 x  x     
1 1 2
  2
2 2
1
n1 n2
has a standard normal distribution. If the variance of the two populations is known the
probability that the difference between the two means take value  Z  is 1-. That is
2
P Z   Z  Z    1  
 2 2 
Substituting for Z and solving for 1-2 gives:

  2  
  2   2 
2 2 2

 
P  x1  x2  Z
2 n1
1
n2
  1  2    x1  x 2  Z
 2 n1
1
n2 
  1 

   
Thus (1-) 100% confidence interval for 1-2 is given by the double inequalities
   
2 2 2 2
x1 
 x2  Z 
2 n1
1
n2
2
 1   2   x1  x 2  Z 
2 n1
1
n2
2
Example: The strength of the wire produced by company A has a mean of 4,500kg and a
standard deviation of 200kg. Company B has a mean of 4000 kg and standard deviation of
300kg. A sample of 50 wires of company A and 100 wires of company B are selected at
random for interesting the strength. Find 99 percent confidence limits of the difference in
average strength of the population of wires produced by the two companies.
Solution: Company A: x1 = 4500 δ1 = 200 n1 = 50
Company B: x 2 = 4000 δ2 = 300 n2 = 100
There fore x1  x2  4500  4000  500 Z   2.576

2
 
2 2
40,000 90,000
 x x  1 2
   41.23
1 2
n1 n2 50 100
The interval estimate of the difference between the two-population mean expressed as:
133
_________________________________________________________________________
   
2 2 2 2
x1  x2  Z 2  1 2
 1   2  x1  x 2  Z 2 2 2
2 n1 n2 2 n1 n2
500-2.576 (41, 23) < µ 2 - µ 2 < 500 + (2.576) (41.23)

500 – 106.20 < µ 2 - µ 2 < 500 + 106.20
393.80 < µ 2 - µ 2 < 606.20
 The 99 percent confidence limits on the difference in the average strength of wires
produced by the two companies are likely to fall in the interval [393.80, 606.20]
If the variance of the two populations is unknown, the population standard deviation of the
two populations is estimated by sample standard deviation. Under such condition the
confidence interval of the difference between the two means is obtained using sample
standard deviations S1 and S2 in place of 1 and 2 respectively. Thus, we have
2 2 2 2
x  x 
1 2  Z
2
S
n1
1
S 2
n2
 1  2  x1  x2  Z
2
S
n1
1
S
n
2
When the standard deviations of the two populations are unknown and the sample sizes n1
and n2 are both small, the desired confidence interval is obtained by using t-distribution,
provided that the two populations are approximately normal.
Under this conditions, the sample statistic T is defined as T 

x 1 
 x 2  1   2 
1 1
SP 
n1 n2
Where sp is the pooled standard deviation, which is an estimator of  when it is assumed

that 1 = 2 = 
The interval estimator of the difference between means of two populations under this
situation is given as
x  x 
1 2  t
2
SP
1 1

n1 n2

 ( 1  2 )  x1  x2 S P  1 1

n1 n2
Example: A study has been made to compare the nicotine contents of two brands of
cigarettes. Ten cigarettes of Brand A had an average nicotine content of 3.1 milligrams with
134
_________________________________________________________________________
standard deviation of 0.5 milligram. While eight cigarettes of Brand B had an average
nicotine of 2.7 milligrams with a standard deviation of 0.7 milligram. Assuming the two
sets of data are independent random samples from normal population with equal variance,
construct a 95% confidence interval for the difference between the mean nicotine contents
of the two brands of cigarette.
Solution: substitute n1 =10 n2= S1=0.5 and S2=0.7 into the formula for sp and we get
9(0.25)  7(0.49)
SP   0.596
16
x2  2.7, x1  3.1 and t 0.025 , 26  2,120
So the 95% confidence interval can be constructed as

1 1 1 1
x1  x 2  t s p   ( 1   2 )  ( x1  x 2 ) S P 
2 n1 n2 n1 n2
1 1 1 1
3.1  2.7  2.120 (0.596)   ( 1   2 )  (3.1  2.7) _(2.120) (0.596) _
10 8 10 8
 0.20  1   2  1.00
Exercise:
A study of two types photocopying equipments show that 61 failures of the first kind of
equipment took on average 80.7 minutes to repair with standard deviation of 19.4 minutes ,
where as 61 failure the second kind of equipment took 88.1 minute to repair with a standard
deviation of 18.8 minutes. Find the 99 percent confidence interval for the difference
between the true average times it takes to repair the failure of the two kind of photocopy
equipment.
7.4.1.4 Interval estimation of population proportion P

As we have seen in the pervious chapter, the sampling distribution of sample proportion
distributed normally if the sample size is large. If P denotes the proportion of successes in n
independent trials,  is the proportion of success in the population, then the random
variable,
135
_________________________________________________________________________
p 
Z approximate a standard normal distribution.
 (1   )
n
This result can be used to construct confidence intervals for the population proportion.
P   Z   Z  Z    1  
 2 2 
 P(1  p) P(1  p) 
 P  P  Z     P  Z   1

 2 n 2 n 
Let P denote the observed proportion of success in a random sample of n-observations from
a population with a proportion  of success. Then, if n is large enough that (n) ( ) (1- ) >
9 then a 100 (1- ) % confidence interval for the population proportion is given by
P(1  P) P(1  p)
P  Z    P  Z
2 n 2 n
Example: In a random sample, 136 of 400 persons given a five vaccine experienced some
discomfort. Construct a 95% confidence interval for the true proportion of persons who will
experience some discomfort from the vaccine.
Solution
136
n  400 P  0.34 Z 0.025  1.96
400
P(1  P) P(1  p)
P  Z    P  Z
2 n 2 n
(0.34)(0.66) (0.34)(0.66)
0.34  1.96    0.34  1.96
400 400
0.294    0.386
Exercise: suppose we want to estimate the proportion of families in a town, which have two
or more children. A random sample of 144 families shows that 48 families have two or
more children. Set a 95% confidence estimate of population proportion of families having
two or more children.
7.4.1.5 Estimation of the difference between two population proportions

If there are two binomial populations with proportions p1 and p2, the most efficient estimate
of the difference P1 - P2 is p1  P2 . The latter ( p1  P2 ) is the difference between sample
136
_________________________________________________________________________
x1 x
proportions. p1  and p 2 2 based on two independent random samples of size n1 and
n1 n2
n2 drawn from populations with proportion P1 and P2, respectively. The desired confidence
interval for P1 – P2 may be obtained by using the sampling distribution of p1  P2 which is
approximately normal with mean  = P1 – P2 and standard deviation
p1 q1 p 2 q 2
 
n1 n2
Defining standard normal variety Z as
( p1  p 2 )  ( p1  p 2 )
Z
P1 q1 P2 q 2

n1 n2
We make a claim that
P  Z   Z  Z    1  
 2 2 
Substituting for Z and solving for (P1 – P2) in exactly the same way as for we did for
difference between means, we have
 p1 q1 p2 q2 
 
 Pq  p q
P  p1  p2  Z  1 1   2 2  ( p1  p2 )  p1  p2  Z
 n1  n2
     1 
 2 2 n1 n2 
 Pq   p2 q2 
p1  p2  Z 
1 1
 
  ( p1  p2 )  p1  p2  Z 2  p1 q1 p2 q2

2
 n1   n2  n1 n2
is used to obtain (1 - ) 100 percent confidence interval by substituting
Z
P  P   P  P 
1 2 1 2
is random variable having approximately the standard
P1 (1  p1 ) P2 (1  p 2 )

n1 n2
normal distribution in the above expression.

Thus ,if x1 is a binomial random variable with the parameters n1 and 1, x2 is a binomial
random variable with the parameters n2 and 2, n1 and n2 are large and
 
x1 x2
 1

n1
and  2

n2
, then
137
_________________________________________________________________________

 
   

   1   2   2   2 
 1    1 
 

 
 1 (1   1 )
 
 1 (1   1 )
 1 2
 Z
2 n1

n2
 1   2   1   2   Z 
  2 n1

n2
s an approximate(1-) 100% confidence interval for 1 – 2.
Example: During a presidential election year, many forecasts are made to determine how
voters perceive a particular candidate. In a random sample of 120 registered voters in region
– A, 107 support the candidate in question. In an independent random sample of 141
registered voters in region-B, only 73 support the same candidate. If the respective
population proportions are denoted A and B, find a 95% confidence interval for the
population proportion difference (A - B)
Solution:
73
nA = 120 nB = 141 PB   0.518
141
For a 95% confidence interval,  = 0.05 and Z  = Z0.05 = 1.96
2
The required interval is therefore
0.892 0.518 1.96

0.892 0.108  0.518 0.482
120 141
0.892 0.108  0.518 0.482
  A   B  0.892  0.518  1.96 120 141
0.275 < A – B < 0.473

Thus, we are 95% confident that the interval from 0.275 to 0.473 contains that difference
between the actual proportions of region A and region B voters who favors the indicated
candidate.
Exercise: In a study of the relationship between birth order and college success, an
investigator found that 126 in a sample of 180-college graduate were first born children. In
a sample of 100 non-graduates of comparable age and socio economic background, the
number of first born children was 54. Estimate the difference between the proportions of
138
_________________________________________________________________________
first-born children in the two populations from which these samples were drawn. Use a 90%
confidence interval and interpreter your results.
7.4.1.6 Interval estimation of variance

In this section, we will consider estimating population variance and the ratio of two
population variances using confidence intervals. The estimation requires X2 and F
distribution that we have discussed in the previous chapter. That is if a random sample of
size n is drawn from a normal population with variance δ2, the sample variance S2 is
unbiased estimator of the population variance δ2.
It follow that the estimator S2 is used to define a random variable  2 such that
(n  1) S 2
2  which follow a chi-square distribution with n-1 degree of freedom.
2
We use this distribution to estimate population variance in interval form. Given a random
sample of size n from a normal population, we can obtain a (1-  )100% confidence interval
for  2 by using a chi-square distribution as follows
 2 (n  1) S 2 
P   x  n 1  1  
2


1 n 1
2,
 2
2

(n  1) S 2
(n  1) S 2
P 2
2  2
 1
x 2
n 1 X 1 ,n 1
2
are obtain from  2 - table.

2 2
where x  , n 1
and x 1 , n 1
2 2
If s2 is the value of the variance of a random sample of size n from a normal population,
then
(n  1) S 2 ( n  1) S 2
  2
is (1   ) 100%
 
2 2
 , n 1 1 , n 1
2 2
Confidence interval for 2. That is the value of population variance found between
( n  1) S 2 ( n  1) S 2
and with 1- probability or with (1-)100% confidence.
 
2 2
 , n 1 1 , n 1
2 2
139
_________________________________________________________________________
In general, let x1, x2 ------- xn has a random sample of normal random variable with mean 
and variance  . Then 2  ( x  x)i

2

(n  1) S 2
has a  2 probability distribution with n-1
 2
 2
degree of freedom. It follows that we can find confidence interval for population variance
( n  1) S 2
 
2 2
  or

2  2 1
2
( n  1) S 2 ( n  1) S 2
  2
 
2 2
1 1
2 2
( n  1) S 2 ( n  1) S 2
P (L1 < 2 < L2) = 1-  where L1  L2 
 
2 2
1 
2 2
Example: Suppose, the weight of frozen food package produced by a given manufacturer is
a normal random variable with mean  and variance 2 (both are unknown). A sample of 10
of these packages are selected at random and independently from those produced. Their
mean X =15.9 and standard deviation of the observation is S=0.57. Compute 95%
confidence interval for 2.
Solution:
n=10
 (xi - x )2 = (n-1)S2 =2.90

2
df = 9  2 0.005 =1.73 0.025
= 2.70
 
2 2
= 19.0 = 23.6
0.975 0.995
So the 95% confidence limits for 2 are
(n  1) S 2 (2.90)
L1   0.15

2
19
1
2
(n  1) S 2 2.90
L2    1.07

2
2.70

2
(L1, L2) =(0.15, 1.07)
140
_________________________________________________________________________
Based on the sample we are 95 percent sure that the interval (0.15, 1.07) contain 2.
Exercise: In 16 test runs, the gasoline consumption of an experimental engine had standard
deviation of 2.2 gallons. Construct a 99% confidence interval for 2, which measures the
true variability of the gasoline consumption of the engine.
7.4.1.6 The estimation of the ratio of two variance
 and  2 as their variances, the

2 2
Given two normal or nearly normal populations with 1
point estimate of 
2 2
1 is given by the ratio of two sample variances, s 1
. So if s
2
and

2 2 1
2 s 2
2
s 2
are the variance of independent random samples of size n1 and n2 from normal
populations, then
F  S
2 2
2 1
is a random variable having an F-distribution with n1 – 1 and n2 – 1
 s
2 2
1 2
degree of freedom. Thus, we can write

  2 1

2 2
P  f1 ,n 1,n 1  2 2  f ,n 1,n 1   1  



2 1 2
 1 2 2 1 

2 2 2
s 1 1
 1
 s 1
. f
 ,n2 1,n1 1
2 2 2
s 2
f1 ,n1 1,n2 1 2 s 2
2
2
a (1-) 100% confidence interval for 

2
1
1
since f1 

2 , n1 1, n2 1
2 f , n1 1, n2 1
2 2
Example: assume that a sample of 25 electric tubes manufactured by company x-shows a

mean life of 1250 hrs with a standard deviation of 15hrs. Like wise assume further that
another sample of 16 tubes manufactured by company Y gives a mean life of 1460 hrs with
a standard deviation of 20hrs. Given that the life of tubes manufactured by each of the two

2
1
companies is normally distributed. Find a 98 percent confidence interval of where

2
2
141
_________________________________________________________________________
 
2 2
1
and 2
are the variances of the life tubes manufactured by company x and company y
respectively.
Solution:
2
n1=25 s 1
= (15)2 = 225
2
n2=16 s 2
=(20)2 = 400
 = 0.02  =0.01
2
f 0.01( 24.15 )  3.29 f 0.01(15 , 24 )  2.89
The desired confidence interval is given by

2 2 2
S 1 1
 S 1
 S 1 1
Where V2 = n2-1 V1 = n1-1
2 2 2
S f (V1 ,V2 ) S S f (V2 ,V1 )
2 2 2 2 2
 (2.89) Gives 98% confidence interval for 

2 2
225 1 225
 1
 1
 
2 2
400 3.29 400 2
2
Exercise: A study has been made to compare the nicotine contents of two brands of
cigarettes. Ten cigarettes of Brand A had an average nicotine content of 3.1 milligrams with
standard deviation of 0.5 milligram. While eight cigarettes of Brand B had an average
nicotine of 2.7 milligrams with a standard deviation of 0.7 milligram. Assuming the two
sets of data are independent random samples from normal population with equal variance,
construct a 95% confidence interval for the ratio of the variance of nicotine contents of the
two brands of cigarette.
142
_________________________________________________________________________
Chapter Eight
Hypothesis testing
Introduction
So far, we have discussed the first aspect of inferential statistics, estimation of population
parameter based up on the information obtained from random sample. Now we will
consider the second aspect of statistically inference, the hypothesis testing. It is the process
of making decision about the value of population parameter based on the information
obtained from sample result. That is hypothesis testing involves identifying the validity of
some conjectures or claims about population parameter using statistic computed from
random samples. Hypothesis testing enable as to make decision about the estimated value of
population based on observed sample data.
In this chapter, we are going to describe testing hypothesis about different population
parameter based on sample evidence by starting with description of basic concepts.

- describe generally how statistical hypothesis testing is conducted
- State how hypothesis testing about different population parameters are done
-identify different sampling distributions that are used for testing of hypothesis about
different population parameters.
- conduct hypothesis testing about population mean, proportion, and variance
8.1. Definition of some concepts

Hypothesis
Statistical hypothesis means a statement that is made about unknown population parameter.
If  denote a population parameter of interest such as mean, variance or proportion, a claim
which is made about these parameters is known as hypothesis. For example job-training
program effectively increase the average worker productivity. Such claim can be tested
143
_________________________________________________________________________
based on data collected for a random sample. The claim that is made about the relationship
between training program and worker productivity represent statistical hypothesis.
Null hypothesis
A statistical hypothesis about the parameter that will be maintained unless there is strong
contrary evidence. Simply it means claim about population parameter that is tested based on
sample information. It represents a statement that a research test its validity based on
sample information. It is denoted by H0. Using random sample of size n a researcher
determine a point estimator  . Since the true value of  is rarely known, we raise the
question: is the estimated value  the same with the hypothesized value of  . The
statement about the compatibility of the estimate with the true population in the language of
hypothesis testing is known as null hypothesis. If the information found not consistent with
H0, the null hypothesis is rejected and we conclude that it is false. On the other hand if the
sample information is found consistent with H0, it is accepted even though do not conclude
it is true.
Alternative hypothesis
A hypothesis that comes to be accepted at the cost of H0 if the sample data provide
convincing evidence of its truth. In other word, a statement contradicts the null hypothesis.
Usually alternative hypothesis is denoted by H1. It may be stated in different way depending
on the nature of the problem statement. If the hypothesis to be tested relates to any
parameter θ whose value is predetermined or otherwise specified as θ = θ0 for example, the
null hypothesis is stated is
H0: parameter = value
H0: θ = θ0
The form in which H1 is to be stated depends on what the present value of θ is expected to
be. It may be stated in either of the following ways.
If our objective is to know whether the value of θ the same as before or has changed, the
alternative hypothesis is stated as
H1 : θ  θ0
144
_________________________________________________________________________
If our interest is to know the value of the parameter θ increase or decrease from the stated
value, θ0; the alternative hypothesis against which we test the null hypothesis is stated as
H1 : θ > θ0 or
H1 : θ < θ0
Hypothesis testing
It is application of set of rules for deciding whether to accept the null hypothesis or reject it
in favor of the alternative hypothesis.
For example, a pharmaceutical company plans to test the efficacy of a medicine against a
disease on the belief that 95 percent of all persons suffering from the disease on average get
cured. To test this belief the company draws a random sample of 100 patients who suffered
from the disable and treated with the medicine. Then applying statistical rules to make
decision about the effectiveness of the medicine is known as hypothesis testing.
Type I error and type II error

While making decision about null hypothesis based on sample information, researchers
make wrong decision. That is they commit an error in decision making, which take either of
the following forms.
(1) Type I error – A case where null hypothesis is rejected when it is true. For
example rejecting H0 : µ = 95 while it is true.
(2) Type II error – A case where null hypothesis is failed to reject when it is in
fact false.
Decision on null
hypothesis Null hypothesis is true Null hypothesis is false
Accept (fail to reject) Correct decision probability = 1- Type II error probability = 
Reject Type I error probability =  Correct decision prob. = 1-
(level of significance) (power of test )
When making conclusion about H0 (to reject or fail to reject) researchers will never know
for sure whether an error was committed or not. However, the probability of making either
145
_________________________________________________________________________
of type I error and type II error can be computed. In hypothesis testing rules are constructed
to make the probability of committing type I error should be small for reliability of the
decision made. In most case the probability of committing type – I error is denoted by  and
it is known as level of significance. Symbolically it is expressed as  = P (Reject H0 / H0 is
true) read as the probability of rejecting H0 given that H0 is true.
Classical hypothesis testing requires that we initially specify a significance level for a test.
That is specifying the value of  (quantity of tolerance of type I error). Commonly used
value of  are 0.10, 0.05 and 0.01.
The complement (1 - ) of the probability of type I error measures the probability level of
not rejecting a true null hypothesis. It is also referred as confidence level.
The probability of committing type II error is denoted by. It is the probability of accepting
a false null hypothesis. The value of  varies with the actual values of population parameter
being tested when H0 is false. The complement (1- ) of the probability of type II error
measures the probability of rejecting the false null hypothesis. It is also called the power of
statistical test.
The quality of statistical test is measured by the size of the two errors probability,  and.
The test should be takes place at small value of  and. Another way of evaluating a test is
to look at the complement of a type II error that is rejecting H0 when H1 is true. It has the
probability, 1- = P (reject H0 when H1is true). It measures the ability of test to perform
what is required.
The value  can be computed using the follow steps.
1. Find the critical value of Values of x or other sample statistic used to separate the
acceptance and rejection regions.
2. Using one or more values for µ consistent with alternative hypothesis H 1 and then
calculate the probability that the sample mean x falls in the acceptance region. This
produce the value  = P(accept H1 when µ= µ a) and then power of test 1-
146
_________________________________________________________________________
Example: The daily yield for a local chemical plant has average 880 tons for the last
several years. The quality control manager would like to know whether this average has
changed in recent months. She randomly selects 50 days from the computer database and
computes the average and the standard deviation of n=50 yields x = 871 tons and S=21
tons. Find  and the power of test when µ actually = 870 if you test the appropriateness of
the hypothesis at =0.05.
Solution:
The acceptance region for the test is located in the interval

0  1.96
n
21
 880  1.96
50
(874.18 885.82)
The probability of accepting H0, given µ=870 is equal to the area under sampling
distribution for test statistic x in the interval from 874.18 to 885.82. Since x is normally
distributed with mean of 870 and SE  21 = 2.97,  is equal to the area under normal
50
curve with µ = 870 located between 874.18 and 885.82. So calculating the Z-values
corresponding to 874.18 and 885.82
x   874.18  870
Z1    1.41
S 21
n 50
x   885.82  870
Z2    5.33
S 21
n 50
 = P (accepting H0 when µ=870) = P (874.18 < x < 885.82 when µ=870)

= P (1.41 < Z < 5.33)
Since the area under normal curve above x =885.82 for µ= 870 (Z = 5.33) is negligible
 = P (Z > 1.41)
= 1-0.9207 = 0.0793
= 0.0793
Hence, power of the test 1-  = 1-0.0793 = 0.9207
147
_________________________________________________________________________
The probability of correctly rejecting H0, given that µ is equal to 870 is 0.9207.
P- Value
The P-value or observed significance level of a statistical test is the smallest value of  for
which H0 can be rejected. It is the actual risk of committing a type I error, if H0 is rejected
based on the observed value of test statistics. P-value measures the strength of evidence
against H0. P-value is the actually the area to the right of the calculated value of test
statistic.
A small P-value indicates that the observed value of test statistic lies far away from the
hypothesized value of µ. This presents strong evidence that H0 is false and should be
rejected. P- Value for statistical test is the probability of observing a value of test statistics
that is contradictory to the null hypothesis and supportive of the alternative hypothesis. As
cut off point if the P-value is less than a pre assigned significance level, then the null
hypothesis would be rejected and you can report that the results are statistically significant
at level. For example, H0 is rejected at 5% level of significance if the P-value is less than
0.05.
Steps for calculating the P-value for a test of hypothesis
1. Determine the value of the test statistic corresponding to the result of the sampling
experiment.
2. a). if the test is one-tailed, the p-value is equal to the tail area beyond test statistic
value in the same direction as the alternative hypothesis. Thus, if the alternative
hypothesis is of the form >, the p-value is the area to the right of, or above, the
observed test statistic value. Conversely, if the alternative is of the form <, the p-
value is the area to the left of, or below, the observed test statistic value.
b) If the test is two-tailed, the p-value is equal to twice the tail area beyond the observed
test statistic value in the direction of the sign of its sign. That is, if test statistic value is
positive, the p-value is twice the area to the right of, or above, the observed test statistic
value. Conversely, if test statistic value is negative, the p-value is twice the area to left
of, or below, the observed test statistic value.
148
_________________________________________________________________________
Example: A manufacture of cereal wants to test the performance of one of its filling
machines. The machine is designed to discharge a mean amount of  =12 ounces per boxes,
and the manufacturer wants to detect any departure from this setting. This quality study
calls for randomly sampling 100 boxes from today‟s production run and determining
whether the mean fill for the run is 12 ounces per box. Determine the p-value using =0.01.
Solutions:
The null and alternative hypothesis of the problem would be stated as
Ho: µ = 12
H1 :   12
X  X  12
Test statistic: Z=  s=.5 n=100 X =11.85
 s
n 100
11.85  12
=  3.0
0.5
10
P-value =p(Z<-3.0)or Z>3.0)=p( p Z  3.0
Since the test is two tail test double p (Z<-3.0) which is equal to 0.5- 0.4987 =.0013
2 p(Z<-3.0)=2(.0013)= .0026
Based on the calculated value of p, we decide to reject or not as follows:
 Choose the maximum value of  that you are will to tolerate.
 If the observed significant level (p-value) the test is less than the chosen value of
,reject the null hypothesis. Other wise do not reject the null hypothesis.
For the above hypothesis since the p-value of the test statistic is less than chosen level of 
we can reject the null hypothesis.
Exercise: In a test of the hypothesis Ho: µ = 50 against H1 :   50 , a sample of 100
observation with a mean of x =49.4 and standard deviation of s=3.1. Find and interpret p-
value at =.01
149
_________________________________________________________________________
8.2. General procedure for hypothesis testing

To test the validity of the claim or assumption about the population parameter, a sample is
drawn from the population and analyzed. The results of the analysis are used to decide
whether the claim is true or not. The following general steps can be used to make
conclusion about the assumed value of stated population parameter.
Step 1: State the null hypothesis and alternative hypothesis
This is stating the assumed value of the population parameter, which is to be tested. It is
stated in such way that there is no difference between the value of the parameter and the
specific value hypothesized. This hypothesis is stated before any research is conducted or
any evidence is obtained. For example, suppose that we want to test the hypothesis that the
average IQ of our college student is 130. Thus, we can state the null hypothesis as
Ho: µ = 130
An alternative hypothesis, H1 would be stated, as the specific population parameter is not
equal to the value stated in the null hypothesis. That is
H1: µ < 130 or H1:µ > 130
Note that the form in which H1 stated is important as it determines the type of the test to be
used. We conduct a lower tail test when stated as H1: θ <θ0 ,upper tail test when stated as
H1: θ > θ0 and two tail test when stated as H1: θ  θ0. Statistical testing requires that H0 is
stated as precisely and clearly as possible. In most cases testing hypothesis requires the
description of H0 in affirmative terms, since it is easier to make use of the sample data for
rejecting H0. For example, if we intended to test whether a new technique of production is
more or less efficient that the old one, H0 is stated as the two techniques are equally
efficient.
Step 2: Deciding the level of significance 

The second step in testing hypothesis is to decide up on the level of significance, the risk
of type-I error. A decision has to be taken in advance, since it determines the rejection and
acceptance regions for H0. Since the choice of  is arbitrary made, it is possible to set  as
high as 1 or as low as zero. When  is set at 1, the acceptance region. 1 -  = 0 so that H0
will always be rejected even when it is true. On the other hand when  is set at zero, the
150
_________________________________________________________________________
acceptance region 1- = 1 So that H0 will always be accepted even when it is false.
Therefore, the value of  should be set optimally. In most case, conventionally set at
=0.05 or 0.01.
The result of testing are said to be significant where H0 is rejected at  = 0.05, and highly
significant where it is rejected at =0.01. This means that the sample result is significantly
different from the hypothesized value, and that the difference between the two is not merely
because of sampling error.
Step 3:- selecting the test statistic
In this step, select the test statistic up on which we based to reject or accept H 0. While
selecting the test statistic, it is necessary that one should identify the sampling distribution
of sample statistic that is used to estimate population parameter. Based on the sampling
distribution of the statistic the critical region that defined the region of acceptance and
rejection of null hypothesis will be defined. If the calculated value of the statistic falls with
in the acceptance region the null hypothesis could not rejected. However, it fall with in the
rejection region we reject the null hypothesis.
8.3. Hypothesis testing for population parameter

As indicated above, hypothesis testing about population parameter can be conducted using
the following steps.
1. Formulate H0 and H1 and specify 
2. Using the sampling distribution of an appropriate test statistic determine a critical
region of size . For instance, the sampling distribution of test statistic happens to
be a t-distribution; a chi square distribution or an F-distribution. For decision the
standard statistical tables provide the necessary value for
2 2
t , t
2
, x  , x  , F or f
2 2
for some values of.
3. Determine the value of test statistic from the sample data.

value of sample statistic  value of hypothesized population parameter
test statistic 
s tan dar error of the sample statistics
151
_________________________________________________________________________
4. Check whether the values of the test statistic fall in to the critical region and
accordingly, reject the null hypothesis or accept it.
Alternatively, P-value is used for a decision to accept or reject the null hypothesis. P-value
corresponds to test statistic represents the lowest value of significance at which the null
hypothesis could have been rejected. Hence, after determining the value of test statistic and
the corresponding P-value from a sample, check whether the p-value less than or equal to .
Then reject the null hypothesis if the p-value is less than the value of  and accept null
hypothesis if the p-value of the test statistic greater than .
8.3.1 Test concerning population mean

In this type of testing involves decision to check whether a reported population mean is
reasonable or not compared to a sample mean computed from the sample population. A
random sample is taken from the population and its statistic x is computed. An assumption
is made about the population mean µ as being equal to the sample mean and a test is
conducted to see if the difference ( x - µ) is significant or not.
Let µ 0 be the hypothesized value of the population mean to be tested. From the null
hypothesis and alternative hypothesis for one tailed test as indicated below.
For right tailed test H0: µ < µ 0
Ha: µ > µ 0
For left tailed test H0: µ > µ 0
H1: µ < µ 0
In hypothesis testing, the sampling distribution of test statistic is very important to
determine the critical region for accepting and rejecting the null hypothesis. Therefore, after
stating the hypothesis to be tested, identify the sampling distribution of statistic up on which
we base to construct the critical region for the test.
When the sample size is large, the sampling distribution of sample mean x , with known
standard deviation δ approximate normal distribution by central limit theorem. This implies
x  0
the standard sample statistic Z  approximate standard normal distribution.

n
152
_________________________________________________________________________
If population standard deviation δ is unknown, then a sample standard deviation is used to

x
estimate δ test statistics becomes Z  . Here we use a test statistic Z to construct

n
acceptance and rejection region for the test
After test statistic computed using sample data, decision to accept or reject null hypothesis
indicated below assuming large random samples (n > 30)
For left tailed: Reject H0 if Zcal < -Z∞ other wise accept H0.
Zcal – calculate value of Z statistics
-Z∞– table value of Z for  level of significance.
For right tailed test: Reject H0 if Z cal > Z other wise accept H0.
For two tailed test reject H0 if Zcal < - Z  or
2
Zcal > Z  Other wise accept H0.

2
Graphically the critical region (acceptance and rejection region of H0 presented using the
following graphs
Rejection Acceptance Acceptance Rejection

region region region region
-Z Z
Left tailed test Right tailed test
Rejection Acceptance Rejection

region region region
- Z Z
2 2
Example: The average weekly earnings for women in managerial and professional position
is $670. Do men in the same positions have average weekly earnings that are higher than
153
_________________________________________________________________________
those for women are? A random sample of n = 40 men in managerial and professional
positions showed X = $725 and S = $102. Test the hypothesis using =0.01
Solution: we would like to show that the average weekly earnings for men are higher than
$670, the women‟s hence, if  is the average weekly earnings in managerial and
professional positions for men, the hypothesis to be tested are:
H0 :  = 670
H1 :  > 670
The rejection region for this one tailed test consists of large values of X or equivalently,
values of the standardized test statistic Z in the right tail of the standard normal distribution,
with  = 0.01. This value is obtained from standard normal table (Z – table) and equal to Z
= 2.33. The observed value of the test statistic, using s as an estimate of the population
standard deviation is
x 725  670
Z   3.41
s 102
n 40
Since the observed value of the test statistic falls in the rejection region, H0 should be
rejected (Zcal > Z  ) and conclude that the average weekly earnings for men in managerial
and professional positions are significantly higher than those of women. The probability
that you have made an incorrect decision is  = 0.01.
Alternatively, we can also use p-value to test the hypothesis that men in the same position
have average weekly income higher than women. In the right tail test with observed test
statistic Z = 3.41, the smallest critical value we can use to reject H0 is Z = 3.41. For this
critical value the risk of an incorrect decision is
P (Z > 3.41) = 1 – 0.9997 (0.9997 is obtained from Z table associated with
calculated value of test statistic Z)
= 0.003
This probability is p-value of the test. It is the area to the right of the calculated value of test
statistic. H0 is rejected if the p-value associated with test statistic is less than the specified
significance level,. Thus, in this case H0 is rejected since p-value equals to 0. 003 which is
less than  = 0.01.
154
_________________________________________________________________________
 = 0.01
P-value =0.003
2.33 3.41
Example – 2: An auto company decided to introduce a new six-cylinder car whose mean
petrol consumptions claimed to be lower than that of the existing auto engine. It was found
that the mean petrol consumption for 50 cars was 10km per liter with standard deviation of
3.5km per liter. Test for the company at 5 percent level of significance, the claim that the
new car petrol consumption is 9.5km per liter on the average.
Solution: Let us assume the null hypothesis H0 that there is no significant difference
between the company‟s claim and sample average value, that is
H0 :  = 9.5
H1 :  ≠ 9.5
Given X =10 n=50 S=3.5 and Z  =1.96 =5%

2
x 10  9.5
The test statistic of the problem is, Z    0.495
s 3.5
n 10
Using  = 0.05, the rejection region consists of values of Z > 1.96 and values Z < -1.96.
Since 0.495, the calculated values of Z falls in the acceptance region, we accept the null
hypothesis =9.5. Since Zcal < Z  . Hence, we can conclude that the new car‟s petrol
2
consumptions equals to 9.5 km/liter.

The P-value approach
To accept or reject H0, find the p-value for the two tail and by comparing with the level of
significance to make conclusion about H0. The rejection region for the two tail test of
hypothesis found in both tails of the normal distribution since the observed value of test
statistic is Z=0.495, the p-value associated with Z=0.495 for two tail test is found as:
P - Value = P (Z > 0.495) + P (Z < - 0.495)
155
_________________________________________________________________________
= (1 – 0.6879) + (1-3121)
= 0.3121 + 0.6979
P – Value = 1.0100
The null hypothesis is rejected only if the P-value is less than the specified value of level of
significance  = 0.05. P – Value = 1.101 >  = 0.05.
Therefore, H0 is not rejected and the results are not statistically significant. There is no
enough evidence to indicate that the new petrol car consumption is different from 9.5
km/liter on average.
- Z  =-1/96 -0.495 Z = 0.495 Z  =1.96

2 2
Exercise
1. The mean lifetime of a sample of 400 fluorescent light bulbs produced by a
company is found to be 1570 hours with a standard deviation of 150 hours. Test the
hypothesis that the mean lifetime of the bulbs produced by the company is 1600
hours against the alternative hypothesis that it is greater than 1600 hours at 1 percent
level of significance.
2. The daily yield for local chemical plant has averaged 880 tons for the last several
years. The quality control manager would like to know whether this average has
changed in recent months. She randomly selects 50 days from the computer database
and computes the average and standard deviation of the n=50 yields as x =871 tons
and s = 21 tons, respectively. Test the hypothesis at  = 0.05 level of significance
using both the identification critical region based on Z- value and p-value.
When sample size is small (i.e. less than 30) the central limit theorem does not assure as to
assume that sampling distribution of a statistic such as mean x , proportion p , is normal.
156
_________________________________________________________________________
x
Consequently, when the sample size n is small, the test statistic does not have a
s
n
normal distribution. Therefore, the critical value of Z that is used for large sample size case
cannot used for accepting or rejecting null hypothesis. As we have discussed in chapter one,
if random sample of n less than 30 taken from normally distributed population, the sampling
x
distribution of a random variable t defined as: t  where s-sample standard deviation,
s
n
has t distribution with n-1 degrees of freedom. Thus, the critical regions for testing the null
hypothesis  =0 against alternative hypothesis can be constructed using t-distribution..
1. Null hypothesis H0:  =0
2. Alternative hypothesis
One tail test Two-tail test
H1:  > 0 or H1:  ≠ 0
H1:  < 0
x  0
3. Define test statistics t
s
n
4. Rejection region
Reject H0 for one tailed test t > t or t < -t  when alternative hypothesis, H1:  <0 or
2 2
when p – value < 

Two tail test
Reject H0 if t  tcal or tcal  t
2 2
Note that the critical value of t , t are found from standard student t-table for n-1 degrees
2
of freedom.
Example 1: A new process for producing synthetic diamonds can be operated at a

profitable level only if the average weight of diamonds is greater than 0.5 karat. To evaluate
the profitability of the process, six diamonds are generated, with recorded weights 0.46,
0.61, 0.52, 0.48, 0.57, and 0.54 karat. Do the six measurements present sufficient evidence
157
_________________________________________________________________________
to indicate that the average weight of the diamonds produced by the process is in excess of
0.5 karat?
Solution: The null hypothesis to be tested, becomes

H0 :  = 0.5 against alternative hypothesis
H1:  > 0.5
The mean standard deviation of the six diamond weights is equal to 0.53 and 0.059
respectively. Thus, the test statistic can be calculated as:
x  0 0.53  0.5
t   1.32
s 0.0559
n 6
t  1.32
If the level of significance  = 0.05, the right tailed rejection region is found using the
critical values of t obtained from t – table. With D.f n-1 = 5 t0.05 = 2.015. So reject H0 if the
calculated value of t is greater than table value of t (tcal > t0.5). Since the calculated value of
test statistic, 1.32 is less than 2.015. It does not fall in to the rejection region. This implies
do not reject H0.
P – Value method
Unlike the Z-table, the table for t gives the value of t corresponding to upper tail area equal
to 0.100, 0.050, 0.025, and 0.005. Consequently, you can only approximate the upper tail
area that corresponds to the probability that t > 1.32. Since the statistic for this test is based
on 5 D.f, we refer to the row corresponding to D.f = 5. The value t = 1.32 falls below t 0.10.
Therefore, the right tail area corresponding to the probability that t > 2.27 is greater than 0.1
but to reject null hypothesis, H0 the p – value must be less than the specified significant
level, . This implies we could not reject H0 since the p – value is greater than the value of
.
Example 2: Suppose that it is known from experience that the standard deviation of the
weight of 8-ounce packages of cookies made by a certain bakery is 0.16 ounce. To check
whether its production is under control on a given day, that is to check whether the true
158
_________________________________________________________________________
average weight of the packages is 8 ounces, employees select random samples of 25

packages and find that their mean weight is x  8.091 ounces. Test the hypothesis  = 8
against the alternative hypothesis  ≠ 8 at 0.01 level of significance.
Solution:
1 H0 :  = 8  = 0.01
H1 :  ≠ 8
2. Construct t- statistic
x  0 8.091  8
t   2.84
s 0.16
n 25
t  2.84
df  n  1  24 , t  t 0.01  t 0.005
2 2
t 0.005 , 24  2.797
Calculated t – value is greater than t , thus reject H0.

2
Exercise
1. A claim is made that Adama University students have an IQ of 120. To test this
claim, a random sample of 10 students was taken and their IQ scores are recorded as
follows: 105, 110, 120, 125, 100, 130, 120, 115, 125, 130. At 0.05 level of
significance, test the validity of this claim.
2. Test at the 0.05 level of significance whether the mean of a random sample of size
n=16 is significantly less than 10. Assumed that the distribution from which the
sample was taken is normal with x =8.4 and  = 3.2.
8.3.2 Hypothesis testing for the difference between two population means
In many applied research, we are interested in hypothesis testing concerning the difference
between the means of the two populations. For example we might wants to compare the
output between two different production process for which we do not know either
population mean similarly we might want to know if one marketing strategy results in
higher sales than another with out having the population mean sales for either. These
questions can be handled effectively by hypothesis testing.
159
_________________________________________________________________________
Let us suppose that we are dealing with independent random samples of size n1 and n2 from
two normal population having 1 and 2 with variance 12and  22 . We can test the null
hypothesis 1 - 2 =D against one of the alternatives 1 - 2 ≠ D, 1 - 2 > D or 1 - 2 < D.
Let the sample mean for the two population is x1 and x 2 , then the estimator of 1 - 2 is
x1  x 2 .
To test the null hypothesis that 1   2  D, we need to know the distribution of x 1  x 2 .
According to central limit theorem if the samples size are large, x1 and x 2 approximate
normal distribution with mean 1 and  2 respectively and standard deviation of

1 and  2 . Thus, distribution of the difference between the two samples means
n1 n2
approximate standard normal distribution.
x1  x 2  D
Z approximate normal distribution.
 
2 2
1 2
n1 n2
The normal test procedure for difference of two means of large sample size carried out as
follow.
1. Null hypothesis H0: (1 - 2) = D where D is some specified difference that you wish
to test. For many tests, we will hypothesize that there is no difference between 1
and 2, that is D = 0
2. Alternative hypothesis:
One tailed test two tail test
H1: 1 – 2 > D or H1: 1 – 2 ≠ D
H1: 1 – 2 < D
x1  x 2  D0 ( x1  x 2 )  D0
3. Test statistic: Z  
SE 2 2
S 1
S 2
n1 n2
4. Reject H0 when
One tail
Z > Z for Z < -Z when the alternative hypothesis is H1: (1 -2) < 0 or when P-value <.
160
_________________________________________________________________________
When the sample sizes are small, we cannot no more rely on central limit theorem to ensure
that the sample means will be normal. If the original populations are normal, however, then

the sampling distribution of the difference in the sample means x1  x2 will be normal 
even for small sample sizes. If both populations have the same variance (shape), 2 a
x1  x 2  ( 1   2 )
random variable has a student t – distribution.
1 1 
   
2
 n1 n2 
Thus, in case of small sample, the critical region to accept or reject the null hypothesis
about the difference between the two means of normally distributed population done using
t- statistic. To compute t- statistic, if the population variance of the two population is not
known, it is estimated with sample variance 2, where
(n1  1)  1  (n2  1)  2
2 2
S 
2
and the t-statistic becomes
n1  n2  2
( x1  x 2 )  ( 1   2 )
t
1 1 
S 2   
 n1 n2 
So reject H0 when t > t for H1: 1 - 2 > D and for t < - t when the alternative hypothesis
H1: 1 - 2 < D. Or when P – value is less than . For two tail test reject H0 when t  t or
2
t  t . Note that the tabulated value of t (t-critical) for t and t are based on n1 + n1 – 2
2 2
degrees of freedom.
Example 1: An experiment is performed to determine whether the average nicotine content

of one kind of cigarette exceeds that of another kind by 0.20 milligram. If n1 = 50 cigarettes
of the first kind had an average nicotine content of X 1  2.61 milligrams with standard
deviation of S1 = 0.12 milligrams , where n2 = 40 cigarettes of the other kind had an average
nicotine content of x2  2.38 milligrams with standard deviation of S2 = 0.14 milligram, test
161
_________________________________________________________________________
the hypothesis (1 - 2) = 0.20 against the alternative hypothesis 1 - 2 ≠ 0.20 at  = 5%
use p-value method also to make decision to reject or accept null hypothesis.
Solution: H0: 1 - 2 = 0.20

H1: 1 - 2 ≠ 0.20
i. critical value approach
Using a two – tailed test with significant level  = 0.05   2


 0.025 , reject H0 if Z > 1.96
or Z < -1.96. So to make comparison between the table value of Z (1.96) and its calculated
value find the calculated value of Z – statistic.
x1  x 2  D
Z ,
 
2 2
1 2
n1 n2
x1 = 2.61 n1 = 50
x 2 = 2.38 n2 = 40
S1 = 0.12 S2 = 0.14
2.61  2.38  0.20
 1.08
(0.12) 2 (0.14) 2

50 40
Since the calculate value of Z (1.08) is less than the table value of Z0.025 = 1.96, the null
hypothesis can not be reject at 5 percent level of significance.
ii. P-value approach: calculate the p-value, the probability that Z is greater than 1.08 Plus
the probability that Z is less than Z = -1.08
P - value = P (Z > 1.08) + P (Z < - 1.08)
Or P-value = 2p (Z > 1.08) = 2 (0.500 – 0.3599) where 0.3599 is the entry of
Z – table for Z = 1.08.
P – value = 0.2802. Since 0.2802 exceeds the value of , 0.05, the null
hypothesis can not be rejected. We can say also the difference between the
two means is not statistically significant.
162
_________________________________________________________________________
Example 2: A semester course in management accounting is being taught to group of 10

students by the traditional lecture method. Another group of 15 students is being taught the
same course through the case method of teaching. At the end of the semester course, the
two groups were examined in the same question paper. It was found that, on average, the
group consisting of 10 students obtained 64 marks with a standard deviation of 6 marks and
the other consisting of 15 students obtained 67 marks with standard deviation of 3 marks.
Assuming the two populations to be approximately normal with the same variance, test the
hypothesis at 0.01 level of significance that the two methods of teaching are equally
effective.
Solution: Let 1 be the average marks of students taught by traditional lecture method, and
2 be the average marks of those taught by the case method. Thus, we can test the
hypothesis as follows:
H0: 1 - 0 = 0  = 0.01
H1: 1 - 2 ≠ 0
The critical region: t  t or t  t t , n1  n2  2
 t 0.005, 23  2.870
2 2 2
Test statistics
n1  1 S 12  (n2  1) S 22
S  2
n1  n2  2
2 2
x1  x 2 n1 S 1  n2 S 2
t  for the save var iance case.
1 1  n1  n2  2
S 2   
 n1 n2   (4.64) 2
x1  x2 64  67
t    1.59
1 1  4.64 1  1
S 1  2  10 15
n n 
Since t=-1.59 is greater than  t  t   2,807 , we can‟t reject H0. It means that the
2
traditional lecture method and the case method are equally effective methods of teaching.
Exercise
163
_________________________________________________________________________
1. Two random samples of size n1 = 9 and n2 = 11 drawn independently from two

normal populations resulted in x1  22, x2  17 S1  5 and S 2  6 . Test the
hypothesis at =0.01 level of significance that 1 ≠ 2, assuming that 1 = 2.
2. A random sample of size n1 = 25 drawn from a normal population with standard
deviation 1 = 4.5 has a mean x1  75. A second sample of size n2 = 36 drawn from
a second normal population with standard deviation 2 = 4 has a mean x2  80. Test
the hypothesis 1 = 2 at 0.05 against the alternative hypothesis 1 ≠ 2.

3. A potential buyer of electric bulbs bought 100 bulbs each of two famous brands A
and B. Up on testing both these samples; he found that brand A had a mean life of
1500 hours with standard deviation of 50 hours where as brand B had an average life
of 1530 hours with a standard deviation of 60 hours. Can it be concluded at 5% level
of significance ( = 0.05) that the two brands differ significantly in quality?
8.3.3 Test concerning population proportion

When we deal with populations that can be divided in to two categories depending on
whether or not they category possess a certain attribute or not. The proportion of the
population, which possesses the stated attribute, is known by the term proportion of success
P. Where as the proportion which do not posses the attribute is known as proportion of
failure, q = 1 – p. Test of hypothesis about P, the proportion of a population possessing a
certain attribute follow the same general form with the hypothesis testing done for other
population parameters.
To test a hypothesis of the form

H0 : P = P0 against one or two alternative
H1 : P > P0 or H1: P < P0 or H1 : P ≠ P0, the test statistic is constructed using sample

proportion P, the best estimator of the true population proportion P. The sample proportion

P, is standardized using the hypothesized mean and standard error to form a test statistic Z,
which has a standard normal distribution if H0 if true.
164
_________________________________________________________________________
 
P  P0 P  P0  x
Test statistic: Z   with P where X is the number of
SE Po qo n
n
success in n binomial trials. Note that n – should be large enough so that the distribution of

P can be approximated by normal distribution. Thus, critical region for the test can be
stated as
Reject H0 when: Z > Z or (Z < Z when H1: P < P0) or when P – value is less than  for
One tail test
Reject H0 when Z  or Z  Z  for two-tail test
2 2
Example
A student leader seeking election to the office of the president of the University student
union claims that 55 percent of the votes will be polled in his favor and that he will win the
election. A sample survey of 100 students before the elections revealed that only 45
students expressed the desire to vote his favor. Verify the claim at 0.01 level of
significance.
Solution: H0 : P0 = 0.55

H1 : P0 < 0.55  = 0.0.1, P  0,45

P  P0 0.45  0.55
test statistic Z     2.00
P0 q0 (0.55) (0.45)
n
critical region: Reject H0 if Z < - Z
-Z = -2.33 (from Z – table)
Z = - 2.00
Decision: we can‟t reject H0 since Z > -Z
Exercise
1. Suppose that 10% of the fields in a given agricultural area infested with the sweet
potato whitefly. One side fields in this area are randomly selected, and 25 are found
to be instead with whitefly. Assuming that the experiment satisfies the condition of
binomial experiment,
165
_________________________________________________________________________
i. Calculate the test statistic and its P-value

ii. Test whether the data indicate that the proportion of infested fields is greater
than expected at 5% level of significance
8.3.4 Hypothesis testing for the difference between two proportions

Hypothesis testing for the difference between two proportions is made when we want to
known whether the difference among population proportion is significant or not. In such
 
case, we can use the difference in the sample proportion P1  P2 along with its standard error
p1 q1 p1q 2
(SE), SE   to find test statistic used to construct critical region for the test.
n2 n2
For large samples of size n1 and n2, both greater than 30, the test statistic has standard
normal distribution. Thus, we follow the formal procedure to conduct the test as follows.
1. Null hypothesis H0: P1 – P2 = 0 or equivalently H0: P1 = P2
Alternative hypothesis:
One tailed test Two tailed test
H1 : (P1 – P2) > 0 or H1: (P1 – P2) ≠ 0
H1: (P1 – P2) < 0
2. Test statistic:
 
 P1  P2   0  
P1  P2
Z  
SE pq pq

n1 n2
 x1  x 2
Where P1  , P2  , Since the common value of P1 = P2 = P is unknown it is estimated
n1 n2
 
 P1  P2   0  
 x1  x 2 P1  P2
by P  , the test statistic is Z    
n1  n2       1 1
pq pq
 p q  
n1 n2  n1 n2 
Rejection region: Reject H0 when
166
_________________________________________________________________________
One tailed test

Z > Z or Z < -Z when alternative hypothesis
H1: (P1 – P2) < 0 or when P – value < .
Two tailed test

Z  Z or Z  Z 
2 2
Assumption: Samples are selected in a random and independent manner from two binomial
populations, and n1 and n2 are large enough so that the sampling distribution of
 
 P1  P2  can be approximated by normal distribution.
 
Example: The records of a hospital show that 52 men in a sample of 1000 men versus 23
women in a sample of 1000 women were admitted because of heart disease. Do these data
present sufficient evidence to indicate a higher rate of heart disease among men admitted to
the hospital? Use  = 0.05 to test the claim assuming the number of patients admitted for
heart disease has approximately binomial distribution for both men and women with
parameters p1 and p2, respectively.
Solution:
Null hypothesis H0: P1 – P2 = 0
Alternative hypothesis H1: (p1 – p2) > 0
To determine test statistic, let use find the pooled standard error using then pooled estimate
of P.
 x1  x 2 52  23
P   0.0375
n1  n2 1000  1000
 
P1  P2 0.052  0.023
Z 
  1 1   1 1 
Test statistic p q   (0.0375) (0.9625)   
 n1 n2   1000 1000 
Z  3.41
167
_________________________________________________________________________
Z0.05 = 1.645
Reject H0: P1 – P2 since (Z = 3.41) > Z0.05 = 1.645 and conclude, the data present sufficient
evidence to indicate that the percentage of men entering the hospital because of heart
disease is higher than that of women.
Exercise
1. Independent random samples of 280 and 350 observations were selected from
binomial distribution 1 and 2 respectively. Sample 1 had 132 successes and sample
2 had 178 successes. Do the data present sufficient evidence to indicate that the
proportion of success in population-1 is smaller that the proportion in population 2?
Test the claim at 5% level of significance.
2. An experiment was conducted to test the effect of a new drug on a viral infection.
The infection was induced in 100 mice and the mice were randomly split in to two
groups of 50. The first group, the control group, received no treatment for infection.
The second group received the drug. After a 30 day period, the proportions of
 
survivors, P1 and P2 in the two groups were found to be 0.36 and 0.60 respectively.
Is there sufficient evidence to indicate that the drug is effective in treating the viral
infection use =0.05.
8.3.5 Testing hypothesis about population variance

There are several reasons why it is important to test hypothesis concerning the variances of
a population. It help us to identify whether, the distribution of a given population is uniform
or not and it is used to compare the variability of one population to other population. Let us
consider each of them turn by turn.
Case 1: Testing the uniformity of a given population. Similar to hypothesis testing of other
population parameter test about population variance 2 is done based on sample variance
computed from random sample observation from normally distributed population. You
might recall from chapter -, the sampling distribution of the ratio of sample variance to
168
_________________________________________________________________________
population variance becomes a random variable have a chi-square distribution. Therefore,

when we test the claim about population variance in the null hypothesis, X2 distribution is
used to compute test statistic in determining acceptance or its rejection.
Here the null hypothesis about population variance 2 would claim to be equal to a specified
value 2. Would claim to be equal to a specified value  0 .

2
2
H0 :  2  tested against alternative hypothesis
0
H1 :  2  (two tail test)

0
or  2   0 (one tail test)

2
H1 :  2 
2
0
To reject or not reject the null hypothesis a test static computed based on the observed
values as follows
(n  1) s 2

2
X  where Null hypothesis value

2 0
0
S2 sample variance
n sample size
Reject H0 for one tailed test, when
2
H1: 
2
2
x x
2
2 or x x1 for the alternative hypothesis .
0 0
x  x
2
For two tailed test reject H0 when 2 or x2  x
1
or when p-value less than .
2 2
2 2
Note that x and x 1
are the upper and lower tail values of X2 for n-1 degree of freedom
respectively obtained from chi-square probability table.
Example
A company manufacturing a radio tubes for the test 10 years find that the life of their tubes
has a variance of 0.6 years. As a result of some qualitative improvement brought about the
product, the company claims that the variance of the life of their tubes has decreased. Using
0.05 level of significance, test the claim made by the company if the sample variance s2
based on the observation of 9 tubes is found as 0.45 years.
169
_________________________________________________________________________
Solution
Null hypothesis, H0: 2 = 0.6
Alternative hypothesis; H1: 2 < 0.6,  = 0.05
(n  1) 2
Test statistic: X 2  , n  9 S 2  0.45

2
0
8(0.45)
X2   6.7
(0.6)
2 2
From chi-square table, x 1
 x 0.095
 2.73
2 2
This implies x x 0.95
as a result we can not reject H0. This means that the sample
information does not support the claim of the company.

Exercise
1. A random sample of n=25 observations from normal population produce a sample
variance equal to 21.4. Does these data provide sufficient evidence to indicate that
2 > 15? Test using  = 0.05.
2. According to an old estimate, variability in weights of the students of a certain
university was reported 10.25kg. The variance based on a recent sample survey of
10 students selected at random was found to be 15kg. Assuming that the weights
follow a normal distribution, test whether the recent estimate is significantly higher
than the old estimate use 0.01 level of significance.
Case 2: Testing the equality of two population variance one way to compare two population
variances,  and  is to use the ratio of sample variances 

2 2
2 2
1
2 . If S 1
is nearly equal to
1 2 2
S 2 S 2
 
2 2
1, you will find little evidence to indicate that 1
and 2
are unequal. On the other hand, a
very large or very small value for 

2
1 provides evidence of a difference in population

2
2
variance. More over it is possible to compare the variability of two population by testing the
equality of their variance.
That is test:
170
_________________________________________________________________________
H0 : 1 
2 2
 2
against
H1 :  1 
2 2
 2
or
 
2 2
1
 2
or
 
2 2
1
 2
.
The decision to accept or eject H0 is made using F-distribution since up on repeated random

2
sampling 1 has F-distribution. When we write the ratio of the two sample variance, it

2
2
is customary to place numerically large sample variance as numerator.

A particular computed value of F-distribution compared with value of F-distribution with
V1 = n – 1 and V2 = n-1 degree of freedom from standard F – distribution probability table.
Thus at  level of significance the critical regions for H0 against the three alternative way of
stating H1 are the following.
Reject H0 when
If H 1 :  1 
2 2
F  F (V1 , V2 )  2
If H 1 :  1  
2 2
F  F1 (V1 , V2 ) 2
F  F (V1 , V2 ) 
 If H 1 :  1  
2 2 2
F  F1 (V1 , V2 ) 
2
F (V1,V2) and F (V1, V2) are the table value of F leaving an area  and  to the right of
2 2
1- and 1   respectively.

2
Example In comparing the variability of the tensile strength of two kinds of structural steel,
2 2
an experiment yielded the following results: n1 = 13, S 1
 19.2 n2 = 16 and S 2
 3.5
where the unit measurement are 1,000 pounds per square inch. Assuming that the
measurements constitutes independent random samples from two normal populations, test
2 2 2 2
the null hypothesis S 1
 S 2 against the alternative S 1
 S 2
at 0.02 level of significance.
Solution
171
_________________________________________________________________________
2 2
H 0 : S1  S 2
 0.02
2 2
H 1 : S1  S 2
Test statistis
F 
2
19.2
1   5.49

2
2
3.5
Table value of F, F0.01 (12.15) = 3.67

Since F  F reject the null hypothesis. We can conclude that the variability of the
(V1 ,V2 )
2
tensile strength of the two kinds of steel is not the same.

Exercise
1. Independent random samples from two normal distributions produced the variance
listed here:
Sample size sample variance
16 55.7
20 31.4
Do the data provide sufficient evidence to indicate that differs from test using
172
_________________________________________________________________________
Answers to Selected Exercise

Chapter II
1. a) 15 b) 6 c) 24
2. a) 0.88 b) 0.12
3. a) 0.04 b) 0.333
4. 38/63
5. 210
6. 35
7. a) 0.75 b) 0.20 c) 0.60
8. a) 0.029 b) 0.286 c) 0.686
9. C
10. 5/40
Chapter III
0 x  1
 3
x
2
x2
1. a) yes because  dx  1 b) 1/9
1 3
c) F ( x)   1  x  2
9
1 x2
2. a) ¼ b) ½ c) 1/2
3. a) ½ b) 1/3 c) 1/6
4. yes
5. a) 5.66
6. a) 1/3
2  2 x 0  x 1
8. a) 7/16 b) 5/16-5/9 c) f ( x)  
0 otherwise
9. f(x) is unbounded, yes

10. a) 5 b) 25
Chapter IV
1. a) 27 b) 80.3 c) 56.8
3 . 0.59049
173
_________________________________________________________________________
4. a) 0.75
5. 0.6288
6. 0.3011
7. a) 0. 061 b) 0.471
8. a) 0.193 b) 0.463
9. a) 0.0228 b) 50% c) 5.41%
Chapter V
(2 x (1  y 2 ) y  x 1
1. a) f 1 ( x)  4 x 3 b) f 2 ( y)  4 y(1  y 2 ) c) f1 ( x y )  
0 other x
2. a) b) -1.2 c) -0.4 d) X and Y are dependent
3. a)1/210
0 x2

 0 y0
 2 x 2  5 x  18  2
 y  16 y
b) F1 ( x)   2 x6 F2 ( y )   0 y5
 84  105
 1 y5

1 x6
(4 x  5) 84 2 x6 (2 y  16) 105 0 y5

c) f1 ( x)   f2 ( y)  
0 otherwise 0 otherwise
16 y  y 2
d) 3/20 e) 23/28 f) 2/35 g) F ( x, y)  h) they are dependent
105
1 1
4. a) 1/18 b) f1 ( x)  (4 x  3) x  1, 2 f2  y   (2Y  6) y  1, 2
18 18
2x  y 2x  y
5. a) b)
4x  3 2y  6
 3  4 xy
 0  y 1
6. a) f ( y x)   3  2 x b) 9/16
0 other y
174
_________________________________________________________________________
References
1. Mendenhall W, Beaver R, Beaver B, (2003): introduction to probability and
statistics 8th ed.
2. Freund’s J (1999): Mathematical statistics 6th ed.
3. Sharma (2004): Business Statistics.
4. New Bold P, Carlson W, Thorne B (2003): statistics for business and economics
5th ed.
5. Hood RP, (2003) statistics for business and economics 3rd ed.
6. Larson H. (). Introduction to probability theory and statistical inference. 3rd ed.
7. Micheal J. Panlk. Advanced Statistics from and Elementary Point of View
,ELSEVIER Academic Press, USA, 2005
8. Murray R. Spiegel., Probability and Statistics, Schaum’s outline series, McGraw –
Hill Book company, New York, 1992
9. Oliver C. Ibe, Fundamentals of Applied Probability and Random Processes,
University of Massachusetts, Elsevir Acadmic press, USA, 2005
10. Seymour LipscatZ, Theory and problems of probability, Schaum’s Outlie Series,
McGraw-Hill Book Company, UK, 1974.
175
_________________________________________________________________________

Statistics For Economics Module Teaching

Uploaded by

Copyright:

Available Formats

Statistics For Economics Module Teaching

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Economics Module Teaching

Uploaded by

Copyright:

Available Formats

STATISTICS FOR ECONOMICS

Statistics for Economists

ODA BULTUM UNIVERSITY

STATISTICS FOR ECONOMICS MODULE

1.1 What is Statistics?

1.2 Classification of Statistics

1.3 Function of Statistics

1.4 Application of Statistics (in Economics)

1.5 Basic concepts

Sample1(X1) Sample2(X2) Sample3(X3)

X 1  9.7 X 2  10.3 X 3  7.3

2. compare and contrast the following pair of statistical terms

a. population and sample

b. parameter and statistics

c. sample size and sampling error

4. What does statistics mean in the sense of statistical data?

6. “ statistics is a method of decision-making in the face of uncertainty on the basis of

7. “Statistics is a partner in decision making process” explain this concept

2.1 The Role of Probability in Statistics

Probability is a concept that is used to measure the likelihood of occurrence of different

1. Classical probability approach

2.4 Principles of Counting, Permutations and Combinations

The second die

2 2,1 2,2 2,3 2,4 2,5, 2,6

Permutation: is an ordered arrangement of a group of objects. The number of permutation

Solution: the total number of sample points is

Combination: one particular arrangement of a group of objects or person selected from a

2.5 Some Rules of Probability

Conditional probability, independence and multiplication Rule

P( B A) = P (B) otherwise, the events are said to be dependent events.

Once we have determined whether events are independent or dependent, we define

If A and B are independent P( A B ) = P( A) P ( B )

If one of these employees is selected at random for membership on the employee

In application of probability, there is a need to combine the probabilities of related events in

a. Both are to be women?

b. Both are to be men?

c. There is to be one of each sex?

2. In a high school graduating class of 100 students, 54 studied Mathematics,69 studied

a. the student took Mathematics or History

b. the student do not take either of these subjects

3. At a large bank 6% of the employees are computer programmers, 50% of the

a. the employee is a computer programmer, given that the employee is female

3.2. Definition of a random variable

A random variable X is continuous if it can assume an infinite or uncountable number of

between A and B. Clearly X assumes an infinite number of values on the interval 0 ≤ X ≤

3.3 Discrete Probability Distributions

Example3. 2. Find the probability function corresponding to the random variable x of

Example 3.1 and construct a probability graph.

The probability distribution function is given by

Table 3.2: probability distribution of example 3.2

Figure 3.1 Bar chart of probability distribution of example 3.2

Distribution Functions for Discrete Random Variables

b) The graph of F(x) is shown in the following figure3.4

¼ = f (2) = F (2) – F (1)

1/2 1/2 = f (1) = F (1) – F (0)

¼=f (0) = F (0) - F (-∞)

3.4 Continuous Probability Distributions

Example 3. 6: The distribution function for a random variable X is

Example3.7. Assume that X be a continuous random variable with the following