Introduction To Statistics - Doc1

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 236

INTRODUCTION TO STATISTICS STAT 281

INTRODUCTION TO THE COURSE

The module has ten chapters: the first two chapters have been designed to deal with general Introductions,
basically to define some basic terms, and Methods of Data Representations.

The next two chapters are about Descriptive Statistics, dealing with Measures of Central Tendency
(collectively known as averages), and Measures of Variation or Dispersion.

Chapter 5 and 6, Probability and Probability Distributions, shall deal with Elementary Probability
Theory and two common Discrete Probability Distributions: Binomial and Poisson; and some Continuous
Probability Densities: Normal, Chi-Square t and F distributions, which play indispensable roles in statistical
theory and inferences.

Chapter 7, 8 and 9 are meant to discuss Sampling Distributions, Estimation and Hypothesis Testing and
Two Sample Inferences on one mean, two means and one proportion and two proportions. The last
chapter deals with Simple Linear Regression and Correlation.

Sufficient examples as well as activities are provided whenever necessary.

Objectives

By the end of the course, the student should be able to:

 Explain the basic concepts of statistics.


 Identify the different types of probability distributions.
 Know and apply the collection and organization of data.
 Identify the different types of sampling techniques.
 Analyze and conclude based on the data collected from a sample.

CHAPTER 1

INTRODUCTION
CONTENTS

1.1. DEFINITIONS AND CLASSIFICATION OF STATISTICS 3


1.2 STAGES IN STATISTICAL INVESTIGATION 3
1
Introduction to Statistics STAT 281

1.3. DEFINITION OF SOME TERMS 4


1.4. APPLICATIONS, USES AND LIMITATIONS OF STATISTICS 5
1.5. SCALES OF MEASUREMENT 7
1.6. INTRODUCTION TO METHODS OF DATA COLLECTION 10

INTRODUCTION
What is Statistics? What is the need to study statistics? How is it employed? These are only some of the
basic questions one has to raise with the field of statistics. This chapter will provide only partial answers to
these questions.

The chapter has six sub-sections that define some important terms starting with the word “Statistics” it self,
treated as singular and plural, along with its classifications, applications, uses and limitations; stages in any
statistical study; scales of measurement and a highlight to methods of data collection.

Objectives:

After completing this chapter, students are expected to be able to:

 Explain the meaning and uses of statistics


 Differentiate between descriptive and inferential statistics.
 Differentiate between types of variables.
 Describe the four levels of measurement.

1.1 DEFINITION AND CLASSIFICATION OF STATISTICS


Definitions of Statistics

Statistics has been defined in to two ways, some writers define it as ‘statistical data’ .i.e numerical
statement of facts, while others define it as ‘statistical methods”; that is, complete body of the
principles and techniques used in collecting and analyzing such data.

 Statistics as a numerical data (plural meaning)

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

Statistics are measurements, enumerations or estimates analyzed and presented as to


exhibit important inter relationships among them. ‘A.M. turtle’.
 Statistics as a statistical method (singular meaning)
Statistics may be defined as ‘the methods and techniques of collecting, organizing,
presenting, analyzing and interpreting numerical data”.

Classifications of Statistics
 Descriptive statistics:-refers to the procedures used to collect, organize and summarized
masses of data. The frequency distribution, measurement of central tendency such as mean
and median, measures of dispersion such as range and standard deviation, belong to this
category of statistics.
 Inferential statistics: - includes the methods used to find out some thing about a
population based on a sample. In this form of statistical analysis, descriptive statistics is
linked with probability theory so that an investigator can generalize the results of a study.

1.2 Stages in statistical investigation


There are five stages or steps in any statistical investigation:
1. Collection of data
This is the first step and it is the foundation of the entire data set. It is the process of
measuring, gathering, assembling the raw data up on which the statistical investigation is to be
based. Careful planning is essential before collecting the data. There are different methods of
collection of data such as census survey, sample survey, etc and the investigator should make
use of the correct method.

2. Organization of data
This is summarization of the data in some meaningful way, like in the form of a table.

3. Presentation of the data

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

This is the process of re-organization, classification, compilation, and summarization of data to


present it in a meaningful form. The collected data may be presented in the form of tabular or
diagrammatic or graphic form.

4. Analysis of data

This is the process of extracting relevant information from the summarized data, mainly through
use of elementary mathematical operation.

5. Interpretation of data

The final step is drawing conclusion from the data collected. A valid conclusion must be

drawn on the basis of analysis. A high degree of skill and experience is necessary for the

interpretation.

1.3 Definition of some terms in Statistics


Data are the raw materials of statistical investigations, they arise whenever measurements are
made or observations are recorded.

There are two groups of data:


i) Primary data-data which are collected from the units or individual respondents for the
purpose of certain study or information.
ii) Secondary data- data which had been collected by certain agency and statistically
treated and the information contained in it is used again for some another purpose.
Some more basic terms include:
 Population:- is total collection of elements to be studied and they have one or more
specific characteristics.
Example:- population consisting of DDU summer students for some study.
 Sample :- is any subset of population selected to draw some valuable conclusions about
the entire population on its basis
4

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

 Parameter:- is numerical measurement, which describes some characteristics of a


population
 Sample statistics:- is a numerical measurement , which describes some characteristics of
a sample

1.4 Applications, Uses and Limitations of statistics

Some of the applications of statistics:


• In almost all fields of human endeavor.
• Almost all human beings in their daily life are subjected to obtaining numerical facts.
• It is applicable in some process like invention of certain drugs, extent of environmental
pollution, etc.
• In industries, especially in quality control area.

Uses of statistics
The main function of statistics is to enlarge our knowledge of complex phenomena. The following
are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison of different sets of data.
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variables.
8. Forecasting future events.

Limitations of statistics

Statistics with all its wide application in every sphere of human activity has its own limitations.
Some of them are given below.

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

1. Statistics is not suitable to the study of qualitative phenomenon

Since statistics is basically a science and deals with a set of numerical data, it is applicable to the
study of only these subjects of enquiry, which can be expressed in terms of quantitative
measurements. As a matter of fact, qualitative phenomenon like honesty, poverty, beauty,
intelligence, etc, cannot be expressed numerically and any statistical analysis cannot be directly
applied on these qualitative phenomena. Nevertheless, statistical techniques may be applied
indirectly by first reducing the qualitative expressions to accurate quantitative terms. For
example, the intelligence of a group of students can be studied on the basis of their marks in a
particular examination.

2. Statistics does not study individuals

Statistics does not give any specific importance to individual items; in fact, it deals with an
aggregate of objects. Individual items, when they are taken individually, do not constitute
statistical data and do not serve any purpose for any statistical enquiry.

3. Statistical laws are not exact

It is well known that mathematical and physical sciences are exact. But statistical laws are not
exact but they are only approximations. Statistical conclusions are not universally true, they are
true only on the average.

4. Statistics can be easily misused

Statistics must be used only by experts; otherwise, statistical methods are the most dangerous
tools on the hands of the inexpert. The use of statistical tools by the inexperienced and untraced
persons might lead to wrong conclusions. Statistics can be easily misused by quoting wrong
figures of data.

5. Statistics is only one of the methods of studying a problem

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

Statistical methods do not provide complete solution to the problems because problems are to be
studied taking the background of the countries culture, philosophy or religion into consideration.
Thus the statistical study should be supplemented by other evidences.

1.5 SCALES OF MEASUREMENT


CLASSIFICATION OF DATA

Data classification can be defined as a method of grouping data according to their similarities
and uses to study the characteristics of the entire population on the basis of their classes.
The classification of data is generally done on geographical, chronological, qualitative or
qualitative basis.
i) In geographical classification, data are arranged according to places, areas or regions.
ii) In chronological classification, data are arranged according to their time references.
iii) In qualitative classification, data are arranged according to attributes like sex, marital
status, educational standard, etc.
iv) In quantitative classification, data are arranged according to certain characteristics
that has been measured or counted.

Data can also be classified according to different aspects such as:


I. Depending on the type of variable
a) Qualitative data (categorical data)

In qualitative classification, data are arranged according to attributes.

Example 1.1

Data collected based on sex, marital status, educational standard, and so on give rise to qualitative
data.

Sex: male or female

Marital status: married, single, divorce, widowed.

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

Educational standard: Literate or Illiterate.

Rank of instructors: Graduate assistant, assistant lecturer, lecturer, and so on.

b) Quantitative data

In quantitative classification, data are arranged according to certain characteristic that has been
counted or measured.

Quantitative variables are again divided in two groups: - discrete and continuous.

Discrete data:-are described by integers only and their values are obtained by counting, the
possible values for such variables are 0, 1, 2… that means they assume only counting numbers.

Example 1.2
Number of students in Dire Dawa University, number of private cars in Dire Dawa,
number of books are some of the examples that produces discrete data.

Continuous data:-are those quantitative figures which can take any numbers, including fractions.
Their values are obtained by measurement.

Example 1.3
Weight of a person in kg, height, temperature and so on give rise to continuous data.

II) Depending on time reference


a) Time series data:- are data collected over along period of time.
b) Cross sectional data:- are data collected over a particular period of time on a range of
spaces.

Definition: A characteristic which shows variability or takes on different values is called a variable.

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

 Quantitative variable – is the one which leads to quantitative data. Hence we can talk about a discrete
variable (yielding discrete data) and a continuous variable (yielding continuous data).
 Qualitative variable- similarly, leads to qualitative data.

III) Depending on scales/Level of measurement

Proper knowledge about the nature and type of data to be dealt with is essential in order to
specify and apply the proper statistical method for their analysis and inferences. Measurement
scale refers to the property of value assigned to the data based on the properties of order,
distance and fixed zero.
The scales of measurement also show what mathematical operations and what statistical
analyses are permissible to be done on the values of the variable.
Accordingly, there are four scales of measurement: nominal, ordinal, interval and ratio scales.

a) Nominal scale variables


These are those qualitative variables that consist of name label or categories of individuals. In
nominal scales numbers are assigned to the variables simply for coding purposes. It is not possible
to compare two individuals based on the numbers assigned to them. They don’t share any of the
properties of the number we deal with an ordinary arithmetic.

Example 1.4

Sex, Religion, Nationality, color, are nominal variables.

b) Ordinal scale
This refers to the variables whose values can be ordered or ranked but the difference between data
values either can’t be determined or is meaningless. Comparison is restricted. Ranking and
counting are the only mathematical operations to be done on the values given to these variables.

Example 1.5

i) Rank of instructors in a university as graduate assistant, lecturer, and professor is ordinal.

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

ii) Beauty classified as beautiful, more beautiful and most beautiful is ordinal.

c) Interval scale
These variables have the properties of the ordinal scale plus the difference between two values
is constant. There is no true zero origin; that is, zero doesn’t show absence in this case.

Example 1.6

Temperature of a given area may be 0 oc. But this doesn’t mean that there is no heat at all; It
simply indicates that it is too cold.

d) Ratio scale
Ratio scale variables have the properties of the interval scale but in this case there is true zero
origin. That is, zero shows absence of something in this case.
All mathematical operations like division, multiplication, logarithms, powers and others are
allowed to be operated on the values of such variables.

Example1.7

Income of a person, amount of yield from a plot of land, expenditure and consumption amount.
In all of these cases, if the variables assume zero values, it is the indication of absence of the
values. That means, for example, if yield is zero, it shows no yield at all.

1.6 INTRODUCTION TO METHODS OF DATA COLLECTION

Depending on the source of data, there are two methods of data collection:
a) Primary method
Data measured or collected by the investigator or the user directly from the source.
• Two activities are involved: planning and measuring.
a) Planning:
 Identify source and elements of the data.
 Decide whether to consider sample or census.
10

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

 If sampling is preferred, decide on sample size, selection method,… etc.


 Decide measurement procedure.
 Set up the necessary organizational structure.
b) Measuring: there are different options.
 Focus Group
 Telephone Interview
 Mailed Questionnaires
 Door-to-Door Survey
 Mall Intercept
 New Product Registration
 Personal Interview and
 Experiments are some of the sources for collecting the primary data.

b) Secondary data
These are data gathered or compiled from published and unpublished sources or files.
When our source is secondary data, we need to check:
 The type and objectives of the situations.
 The purpose for which the data are collected and compatibility with the present
problem.
 The appropriateness of the nature and classification of the data to our problem.
 There are no biases and misreporting in the published data.

Note that data which are primary for one may be secondary for the other.

11

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

SCOPE OR COVERAGE OF DATA COLLECTION


In general, there are two methods of data collection, census and sample survey
i) Census survey or complete enumeration
It is a process of investigating the characteristics of each and every member of the
population. It is a survey in which observations are made through the entire population.

Advantages of census survey

 It is more representative than sample survey


 It is more accurate
 Complete and exact when the domain is small

Disadvantages of census survey

 Completeness is impossible when the population is large


 It consumes time, money and human power

ii) Sample survey


 It is an investigation where some part of a population are taken to infer about the
whole population.
 It is appropriate when there is insufficient cost and time.

Advantage of sample survey over census survey

a) It saves money
It is cheaper to assess a sample of size n than a population of size N (n<N).
b) It saves labor
Small number of staffs (enumerators, supervisor, data editors) are required in sample
survey than in census.
c) It saves time
Since the size is small, it reduces data collection and processing time.
d) It minimizes disturbance

12

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

If the process of data collection affects the society, sampling is the only alternative
for data collection.

CHAPTER SUMMARY

 Statistics is the science that deals with the method of data collection, organization,
presenting, analysis and interpretation of the results of the analysis.
 There are two classifications of statistics: descriptive and inferential.
 Descriptive statistics includes those procedures used to summarize complex data. These
include graphical methods, measure of central tendency and measures of dispersion
 Inferential statistics deals with taking samples and reaching conclusions about a
population, which include estimation and test of hypothesis.
 Variables are classified in to quantitative and qualitative. Quantitative variables are those
variables whose values can be expressed numerically. The values of the qualitative
variables, how ever, can not be expressed numerically.

 Planning and measurement are the two activities involved while working with primary
data.

 The two main kinds of data collection are Census survey and Sample survey. Census
represents complete enumeration, where as sample survey means taking part of the
population so as to infer about the general population from the results of the sample.

13

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

Exercises on Chapter 1

1. Broadly, define the term ‘Statistics’.


2. Mention some of the uses and limitations of Statistics.
3. Classify the following variables as qualitative, quantitative, discrete or continuous.
a) Number of courses that students take at DDU during this summer.
b) The amount of rainfall at Dire Dawa over the last ten years.
c) The attitudes of the parents towards bringing up their children.
d) Type of automobiles people drive.
e) Length of steel bars produced in a given production run.
f) Weight of a bar of soap.
4. An insurance company has insured 250,000 cars over the last five years. The company
would like to know the number of cars involved in one or more accidents over this time
period. It selects 500 clients at random from the files and makes a record of clients who were
involved in one or more accidents. Based on this information, Identify:
a) The population. b) The sample. c) The variable of interest to the insurance company.
d) The type of statistics used. e) The scope of data collection.
5. Classify the following statements in to descriptive or inferential statistics:
a) The average age of students at our school is 22 years.
b) 20% of students in my summer Biology class are married.
c) It is expected that there will be 1200 car accidents in the next three months in Ethiopia.
d) Two thirds of all doctors interviewed smoke cigarettes.
e) A firm reported that the average life span of a product is estimated to be 8 months.
6. Are the following data nominal, ordinal, interval or ratio? Explain your answers.
a) Type of automobiles people drive. b) Students I.D card numbers.

c) Number of errors made on a production line. d) Time it took to run 42 kms.

e) Phone numbers. f) Temperature readings in Fahrenheit.

14

Prepared by Big Bang, August, 2017 GC


Introduction to Statistics STAT 281

g) Military rank. h) Birth place of students. i) Number of customers of a bank.

15

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS STAT 281

CHAPTER 2

METHODS OF DATA PRESENTATION

CONTENTS
2.1 FREQUENCY DISTRIBUTIONS 16

2.2. DIAGRAMMATIC AND GRAPHICAL PRESENTATION OF DATA 25

INTRODUCTION

In this chapter we will deal with the classification and presentation of data by using
frequency distribution and different types of graphs. Having collected and edited the
data, the next important step is to organize it. That is, to present it in a readily
comprehensible condensed form that aids to draw inferences from it. It is also necessary
that the like be separated from the unlike ones.

OBJECTIVES

At the end of this chapter, the student is expected to be able to:

 Explain the meaning of frequency and frequency distribution


 Construct both Grouped and Ungrouped frequency distributions
 Put raw data in to frequency distribution
 Compute class mark, class boundary, relative frequency and cumulative
frequencies
 Know the types of graphs and apply them in their appropriate places
 Draw histogram, frequency polygon, ogive, pie chart, bar chart and apply them
for appropriate data set.

16
2.1 FREQUENCY DISTRIBUTIONS
The presentation of data is broadly classified in to the following two categories:

• Tabular presentation
• Diagrammatic and Graphic presentation.
The process of arranging data in to classes or categories according to similarities technically
is called classification.

Classification is a preliminary and it prepares the ground for proper presentation of data.

Before seeing frequency distribution, we have to see some basic terms.

Raw data: recorded information in its originally collected form, whether it is count or
measurement, is referred to as raw data.

Array: is an arrangement of row data in to ascending or descending order of magnitude.

Frequency: is the number of times a value is repeated for the variable in the corresponding
data operations.

Frequency array:- is an array where the individual items or values of a variable are given
along with the corresponding frequencies.

Frequency distributions:- is a tabular summary of a set of data showing frequency of items


in each of the several non-overlapping classes or categories.

Types of frequency distributions

There are three basic types of frequency distributions:


 Categorical frequency distribution
 Ungrouped frequency distribution
 Grouped frequency distribution
17

Prepared by Big Bang, August, 2017 GC


There are two groups of frequency distributions: categorical or numerical.

1) Categorical frequency Distribution


Used for data that can be placed in specific categories such as nominal, or ordinal.

Example 2.1
A social worker collected the following data on marital status for 25 persons. (M=married,
S=single, W=widowed, D=divorced). Prepare a frequency distribution.

M S D W D
S S M M M
W D S M M
W D D S S
S W W D D

Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital
status (M, S, D, and W). These types will be used as class for the distribution. We follow the
following procedures to construct such a frequency distribution.
Step 1: Prepare a table as shown below.

Class Tally Frequency Percent


(1) (2) (3) (4)
M
S
D
W
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
18

Prepared by Big Bang, August, 2017 GC


Step 4: Find the percentages of values in each class by using:

%= , where f= frequency of the class, n=total number of values.

Percentages are not necessarily part of frequency distribution but they can be added since
they are used in certain types of diagrammatic representations such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps, one can construct the following frequency distribution.

Class Tally Frequency Percent


(1) (2) (3) (4)

M //// 5 20
S //// // 7 28
D //// // 7 28
W //// / 6 24

2) Numerical Frequency Distribution

In such frequency distributions, the data are classified according to numerical size. This is used to
summarize interval and ratio data. Numerical frequency distributions may be discrete (ungrouped ) or
continuous (grouped), depending on whether the variable is discrete or continuous.

Discrete (Ungrouped) frequency Distribution


 Is a table of all the potential raw score values that could possibly occur in the data
along with the number of times each value actually occurred.
 Such distribution is often constructed for small set or data on discrete variable.

To construct ungrouped frequency distribution, we need the following steps:


 First find the smallest and largest raw scores in the collected data.
 Arrange the data in order of magnitude and count the frequency.

19

Prepared by Big Bang, August, 2017 GC


 To facilitate counting, one may include a column of tallies as shown above.
Example 2.2
The following data represent the mark of 20 students. Construct ungrouped frequency
distribution.

80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown below.
Step 3: Tally the data.
Step 4: Count the frequency and record in the last column.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1

20

Prepared by Big Bang, August, 2017 GC


Grouped (Continuous) frequency Distribution
This is a frequency distribution when several numbers are grouped in one class.
When the range of the data is large, the data must be grouped in to classes that are more than
one unit in width.

Definition of some common terms


 Class limits: Separates one class in a grouped frequency distribution from another.
The limits could actually appear in the data and have gaps between the upper limits
of one class and lower limit of the next.
 Units of measurement (U): the distance between two possible consecutive
measures. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
 Class boundaries: Separates one class in a grouped frequency distribution from
another. The boundaries have one more decimal places than the row data and
therefore do not appear in the data. There is no gap between the upper boundary of
one class and lower boundary of the next class.
The lower class boundary is obtained by subtracting 0.5U from the corresponding
lower class limit and the upper class boundary is obtained by adding 0.5U to the
corresponding upper class limit.
 Class width: the difference between the upper and lower class boundaries of any
class. It is also the difference between the lower limits of any two consecutive classes
or the difference between any two consecutive class marks.
 Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
 Cumulative frequency: is the number of observations less than/more than or equal
to a specific value.
 Cumulative frequency above: it is the total frequency of all values greater than or
equal to the lower class boundary of a given class.
 Cumulative frequency below: it is the total frequency of all values less than or
equal to the upper class boundary of a given class.

21

Prepared by Big Bang, August, 2017 GC


 Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class
interval together with their corresponding cumulative frequencies. It can be more
than or less than type, depending on the type of cumulative frequency used.
 Relative frequency (rf): it is the frequency divided by the total frequency. This
gives the percent of values falling in that class.
 Relative cumulative frequency (rcf): it is the cumulative frequency divided by the
total frequency. Gives the percent of the values which are less than or more than the
upper class boundary.

Guidelines for classes

1. There should be between 5 and 20 classes.


2. The class width had better be an odd number. This will guarantee that the class
midpoints are integers instead of decimals.

3. The classes must be mutually exclusive. This means that no data value can fall into
two different classes.

4. The classes must be all inclusive or exhaustive. This means that all data values must
be included.

5. The classes must be continuous. There are no gaps in a frequency distribution.


Classes that have no values in them must be included (unless it's the first or last
classes which are dropped).

6. The classes must be equal in width. The exception here is the first or last class. It is
possible to have a "below ..." or "... and above" class. This is often used with ages.

Constructing a Grouped Frequency Distribution

1. Find the largest and smallest values.


2. Compute the Range (R) = Maximum - Minimum
22

Prepared by Big Bang, August, 2017 GC


3. Select the number of classes desired. This is usually between 5 and 20 or use
Sturges’ rule of thumb:

, where k is number of classes desired and n is total number of


observations.

4. Find the class width dividing the range by the number of classes and rounding up

. There are two things to watch out here. You must round up, not off.

Normally 3.2 would be rounded to 3, but in rounding up, it becomes 4. If the range
divided by the number of classes gives an integer value (no remainder), then you can
either add one to the number of classes or add one to the class width. Sometimes
you're locked into a certain number of classes because of the instructions.

5. Pick a suitable starting point less than or equal to the minimum value. The starting
point is called the lower limit of the first class. Continue to add the class width to this
lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the
upper limits.
7. Find the boundaries by subtracting 0.5U units from the lower limits and adding 0.5U
units on the upper limits. The boundaries are also half-way between the upper limit
of one class and the lower limit of the next class.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it
may not be necessary to find out the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies

Example 2.3

23

Prepared by Big Bang, August, 2017 GC


Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solution:
Step 1: Find the highest and the lowest value H=39, L=6.

Step 2: Find the range; R=H-L=39-6=33.

Step 3: Select the number of classes desired using Sturges’ formula:

k=1+3.32log (20) =5.32=6(rounding up).

Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)

Step 5: Select the starting point, let it be the minimum observation. Then,

6, 12, 18, 24, 30, 36 are the lower class limits.

Step 6: Find the upper class limit.

E.g. the first upper class=12-U=12-1=11. Then,

11, 17, 23, 29, 35, 41 are the upper class limits.

So, combining steps 5 and 6, one can construct the following classes:

Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41

Step 7: Find the class boundaries.

E.g. for the first class, lower class boundary=6-U/2=5.5,


24

Prepared by Big Bang, August, 2017 GC


Upper class boundary =11+U/2=11.5.

Then, continue adding W on both boundaries to obtain the rest boundaries. By doing, so one

can obtain the following class boundaries:

Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5

Step 8: Tally the data.

Step 9: Write the numeric values for the tallies in the frequency column.

Step 10: Find cumulative frequency.

Step 11: Find relative frequency or/and relative cumulative frequency.

The complete frequency distribution follows:

Class Class Class Tally Freq. Cf (less Cf (more rf. rcf (less
limit boundary Mark than type) than type) than type

6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10


12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00

25

Prepared by Big Bang, August, 2017 GC


26

Prepared by Big Bang, August, 2017 GC


2.2 DIAGRAMMATIC AND GRAPHIC PRESENTATION OF DATA.
These are techniques for presenting data in visual displays using diagrams and pictures.
Importance: -
• They have greater attraction.
• They facilitate comparison.
• They are easily understandable.
Diagrammatic presentation of data
-Diagrams are appropriate for presenting discrete as well as qualitative data.
-The three most commonly used diagrammatic presentations for discrete as well as
qualitative data are:
• Pie charts
• Pictogram
• Bar chart
Pie chart
A Pie Chart is a circular chart divided into sectors, illustrating relative magnitudes or
frequencies of classes of a given variable. Pie chart usually represents categorical data but it
is also possible to use it for discrete quantitative data. The angle of each sector has to be
proportional to the relative frequency of a given class. Angle of Sector=

* 100.

Example 2.4
Draw a suitable diagram to represent the following population in a town.

Men Women Girls Boys


2500 2000 4000 1500

Solution: Draw a pie-chart.


Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
27

Prepared by Big Bang, August, 2017 GC


Step 3: Using a protractor and compass, graph each section and write its name
corresponding percentage.

Class Frequency Percent Degree


Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54

15%
25%

Men
Women
Girls
Boys
40% 20%

Bar Charts

- A set of bars (thick lines or narrow rectangles) representing some magnitude over
time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being:
 Simple bar chart
 Deviation or two way bar chart
 Broken bar chart
 Component or sub divided bar chart.

28

Prepared by Big Bang, August, 2017 GC


 Multiple bar charts.
Simple Bar Chart
-Are used to display data on one variable.

-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.

Example 2.5

The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.

Product Sales($) Sales($) Sales($)


In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54

A Simple Bar chart for sale by product in year 1997 is:


Sales($) In 1957

30
24 24
25

20

15 12
10

0
A B C

Component Bar chart


-When there is a desire to show how a total (or aggregate) is divided in to its component
parts, we use component bar chart.

29

Prepared by Big Bang, August, 2017 GC


-The bars represent total value of a variable with each total broken in to its component parts
and different paints or designs are used for identifications.
Example 2.6
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solution:

Sales By product in 1957-1959

100

80
sales in $

product C
60
product B
40
product A
20

0
1957 1958 1959
Years of production

Multiple Bar charts

- These are used to display data on more than one variable.


- They are used for comparing different variables at the same time.

Example 2.7
Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
Solution:

30

Prepared by Big Bang, August, 2017 GC


Sales by Product in 1957-1959

60
50
Sales in $

40 product A
30 product B
20 product C
10
0
1957 1958 1959
Years of production

Broken Bar diagram


This chart is used to present data involving few extreme figures where it will be difficult to
accommodate the bars corresponding to those figures with in graph paper. In this case, we
use piece of bars with each piece starting a jump on the numerical data.

Activity 2.1
Draw a diagram presenting by product in 1958, assuming that there was a product D whose
sales in 1958 was $ 100000.

Graphical Presentation of data


The histogram, frequency polygon and cumulative frequency curve or Ogive are most
commonly applied graphical representations for continuous data.

Procedures for constructing statistical graphs


• Draw and label the X and Y axes.
• Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y
axes.
• Represent the class boundaries for the histogram or Ogive or the mid points for the
frequency polygon on the X axes.
• Plot the points.
• Draw the bars or lines to connect the points.
31

Prepared by Big Bang, August, 2017 GC


Histogram
This is a graph which displays the data by using vertical bars of various heights to represent
frequencies. Class boundaries are placed along the horizontal axes. Class marks and class
limits are some times used as quantity on the X axes. Unlike Bar graph, in the case of
Histogram, the categories (bars) must be adjacent.

Example 2.8

The following table summarizes the Biostatistics mid exam score of 38 students out of 35
marks.

If we want to draw Histogram for this data it would look like the following:

32

Prepared by Big Bang, August, 2017 GC


Histogram of Biostatistics marks in mid exam

Frequency Polygon
Frequency Polygon depicts a frequency distribution for discrete or continuous numeric data.
Frequency polygons are graphical device for understanding the shapes of distributions.

A Histogram can easily be changed to Frequency Polygon by joining the mid points of the
top of the adjacent rectangles of the Histogram with a line. It is also possible to draw
Frequency Polygon without drawing Histogram.

Example 2.9

The following frequency distribution represents the ages of 60 patients at Gambella hospital.
Represent the data by a frequency polygon.

33

Prepared by Big Bang, August, 2017 GC


Then we have to identify the mid points of each interval.

Finally we have to plot the midpoints (on the X axis) with respective to frequency of each
class (on the Y axis) and connect adjacent plots with a straight line.

Note that two artificial class marks at both ends with frequencies of zero have been
added to “tie down” the graph on the X-Axis.

Ogive (cumulative frequency polygon)

This is a graph showing the cumulative frequency (the less than or more than type) plotted
against upper or lower class boundaries, respectively. That is, class boundaries are plotted
along the horizontal axis and the corresponding cumulative frequencies are plotted along the
vertical axis. The points are then joined by a free hand curve.

34

Prepared by Big Bang, August, 2017 GC


There are two types of ogive

1. Less than ogive :- is a line graph obtained from less than cumulative frequency
plotted against upper boundaries of their respective class intervals
2. More than Ogive :- is a line graph obtained from more than cumulative frequency
plotted against the lower boundaries of their respective class intervals

35

Prepared by Big Bang, August, 2017 GC


Example 2.10

Draw both cumulative frequency curves for the following data.

Class Class
Limit F boundary LCB UCB
3-7 3 2.5-7.5 2.5 7.5
8-12 4 7.5-12.5 7.5 12.5
13-17 6 12.5-17.5 12.5 17.5
18-22 13 17.5-22.5 17.5 22.5
23-27 17 22.5-27.5 22.5 27.5
28-32 6 27.5-32.5 27.5 32.5
33-37 1 32.5-37.5 32.5 37.5

The less than Ogive curve:

36

Prepared by Big Bang, August, 2017 GC


50
ncy
40
freque
tive 30
cumula
20
than
Less 10

0
7.5 12.5 17.5 27.5 32.5 37.5

22.
5
Upper class boundary

The More than Ogive curve:

50
40
30
frequency 20
cumulative
More than 10
0
32.5 37.5 27.5 22.5 17.5 12.5 7.5
Lower class boundary

The less than and More than Ogive curves together:

37

Prepared by Big Bang, August, 2017 GC


50
40
frequency
cumulative 30
more than 20
Less than and
10
0
7.5 12.5 17.5 22.5 27.5 32.5 37.5

Class boundaries

38

Prepared by Big Bang, August, 2017 GC


CHAPTER SUMMARY

 Frequency is the number of times a value appears in a data set

 There are two types of frequency distribution: grouped and ungrouped frequency

distribution.

 Class mark, class boundary, cumulative frequency and relative frequency are some

of the most important quantities we compute for a given frequency distribution

 Histogram, frequency polygon, and ogive are usually drawn for quantitative data

 Pie chart is a circular chart that is used to display the percentage of the total number

of measurements falling in to different categories.

 Bar chars are usually used for count data. The different types of bar charts include

simple bar chart ,deviation bar chart, component bar chart and multiple bar chart.

 We have to know the types of graphs and apply them in their appropriate places.

39

Prepared by Big Bang, August, 2017 GC


Exercises on Chapter 2

1. Identify the type/s of classification to summarize results from:

a) Type of cars sold by a company.


b) Amount of deposit made by customers of a bank in a given day.
c) Amount of coffee exported from 5 regions in a country.
d) Number of cars sold to a certain company for three consecutive years.
2. The following data shows the high temperatures in 0C for 50 randomly selected days:

32 38 30 24 24 37 39 34 35 31

23 35 29 34 21 35 35 24 23 26

30 38 25 37 25 39 25 30 27 32

33 30 29 32 33 35 29 33 19 39

22 33 31 20 29 27 31 22 23 36

a) Construct a grouped frequency distribution with suitable number of classes.


b) Convert the distribution obtained in (a) in to a cumulative
i) less than and ii) More than distribution.
c) Construct a histogram, frequency polygon, and both ogives.
3. The following data shows the average yearly consumption of meat in kilograms
for 40 families.
12.6 17.8 19.9 19.0 10.4 20.6 13.2 22.5

14.0 15.6 19.1 20.4 20.6 18.6 18.0 15.9

13.7 14.9 18.7 18.4 20.1 24.2 19.3 13.9

11.7 16.7 15.3 18.3 17.4 23.4 22.0 17.9

21.7 18.9 14.4 9.9 16.0 16.8 10.8 16.2

40

Prepared by Big Bang, August, 2017 GC


a) Construct a continuous frequency distribution with suitable number of classes.
b) Construct the less than ogive.
c) Construct the relative frequency distribution.
4. A frequency distribution with 6 classes of equal size is constructed to present data
which has been recorded in integers. If the class midpoint of the 3rd class interval is 20
and the class width is 5, write down all the classes.

5. A company has 25 vehicles. The table below shows the summary of yearly fuel
consumption of the vehicles.
Fuel consumption
1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6 and above
in 000’s of liters
Number of vehicles 2 5 6 7 4 1

i) Give a) The lower class limit of the 3rd class. b) Class boundaries of the 2nd class.
c) Class midpoint of the 4th class. d) Width of the 1st class.
e) How many of the vehicles consumed: i) at least 1950 liters but not more than or equal
to 2950 liters? ii) Less than 3950 liters? iii) At most 4900 liters?
g) What percent of the vehicles consumed: i) At least 2950 liters?
ii) Less than 5950 liters? iii) More than 1950 liters?
6. The table below shows the weight distribution of 25 students in basket ball team.

Weight in kgs Number of students

Below 50.5 3

Below 55.5 10

Below 60.5 16

Below 65.5 20

Below 70.5 22

Below 75.5 25
41

Prepared by Big Bang, August, 2017 GC


a) Form the continuous frequency distribution type where the unit of measurement is 1.
ii) Determine the class limits and the class marks.

iii) How many of the students weigh more than 65.5 kgs? Between 55.5 - 70.5 kgs?

7) The following table shows the type of cars manufactured by a certain company during
1972-1975.
Years
Cars 1972 1973 1974 1975
Toyota 400 300 380 450
Nissan 260 340 350 390
Isuzu 330 310 445 470

Construct

a) A simple bar chart for the total number of cars manufactured.


b) Multiple bar charts.
c) Component bar chart.
d) Percentage component bar chart.

8) A recent study showed that a typical Ethiopian car owner incurs the following expenses,
on the average, when he leases a car for 3 years. Draw a pie chart to portray this data.
Expenditure item Amount ($)
Lease amount 4,500
Gasoline 1,350
Insurance 1,800
Maintenance 1,350

42

Prepared by Big Bang, August, 2017 GC


CHAPTER 3

MEASURES OF CENTERAL TENDENCY


CONTENTS

3.1. INTRODUCTION AND OBJECTIVES OF MEASURING CENTRAL

TENDENCY 39

3.2. THE SUMMATION NOTATION 41

3.3. PROPERTIES MEASURES OF CENTRAL TENDENCY 43

3.4. TYPES OF MEASURES OF CENTRAL TENDENCY 44

INTRODUCTION

In the previous chapter, you have been introduced to the classification and presentation of
data using graphical methods. Graphical methods are important for data analysis, how ever,
they are inappropriate for statistical inference, since it is difficult to derive the similarity of a
sample frequency and the corresponding population histogram. The two most common
numerical descriptive measures are measure of central tendency and measures of variability.
That is, we seek to describe the center of the distribution and also how the measurements
vary about the center of the distribution. So, this chapter introduces you to the methods used
to find the average or representative values in a given data set

Objectives:

At the end of this chapter the student is expected to be able to:

 Discuss the meanings and uses of the measure of central tendency

 Decide the appropriate measures of central tendency

 Compute and interpret the arithmetic mean, harmonic mean, geometric mean,
median, mode, Quartiles, Deciles, Percentiles and soon
43

Prepared by Big Bang, August, 2017 GC


3.1 INTRODUCTION AND OBJECTIVES OF MEASURING CENTRAL
TENDENCY

Measures of central tendency are measures of the location of the middle or the center of a
distribution. The definition of "middle" or "center" is purposely left somewhat vague so that
the term "central tendency" can refer to a wide variety of measures.
-The tendency statistical data to get concentrated at certain value is called central tendency.
And various methods that determine the actual value at which the data tend to concentrate
are called measure of central tendency. One of the most important objectives of statistical
analysis is to get one single value that describes the characteristics of the entire data. Such a
value is called the central value or average.
-When we want to make comparison between groups of numbers it is good to have a single
value that is considered to be a good representative of each group. This single value is called
the average of the group.
-Averages are also called measures of central tendency.
-An average which is representative is called typical average and an average which is not
representative and has only a theoretical value is called a descriptive average.

Objectives:
 To comprehend the data easily i.e. to condensed the mass of data in to one single
value.
 To facilitate comparison.
 To make further statistical analysis.

44

Prepared by Big Bang, August, 2017 GC


3.2 THE SUMMATION NOTATION
Let X1,X2 X3,…,XN be a number of measurements where N is the total number of observation
and Xi is ith observation.
Very often in statistics an algebraic expression of the form X 1+X2+X3+...+XN is used in a
formula to compute a statistic. It is tedious to write an expression like this very often, so
mathematicians have developed a shorthand notation to represent a sum of scores, called the
summation notation.

The symbol is mathematical shorthand for X1+X2+X3+...+XN

The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the
numbers."

Example 3.1

Suppose that the following were scores made on the first homework assignment for five
students in the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where N=5, the
summation could be written:

The "i=1" in the bottom of the summation notation tells where to begin the sequence of
summation. If the expression were written with "i=2", the summation would start with the
second number in the set.

The "N" in the upper part of the summation notation tells where to end the sequence of
summation. If there were only three scores then the summation and example would be:

45

Prepared by Big Bang, August, 2017 GC


Sometimes if the summation notation is used in an expression and the expression must be
written a number of times, as in a proof, then a shorthand notation for the shorthand notation
is employed. When the summation sign " " is used without additional notation, then "i=1"
and "N" are assumed

PROPERTIES OF SUMMATION

1. , Where k is any constant

2. , Where k is any constant

3. , where a and b are any constant

4.

5.

Example 3.2

46

Prepared by Big Bang, August, 2017 GC


Activity 3.1
Considering the following data determine
X Y
5 6
7 7
7 8
6 7
8 8

a) b) c) d) e)

f) g) h) g)

3.3 PROPERTIES OF MEASURES OF CENTRAL TENDENCY


The characteristics of a good measure of central tendency (Or a typical average) should
have the following properties:
 It should be defined rigidly which means that it should have a definite value.
 It should be based on all observation under investigation.
 It should be not be affected by extreme observations.
 It should be capable of further algebraic treatment.
 It should be as little as affected by fluctuations of sampling or it should be
stable with sampling.
 It should be ease to calculate and simple to understand.
 It should be unique and always exist.

Note: There is no measure satisfied all the above condition, we choose the one that satisfies
most of the properties!

47

Prepared by Big Bang, August, 2017 GC


48

Prepared by Big Bang, August, 2017 GC


3.4 TYPES OF MEASURES OF CENTRAL TENDENCY

There are several different measures of central tendency; each having its advantage and
disadvantage, including:
• The Mean
• The Median
• The Mode
The choice of these averages depends up on which one best fits the property under
discussion.

Mean: There are three types of mean which are suitable for a particular type of data. They
are:

a) Arithmetic mean or Average


b) Geometric mean
c) Harmonic mean
3.4.1 The Arithmetic Mean
It is divided in to two that is simple arithmetic mean and the weighted arithmetic mean
1) Simple Arithmetic Mean:
Different methods exist for grouped and ungrouped data. These are direct method and
indirect method.
1) Direct method
- The mean is defined as the sum of the magnitude of the items divided by the number of
items
The mean of X1, X2 ,X3 …Xn is denoted by A.M, and is given by:

When the data are arranged or given in the form of frequency distribution i.e. there
are k variate values such that a value has a frequency ( i=1,2,---,k) ,then the
Arithmetic mean will be

49

Prepared by Big Bang, August, 2017 GC


Where k is the number of classes and .

Arithmetic Mean for Grouped Data


If data are given in the shape of a continuous frequency distribution, then the arithmetic
mean is obtained as follows:

=the class mark of the ith class and fi = the frequency of the ith class

Example 3.3

Find the arithmetic mean for the following frequency distribution

Class interval Frequency Class mark


2-5 3 3.5
6-8 2 7
9-12 5 10.5
13-15 4 14

50

Prepared by Big Bang, August, 2017 GC


Activity 3.2

1) Daily cash earnings of 15 workers working in different industries are as follows:


11.63,8.22,12.56,12.14,29.23,18.23,11.49,11.30,17.00,9.16,8.64,27.56,8.23,19.77,12.81

Find the average daily earning of a worker?

2) The distribution of age at first marriage of 130 males was as given below

Age in years(X):18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29.

No. of males (f): 2, 1, 4, 8, 10, 12, 17, 19, 18,1 4, 13,12.

Compute the average age of males at first marriage?

3) Calculate the mean for the following age distribution.


Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6

2) Indirect Method

Coding of data: - a linear transformation of data may be regarded as coding. In


coding we shift the origin and change the scale.
The effect of coding on mean is given below.

1) If we subtract an arbitrary constant from each observation, the mean is also reduced
by the constant value
51

Prepared by Big Bang, August, 2017 GC


2) If we divided each observation of a set by arbitrary constant, the mean reduced as
many times the constant divisor.
Note: In case of addition or multiplication, the word ‘reduced’ should be replaced by
increased in the above statement.

The origin data are transformed using some assuming mean (working mean) denoted
by A and let xi denotes the original value, then

Show!

When the data are arranged or given in

the form of frequency distribution

For grouped data

Show!

Activity 3.3

1) Suppose that the deviation of the observation from the assumed mean of 7 are

1, -1, -2, -2, 0, -3, -2, 2, 0,-3

a) Find the true mean.


b) Find the original observation

52

Prepared by Big Bang, August, 2017 GC


2) Find the mean of the marks obtained by 51 students with A=48.5 and w=10 of
xi 28.5, 38.5, 48.5, 58.5, 68.5

fi 4, 12, 15, 13, 7

Special properties of Arithmetic mean


1. The sum of the deviations of a set of items from their mean is always zero. i.e.

2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.

, for any constant A.

3. If is the mean of observations , and is the mean of observations , etc, and


is the mean of observations , then the mean of all the observation in all groups,
often called the combined mean, is given by:

4. If a wrong figure has been used when calculating the mean the correct mean can be
obtained with out repeating the whole process using:

Where n is total number of observations.

5. The effect of transforming original series on the mean.


a) If a constant k is added/ subtracted to/from every observation then the new mean
will be the old mean± k respectively i.e. .
b) If every observations are multiplied by a constant k then the new mean will be
k*old mean i.e. .

53

Prepared by Big Bang, August, 2017 GC


2) Weighted Mean
When a proper importance is desired to be given to different data, a weighted mean is
appropriate.
Weights are assigned to each item in proportion to its relative importance.
Let X1, X2, …, Xn be the value of items of a series and W1, W2, …, Wn their corresponding
weights , then the weighted mean, denoted , is defined as:

Example 3.4

In 2002/03, the average salaries of elementary school teachers in three cities were Birr 24,
000, 20,000, and 30,000. If there were 600,400 & 800 elementary school teachers, find the
weighted average salary of all the elementary school teachers in the three cities.

Solution.

Activity 3.4:
a) A student obtained the following percentage in an examination: English 60, Biology 75,
Mathematics 63, Physics 59, and chemistry 55. Find the student’s weighted arithmetic
mean if weights 1, 2, 1, 3, 3, respectively, are allotted to the subjects.

54

Prepared by Big Bang, August, 2017 GC


b) A teacher allots weights 2 to homework, 3 to mid exam and 5 to final exam. If a student
scores 90, 50, and 60 for HW, ME and FE, respectively, what is his/her average
academic performance?

Merits and Demerits of Arithmetic Mean

Merits:
• It is rigidly defined.
• It is based on all observations.
• It is suitable for further mathematical treatment.
• It is a stable average, i.e. it is not affected by fluctuations of sampling to some extent.
• It is easy to calculate and simple to understand.
Demerits:
• It is affected by extreme observations.
• It can not be used in the case of open end classes.
• It can not be determined by the method of inspection.
• It can not be used when dealing with qualitative characteristics, such as intelligence,
honesty, beauty.
• It can be a number which does not exist in a series of data.
• Some times it leads to wrong conclusion if the details of the data from which it is
obtained are not available.
• It gives high weight to high extreme values and less weight to low extreme values.

3.4.2 Geometric Mean (G.M)


Here it is the particular type of data for which the Geometric mean is of importance because
it gives a good mean value. If the vitiate values are measured as ratios, proportions or
percentages, geometric mean gives a better measure of central tendency than other means.

The G.M of N variate values is the Nth root of their product.

55

Prepared by Big Bang, August, 2017 GC


Like arithmetic mean, it also depends on all observations. It is affected by the extreme values
but not to the extent of the mean. However, there is one great drawback with it, that it can
not be calculated if any one or more values are zero or negative.
Suppose that X1, X2, ---, XN are N variate values, then their G.M is given as,

In case X1,X2 . . . , XK have the corresponding frequencies f1,f2, . . ., fk, then

Where N=

Example 3.5
Calculate the geometric mean for the following.

2, 3, 4, 6

In case of grouped data, mid-values of the class intervals are considered as Xi.
For logarithmic values of X’s, it becomes the average of logX i values and the formula for the
Geometric mean is:

for i=1,2,. . . ,N.

In case of frequency distribution where each of Xi occurs fi times (i=1,2,. . .,k), we have:

Where for i=1, 2, . . ., k, Then taking antilog of both

sides, we obtain G.M.

Geometric mean for the second purpose is given

Where n is the length of the period

56

Prepared by Big Bang, August, 2017 GC


Example 3.6

The population of a country in 1980 was 2 million and in 1990 it was 22 million. What was
the average annual increase during this period?

Here n = 11 years, and

Note: The geometric mean is less affected by extreme values than the arithmetic mean and is
useful as a measure of central tendency for some positively skewed distributions.

3.4.3 Harmonic Mean (H.M)

The H.M is the inverse of the arithmetic mean of the reciprocals of the observations of a
set. It is a suitable measure of central tendency when the data pertains to speed, rates, and
time.

Let X1, X2,. . ., XN be N variate values in a set; then the harmonic mean is given by:

, for i=1, 2, …, k.

Example 3.7

Find the harmonic mean of the following data: 2, 1, 4, 3.

Example 3.8

57

Prepared by Big Bang, August, 2017 GC


If a car driver covered the first 10 km at a speed of 40km/h and the next 10km at a speed of 60km/h.
What is the average speed of the car driver to cover the 20km

Average Speed

If the data are arranged in the for of a frequency distribution in which an observation X i has
frequency fi (i=1, 2, . . .,k), the harmonic mean is given by,

Where for i=1, 2, …, k.

It fulfils almost all properties of a good measure of central tendency, except when any
observation is zero, it can not be calculated. Its main advantage is that it gives more
weightage to small values and less weightage to large values.

Relationship between AM, GM and HM


1 2

Given two values x and x , there is a relation ship that exist between HM,GM and AM.

This relation ship exists in two cases

So that G.M = HM=AM.

58

Prepared by Big Bang, August, 2017 GC


Combining the two relation ship, we find out that:

Another relation ship:

59

Prepared by Big Bang, August, 2017 GC


3.4.4 The Mode
The mode is a value which occurs most frequently in a set of values, and which occurs more
than once.
- The mode may not exist and even if it does exist, it may not be unique.
- In case of discrete distribution, the value having the maximum frequency is the modal
value.
- If in a set of observed values, all values occur once or equal number of times, then, there is
no mode.

Example 3.9
a) Find the mode of 5, 3, 5, 8, 9
Mode =5
b) Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal Data: 8 and 9
c) Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
The mode of a set of numbers X1, X2, …, Xn is usually denoted by .

Mode for Grouped data

If data are given in the shape of continuous frequency distribution, the mode is defined as:

Where: = the mode of the distribution


Lmod= the lower class boundary of the modal class

fmo= frequency of the modal class

60

Prepared by Big Bang, August, 2017 GC


f1= frequency of the class preceding the modal class

f2= frequency of the class succeeding the modal class

W=the size of the modal class

Note: The modal class is a class with the highest frequency

Example 3.10

Find the mode for the frequency distribution given by below.

Class interval Frequency


3-6 4
6-9 8
9-12 10
12-15 3

Activity 3.5
The following is the distribution of the size of certain farms selected at random from a
district. Calculate the mode of the distribution.

61

Prepared by Big Bang, August, 2017 GC


Size of farms No. of farms
5- 15 8
15- 25 12
25- 35 17
35- 45 29
45- 55 31
55- 65 5
65- 75 3

62

Prepared by Big Bang, August, 2017 GC


Merits and Demerits of Mode
Merits:
• It is not affected by extreme observations.
• Easy to calculate and simple to understand.
• It can be calculated for distribution with open end class.
 Can be used for qualitative data as well.
Demerits:
• It is not rigidly defined.
• It is not based on all observations
• It is not suitable for further mathematical treatment.
• It is not stable average, i.e. it is affected by fluctuations of sampling to some extent.
• Often its value is not unique.

3.4.5 The Median and other quantiles (quartiles, deciles, percentiles)

In a distribution, median is the value of the variable which divides the data in to two equal
halves.
In an ordered series of data, the median is an observation lying exactly in the middle of the
series. It is the middle most value in the sense that the number of values less than the median
is equal to the number of values greater than it.
Let X1, X2, …, Xn be the observations, then the numbers arranged in ascending order will be
X[1], X[2], …X[n], where X[i] is ith smallest value.
Here, we find that X[1]< X[2]< …<X[n]
Median is denoted by .

63

Prepared by Big Bang, August, 2017 GC


Median for ungrouped data

Example 3.11
Find the median of the following data.

a) 3,8,4,7,7,5,6,8,7,4,6,8,9,7,6

Arrange the given data in either increasing or decreasing order:

3,4,4,5,6,6,7,7,7,7,8,8,8,9

Median = 7

b) 3,4,4,5,6,6,6,7,7,7,7,8,8,8

Median=

Activity 3.6
a) Actual waiting time for the first job on the selected sample of nine people having different
field of specializations was given below.
Waiting time (in month):11.6, 11.3, 10.7, 18.0, 3.3, 9.2, 8.3, 3.8, 6.8
Calculate the median of the waiting time?
b) The export of agricultural products in million dollars from a country during eight quarters
in 1974 and 1975 was, 29.7, 16.6, 2.3, 14.1, 36.6, 18.7, 3.5, 21.3.
Find the median of the given set of values?

Median for grouped data.


If data are given in the shape of continuous frequency distribution, the median is defined as:

Where: =lower class boundary of the median class.

64

Prepared by Big Bang, August, 2017 GC


W=the size of the median class.
n=total number of observation.
Note: The median class is the class with the smallest cumulative frequency (less than type)
greater than or equal to n/2.

Example 3.12

Find the median wage of the following distribution

Wages(in Rs) 2000-3000 3000n-4000 4000-5000 5000-6000 6000-7000


No.of workers 3 5 20 10 5

Solution:

Wages(in Rs) No.of workers Cf


2000-3000 3 3
3000-4000 5 8
4000-5000 20 28
5000-6000 10 38
6000-3000 5 43

Here N/2 =43/2=21.5.


So, the first cf ≥ 21.5 is 28 and the corresponding class is 4,000-5,000, so the median class is
4,000-5,000, and

65

Prepared by Big Bang, August, 2017 GC


Activity 3.7
Find the median of the following distribution.

Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3

Merits and Demerits of Median


Merits:
• Median is a positional average and hence not influenced by extreme observations.
• Can be calculated in the case of open end intervals.
• Median can be located even if the data are incomplete.
Demerits:
• It is not a good representative of data if the number of items is small.
• It is not amenable to further algebraic treatment.
• It is susceptible to sampling fluctuation (likely to be affected by sampling fluctuation).

Empirical relationship between

, for symmetrical distribution

, for unimodal skewed or asymmetrical frequency distribution

QUARTILES, DECILES AND PERCENTILES

Quartiles

66

Prepared by Big Bang, August, 2017 GC


Quartiles are values, which divide the ordered data in to 4 equal parts. Hence there are three
quartiles. ,

 The first quartile Q1 is the value that is the first quarter of the given ordered data.
 The second quartile Q2 is the value that divides the given ordered data in to two
equal parts
 The third quartile Q3 is the value that is the third quarter of the given ordered data

Quartiles are the measurements that divide the series in to 4 equal parts. The median is
the 2nd quartile. The first quartile (Q1) is the value of the item, which divides the lower
half of the distribution into two equal parts. The third quartile (Q3) is the value or the
item that divided the upper half of the distribution in to two equal parts. That is it is the

value of the the item in the series

For raw (ungrouped) data, first arrange the n observations in increasing order of
magnitude. Then the ith quartile is given by

Value of the ordered data

In dividing i(n+1) by 4, there may be a remainder r ,let q be the quotient and r be the
remainder of the division then

Example 3.13

Find the first, the second and third quartile for the following data. (exam result 10%) of
15 students 4,8,9,7,6,6,6,7,7,8,8,8,9,9,

67

Prepared by Big Bang, August, 2017 GC


Example 3.14

The following are yields of barley from 14 plots

30,32,35,38,40,42,48,49,52,55,58,60,62, and 65 . Find the 1st and 3rd quartiles.

The ith quartile for grouped frequency distribution is given by

Where Qi- is quartile

Lqi = The lower class boundary of the class in which the ith quartile is located

68

Prepared by Big Bang, August, 2017 GC


Fpqi- is the cumulative frequency of the class immediately preceding the class containing Qi

fqi- the frequency of the class containing Qi

W- width of the class containing Qi

N = Sample size

Example 3.15

Calculate three quartiles for the following data

Marks No. of Students(f) Less than cf


0-10 6 6
10-20 5 11
20-30 5 19
30-40 15 34
40-50 7 41
50-60 6 47
60-70 3 50
Total 50

c.f > 12.75 is 19, so the corresponding class containing Q1 is 20-30

C.F >25 is 34, so the Corresponding class containing

69

Prepared by Big Bang, August, 2017 GC


c.f> 37.25 is 41 ., the Corresponding class containing

Example 3.16

Find the 1st and 3rd quartiles for the following data

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70


No. of students 6 5 8 15 7 6 3
< c.f 6 11 19 34 41 47 50

So the corresponding class containing Q1 is 20-30

is 41, so the corresponding class containing Q3 is 40-50

DECILES

The values that divide the data set in to ten equal parts are called deciles. They are denoted
by D1, D2,…, D9 respectively

70

Prepared by Big Bang, August, 2017 GC


For row (ungrouped) data, first arrange the n observations in increasing order of magnitude.
Then the ith deciles is given by

In dividing i(n+1) by 10, there may be a remainder r ,let q be the quotient and r be the
remainder of the division then

The ith deciles for grouped frequency distribution is given by

Where Di = the ith decile


th
- Ldithe lower class boundary of the class in which the i decile is located
- Fpdi is the cumulative frequency of the class immediately preceding the class containing Di

- fdi is he frequency of the class containing Di


- w- width of the class containing Di
Sample size

PERCENTILES
The values that divide our data set in to hundred equal parts are called percentiles. They are
denoted by p1,p2,…, p99
For raw (ungrouped) data, first arrange then observation in increasing order of magnitude

71

Prepared by Big Bang, August, 2017 GC


In dividing by 100, there may be remainder, let q be the quotient and be the
remainder of the division

The ith percentile for grouped frequency distribution given by

where pi- the ith percentile

The lower class boundary of the class in which the ith percentile is located

Is the cumulative frequency of the class immediately preceding the class containing pi

The frequency of the class containing,


Width of the class containing pi
Sample size

Example 3.17

Calculate i) 7th decile, and ii) 90 th percentile from the following table.

Monthly per No of families C.f


capital exp. Classes
140-150 17 17
150-160 29 46
160-170 42 88
170-180 72 160
180-190 84 244
190-200 107 351
200-210 49 400
210-220 34 434
220-230 31 465
230-240 16 481
240-250 12 493
72

Prepared by Big Bang, August, 2017 GC


Solution:

The number 345.8 is contained in the minimum cumulative frequency 351, hence the class
190-200 is the 7th decile class

Then 199.5.

ii) .

The number 444.60 is contained in the minimum cu.fr.465 hence, the 90 th percentile class is
220-230 . So that, we have:

73

Prepared by Big Bang, August, 2017 GC


CHAPTER SUMMARY

 Measures of central tendency are those statistical methods used to find the
values used to represent sets of values in a data set
 Arithmetic mean is the sum of all the values in data set divided by the total number of
observations
 Median is the middle value after the observation are arranged in the order of their
magnitude
 Mode is the value that occurs with the highest frequency in a dataset
 Harmonic mean is the reciprocal of the numbers
 Geometric mean is the nth root of the product of n numbers
 Quartiles are the values that divide a given data set in to four equal parts
Deciles are the values that divide a given data set in to ten equal parts
 Percentiles are the values that divide a given data set in to hundred equal parts
 Different measures of control Tendency have different properties and applications we
are there fore, required to apply them in their appropriate places

74

Prepared by Big Bang, August, 2017 GC


Exercises on Chapter 3

1. The arithmetic mean of two numbers is 13 and their geometric mean is 12.

Find a) the numbers b) the HM.

2. The following table shows the distribution of marks of 100 students in a certain exam
out of 50. The median and mode are given to be 25 and 24 respectively. Calculate the
missing frequencies and then arithmetic mean of the data..

Marks 0-10 10-20 20-30 30-40 40-50

Number of students 14 ? 27 ? 15

3. The mean weight of 150 students in a certain class is 60 kg. The mean weight of boys
is 70 kg and that of girls is 55 kg. Find the number of boys and girls in the class.

4. The ratios of teachers to students in four colleges are 1:8, 2: 15, 1:10 and 2:21. Find
the average ratio of teachers to students.

5. An entrance exam for a job consists of three subjects, English, Mathematics and Office
management having 20%, 30% and 50% respectively. Find the average score of a
candidate who got 70%, 60% and 50%, respectively in the three exams. Find the
average ratio of teachers to students.

6. In a class there are 30 females and 70 males. If females averaged 60 in an examination


and boys averaged 72, find the mean for the entire class?

7. An average weight of 10 students was calculated to be 65.Latter it was discovered that


one weight was misread as 40 instead of 80 k.g. calculate the correct average weight?

75

Prepared by Big Bang, August, 2017 GC


8. The mean of n Tetracycline Capsules X 1, X2, …,Xn are known to be 12 gm. New set
of capsules of another drug are obtained by the linear transformation Y i = 2Xi – 0.5
( i = 1, 2, …, n ) then what will be the mean of the new set of capsules?
9. The mean of a set of numbers is 500.
a. If 10 is added to each of the numbers in the set, then what will be
the mean of the new set?
b. If each of the numbers in the set are multiplied by -5, then what will
be the mean of the new set?
10. Ato Ayele spent Birr 100 on each of the following two items: A and B. If the prices
of the items are Birr 30 and Birr 10 per kg respectively, find the average price of the
items per kg.

11. In a surveying class there are 10 freshman, 6 second year and 12 third year students.
If the freshman averaged 70 in an examination, the second years averaged 75 and the
third years averaged 85. Find the mean grade for the entire class.

12. The profit of a company increased by 25% during the year 1992, increased by 40%
during the year 1993, decreased by 20% in the year 1994 and increased by 10%
during the year 1995. Find the average growth in the profit level over the four year
periods.

13. In a 400- meter athletic competition a participant covers the distance as given below.
Find the average speed.

Speed (Meters per second)

First 80 meters 10

Next 240 meters 7.5

Last 80 meters 10

76

Prepared by Big Bang, August, 2017 GC


14. Following is the distribution of marks obtained by 500 students in statistics.

Marks more than 0 10 20 30 40 50

Number of students 500 460 400 200 100 30

a) Calculate the most suitable average


b) Obtain Q 1, Q3, D2, D7, P28, P40 and P80 and interpret the results.
15. Suppose the price of an item in a certain shop is presented as shown below:

Price Number of items


10-19 27
21-29 A
31-39 28
41-49 B
51-59 19
Total N
If 75% of the items is less than or equal to 45 and most of the items have a price of 34,
and then find the missing frequencies.

16. The marks secured out of 100 by a group of students in a school are

given below.

Marks Number of students


Below10 15
Below 20 35
Below 30 60
Below 40 84
Below 50 106
Below 60 120
Below 70 125
Determine the median and modal marks and interpret the results.

77

Prepared by Big Bang, August, 2017 GC


CHAPTER 4
Measures of Dispersion (Variation)

CONTENTS
4.1 INTRODUCTION AND OBJECTIVES OF MEASURING
VARIATION 73
4.2 ABSOLUTE AND RELATIVE MEASURES 74
4.3 TYPES OF MEASURES OF VARIATION 74
4.4 MOMENTS SKEWNESS AND KURTOSIS 86

INTRODUCTION
In our society, people usually elect their representative that conveys the interest of most of
them. But sometimes the representative may convey the interests that deviate from the
interests of some of the members. That is, the question is “how well their representative
represents them?” Similarly, in statistics, we may seek to know how well an average
represents the whole set of data.

Objectives

At the end of this chapter the student is expected to be able

 Explain the meaning and uses of the measures of dispersion


 Decide the appropriate measures of dispersion for a purpose and
 Compute range, quartile deviation, Inter quartile range, mean deviation and
decide which one is best measure of dispersion
 Compute and interpret the variance and standard deviation

78

Prepared by Big Bang, August, 2017 GC


 Compute coefficient of variation, standard sore and interpret the results of the
above and other measures of dispersion

4.1 INTRODUCTION AND OBJECTIVES OF MEASURING


VARIATION

Measure of central tendency alone does not adequately describe a set of observation unless
all observations are the same. So we need some additional information like
1) The extent to which the items in a particular distribution are scatters around the central
tendency i.e. measure of dispersion.
2) The direction of scatteredness whether more items are attached towards higher or lower
values i.e. measure of skewness.
3) The extent to which the distribution is more peaked or more flat toped than the normal
distribution i.e. measure of kurtosis.

Measure of dispersion
The scatter or spread of items of a distribution is known as dispersion or variation. In other
words the degree to which numerical data tend to spread about an average value is called
dispersion or variation of the data.
Measures of dispersions are statistical measures which provide ways of measuring the extent
in which data are dispersed or spread out.

Objectives of measuring Variation:


• To judge the reliability of measures of central tendency
• To control variability itself.
• To compare two or more groups of numbers in terms of their variability.
• To make further statistical analysis.

Desirable properties of measure of dispersion

79

Prepared by Big Bang, August, 2017 GC


The desirable properties for statistical average also apply to a good measure of
dispersion.

80

Prepared by Big Bang, August, 2017 GC


4.2 ABSOLUTE AND RELATIVE MEASURES

The measures of dispersion which are expressed in terms of the original unit of a series are
termed as absolute measures. Such measures are not suitable for comparing the variability of
two distributions which are expressed in different units of measurement and different
average size.
Relative measures of dispersions are a ratio or percentage of a measure of absolute
dispersion to an appropriate measure of central tendency and are thus pure numbers
independent of the units of measurement. For comparing the variability of two distributions
(even if they are measured in the same unit), we compute the relative measure of dispersion
instead of absolute measures of dispersion.

4.3 TYPES OF MEASURES OF VARIATION

Various measures of dispersions are in use. The most commonly used measures of
dispersions are:

Absolute measure Relative measures

Range Relative range


Quartile deviation Coefficient of quartile deviation
Mean deviation Coefficient of mean deviation
Variance Coefficient of variation
Standard deviation Standard scores

4.3.1 The Range (R) and Relative Range (RR)

The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the
range of scores. Because the range is greatly affected by extreme scores, it may give a distorted

81

Prepared by Big Bang, August, 2017 GC


picture of the scores. The following two distributions have the same range, 13, yet appear to
differ greatly in the amount of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.

For ungrouped data: R= where is the maximum observation and is the


minimum observation.

For grouped data: where is the last upper class limit and
is the first lower class limit.

Merits and Demerits of range

Merits:
• It is rigidly defined.
• It is easy to calculate and simple to understand.
Demerits:
• It is not based on all observation.
• It is highly affected by extreme observations.
• It is affected by fluctuation in sampling.
• It is not liable to further algebraic treatment.
• It can not be computed in the case of open end distribution.
• It is very sensitive to the size of the sample.

Relative Range (RR)


It is also some times called coefficient of range and given by:

For ungrouped data:

82

Prepared by Big Bang, August, 2017 GC


For grouped data:

Activity 4.1

1) Find the R and RR and then identify which data is more dispersed?

a) For the month income of 10 workers X i: 347, 420, 500,600,696,710, 835, 850, and
900.
b) For the following age distribution.
Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6

2. If the range and relative range of a series are 4 and 0.25 respectively. Then what is the value
of:
a) Smallest observation
b) Largest observation

4.3.2 The Quartile Deviation (QD)& Coefficient of Quartile Deviation (COQD)

IQR is the difference between the upper quartile (Q 3) and lower quartile (Q1) of a given
group. It is a measure of dispersion when the data contains extreme values. It is also a good
measure of dispersion for the distribution having open ended class

Example 4.1

If Q1=12 and Q2=45, then IQR= 45-12=33.

83

Prepared by Big Bang, August, 2017 GC


Quartile deviation and coefficient of Q.D.

Quartile Deviation is half of the IQR, i.e.


Quartile Deviation= .
It is an absolute measure of dispersion.
To compare the variability of two series, a relative measure known as Coefficient of quartile
deviations is given which is symbolically expressed as:

4.3.3 The Mean (Average) Deviation and Coefficient of Mean Deviation

If xi/fi, i=1, 2, …, n is the frequency distribution then mean deviation from the
mean is given by

Mean deviation from mean

Where represents modulus or the absolute value of the deviation , where the
negative sign is ignored.
Mean deviation from median

Since mean deviation is based on all the observations it is a better measure of dispersion than
range or quartile deviation

Example 4.2

Calculate i) Quartile deviation (Q.D), and ii) mean deviation (M.D) from mean and from
median, for the following data:
84

Prepared by Big Bang, August, 2017 GC


Marks: 0-10 10-20 20-30 30-40 40-50 50-60 60-70

Freq. 6 5 8 15 7 6 3

Solution:

Marks MV(x) f (x-md) d= x-35 f(x-md) fd

0-10 5 6 29 -3 174 -18 28.4 170-4

10-20 15 5 19 -2 95 -10 18.4 92

20-30 25 8 9 -1 72 -8 8.4 67.2

30-40 35 15 1 0 15 0 1.6 24.0

40-50 45 7 11 1 77 7 11.6 81.2

50-60 55 6 21 2 126 12 21.6 129-6

60-70 65 3 31 3 93 9 31.6 94.6

652 -8 659.2

i) Here N=50 ,

ii) M.D (from mean) =

Mean, marks

M.D (from mean) =

Median =

85

Prepared by Big Bang, August, 2017 GC


4.3.4 The Variance, the Standard deviation and the coefficient of Variation

Population Variance

If we divide the variation by the number of values in the population, we get


something called the population variance. This variance is the "average squared
deviation from the mean".
For ungrouped data:

Population variance=

For grouped data:

Population variance=

Sample Variance
One would expect the sample variance to simply be the population variance with the
population mean replaced by the sample mean. However, one of the major uses of statistics
is to estimate the corresponding parameter. This formula has the problem that the estimated

86

Prepared by Big Bang, August, 2017 GC


value isn't the same as the parameter. To counteract this, the sum of the squares of the
deviations is divided by one less than the sample size.

For ungrouped data:

Sample variance=

For grouped data:

Sample variance=

We usually use the following computational formula.

Properties of the Variance

1) The variance has mostly removed the lacunae which are present in measures of
dispersion given before it.
2) The main demerit of variance is that its unit is square of the unit of measurement of
variate values. Generally this value is large and makes it difficult to decide about the
magnitude variation.

87

Prepared by Big Bang, August, 2017 GC


3) The variances give more weight to the extreme values as compared to those which are
the mean value, because the difference is squared in variance.
4) If the wrong number has been used in calculating the variance and if n,

are known we can correct this. We can use the following formula:

Let

5) If a sample of elements has a variance and a sample of elements have a variance of ,

then the combined variance (is called pooled variance) is given by:

6) If the variance of observation is , then variance of

a) is also . Where k is a constant number.

b) is also .

c) is

Standard Deviation

There is a problem with variances. Recall that the deviations were squared. That means that
the units were also squared. To get the units back the same as the original data values, the
square root must be taken.

with class frequency

88

Prepared by Big Bang, August, 2017 GC


for grouped frequency distribution.

Population standard deviation=


Sample standard deviation =
The following steps are used to calculate the sample variance:
1. Find the arithmetic mean.
2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the number of
observations minus one, i.e., n-1 (where n is equal to the number of observations in the data
set).
6. Take the squired root to get the standard deviation.
Remark: If the standard deviation of a set of data is small the values are more concentrated
around the mean and if the standard deviation is large, the value is more scattered widely
around the mean.

Properties of standard deviation

1) It is considered to be the best measure of dispersion and is used widely.


2) There is however one difficulty with it. If the unit of measure of variables of two
series is not same, and then the variability can not be compared by comparing the
values of standard deviation.
3) If the standard deviation of observation is , then the standard
deviation of
a) is also . Where k is a constant number.

b) is also .

89

Prepared by Big Bang, August, 2017 GC


c) is

90

Prepared by Big Bang, August, 2017 GC


Example 4.3
Find the variance and standard deviation of the following sample data 5, 17, 12, 10.
Solution:
=11

Xi 5 10 12 17 Total
(Xi- )2 36 1 1 36 74

S2=74/3=24.67 => S= √S2=√24.67 = 4.97

Activity 4.2
i) The data is given in the form of frequency distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
ii) The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a) If 10 is added to each of the numbers in the set, then what will be the variance and
standard deviation of the new set?
b) If each of the numbers in the set are multiplied by -5, then what will be the
variance and standard deviation of the new set?

The Coefficient of Variation (C.V)

Hence in situations where either the two series have different units of measurements, or their
means differ sufficiently in size, the coefficient of variation should be used as a measure of
dispersion.
91

Prepared by Big Bang, August, 2017 GC


• Is defined as the ratio of standard deviation to the mean usually expressed as percents.

Properties of C.V
1) It is one of the most widely used measures of dispersion because of its virtues.
2) Smaller the value of C.V, more consistent is the data and vice versa.
3) For fixed experiments, C.V is generally reported. If C.V is low it indicates more
reliability of experimental findings.

Example 4.4

Consider the distribution of yields (per plot) of two paddy varieties. For the first variety, the
mean and S.d are 60 kg and 10 kg respectively. For the second variety the mean and S.d are
50kg and 9kg respectively

This shows that the variability in first variety is less as compared to that in the second variety

Activity 4.3
Two distribution A& B have mean 80 inch and 20 kg and s. deviation is 10 inch and 1.5 kg
respectively. Which distribution has greater variability?

Chebyshev's Theorem
 Is, developed by Russian Mathematician Chebyshev, Specifies the proportions of the
spread in terms of the standard deviation.
 For any set of data (population or sample) and any constant k(greater than one) the
proportion of the data that must lie with k standard deviations on either sides of the mean

92

Prepared by Big Bang, August, 2017 GC


( is at least that is the proportion of items falling beyond k standard

deviations from the mean is at most

The Empirical or Normal Rule:


Chebyshev’s theorem applies to any distribution regardless of its shape. However, when a
distribution is bell-shaped (or what is called normal), the following statements, which make up
the empirical rule, are true.
• Approximately 68% of the data values fall within one standard deviation of the mean i.e.
with in ( .
• Approximately 95% of the data values fall within two standard deviations of the mean i.e.
with in (
• Approximately 99.7% of the data values fall within three standard deviations of the mean
i.e. with in .

4.3.5 The Standard Scores (Z-scores)

• If X is a measurement from a distribution with mean and standard deviation S,


then its value in standard units is

• Z gives the deviations from the mean in units of standard deviation and it tell us
how many S.D a given value lie above or below the mean.
• It also helps in hypothesis testing
• It is used to compare two observations coming from different groups.

93

Prepared by Big Bang, August, 2017 GC


Example 4.5

Two groups of children were trained to perform a certain task for a month and then tested to
find out which group is faster to learn the task. The average time taken to perform the task
was 10-4 minutes with s.d of 1.2 min &11.9 min with a s.d. of 1.3 min for the 2 nd
group .Child A form group 1 took 9.2 min. while child B from group 2 took 9.3 min, who
was faster in performing the task relative to the other

Group I Group II

Mean = 10.4 Mean = 11.9

S. d = 1.2 S. d = 1.3

XA = 9.2 XB = 9.3

These values indicate that the time taken, by child A is one S.d below the average time taken
by the group. The time taken by child B is two S.d below the mean time taken by his/her
group, child B is therefore, faster in performing the task relative to the other.

4.4 MOMENTS, SKEWNESS AND KURTOSIS

Moments are used to measure skew ness and kurtosis.


If X is a variable that assume the values X1, X2,…..,Xn then
1. The rth moment is defined as:

r=1, 2.3,…

- If r=1, it is the simple arithmetic mean, this is called the first moment.
94

Prepared by Big Bang, August, 2017 GC


2. The rth moment about the mean (the rth central moment)
- Denoted by Mr and defined as:

For r = 1, 2, …

If r=2, it is population variance, this is called the second central moment. If we assume n-
1≈n ,it is also the sample variance.
3. The rth moment about any number A
- denoted by and defined as:

r=1, 2, …

Remarks: 1) 2) 3)

Activity 4.4
1. Find the first two moments for the following set of numbers 2, 3, 7
2. Find the first three central moments of the numbers in problem 1
3. Find the third moment about the number 3 of the numbers in problem 1.

Skew ness
- Skewness is the degree of asymmetry or departure from symmetry of a distribution.
- A skewed frequency distribution is one that is not symmetrical.
- Skewness is concerned with the shape of the curve not size.

95

Prepared by Big Bang, August, 2017 GC


- If the frequency curve (smoothed frequency polygon) of a distribution has a longer tail to
the right of the central maximum than to the left, the distribution is said to be skewed to
the right or said to have positive skewness. If it has a longer tail to the left of the central
maximum than to the right, it is said to be skewed to the left or said to have negative
skewness.
- For moderately skewed distribution, the following relation holds among the three
commonly used measures of central tendency.
Mean-Mode=3*(Mean-Median)

Measures of Skewness
It is the measure of the direction and degree of asymmetry.
- Denoted by
- There are various measures of skewness.
1. The Pearsonian coefficient of skewness

2. The moment coefficient of skewness

Note: The shape of the curve is determined by the value of


• If =0 then the distribution is symmetric
• If >0 then the distribution is positively skewed
• If <0 then the distribution is negatively skewed.
Remarks:

96

Prepared by Big Bang, August, 2017 GC


In a positively skewed distribution, smaller observations are more frequent than
larger observations i.e. the majority of the observations have a value below an

average and it has a long tail in the positive direction.

In a negatively skewed distribution, smaller observations are less frequent than larger
observations i.e. the majority of the observations have a value above an average.

Activity 4.5
1. Suppose the mean, the mode, and the standard deviation of a certain distribution are
32, 30.5 and 10 respectively. What is the shape of the curve representing the distribution?
2. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5.
If the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and
the probable mode of the distribution.
Kurtosis
Kurtosis is the degree of peakd-ness of a distribution, usually taken relative to a normal
distribution. The peakd-ness of a distribution be classified in to three:
a) Leptokurtic: -A distribution having relatively high peak.
- A large number of observations have same values
b) Mesokurtic: - Normal peak
- The curve is properly peak.
c) Platykurtic: - Flat toped
- A large number of observations have low frequency are spread in the
middle interval.

97

Prepared by Big Bang, August, 2017 GC


Measures of kurtosis:
- It is a measure of peakdness.
- Denoted by and given by

Note: The peakdness depends on the value of


 If >3 then the curve is Leptokurtic.
 If =3 then the curve is Mesokurtic.
 If < then the curve is Platykurtic.

Activity 4.6
1. If the first four central moments of a distribution are:

a) Compute a measure of skewness


b) Compute a measure of kurtosis and give your interpretation.
2. If the standard deviation of a symmetric distribution is 10, what should be the value of
the fourth moment so that the distribution is mesokurtic?

98

Prepared by Big Bang, August, 2017 GC


CHAPTER SUMMARY

 Variability or dispersion concerns with the extent to which values in a data set
vary from the mean or from one another
 There are different measures of dispersion these include range, variance,
standard deviation, mean deviation and coefficient of variation
 Range is the difference between the largest and the smallest value in a data set
 Variance is the sum of the squares of the difference between the mean and the
individual observations divided by the total number of observations for the case
of population and by n-1 for the case of sample
 Standard deviation is the positive square root of the variance
 Coefficient of variation is the ratio of the standard deviation to the arithmetic
mean and expressed as percentage
 Different measures of dispersion have different properties and different uses, we
have to apply them in their appropriate places

Exercises on Chapter 4

99

Prepared by Big Bang, August, 2017 GC


1. In a moderately asymmetrical distribution the mode minus the mean is 2.4 and median is
24.8. The coefficient of variation is 20%. Find the mode, the mean and the standard
deviation.

2. The following data are given for 20 observations.

=306, =5490, =16, =10.

Find arithmetic mean and standard deviation for the 20 observations.

3. The standard deviation calculated from a set of 32 observations is 5. If the sum of the
observations is 80, what is the sum of square of the observations?

4. The mean of 5 observations is 3 and variance is 2. If three of them are 1, 3 and 5. Find the
remaining two.

5. The distribution of marks of 50 students in statistics out of 50 is given in the table below.
Marks 0-10 10-20 20-30 30-40 40-50

Number of 5 8 15 16 6
students
Calculate
a) The range b) The quartile deviation
c) The standard deviation and interpret the result.
6. Two models of radio were subjected to a durability test, and the results were as
follows.

Number of sets
examined
Life(in years)
Model A Model B

100

Prepared by Big Bang, August, 2017 GC


0-2 5 2
2-4 16 7
4-6 13 12
6-8 7 19
8-10 5 9
10-12 4 1

State which model has a longer average life and which model has more uniformity

7. Meteorologist is interested in the consistency of temperatures in three cities. During a


given week collected the following data. The temperatures for the five days of the week in
the three cities were

City 1: 25 24 23 26 17

City 2: 22 21 24 22 20

City 3: 32 27 35 24 28

Which city have the most consistent temperature, based on these data?

8. Suppose Bekele got 90 on a test in which the mean and S.D for the class were 70 and
10 respectively. In other test Almaz score 60 in which the mean and S.D for her class
were 56 and 40 respectively.
a) Who was better of relative to his/her class?
b) Which class has students of less similar result?
9. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5.
If the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and
the probable mode of the distribution.
10. If the standard deviation of a symmetric distribution is 10, what should be the
value of the fourth moment so that the distribution is mesokurtic?

101

Prepared by Big Bang, August, 2017 GC


CHAPTER 5

ELEMENTARY PROBABILITY
CONTENTS

5.1. Introduction 95

5.2. Definition of some probability terms 96

5.3. Counting rules: addition and multiplication rules,

permutation and combination 97

5.4. Probability of an event 103

5.5. Some probability rules 107

5.6. Conditional probability and independence 107

In this chapter, there are two main points to be discussed: Possibilities and Probabilities. After
presenting some basic concepts of probability, the next part is about techniques of counting or the
methods used to determine the number of possibilities, which are indispensable to compute
probabilities, then followed by different definitions of probability; and finally, some general rules
and derived theorems of probability will be presented.

Objectives:

After studying this chapter, you should be able to:

Define basic terms in probability such as: sample space, outcome, event, and so on.

Describe the addition and multiplication rules of counting.

Differentiate between permutations and combinations.

Evaluate the probability of an event through the various methods.

Describe the different rules of probability.

5.1 INTRODUCTION
In our daily life, it is not uncommon to hear words which express our doubts or being
uncertain about the happenings of certain events. To mention some instances,

102

Prepared by Big Bang, August, 2017 GC


“If by chance you meet her, please convey my heart-felt greeting";

“Probably, he might not take the class today";

“There is a 50-50 chance of survival of a cancer”, his physician said;

For a Mathematician, there is a fat chance of passing this course; etc

These statements show uncertainty about the happening of the event under question. In
Statistics, however, sensible numerical statements can be made about uncertainty and apply
different approaches to calculate probabilities.

A numerical measure of uncertainty is provided by a very important branch of Mathematics


known as Theory of Probability.

In general, there are three states of expectations: certainty, impossibility, and uncertainty.
Probability Theory is concerned about the study of a random (chance) phenomena; it is a numerical
measure of the chance of occurrence of something (called an event), which shows the degree of
uncertainty. Thus, we say that the probability of the above three expectations is, respectively, one,
zero, and between zero and one. Probability Theory is thr basis for all statistical applications in any
field of study.

Since probability theory is closely related with set theory, one need to revise this section from
mathematics. Probability is also defined in terms of relative frequency, presented in chapter two of
this module. Thus, the following is a review of your knowledge on these topics, and of course from
your knowledge of elementary probability is High School.

• Probability theory is the foundation upon which the logic of inference is built.

• It helps us to cope up with uncertainty.

• In general, probability is the chance of an outcome of an experiment. It is the measure of how likely
an outcome is to occur.

5.2 DEFINITIONS OF SOME PROBABILITY TERMS

1. Experiment: Any process of observation or measurement or any process which generates


well defined outcome.

103

Prepared by Big Bang, August, 2017 GC


2. Probability Experiment (Random Experiment): It is an experiment that can be repeated
any number of times under similar conditions and it is possible to enumerate the total
number of outcomes with out predicting an individual out come.

Example 5.1
If a fair coin is tossed three times, it is possible to enumerate all possible eight sequences of
head (H) and tail (T). But it is not possible to predict which sequence will occur at any
occasion.
3. Outcome: The result of a single trial of a random experiment
4. Sample Space(S): Set of all possible outcomes of a probability experiment.
Example: Sample space of a trial conducted by three tossing of a coin is S=
{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Sample space can be
 Countable (finite or infinite)
 Uncountable
5. Event (Sample Point): It is a subset of sample space. It is a statement about one or more
outcomes of a random experiment. It is denoted by capital letter A.
For example, in the event, that there are exactly two heads in three tossing of a coin, it
would consist of three points HTH, HHT and THH.
Remark: If S (sample space) has n members then there are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an event: The complement of event A (denoted by ), consists of
all the sample points in the sample space that are not in A.
8. Elementary (simple) Event: an event having only a single element or sample point.
9. Mutually Exclusive (Disjoint) Events: Two events which cannot happen at the same
time.
10. Independent Events: Two events are said to be independent if the occurrence of one
does not affect the probability of the other occurring.

104

Prepared by Big Bang, August, 2017 GC


11. Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.

5.3 COUNTING RULES: ADDITION, MULTIPLICATION RULES,


PERMUTATION AND COMBINATION

In order to calculate probabilities, we have to know


• The number of elements of an event
• The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
• In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule
- The multiplication rule
- Permutation rule
- Combination rule

Addition Rule
If event A can occur in m possible ways and event B can occur in n possible ways,
there are m+n possible ways for either event A or event B to occur,
but only if there are no events in common between them.
i.e. n (A or B) =n (A)+n(B)-n(A B)

Observe the following Venn-diagram

Only A = A-B

105

Prepared by Big Bang, August, 2017 GC


Only B=B-A

Both A & B=
Notes: 1) An alternative expression is: n =n(only A)+n(only B)+n(both A & B).

If , then n = n(A) + n(B).

n(A)= n(only A)+ n ; and n(B)= n(only B)+ n .

Alternative expressions: A-B= , and B-A = .

To list the outcomes of the sequence of events, a useful device called tree diagram is used.

Example 5.2:
A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or milk with
bread, cake and sandwich. How many possibilities does he have?

Solutions: See the following 9 possibilities:


Tea
Bread Coffee Milk
Cake Bread Bread
Sandwich cake cake
Sandwich Sandwich
The Multiplication Rule

106

Prepared by Big Bang, August, 2017 GC


If a choice consists of k steps of which the first can be made in n 1 ways, the second can be
made in n2 ways…, the kth can be made in nk ways, then the whole choice can be made in (
ways.
Example 5.3

If a man has 3 pairs of trousers, 5 shirts, 2 jackets and 3 pairs of shoes, in how many different ways
can he wear his clothes and shoes?

Solution: Using n1=3, n2=5, n3=2 and n4=3, the total number of ways of wearing is:

Activity 5.1
The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many different
cards are possible if
a) Repetitions are permitted.
b) Repetitions are not permitted.
Factorial notation

The symbol "n!", read as " n factorial", denotes the product of all positive integers less than or equal
to n.

Let n be a non-negative integer. Then, n factorial, denoted by n!, is defined as

n!= n*(n-1)*(n-2)*…*2*1, where 0!=1.

Note the following relation ships:

n!=n(n-1)!=n(n-1)(n-2)! =n(n-1)(n-2)(n-3)!, etc.

, and so on. In general, we can have

Permutation
An arrangement of n objects in a specified order is called permutation of the objects.
107

Prepared by Big Bang, August, 2017 GC


Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! =n*(n-1)*(n-2)*,…,*2*1.
2. The arrangement of n objects in a specified order using r objects at a time is called the
permutation of n objects taken r objects at a time. It is written as and the formula is

3. The number of permutations of n objects in which k1 are alike, k2 are alike ---- etc is

4. The arrangement of n objects around a line is (n-1)! ways.


5. The number of ways of partitioning a set of n things in to k cells where there are n 1
elements in the first cell, n2 elements in the second cell,…,nk elements in the kth cell is

Example 5.4

Find the permutations of two of the five vowels a, e, i, o, u; and list them.

Solution: There are n1*n2=5*4=20 different permutations, listed as follows:

ae,ai,ao,au, ea,ei,eo,eu, ia,ie,io,iu, oa,oe,oi,ou, ua,ue,ui,uo.

108

Prepared by Big Bang, August, 2017 GC


Activity 5.2
1. Suppose we have a letters A, B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word
“MISSISSIPPI”?
3. In how many ways can people assigned 1 triple and 2 double room?
4. In how many ways can a party of 7 people arrange themselves?
a) In a row of 7 chairs?
b) Around a circular table?

Combination
There are many problems in which we are interested in finding the number of ways in which r objects
can be selected from n distinct objects without regard to the order of selection. Such selections are
called combinations.

Definition: The number of ways of selecting r objects from a set of n objects with out regard
to the order of selection is called combination.

Example 5.5
Given the letters A, B, C, and D list the permutation and combination for selecting two
letters.
Solutions:
Permutation: Combination:

AB BC
AC BD
AD DC

Note that in permutation AB is different from BA. But in combination AB is the same as BA.

109

Prepared by Big Bang, August, 2017 GC


The number of combinations of r objects selected from n objects is denoted by or

and is given by the formula:

Example 5.6

In how many ways can 3 letters be selected form the four letters a, b, c & d?

Solution: Since we do not care about their order of selection, we have only the following four cases:
abc, abd, acd, & bcd.

But recall that, the number of permutations of 3 letters out of the 4 is 4P3 =24, and we know that each
of the three letters can be arranged in 3! = 6 ways.

Thus, we have (3!)4 = 4P3, from which we get 4=4P3/3! = , say.

Actually, "combination" means the same as "subset"; in the above case, the number of subsets of 3
elements that can be selected from a set of 4 distinct elements is = 4.

This is called the total number of combinations of 3 objects selected from n distinct objects.

Notation: is used to denote the combination of n objects taking r of them at a time.

The number of combinations of n distinct objects taking r of them at a time is given by:

, for .

Example 5.7

In how many ways can a committee of 2 students be formed out of 6?

Solution: We substitute n = 6 and r =2, to get 6C2=15.

Example 5.8

If a committee of 5 candidates is to be formed out of 10, of which 4 are girls and 6 are boys, how
many committees can be formed if 2 girls are to be included?

110

Prepared by Big Bang, August, 2017 GC


Solution: It can be seen as a two-stage selection. Since 2 of the 4 girls can be selected in n 1=4C2=6
ways, and 3 of the 6 boys in n2= 6C3 =20 ways, then, the total number of committees is

5.4 PROBABILITY OF AN EVENT

There are four different conceptual approaches to study probability theory. These are: • The
classical approach.
• The frequencies approach.
• The axiomatic approach.
• The subjective approach.

The classical approach


This approach is used when:
- All outcomes are equally likely and mutually exclusive.
- Total number of outcome is finite, say N.

Definition
If a random experiment with N equally likely outcomes is conducted and out of these N A
outcomes are favorable to the event A, then the probability that event A occur denoted P (A)
is defined as:

Limitation:
 If it is not possible to enumerate all the possible outcomes for an experiment.
 If the sample points (outcomes) are not mutually independent.
 If the total number of outcomes is infinite.
 If each and every outcomes is not equally likely.
Example 5.9

What is the probability that a 3 or 5 will turn up in rolling a fair die ?

Solution: S ={1, 2, 3, 4, 5, 6}; let E ={3, 5}. For a fair die, P(1)=P(2) =  =P(6)=1/6; then,
P(E)=m/n=2/6=1/3.

111

Prepared by Big Bang, August, 2017 GC


Activity 5.3
1.A fair die is tossed once. What is the probability of getting
a) Number 4?
b) An odd number?
c) Number greater than 4?
d) Either 1 or 2 or …. Or 6
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of
these candles are selected at random, what is the probability?
a) All will be defective.
b) 6 will be non defective
c) All will be non defective
Limitations of the classical definition

At times, it is impossible to apply this concept. Two instances can be mentioned:

i) If the outcomes are not equally likely to occur.

If one sits for a quiz, the two options (pass/fail) are not equally likely.

If one jumps from a running train, is his survival/death equally likely?

ii) If the total number of outcomes is infinite.

The Frequencies or a posteriori Approach


This is based on the relative frequencies of outcomes belonging to an event.

Definition
The probability of an event A is the proportion of outcomes favorable to A in the long run
when the experiment is repeated under same condition.

Example 5.10

112

Prepared by Big Bang, August, 2017 GC


If the records of Ethiopian Air Lines show that 468 of 600 of its flights from B/Dar to Addis arrived
on time, what is the probability that any one of similar flights will arrive on time?

Solution: If E =The event that the plane will arrive on time, then:

Note: The probability of not arriving on time is: P( EC )

That is, the plane didn't arrive on time for 600 – 468 =132 flights; thus, .

In general, or

Activity 5.4
If records show that 60 out of 100,000 bulbs produced are defective. What is the probability
of a newly produced bulb to be defective?

Axiomatic Approach:
This approach does not give precise definition of probability but gives certain axioms or postulates
or rules on which probability calculations are based. Then, anyone of the preceding concepts can be
used in applications as long as it is consistent with these rules.

Let E be a random experiment and S be a sample space associated with E. With each event A a real
number called the probability of A satisfies the following properties called axioms of probability or
postulates of probability.

1.
2. P(s) =1
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P (AuB) =P (A) +P (B)

Subjective Approach
It is always based on some prior body of knowledge. Hence subjective measures of
uncertainty are always conditional on this prior knowledge. The subjective approach accepts

113

Prepared by Big Bang, August, 2017 GC


unreservedly that different people (even experts) may have vastly different beliefs about the
uncertainty of the same event.

Example 5.11
Abebe’s belief about the chances of Ethiopia Buna club winning the FA Cup this year may
be very different from Daniel's. Abebe, using only his knowledge of the current team and
past achievements may rate the chances at 30%. Daniel, on the other hand, may rate the
chances as 10% based on some inside knowledge he has about key players having to be sold
in the next two months.

5.5 Some probability rules

There are also other other rules, but all are derived from the above three postulates.
Some of them are:
a) P(A1 A2 … An) =P(A1)+ P(A2) + … + P(An) , if A1, A2, …,An are pairwise mutually
exclusive.
b) , probability never exceeds unity.
c) .
d) , where is the complement of event A.
e) For any two events A and B, P(A B)=P(A)+P(B)-P(A B); this is the general addition
rule.

5.6 CONDITIONAL PROBABILITY AND INDEPENDENCE


Conditional Events: If the occurrence of one event has an effect on the next occurrence of
the other event then the two events are conditional or dependant events.

Conditional probability of an event

114

Prepared by Big Bang, August, 2017 GC


The conditional probability of an event A given that B has already occurred, denoted by
P(A/B).Since A is known to have occurred, it becomes the new sample pace replacing the
original sample space.
From this we are led to the definition

, P(B) or P(A B)=P(A/B).P(B)

Remark: 1)
2)
3) For three events A, B, and C
.
4) If an event A must result in of the mutually exclusive events A1.A2,…, An.
Then P (A) =P(A1).P(A/A1) + P(A2).P(A/A2) + ….+ P(An).P(A/An).
5) Suppose that A1, A2, …, An are mutually exclusive events whose union is the sample
space.

Then is called Bayes’ rule

Activity 5.5
1. For a student enrolling at freshman in a certain university, the probability is 0.25 that
he/she will get scholarship and 0.75 that he/she will graduate. If the probability is 0.2 that
he/she will get scholarship and will also graduate. What is the probability that a student who
get a scholarship graduate?
2) A lot consists of 20 defective and 80 non-defective items from which two items are
chosen without replacement. Find the probability that:
a) that both items are defective, b) the second item is defective.
Probability of Independent Events

115

Prepared by Big Bang, August, 2017 GC


The probability of B occurring is not affected by the occurrence or nonoccurrence of A, then
we say that A and B are independent events i.e. P (B/A)=P(B). This is equivalent to

Remarks: If A1, A2, A3 are to be independent then they must be pair wise independent,
Where j,k=1,2,3 and we must also have

Example 5.12

Given that P (A) = 0.4, P (B) = 0.2, P (A B) = 0.08,

P(C) = 0.5, P (D) = 0.3, P(C D) = 0.10.

a) Are A and B independent? b) Are C and D independent?

Solution: a) P (A) P (B) = (0.4) (0.2) = 0.08 = P (A B).Hence, A and B are independent.

b) P(C) P (D) = (0.5) (0.3) = 0.15  P(C D) = 0.10. Hence, C and D are dependent.

The notion of independence can be extended to more than two events:

Example 5.13

A problem in Statistics is given to three students X, Y, and Z, whose probabilities of solving it are

, respectively. What is the probability that

a) All of them will solve it; b) Any one of them will solve it, if they try independently?

Solution: WE are given that P(X) = ½, P(Y) = ¾, and P (Z) = ¼.

b)

Activity 5.6

116

Prepared by Big Bang, August, 2017 GC


1. A fair die is tossed twice. Find the probability of getting a 4, 5,or 6 on the first toss and a

1,2,3 or 5 on the second toss?

2. A ball is drawn at random from a box containing 6 red balls, 4 white balls and 5 blue balls.

Find the probability that they are drawn in the order red, white and blue if each ball is

a) replaced b) not replaced

117

Prepared by Big Bang, August, 2017 GC


CHAPTER SUMMARY

Mutually exclusive events are those which do not occur together.

Independent events are those which do not affect each other.

A & B are independent if and only if:

The number of permutations of r objects selected out of n different objects is given by

All of n different objects can be arranged in n! different ways.

The number combinations of n distinct objects selecting r of them at a time is:

Classical probability concept: The probability of an event is m/n if it can occur in m ways out of a
total of n equally likely ways.

The relative frequency concept of probability: The probability of the occurrence of an event equals
its relative frequency.

The three axioms of probability are:

Probability is non - negative.

Probability of a sample space is unity.

if A & B are mutually excursive.

 , number of elements in a union of sets.

 , for any two events.

 For any three events A, B, and C,

118

Prepared by Big Bang, August, 2017 GC


Exercises on Chapter 5
1. If find

2. a) b) ; c)

3. A coin is loaded so that , and . If the coin is tossed three times,


what is the probability of getting?

a) all heads; b) two tails and a head in this order; c) two tails & a head in any order?

3. Among 15 clocks there are two defectives. In how many ways can an inspector choose
3 of clocks for inspection, so that
a) Non of the defective is included

b) Only one defective is included

c) Two defectives are included

4. In how many ways can a committee of 5 people be chosen out of 9 people?


5. Out of 5 Mathematician and 7 Statistician a committee consisting of 2 Mathematician
and 3 Statistician is to be formed. In how many ways this can be done if
a) There is no restriction
b) One particular Statistician should be included
c) Two particular Mathematicians can not be included on the committee.
6. A committee of 5 people must be selected out 5men and 8 women. In how many ways
can be selection made if there are three women on the committee?
7. A recent survey asked 100 people if they thought women in armed forces should
permitted to participate in combat. The result of the survey are shown in the table

Gender Yes No Total


Male 32 18 50
Female 8 42 50
Total 40 60 100
Find the probabilities

119

Prepared by Big Bang, August, 2017 GC


a) The respondent answer “yes” given that the respondent was female

b) The respondent was a male given that respondent answered “no”

8. Consider the following 2 × 2 table that shows incidence of myocardial infarction


(denoted MI) for women who had used oral contraceptives and women who had
never used oral contraceptives.
MI Yes MI No Totals
Used oral contraceptives 55 65 120
Never used oral contraceptives 25 125 150
Totals 80 190 270
Assume that the proportions in the table represent the “infinite population” of adult
women. Let A = {woman used oral contraceptives} and let B = {woman had an MI
episode}
Find a) P(A), P(B), P(Ac), and P(Bc).
b) What is P(A n B)?
c) What is P(A u B)?
d) Are A and B mutually exclusive?
e) What are P(A|B) and P(B|A)?
f) Are A and B independent?

120

Prepared by Big Bang, August, 2017 GC


CHAPTER 6

PROBABILITY DISTRBUTIONS

CONTENTS
6.1. DEFINITION OF RANDOM VARIABLES AND PROBABILITY
DISTRIBUTIONS 114
6.2. INTRODUCTION TO EXPECTATION – MEAN AND VARIANCE OF
A RANDOM VARIABLE 118
6.3. COMMON DISCRETE PROBABILITY DISTRIBUTIONS –
BINOMIAL AND POISSON 121
6.4. COMMON CONTINUOUS PROBABILITY DISTRIBUTIONS -
NORMAL, CHI-SQUARE, T AND F 125

INTRODUCTION

In chapter 5, the techniques of computing the probability of an event have been introduced.
In this chapter, we shall study the most commonly used discrete probability distributions;
namely, the Binomial and Poisson distributions; and three continuous probability densities:
normal, chi-square and t distributions, which are very important in statistical inference. We
will also mention some of their properties, because we need the results. But before
presenting the probability distributions specifically, we need to define a random variable, a
probability distribution, and the mean and variance, in general, of a continuous as well as
discrete random variables.

Objectives:

121

Prepared by Big Bang, August, 2017 GC


After studying this chapter, you should be able to:

 Describe a random variable and its probability distribution.

 Evaluate the mathematical expectation of a random variable.

 Evaluate probabilities of a discrete and continuous random variables.

 Identify the appropriate distribution under a given situation.

6.1. DEFINITION OF RANDOM VARIABLES AND PROBABILITY


DISTRIBUTION

Random variable: - is numerical valued function defined on the sample space. It assigns a
real number for each element of the sample space. Generally a random variables are denoted
by capital letters and the value of the random variables are denoted by small letters

Random variables are of two types: Discrete and Continuous.

Discrete random variable: are variables which can assume only a specific number of
values. They have values that can be counted
Examples
• Toss a coin n time and count the number of heads.
• Number of children in a family.
• Number of car accidents per week.
• Number of defective items in a given company.
• Number of bacteria per two cubic centimeter of water.

Continuous random variable: are variables that can assume all values between any two
give values.
Examples
• Height of students at certain college.
122

Prepared by Big Bang, August, 2017 GC


• Mark of a student.
• Life time of light bulbs.
• Length of time required to complete a given training.
Probability distribution: - consists of a value a random variable can assume and the
corresponding probabilities of the values or it is a function that assigns probability for each
element of random variable.

Probability distribution can be discrete or continuous.

Discrete probability distribution: - is a formula, a table, a graph or other devices used to


specify all possible values of the discrete random variable (R.V) X along with their
respective probabilities.

Example 6.1

In an experiment of "flipping a fair coin 3 times", list the elements of the sample space that
are assumed to be equally likely (as this is what is meant by a fair or balanced coin) and the
corresponding values x of the r-v X, the number of heads observed.

Solution: If H stands for heads and T for tails, then the sample space corresponding to this
experiments is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Since X= the number of heads observed, the results are shown in the following table:

Element of S Probability x

HHH 1/8 3
HHT 1/8 2
HTH 1/8 2
HTT 1/8 1
THH 1/8 2
THT 1/8 1
TTH 1/8 1

123

Prepared by Big Bang, August, 2017 GC


TTT 1/8 0

Thus, we can write X(HHH) = 3, X(HHT) = 2, , X(TTT) = 0,


and P(X = 3) = 1/8 = the probability that the r-v X is 3, P(X= 2) = 3/8, and P(X=0)=1/8.
Note that the possible values of X are: .

Activity 6.1
1) Consider an experiment of tossing a coin three times. Let X be the number of heads.
Construct the probability distribution of X.
2) A balanced die is tossed two twice, construct a probability distribution if:
A) X is the sum the number of spots in the two trials.
B) X is the absolute difference of the number of spots in the trials.

Properties of discrete probability distribution

1)

2)
3) If X is discrete random variable then

Example 6.2

124

Prepared by Big Bang, August, 2017 GC


Check whether the following function can serve as a pmf of a discrete r-v X:

Solution: Substituting the different values of x, we get

Since these values are all non-negative and the sum is , the given
function can serve as a pmf of some random variable whose domain is .

B) Continuous probability distribution

Definition: a non negative function f(x) is called probability distribution of continuous R.V
X if the total area bounded by the curve and the X-axis is 1 and if the sub area under the
curve bounded by the curve & X-axis and perpendicularly erected at any points a and b give
the probability that X is between a and b.

Example 6.3

Suppose that the r-v X is continuous with the following pdf:

a) Check that satisfies the two conditions of being a p.d.f.;


b) Evaluate P(X<0.5).
Solution: a) Obviously, for o < X< 1, f(x) >0, and

Hence, is the pdf of some random variable X.

125

Prepared by Big Bang, August, 2017 GC


Note: since f(x) is zero in the other two intervals:

b)

Activity 6.2

Let X be a continuous r-v with the following pdf:

a) Check that satisfies the conditions of being a p.d.f.


b) Find .

Properties of continuous probability distribution

a) The total area under the curve is one i.e.

b) the area under the curve between the point a and b.


c)
d)
e)

6.2 INTRODUCTION TO EXPECTATION

Definition:
1. Let a discrete random variable X assume the values X1, X2, ….,Xn with the probabilities
P(X1), P(X2), ….,P(Xn) respectively. Then the expected value of X, denoted as E(X) is
defined as:

126

Prepared by Big Bang, August, 2017 GC


E(X) =X1.P (X1) +X2.P(X2) +…. +Xn.P (Xn)

=
2. Let X be a continuous random variable assuming the values in the interval (a, b) such that

=1, then

Mean and Variance of a random variable


Let X is given random variable.
1. The expected value of X is its mean
Mean of X=E(X)
2. The variance of X is given by:
Variance of X=Var(x) =

Where

Rules of Expectation

1) Let X be a R.V and k be a real number, then

a) E (kX) =kE(X)

b) E(X+k) =E(X) + k

2) Let X and Y be R.V on the sample space, then

127

Prepared by Big Bang, August, 2017 GC


a)

b)

Where Cov(X, Y) =the covariance between X and Y=E (XY)-E(X).E(Y)

3) Let X and Y be independent R.V, then

a) E (XY) =E(X).E(Y)

b)

c) Cov (X, Y) =0

Example 6.4

Let a fair die be rolled once. Find the mean number rolled, say X.

Solution: Since S = {1, 2, 3, 4, 5, 6} and all are equally likely with prob. of 1/6, we have

Example 6.5

Find the expected value and the variance of the r-v given in

Solution:

= 1.

128

Prepared by Big Bang, August, 2017 GC


=

Activity 6.3
1. What is the expected value and Variance of a random variable X obtained by tossing a
coin three times where X is the number of heads?
2. Let X be a continuous R.V with distribution

Then find a) P (1<x<1.5; b) E(x); c) Var(x); and d) .

6.3 COMMON DISCRETE PROBABILITY DISTRIBUTIONS

In this section, we shall study two common discrete probability distributions, namely, the
Binomial and Poisson distributions.

1. Binomial Distribution

A binomial experiment is a probability experiment that satisfies the following four


requirements called assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a
failure.
3. The probability of each outcome does not change from trial to trial, and
4. The trials are independent, thus we must sample with replacement.
129

Prepared by Big Bang, August, 2017 GC


Examples of binomial experiments
• Tossing a coin 20 times to see how many tails occur.
• Asking 200 people if they watch BBC news.
• Registering a newly produced product as defective or non defective.
• Asking 100 people if they favor the ruling party.
• Rolling a die to see if a 5 appears.

Definition: The outcomes of the binomial experiment and the corresponding probabilities of
these outcomes are called Binomial Distribution.
Let p=probability of success q= 1-p=probability of failure on any given trials
Then the probability getting x success in n trials becomes

And this sometimes written as

When using the binomial formula to solve problems, we have to identify three things:
• The number of trials (n)
• The probability of a success on any one trial (P) and
• The number of successes desired (X).

Example 6.6
Find the probability of getting 5 heads and 7 tails in 12 flips of a fair coin.

Solution: Given n = 12 trials. Let X be the number of heads.

Then, p = Prob. of getting a head =1/2, and q = prob. of not getting a head=1/2.

 The probability of getting k heads in a random trial of a coin 12 times is

130

Prepared by Big Bang, August, 2017 GC


And for k =5, .

Example 6.7

If the probability is 0.20 that a person traveling on an EAL flight will a vegetarian, find the
probability that 3 of 10 people on such flight will be a vegetarian?

Solution: Let X be the number of vegetarians. Given n = 10, p = 0.20, k = 3; then,

Remark: If X is a binomial random variable with parameters n and p then


E(X)=np and var(X)=npq

Activity 6.4
What is the probability of getting three heads by tossing a fair coin four times?

2. Poisson Distribution

A random variable X is said to have a Poisson distribution if its probability distribution is


given by:

Where is the average number occurrence of an event in the unit length of interval or
distance and x is the number of occurrence in a Poisson process.
The Poisson distribution depends only on the average number of occurrences per unit time of
space. It is used as a distribution of rare events, such as:
• Number of misprints.
• Natural disasters like earth quake.
• Accidents.
• Hereditary.

131

Prepared by Big Bang, August, 2017 GC


• Arrivals
. Number of misprints per page
The process that gives rise to such events is called Poisson process.
If X is a Poisson random variable with parameters λ then
E(x) = λ, var(x) = λ.

Note: The Poisson probability distribution provides a close approximation to the binomial
probability distribution when n is large and p is quite small or quite large with λ=np.

Usually we use this approximation if 5≤np. In other words, if n>20 and np<5 or n(1-p) ≤5
then we may use Poisson distribution as an approximation to binomial distribution.

Example 6.8

Suppose that customers enter a waiting line at random at a rate of 4 per minute. Assuming
that the number entering the line during a given time interval has a Poisson distribution, find
the probability that:

a) One customer enters during a given one-minute interval of time;

b) At least one customer enters during a given half-minute time interval.

Solution: a) Given per min, .

b) Per half-minute, the expected number of customers is 2, which is a new parameter.

, but .

132

Prepared by Big Bang, August, 2017 GC


Activity 6.5
1. If 1.6 accidents can be expected an intersection on any given day, what is the probability
that there will be 3 accidents on any given day?
2. If there are 200 typographical errors randomly distributed in a 500-page manuscript, find
the probability that a given page contains exactly 3 errors.

3. A sale firm receives, on the average, 3 calls per hour on its toll-free number. For any
given hour, find the probability that it will receive the following.

a) At most 3 calls

b) At least 3 calls

c) Five or more calls

4. If approximately 2% of the people are left-handed, find the probability that in a room 200
people, there are exactly 5 people who are left-handed?

6.4 COMMON CONTINUOUS PROBABILITY DISTRIBUTIONS

In this section, we will study three important continuous probability distributions that play the
leading role in statistical inference; viz., the normal, t & Chi-Square distributions.

6.4.1 Normal Distribution

A random variable X is said to have a normal distribution if its probability density function
is given by

- are parameters of the normal distribution.

133

Prepared by Big Bang, August, 2017 GC


Properties of Normal Distribution:
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum
ordinate is at μ=x and is given by

2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution i.e. there is no gaps or holes.
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the

mean is 0.5

6. It is unimodal, i.e., values mound up only in the center of the curve.


7. Median=Mean=mod =μ and located at the center of the distribution.
8. The probability that a random variable will have a value between any two points is equal to
the area under the curve between those points.
Note: To facilitate the use of normal distribution, the following distribution known as the
standard normal distribution was derived by using the transformation

That is, .

Properties of the Standard Normal Distribution:

Same as a normal distribution, but


• Mean is zero
• Variance is one
• Standard Deviation is one

134

Prepared by Big Bang, August, 2017 GC


- Areas under the standard normal distribution curve have been tabulated in various ways. The
most common ones are the areas between Z=0 and a positive value of Z.
- Given a normal distributed random variable X with Mean μ and standard deviation σ

Example 6.9

Find the probabilities that a r-v having the standard N.D will take on a value

a) Less than 1.72; b)less than -0.88; c) between 1.30 & 1.75; d)between -0.25 & 0.45.

Solution: Making use of the Z table, we find that

a) P(Z<1.72)=P(Z<))+P()<Z<1.72)=0.5+0.4573=0.9573.

b) P(Z < -0.88) = P(Z > 0.88) =0.5 - P(0 < Z < 0.88) =0.5- 0.3106 = 0.1894.

c) P(1.30 < Z <1.75)= P(0 < Z < 1.75) – P(0 < Z < 1.30) = 0.4599 – 0.4032)=0.0567.

d) P(-0.25 < Z < 0.45)= P(-0.25 < Z < 0) + P( 0 < Z < 0.45) = 0.0987 + 0.1736=0.2723.

Activity: Find the following: a) P(-0.45 < Z < -0.25); b)P(Z>1.75).

Activity 6.6
Of a large group of men, 5% are less than 60 inches in height and 40% are between 60 & 65
inches. Assuming a normal distribution, find the mean and standard deviation of heights.

6.4. 2 Student’s t Distribution


In statistics as long as sample size is large enough, most datasets can be explained by
Standard Normal Distribution. But when the sample size is small, statisticians rely on the
distribution of the t statistic (also known as the t score), whose value is given by:

135

Prepared by Big Bang, August, 2017 GC


Where the sample mean, μ is the population mean, s is the standard deviation of the
sample, and n is the sample size.

The distribution of the t statistic is called the t distribution or the Student t distribution. The
particular form of the t distribution is determined by its Degrees of Freedom (df). The
degrees of freedom refers to the number of independent observations in a set of data. When
estimating a mean score or a proportion from a single sample, the number of independent
observations is equal to the sample size minus one.. The t distribution can be used with any
statistic having a bell-shaped distribution (i.e., approximately normal).

The t distribution has the following properties:

 The mean of the distribution is equal to 0.


 The variance is equal to v / (v - 2), where v is the degrees of freedom.
 With infinite degrees of freedom, the t distribution is the same as the standard normal
distribution.
 The t distribution is similar to standard normal distribution in the following ways
 It is bell-shaped.
 It is symmetric about the mean.
 The mean, median, and mode are equal to zero and located at the center of the
distribution.
 The curve never touches the x axis.
 The t distribution differs from standard normal distribution in the following ways.
 The variance is greater than one
 The t distribution is actually a family of curves based on the concept of degrees of
freedom, which is related to sample size.

136

Prepared by Big Bang, August, 2017 GC


 As the sample size increases, the t distribution approaches the standard normal
distribution.
6.4.3 Chi-Square Distribution

The chi-square variable is similar to t variable in that its distribution is a family of curves

based on the number of degree of freedom. The symbol for chi-square is (Greek letter chi,

pronounced “ki”). The chi-square distribution is obtained from the values of when

random samples are selected from a normally distributed population whose variance is .A
chi-square variable can not be negative, and the distributions are positively skewed. At about
100 degree of freedom, the chi-square distribution becomes some what symmetrical. The
area under each chi-square distribution is equal to 1.00 or 100%.

In order to find the area under the chi-square distribution, there are three cases to consider:

1) Find the chi-square critical value for a specific when the hypothesis test is one tailed

right. In this case, find the value at the top of table and the corresponding degree of
freedom in the left column. Then, the critical value is located when the two columns meet.

Example 6.10
a) The critical chi-square value for 15 degrees of freedom when and the test is one-
tailed right is 24.996.

b). Find the chi-square critical value for a specific when the hypothesis test is one tailed
left. In this case, the value must be subtracted from one. Then, the left side of the table

used, because the table gives the area to the right of the critical value, the statistics can
not be negative.

Example 6.11

137

Prepared by Big Bang, August, 2017 GC


The critical value for 10 df when =0.05 and the test is one-tailed left is 3.940.
3) Find the chi-square critical value for a specific when the hypothesis test is two-tailed.
When a two-tailed test is conducted, the area must be split. For example, to find the critical
chi-square values for 22 degrees of freedom when =0.05, we use the area to the right of
the larger value 0.025 (0.05/2), and the area to right of the smaller value 0.975(1-0.05/2).
Hence, one must use values in the table of 0.025 and 0.975, with 22 degrees of freedom
the critical values are 36.781 and 10.982 respectively.

Note that after the degrees of freedom reach 30, chi-square table only gives values for
multiples of 10(40, 50,60,etc.). When the exact degrees of freedom one is seeking are not
specified in the table, the closer smaller value should be used.

The chi-square distribution has the following properties:


 The mean of the distribution is equal to the number of degrees of freedom (v): μ = v.
 The variance is equal to two times the number of degrees of freedom: σ2 = 2 * v

 When the degrees of freedom are greater than or equal to 2, the maximum value for
Y occurs when .

 As the degrees of freedom increase, the chi-square curve approaches a normal


distribution.

6.4.5 The F distribution

We use the t distribution and tests to examine the probability of a single estimator taking a
particular value we use the F distribution and F tests to carry out joint hypothesis testing on
more than one estimator

The motivation behind the F distribution is where we have independent samples of two
variables each drawn from normal distributions

138

Prepared by Big Bang, August, 2017 GC


Example 6.12
 X1, X2…., Xm: random sample of size m from a normal distribution
 Y1, Y2…., Yn: random sample of size n from a normal distribution

If we want to find out if the variances are the same, so X2=Y2, but we can’t observe the
sample variances; however, we have sample estimators, SX2 and SY2 :

and

If we take

If the two variances are the same, F=1. If they are different, F 1 and the greater the
difference, the greater the value of F will be

Statistical theory shows us that if the two population variances are equal (X2=Y2), the F
ratio will follow the F distribution with (m-1)/(n-1) df (with the larger of the two variances
on the top)

The F ratio is often designated Fk1,k2, where the subscript denotes the parameters of the
distribution, so, here k1=(m-1) and k2=(n-1)

Properties of the F distribution


1. The F distribution is skewed to the right and ranges between zero and infinity (i.e.
it only takes positive values)
2. As the df , the F distribution approaches the normal distribution
139

Prepared by Big Bang, August, 2017 GC


3. The square of a t-distributed r.v. has an F distribution with 1 and k df in the
numerator and denominator respectively
i.e. tk2=F1,k

CHAPTER SUMMARY

 Any function can be a pmf if: , and .

 For a random variable X,

, where

 , and , where a and b are constants.

 The binomial pmf is given by: .

 For a binomial random variable, , and .

 For the Poisson distribution: .; E(X)= =Var(X).

 For f(x) to be a pdf, we need and ; then P(a < X < b) = .

 The pdf of a Normal Distribution is: ; .

 For a Normal Distribution, the three averages coincide: mean = median = mode.

 has mean 0 and variance 1.

 If X is continuous, then .
 The Chi-square distribution is used to test an association of attributes.

140

Prepared by Big Bang, August, 2017 GC


If and are the mean and the variance of a random sample of size n from a normal population
X 
with mean and variance , then t= has t distribution with (n-1) degrees of freedom.
S/ n

141

Prepared by Big Bang, August, 2017 GC


Exercises on Chapter 6

1. Suppose that an examination consists of six true and false questions, and assume that a student
has no knowledge of the subject matter. The probability that the student will guess the
correct answer to the first question is 30%. Likewise, the probability of guessing each of the
remaining questions correctly is also 30%.
a) What is the probability of getting more than three correct answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
2) The probability that a patient contracting IB will recover from the distance under medical
treatment is 0.6 out of 15 patients contracting the diseases
a) What is the probability that exactly 10 is record?

b) What is the expected number of patient who will recover?

c) What is the variance of the number of patient who will recover?

Assume that the patients are subjected under the same medical treatment.

3. Find the area under the standard normal distribution which lies
a) Between Z=0 and z=96.0

b) Z=-1.45 and Z=0

c) The right of Z=-0.35

d) To the left of Z=0.35

e) Between Z=-0.67 and Z=0.75

f) Between Z=0.25 and Z=1.25

4. Find the value of Z if

142

Prepared by Big Bang, August, 2017 GC


a) The normal curve area between 0 and z(positive) is 0.4726
b) The area to the left of z is 0.9868
5. A random variable X has a normal distribution with mean 80 and standard deviation 4.8.
What is the probability that it will take a value?
a) Less than 87.2
b) Greater than 76.4
c) Between 81.2 and 86.0
6. A normal distribution has mean 62.4.Find its standard deviation if 20.05% of the area
under the normal curve lies to the right of 72.9
7. A random variable has a normal distribution with σ =5.Find its
mean if the probability that the random variable will assume a value less than 52.5 is 0.6915.

143

Prepared by Big Bang, August, 2017 GC


CHAPTER 7

SAMPLING AND SAMPLING DISTRBUTIONS

CONTENTS

7.1. METHODS OF SAMPLING (SIMPLE RANDOM SAMPLING) 136

7.2. SAMPLING DISTRIBUTION OF THE SAMPLE MEAN 144

7.3. SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION 148

7.4. STANDARD ERRORS 152

7.5. THE CENTRAL LIMIT THEOREM 153

INTRODUCTION

Before giving the notion of sampling we will first define population. In a statistical
investigation the interest usually lies in the assessment of the general magnitude and the
study of variation with respect to one or more characteristics relating to individuals
belonging to a group. This group of individuals under study is to a group This group of
individuals under study is called population or universe thus in statistics, population is an
aggregate of objects animate and inanimate under study the population may be finite of
infinite

Objectives
At the end of this chapter students will be able to
 Explain the meaning of sampling theory sampling unit and sampling frame

144

Prepared by Big Bang, August, 2017 GC


 Differentiate different sampling methods techniques
 Determine the sampling distribution of mean and of the
sampling distribution of the proportion
7.1. METHODS OF SAMPLING
7.1.1. The concept of sampling

Sampling is that part of statistical practice concerned with the selection of individual
observations intended to yield some knowledge about a population of concern, especially for
the purposes of statistical inference. Before having further discussion on the specific type of
sampling methods, it is valuable to be acquainted to the following terms:

1. Sampling
Sampling is the process or method of sample selection from the population.
Sampling can be done either with replacement or with out replacement.

1.1 Sampling with replacement (swr):

In this case, a unit is selected from a population with a known probability and a unit is
returned to the population before the next selection is made (after records its
characteristic(s).Thus, in this method at each selection, the population size remains constant
and the probability at each selection or draw remains the same and a unit has chances of

being selected more than once. There are possible samples of size n from a

population of N units.

1.2 Sampling with out replacement (swor):

In this selection procedure, if a unit from a population size N is selected, it is not returned to
the population. Thus, for any subsequent selection, the population size reduced by one. There

are possible samples of size n from a population of N units.

145

Prepared by Big Bang, August, 2017 GC


2. Sampling unit
The ultimate unit to be sampled or elements of the population to be sampled is called
sampling unit.
Examples
 If some body studies Scio-economic status of the house holds, house holds is the
sampling unit.
 If one studies performance of freshman students in some college, the student is the
sampling unit.

3. Sample size

The number of sampling units which are selected from a population. The sample size
depends on a number of considerations which are as follows.

a) The purpose for which the sample is drawn.

b) The type of population from which the sample is to be drawn.

c) Availability of technical people or equipment needed.

d) Resources allotted for the study in terms of time and money.

e) Precision required.

4. Study Unit

The unit on which information is collected is called study unit.

5. Sampling Fraction (Sampling Interval)

The ratio between the numbers of units in the sample to the number of units in the source
population.

6. Sampling frame
146

Prepared by Big Bang, August, 2017 GC


Sampling frame is the list of all the units in the source population from which a sample is to
be taken.

Examples
 List of house holds.
 List of students in the registrar office.

7. Errors in sample survey


There are two types of errors
a) Sampling error:
- It is the discrepancy between the population value and sample value due to the
fact that the sample is not a perfect representation of the population.
- May arise due to inappropriate sampling techniques applied
b) Non sampling errors: are errors due to procedure bias such as:
 Due to incorrect responses ( is called response or observational error)
 Measurement or lack of preciseness of definition.
 Errors at different stages in processing the data such as editing and
tabulating of data..

Reasons for Sampling


 Reduced cost: Finances required to cover the whole population can hardly be made
available
 Greater speed: Too much time required studying the whole population and often the
study becomes outdated by the time it is complete.
 Greater accuracy: Complete enumeration (census study) adds many errors which are
reduced or eliminating by sampling.
 The only option when the population is infinite: Incase, the population is infinite or
consists uncountable number of units, its study is impossible.

147

Prepared by Big Bang, August, 2017 GC


Because of the above consideration, in practice we take sample and make conclusion about
the population values such as population mean and population variance, known as
parameters of the population.
Sometimes taking a census makes more sense than using a sample. Some of the reasons
include:
 Universality
 Qualitativeness
 Detailedness
 Non-representative ness

7.1.2. Sampling Techniques

There are two types of sampling techniques, broadly speaking.

A) Random Sampling or Probability Sampling

A probability sampling scheme is one in which every unit in the population has a known
nonzero probability of being sampled and the process involves random selection.
Probability sampling includes: Simple Random Sampling, Systematic Sampling, Stratified
Sampling, Cluster Sampling or Multistage Sampling.

1. Simple Random Sampling:


 It is a method of selecting items from a population such that every possible sample of
specific size has an equal chance of being selected. In this case, sampling may be
with or without replacement. Or all elements in the population have the same pre-
assigned non zero probability to be included in to the sample.
 This could be accomplished by writing each study units name on a slip of paper and
selecting adequate number of them using Lottery Method. It can also be done by
assigning a number to each sampling unit then samples are selected using Table of
Random Numbers or Computer application.
148

Prepared by Big Bang, August, 2017 GC


Table of Random Numbers
Table of random numbers are tables of the digits 0, 1, 2,…,, 9, each digit having an equal
chance of selection at any draw. For convenience, the numbers are put in blocks of five. In
using these tables to select a simple random sample, the steps are:
i) Number the units in the population from 1 to N (prepare frame of the population).
ii) Then proceed in the following way
If the first digit of N is a number between 5 and 9 inclusively, the following method of
selection is adequate. Suppose N=528 and we want n=10.
Select three columns from the table of random numbers, say columns 25 to 27. Go down the
three columns selecting the first 10 distinct numbers between 001 & 528. These are 36, 509,
364, 417, 348, 127, 149, 186, 439, and 329. Then the units with these roll numbers are our
samples.
Note: If sampling is with out replacement, reject all the numbers that comes more than once.

2. Stratified Random Sampling


 The population will be divided in to non-overlapping and exhaustive groups called
strata.
 A separate sample is taken from each stratum using Simple or Systematic Random
Sampling techniques.
 Elements in the same strata should be more or less homogeneous while different in
different strata.
 It is applied if the population is heterogeneous.
 The main advantage is it improves representativeness of the sample and it creates
reasonable comparison among strata. The major limitation is it requires separate
sampling frame for each stratum.
 Some of the criteria basis for stratification is: Characteristics of the population (Sex,
Age, ethnic origin and Occupation, etc.) and Geographical

3. Cluster sampling
149

Prepared by Big Bang, August, 2017 GC


 Dividing the population into separate groups of elements called clusters. Each
element of the population belongs to one and only one clusters.
 A simple random sample of the clusters is then taken. All elements within each
sampled cluster form the sample.

 Cluster sampling tends to provide best results when the elements within the clusters
are heterogeneous.

 It is used in large geographic samples where no list is available of all the units in the
population but the population boundaries can be well-defined.

 Cluster sampling must use a random sampling method at each stage. This may result
in a somewhat larger sample than using a simple random sampling method, but it
saves time and money.

 Cluster sampling is useful when it is difficult or costly to generate a simple random


sample.

For example, to obtain information about the drug habits of all high school students in a
state, you could obtain a list of all the school districts in the state and select a simple
random sample of school districts. Then, within in each selected school district, list all the
high schools and select a simple random sample of high schools. Within each selected high
school, list all high school classes, and select a simple random sample of classes. Then use
the high school students in those classes as your sample.

4. Systematic Random Sampling


This method selects units at a fixed interval throughout the sampling frame after a random
start.

150

Prepared by Big Bang, August, 2017 GC


 Is obtained by numbering each subject of the population and then selecting every
number.
 Here are the steps you need to follow in order to achieve a systematic random
sample:

 Number the units in the population from 1 to N,


 Decide on the n (sample size) that you need,
 Calculate the Sampling Fraction k (K = N/n),
 Randomly select an integer between 1 to k, suppose it is j

 The unit is selected at first and then until the


required sample size is reached
The general advantage of Systematic Random Sampling is the fact that it is easier and less
time consuming to perform. In some situation it can also be conducted without sampling
frame. However, this method can be biased when there is cyclic patter in the order of the
subjects.

 For example, to select a sample of 25 dorm rooms in your college dorm, make a list of all
the room numbers in the dorm. Say there are 100 rooms. Divide the total number of rooms
(100) by the number of rooms you want in the sample (25). The answer is 4. This means
that you are going to select every fourth dorm room from the list. But you must first
consult a table of random numbers. Pick any point on the table, and read across or down
until you come to a number between 1 and 4. This is your random starting point. Say your
random starting point is "3". This means you select dorm room 3 as your first room, and
then every fourth room down the list (3, 7, 11, 15, 19, etc.) until you have 25 rooms
selected.

B) Non Random Sampling or non probability sampling.


 It is a sampling technique in which the choice of individuals for a sample depends on
the basis of convenience, personal choice or interest.

151

Prepared by Big Bang, August, 2017 GC


 It is any sampling method where some elements of the population have no chance of
selection or where the probability of selection can't be accurately determined.
 In No probability sampling, the sample is less likely to be representative of the
population, thus information about the relationship between sample and population is
limited, making it difficult to extrapolate from the sample to the population.
 Non probability sampling is used when there is no sampling frame to conduct
probability sampling, or when it is impossible to conduct probability sampling due to
economical and feasibility factors
 Non probability sampling is divided into Purposive, Convenience, Quota and
Snowball Sampling.
A) Judgmental or Purposive Sampling

The researcher chooses the sample based on who he/she think would be appropriate for the
study.Samples are taken based on previous knowledge of the population (from which the
samples are taken), and the specific purpose of the study or investigation. Researchers use
their personal judgment in selecting the sample(s)

B) Convenience Sampling

The selection of units from the population is based on easy availability and/or accessibility.

C) Quota Sampling

It starts with systematically setting “Quota” to represent subgroups of a population. Then


data is collected to meet the predefined Quota.

D) Snowball Sampling

152

Prepared by Big Bang, August, 2017 GC


The researcher begins by identifying someone who meets the inclusion criteria of the study.
Then the study subject would be asked to recommend others who s/he may know who also
meet the criteria.

Sampling Distribution
Because statistic such as x varies from sample to sample, they are random variables. As
such, Statistic has probability distributions associated with them. In order to make
probability statements regarding a sample statistic, we need to know the probability
distribution of the sample statistic. That is to say, we need to know the shape, center and
spread of the sample statistic’s distribution.

The sampling distribution of a statistic is a probability distribution for all possible values of
the statistic computed from a sample of size n.

 There are commonly three properties of interest of a given sampling distribution.


 Its Mean
 Its Variance
 Its Functional form.

7.2 SAMPLING DISTRIBUTION OF THE SAMPLE MEAN

Sampling distribution of the sample mean is a theoretical probability distribution that shows
the functional relation ship between the possible values of a given sample mean based on
samples of size and the probability associated with each value, for all possible samples of
size drawn from that particular population.

Steps for the construction of Sampling Distribution of the mean


1. From a finite population of size N, randomly draw all possible samples of size n
2. Calculate the mean for each sample.

153

Prepared by Big Bang, August, 2017 GC


3. Summarize the mean obtained in step 2 in terms of frequency distribution or relative
frequency distribution.

Example 7.1

Consider a population consisting of values 2, 4, 6, 8

Let as take single random of sample from this population: - so that size 2 with replacement
is Nn = 42 = 16.

Samples Mean Sample Mean

(2,2) 2 (2,6) 4

(4,2) 3 (4,6) 5

(6,2) 4 (6,6) 6

(8,2) 5 (8,6) 7

(2,4) 3 (2,8) 5

(4,4) 4 (4,8) 6

(6,4) 5 (6,8) 7

(8,4) 6 (8,8) 8

Mean frequencies probability.

2 1 1/16

3 2 2/16

154

Prepared by Big Bang, August, 2017 GC


4 3 3/16 sampling distribution of the sample mean.

5 4 4/16

6 3 3/16

7 2 2/16

8 1 1/16

where =the mean of sample mean.

, the mean of the sample mean is the same as the population mean.

Activity 7.1

Suppose we have a population of size N=5, consisting of the age of five children: 6, 8, 10,
12, and 14.

155

Prepared by Big Bang, August, 2017 GC


Take samples of size 2 with replacement and construct sampling distribution of the sample
mean.

Remark:
1. In general, if sampling is with replacement or while sampling from an infinite
population.

2. If sampling is with out replacement

, where the term is called the finite population correction

factor (fpc)
3. In any case the sample mean is unbiased estimator of the population mean.
That is, (show this)
 Sampling may be from a normally distributed population or from a non- normally
distributed population.
 When sampling is from a normally distributed population, the distribution of will
posses the following property.
1. The distribution of will be normal

2. The mean of is equal to the population mean, i.e.


3. The variance of is equal to the population variance divided by the sample size i.e.

Activity 7.2

156

Prepared by Big Bang, August, 2017 GC


If the uric acid values in normal adult males are approximately normally distributed with
mean 5.7 mgs and standard deviation 1mg find the probability that a sample of size 9 will
yield a mean.
i. greater that 6
ii. between 5 and 6
iii. Less that 5.2

157

Prepared by Big Bang, August, 2017 GC


7.3 SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION
As

where .

Thus, for some constants

where Z is the standard normal random variable.

Note: Since increasing the sample size will decrease the standard error!!

Thus, the larger the sample size is, the larger is (since the interval

is larger than the one with smaller sample size)!!

Example 7.2

What is the probability of the difference between the sample proportion and the population
proportion will be less or equal to 0.05 as the sample size What is the probability as
we increase the sample size to 100?

Solution

. Thus,

158

Prepared by Big Bang, August, 2017 GC


There is 42.46% chance that the difference between the sample proportion and the
population proportion is not more than 0.05 as . .

As sample size is increased to 100, then

Thus,

There is 69.22% chance that the difference between the sample proportion and the
population proportion is not more than 0.05 . That is, the larger sample size will
provide a higher probability that the value of the sample proportion will be within a specific
distance of the population proportion.

Example 7.3
A new soft drink is being market tested. It is estimated that 60% of consumers will like the
new drink. A sample of 96 taste-tested the new drink.
(a) Determine the standard error of the proportion
(b)What is the probability that equal to or more than 70.4% of consumers will indicate they
like the drink?
(c) What is the probability that equal to or more than 30% of consumers will indicate they do
not like the drink?

Solution:

159

Prepared by Big Bang, August, 2017 GC


(a)

(b)

(c) We need to compute the probability that less than 70% of consumers will indicate they
like the drink?

Example 7.4

What is the most important factor for business travelers when they are staying in a hotel?
According to USA Today, 74% of business travelers state that having a smoke-free room is
the most important factor. Assume that the population proportion is and that a
sample of 200 business travelers will be selected.

(a) What is the probability that the sample proportion will be within of the population
proportion?
(b) Suppose the probability that a sample proportion will be within of the population
mean is 0.9. What is the sample size n?
Solution:

(a)

160

Prepared by Big Bang, August, 2017 GC


(b)

161

Prepared by Big Bang, August, 2017 GC


7.4 STANDARD ERRORS
7.4.1 Standard Error of a Sample Mean

Rarely would one construct a sampling distribution of means and derive the standard error of
this distribution in order to Determine the error in generalizing to the population Instead, the
standard error of a sampling distribution of means ( ) can be estimated from the standard
error of the mean of a single sample:

Example 7.5
Consider the following summarized data for case processing time.

X = 72 days, S = 3 days, N = 80.

7.4.2 Standard Error of a Proportion

The Central Limit Theorem applies to the sampling distribution of a proportion. The
standard error of the sampling distribution of a proportion can be estimated from a single
sample, in a manner similar to that used with the mean

Example 7.6
Survey of attitudes towards the death penalty (N=800)

162

Prepared by Big Bang, August, 2017 GC


P = proportion favorable = 0.60
Q = proportion unfavorable = 0.40
Standard error of P

95% confidence interval

P 1.96 (SP) = 0.60 1.96 (0.017)

95% interval: (0.5667 to 0.6333)

7.5. THE CENTRAL LIMIT THEOREM

Suppose a random variable X has population mean μ and standard deviation σ and that a
random sample of size n is taken from this population. Then the sampling distribution of

becomes approximately normal with and variance as the sample size n


increases (n ).

Simply stated: For any population, regardless of its shape, as the sample size increases, the

shape of the sampling distribution of the sample mean, , becomes more normal.

Example 7.7

For a population of 2,000 students living in hostels of the monthly mean expenditure on
three meals is 500 birr with a variance of 144, if sampling is with replacement find the
probability that a random sample of size 36 student from this population yields a mean
expenditure of less than birr 495 per month

Solution: Given, =500 , δ= 12, n= 36

163

Prepared by Big Bang, August, 2017 GC


Activity 7.3

1) Suppose that all students who are at examination in a particular year the mean score was
450 with s.d of 120.If 400 of the students who took the test during that particular year were
selected at random.

a) Determine the standard error of the mean


b) What is the probability that their scores have a mean
i) greater than 456
ii) Between 440 and 460
2) In 2000, as reported by ACT Research Service, the mean ACT Math score was =20.7 If
ACT Math scores are normally distributed with =5, answer the following questions.

(a) Describe the sampling distribution of the sample mean,

(b) What is the probability that a randomly selected student has an ACT Math score less than
18?

(c) What is the probability that a random sample of 10 ACT test takers had a mean math
score of 18 or less?

164

Prepared by Big Bang, August, 2017 GC


CHAPTER SUMMARY

We emphasized in this chapter that data can be collected by taking in to consideration of


each and every member of the population or by taking some members of the population as a
representative. A complete enumeration of all elements of a universe is known as a census
survey. Census survey is important because no element of chance will be entered in the data
that will be collected using this method and hence the highest accuracy will be obtained.
Census survey is more essential when the population size is small and the elements of the
population are heterogeneous. But the problem with this method is that it requires more
money, time and energy. Thus, this method is beyond the reach of individual researchers.

Another method of getting information about the population is by taking a small proportion
of a population which can be technically called samples. Sampling is used extensively in all
facets of business and government.

The purpose of selecting a sample from a population is to generalize about a certain


phenomenon of the population. Most business and government decisions are based on
incomplete data because a study of the total population would either be too costly or too time
consuming. If every item in the population has a definable chance of being selected, the
sample is called probability sample. Simple random, systematic, stratified and cluster
sampling techniques are methods of assuring random selection of items from the population.
If the selection of every item in a sample is not governed by the laws of chance, the sampling
is considered non-probability sampling.

Most non-probability types of sampling (Judgment, quota, Convenience and Referral) have
common weakness. The choice of items selected in the sample is left to the discretion of the
researcher. Some users of non-probability sampling recognize the disadvantages of this type
sampling but consider that the cost saving and convenience outweighs the disadvantages.
The main disadvantage is that the reliability or accuracy of the sample results cannot be
accurately measured. There fore, the subsequent discussion involving the reliability of
sample results concerns only probability sampling techniques.

Exercises on Chapter 7
165

Prepared by Big Bang, August, 2017 GC


1. a) Consider a population consisting of 12 items, numbered 1 through 12. List all

possible systematic samples of size 3.

b) List all possible systematic samples of size 4 from a population consisting

of 14 items, identified as the numbers 01 through 14 .

2. A stratified sample of size 80 is to be taken from a population of size 2000, which

consists of four strata of size 500, 1200, 200, and 100. How large a sample must be taken

from each stratum if the allocation is to be proportional?

3. A population consists of the four numbers, 3,7,11, and 15. Consider all possible samples
of size 2 drawn from this population without replacement. Find

a) b) ;

c) the mean of the sampling distribution of means;

d) the standard deviation of the sampling distribution of means.

Verity (c) and (d ) from (a) and (b) using suitable formulae.

4. What is the value of the fpc when

a) n = 10 and N = 200;

b) n= 20 and N = 200;

c) n = 40 and N = 400;

d) n = 400 and N = 4,000?

166

Prepared by Big Bang, August, 2017 GC


CHAPTER 8

ESTIMATION AND HYPOTHESIS TESTING

CONTENTS

8.1. POINT AND INTERVAL ESTIMATION OF THE MEAN 160

8.2. POINT AND INTERVAL ESTIMATION OF THE PROPORTION 166

8.3. SAMPLE SIZE DETERMINATION 168

8.4. HYPOTHESIS TESTING ABOUT THE MEAN 171

8.5. HYPOTHESIS TESTING ABOUT THE PROPORTION 176

8.6. TESTS OF ASSOCIATION 179

INTRODUCTION

Statistical inference involves the procedures of reaching conclusions about a population or


populations based on the sample results. Our inference may be estimation of a population
parameter or testing an idea (hypothesis) about a population or populations. Thus we
generally divide statistical inference into two as estimation and test of hypothesis. Both
estimation and hypothesis are an idea about something around as. The procedure we follow
to accept or reject a hypothesis is called test of the hypothesis. To accept or reject a
hypothesis, we base ourselves on a sample evidence. If the sample evidence does not agree
with what is hypothesized about a population, we reject the hypothesis

The concept of estimation and hypothesis testing is used indifferent aspects of human life
and different fields of study.

167

Prepared by Big Bang, August, 2017 GC


Objectives

At the end of this chapter students are expected to be able to:

 Explain the meaning of estimation & hypothesis

 State the desirable properties of a point estimation.

 Discriminate between point estimation & interval estimation

 Compute and interpret the confidence interval for population mean and proportion

 Discriminate between one-tailed and two-tailed tests

 Discriminate between type I and Type II errors

 State the steps of testing a hypothesis

 Explain on how to apply normal distribution and when to use z – distribution in

testing a hypothesis

 Concepts Statistical Estimation and Hypothesis testing

Inference is the process of making interpretations or conclusions from sample data for the
totality of the population.
Inferential statistics uses the sample results to make decisions and draw conclusions about
the population from which the sample is drawn.
In statistics there are two ways though which inference can be made.
 Statistical estimation
 Statistical hypothesis testing
Both involve using sample statistics to make inferences about the
population parameter.

Both involve using sample statistics to make inferences about the population parameter.

168

Prepared by Big Bang, August, 2017 GC


Populatio Inference Analyzed
n data

Sample
Numerica
l data

Statistical Estimation
This is one way of making inference about the population parameter where the investigator
does not have any prior notion about values or characteristics of the population parameter.
There are two ways estimation:
i. Point Estimation: It is a single value or number of sample information that is used
to estimate a parameter. The best point estimate of the population mean is the
sample mean
ii. Interval estimation: It is the procedure that results in the interval of values as an
estimate for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a parameter.

Estimator and Estimate


Estimator is the rule or random variable that helps us to approximate a population parameter.
But estimate is the different possible values which an estimator can assume. For example:
The sample mean

169

Prepared by Big Bang, August, 2017 GC


is an estimator for the population mean and is an estimate, which is one of the
possible values of

Properties of best estimator


The following are some qualities of an estimator
o It should be unbiased.
o It should be consistent.
o It should be relatively efficient.

To explain these properties let be an estimator of θ


1. Unbiased Estimator: An estimator whose expected value is the value of the parameter

being estimated. i.e.


2. Consistent Estimator: An estimator which gets closer to the value of the parameter as the

sample size increases. i.e. gets closer to θ as the sample size increases.
3. Relatively Efficient Estimator: The estimator for a parameter with the smallest variance.
This actually compares two or more estimators for one parameter.

8.1. POINT AND INTERVAL ESTIMATION OF POPULATION MEAN

8.1.1. Point estimation of the population mean: μ


Another term for statistic is point estimate, since we are estimating the parameter value. A
point estimator is the mathematical way we compute the point estimate. For instance, sum of
Xi over n is the point estimator used to compute the estimate of the population means, That

is, is a point estimator of the population mean.

8.1.2. Confidence interval estimation of the population mean


Although possesses nearly all the qualities of a good estimator, because of sampling error,
we know that it's not likely that our sample statistic will be equal to the population
170

Prepared by Big Bang, August, 2017 GC


parameter, but instead will fall into an interval of values. We will have to be satisfied
knowing that the statistic is "close to" the parameter. That leads to the obvious question,
what is "close"?
We can phrase the latter question differently: How confident can we be that the value of the
statistic falls within a certain "distance" of the parameter? Or, what is the probability that the
parameter's value is within a certain range of the statistic's value? This range is the
confidence interval. A confidence interval is a specific interval estimate of a parameter
determined by using data obtained from a sample and the specific confidence level of the
estimate.
The confidence level is the probability that the value of the parameter falls within the range
specified by the confidence interval surrounding the statistic. There are different conditions
to be considered to construct confidence intervals of the population mean,

Condition-1: If the population variance is known; what ever the value of sample size but
the population is normal

Recall the Central Limit Theorem, which applies to the sampling distribution of the mean
of a sample. Consider samples of size n drawn from a population, whose mean is μ and
standard deviation is with replacement and order important. The population can have any

frequency distribution. The sampling distribution of will have a mean and a

standard deviation , and approaches a normal distribution as n gets large.


This allows us to use the normal distribution curve for computing confidence intervals.

, where is a measure of error.

171

Prepared by Big Bang, August, 2017 GC


- For the interval estimator to be good the error should be small. How it is small?
• By making n large
• Small variability
• Taking Z small
-To obtain the value of Z, we have to attach this to a theory of chance. That is, there is an
area of size 1- Such that:

Where: = is the probability that the parameter lies outside the interval

is the value of the standard normal variable corresponding to the

right of which probability lie , i.e.

If the population has a normal distribution and is known, then a


confidence interval for is given by:

Note: When (as is often the case) we don't know the population standard deviation and n is
large ( ), we can approximate it by the sample standard deviation , and obtain the

following (good) approximation of the confidence interval for :

Z-value with an area of /2 to its right (obtained from a table).

172

Prepared by Big Bang, August, 2017 GC


Condition-2: If the population variance is not known and n is Small (n<30 the population
is normal:

In most practical research, the standard deviation for the population of interest is not known.
In this case, the standard deviation is replaced by the estimated standard deviation S, also
known as the standard error. Since the standard error is an estimate for the true value of the
standard deviation, the distribution of the sample mean is no longer normal with mean
and standard deviation . Instead, the sample mean follows the -distribution with mean
and standard deviation . The -distribution is also described by its degrees of
freedom. For a sample of size n, the -distribution will have n-1 degrees of freedom. The

notation for a -distribution with n-1 degrees of freedom is . As the sample size n
increases, the -distribution becomes closer to the normal distribution, since the standard
error approaches the true standard deviation for large n.

has distribution with n-1 degree of freedom.

-The value of can be obtained from a table with an area of to the right with
degrees of freedom.

Therefore, the confidence interval for when the population is normally


distributed and is not known is given by:

Example 8.1:

A random sample of 900 workers showed an average height of 67 inches with a standard
deviation of 5 inches.

173

Prepared by Big Bang, August, 2017 GC


a) Find a 95% confidence interval of the mean height of all workers
b) Find a 99% confidence interval of the mean height of all workers
Solution:

a) , S=5, n=900

from the table.

The required interval will be:

b)

from the table.

The required interval will be:

Example 8.2

A Drug Company is testing a new drug which is supposed to reduce blood pressure. From
the six people who are used as subjects, it is found that the average drop in blood pressure is
2.28 points, with a standard deviation of 0.95 points. What is the 95% confidence interval for
the mean change in pressure?

Solution:
174

Prepared by Big Bang, August, 2017 GC


, ,

from the table, with

The required interval will be:

Example 8.3

Suppose we want to estimate a 95% confidence interval for the average quarterly returns of
all fixed-income funds in the Ethiopia. We draw a sample of 100 observations and calculate
the sample mean to be 0.05 and the standard deviation 0.03. We assume that those returns
are normally distributed with known variance.

Solution:

n=100

from the table

The confidence interval is:

175

Prepared by Big Bang, August, 2017 GC


8.2. POINT AND INTERVAL ESTIMATION OF THE POPULATION
PROPORTION

If P represents for the population proportion then the sample proportion provides a

good estimate of P. Therefore, the sample proportion is the point estimation of the
population proportion. To construct the confidence interval for the proportion we follow the
following conditions:

Conditions: If the population proportion is not too close to zero or one, and
that the sample size is large (at least 30):

 Under these conditions, the sampling distribution can be approximated by a

normal distribution that has mean P and standard deviation


To construct a confidence interval for P, we can now adopt the same argument that was used
in finding a confidence interval for and write:

Hence a ( ) 100% confidence interval is population proportion P is given by:

An Approximate ( ) 100% confidence interval for the population proportion P is


given by:

176

Prepared by Big Bang, August, 2017 GC


If the sample size is large (usually n>30)

Example 8.4

In a sample of 400 people who were questioned regarding their participation in sports, 160
said that they did participate. Construct a 98 % confidence interval for P, the proportion of P
in the population who participate in sports.

Solution:

Let X= be the number of people who are interested to participate in sports.

X=160, n=400, =0.02, Hence

As a result, an approximate 98% confidence interval for P is given by:

Hence, we can conclude that about 98% confident that the true proportion of people in the
population who participate in sports between 34.5% and 45.7%.

8.3 SAMPLE SIZE DETERMINATION

177

Prepared by Big Bang, August, 2017 GC


Before a sample is actually collected, the required sample size for testing a hypothesis
concerning the population proportion can be determined by specifying (1) the hypothesized
value of the proportion, (2) a specific alternative value of the proportion such that the
difference from the null-hypothesized value is considered important, (3) the level of
significance to be used in the test, and (4) the probability of Type II error that is to be
permitted. The formula for determining the minimum sample size required for testing a
hypothesized value of the proportion is

In the above equation, z0 is the critical value of z used in conjunction with the specified level
of significance (α level), while Z 1 is the value of z with respect to the designated probability
of Type II error (β level). In determining sample size for testing the mean, z 0 and Z1 always
have opposite algebraic signs.
The result is that the two products in the numerator will always be accumulated. Also, the
above equation can be used in conjunction with either one-tail or two-tail tests and any
fractional sample size is rounded up. Finally, the sample size should be large enough to
warrant use of the normal ability probability distribution in conjunction with P0 and P1.

Hypothesis Testing

A statistical hypothesis test is a method of making statistical decisions using experimental


data.

Hypothesis Testing: Is a common method of drawing inferences about a population based on


statistical evidence from a sample.

Definitions
Statistical hypothesis

178

Prepared by Big Bang, August, 2017 GC


This is an assertion, statement, or claim about the population whose plausibility is to be
evaluated on the basis of the sample data.
Test statistic: Is a statistics whose value serves to determine whether to reject or accept the
hypothesis to be tested. It is a random variable.
Statistic test: Is a test or procedure used to evaluate a statistical hypothesis and its value
depends on sample data.
There are two types of hypothesis:

Null hypothesis
This is a claim or statement about a population parameter that is usually assumed to be true
from the very beginning until it is declared false. It is a statistical hypothesis that states a
hypothesis of equality or the hypothesis of no difference between a parameter and a specific
value. It is usually denoted by H0.
Alternative hypothesis: Is a claim or statement about a population parameter that will be true
if the null hypothesis is false. It is a statistical hypothesis that states a hypothesis of
difference between a parameter and a specific value. It is usually denoted by H1 or HA.
Types and size of errors:
 Testing hypothesis is based on sample data which may involve sampling and non
sampling errors.
 Type I error: Rejecting the null hypothesis when it is actually true. The significance
level ( ) can be interpreted as the probability of rejecting the null hypothesis when it
is actually true. The distribution of the test statistic under the null hypothesis
determines the probability of a type I error.
=P (type I error) = level of significance
 Type II error: Occurs when a false null hypothesis is not rejected. The null
hypothesis is actually false but we wrongfully conclude do not reject it.
represents the probability that H0 is not rejected when actually H0 is false. The

179

Prepared by Big Bang, August, 2017 GC


distribution of the test statistic under the alternative hypothesis determines the
probability of a type II error.
=P (type II error)
 The power of a test ( ) is the probability of correctly rejecting a false null
hypothesis. The value of ( ) is called the power of a test.
=Power of test

Note: The two types of errors that occur in tests of hypothesis depend on each other. We
can not lower the values of and simultaneously for a test of hypothesis for a fixed
sample size. Lowering the value of will raise the value of , and lowering the value of
will raise the value of . However, we can decrease both and simultaneously by
increasing the sample size.
The following table gives a summary of possible results of any hypothesis test:

Actual situation (condition)


H0 is true H0 is false
(H1 is false) (H1 is true)
Decision Do not Reject H0 Correct Decision Type II error
Reject H0 Type I error Correct Decision

General steps in hypothesis testing:


1. State the appropriate hypothesis
2. Select the level significance,
3. Select an appropriate test statistics
4. Identify the critical region.
5. Compute the test value
6. Making the decision.

180

Prepared by Big Bang, August, 2017 GC


7. Summarize the results.

8.4. HYPOTHESIS TESTS ABOUT THE MEAN:

1. VS

2. VS

3. VS

Condition-1

If the population standard deviation, is known what ever the value of sample size is and
when sampling is from a normal distribution:

The formula for the test statistic is:

After specifying α we have the following test criteria corresponding to the above three
hypothesis.

Hypothesis Decision rule is to


reject H0 if:
Null Alternative
VS

Note: When we don't know the population standard deviation and n is large ( ), we
can approximate it by the sample standard deviation , and obtain the following test
statistics:

181

Prepared by Big Bang, August, 2017 GC


-The decision rule is the same as condition-1.

Condition-2

When the population standard deviation, , is unknown, the population is normally or


approximately normally distributed, and sample size is small (n<30):

The formula for the test statistic is:

After specifying α we have the following test criteria corresponding to the above three
hypothesis.

Hypothesis Decision rule is to


reject H0 if:
Null Alternative
VS

Example 8.5

182

Prepared by Big Bang, August, 2017 GC


Ethio Telecom provides telephone service in an area. According to the company’s records,
the average length of all calls placed was 12.5 minutes. A sample of 150 such calls placed
through this Co. produced a mean length of 13 minutes with a standard deviation of 2.6
minutes. Can you conclude that the mean length of all current calls is different from 12.5
minutes? Use the 0.05 level of significance and assume that the distribution of all call is
normal.

Solution:

Let population mean

1. State the null and alternative hypothesis:

(The mean length of all current calls is 12.5 minutes)

(The mean length of all current calls is different from12.5


minutes).

2. Select the level significance, = 0.05 (given)


3. Select an appropriate test statistics:
Z-statistic is appropriate because the sample size is large
4. Identify the critical region:
Here we have two critical regions since we have two tailed hypothesis. The

critical region is
is the acceptance region
5. Compute the test value
, , n=150

6. Decision:

Reject H0, since is not in the acceptance region

183

Prepared by Big Bang, August, 2017 GC


7 Conclusion: At 5% level of significance, we have evidence to say that the average length
of all such calls is not equal to 12.50 minutes.

Example 8.6
Ten individuals are chosen at random from a population and their height is found to be in
inches 63, 63, 66, 67, 68, 69, 70, 71 and 71. In the height of the data the average height of
the population is 66 inches. Can we conclude that the height of an individual is decreasing?
(Use and assume the normality of the population)

Solution:

Let population mean

1. State the null and alternative hypothesis:

VS

2. Select the level significance, = 0.05 (given)


3. Select an appropriate test statistics:
-statistic is appropriate because the population standard deviation is unknown
and the sample size is small.
4. Critical region:

is the acceptance region.


5. Compute the test value

, , n=10

6. Decision:

Reject H0, since is not in the acceptance region


184

Prepared by Big Bang, August, 2017 GC


7. Conclusion: At 5% level of significance, we have evidence to say that the average
height of an individual is less than 66 inches.
Example 8.7
A national magnitude claims that the average college student watches less television. The
average national of all college students is 29.4 hours per week with a standard deviation of 2
hours. A sample of 25 college students has a mean of 27 hours. Test the claim at
and assume normality of the population.
Solution:
1. State the null and alternative hypothesis:

VS

2. Select the level significance, = 0.01 (given)


3. Select an appropriate test statistics:
Z-statistic is appropriate because the population standard deviation is known.
4. Critical region:

is the acceptance region for the null hypothesis


5. Compute the test value
, n=25

6. Decision:

Do not reject H0, since is not in the acceptance region


7. Conclusion: The average college students watches less television at 1% level of
significance

Example 8.8

185

Prepared by Big Bang, August, 2017 GC


An authority from a district power station of the town told reporters recently that the average
monthly electric Bill of households in AA is not more than Birr 100. A random sample of
400 households from the city produces a mean of Birr 105 Bill with standard deviation of
Birr 40. Test the claim of the authority at 5% level of significance.
Solution:
1. State the null and alternative hypothesis:

VS

Select the level significance, = 0.05 (given)

2. Select an appropriate test statistics:


Z-statistic is appropriate because the sample size is large and the population is non-
normal
3. Critical region:

is the acceptance region for the null hypothesis


4. Compute the test value

5. Decision:

Reject H0, since is not in the acceptance region


6. Conclusion: At 5% level of significance the claim of the authority is not correct.

8.5. TESTS ABOUT A POPULATION PROPORTION: P

The procedure to make tests of hypothesis about the population proportion for large
samples is similar in many aspects to the population mean. The procedure includes the same
seven steps. Similarly, the test can be two-tailed or one tailed. When the sample size is large,

the sample proportion is approximately normally distributed with its mean equal to and

186

Prepared by Big Bang, August, 2017 GC


standard deviation equal to Hence; we use the normal distribution to perform a
test of hypothesis about the population proportion for a large Sample. The sample size

considered to be large when and are both greater than 5.

Suppose the assumed or hypothesized value of (parameter of the binomial distribution) is

denoted by then one can formulate two sided (1) and one sided (2 and 3) hypothesis as
follows:

1. VS

2. VS

3. VS

The choice of depends on the prior information we have on the values of .


Decision Rule:

Hypothesis Decision rule is to


reject H0 if:
Null Alternative
VS

Example 8.9

A manufacturing company has submitted a claim that 100% of items produced by a certain
process are non defective. An improvement in the process is being considered that the feel
187

Prepared by Big Bang, August, 2017 GC


will lower the proportion of defectives below the current 10%. In an experiment 100 items
are produced with the new process and 5 are defective: Is this evidence sufficient to conclude
that the method has been improved? Use a 0.05 level of significance.

Solution: As usual, we follow the steps:

1. (actually ) VS
2.
3. Critical Region: Z>1.645
4. Computation

5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has reduced
the proportion of defective.

Example 8.10

The unemployment rate in a given country at a given period is believed to be 10%. The
government embarked on a series of projects to reduce unemployment. It was of interest to
determine whether unemployment decreases as a result of the projects. A random sample of
500 people was chosen, and 48 of them were found to be unemployed. Test at 1% level of
significance if the government projects reduced the unemployment rate

Solution: As usual, we follow the steps:

188

Prepared by Big Bang, August, 2017 GC


1. VS
2.
3. Critical Region: Z<-Z1.645

4. Critical Region:
5. Computation

6. Decision: Do not reject H0 since Zcal > Ztab


7. Conclusion: the government projects didn’t reduce unemployment.

Activity 8.1
A large sample of 200 students from the students of a certain high school is interviewed and
85 of them are found to use city bus. Can you conclude that at least 40% of the students
use city bus? Use a 0.05 level of significance.

8.6. Test of Association

In the previous section we tried to see how we can test hypothesis for numeric data give in
the form of mean or proportion. It is also possible to apply hypothesis testing on categorical
data.
Suppose that we have a population consisting of observations having two attributes or
qualitative characteristics say A and B.
If the attributes are independent then the probability of possessing both A and B is PA*PB

189

Prepared by Big Bang, August, 2017 GC


Where PA is the probability that a number has attribute A.
PB is the probability that a number has attribute B.
Suppose A has r mutually exclusive and exhaustive classes.
B has c mutually exclusive and exhaustive classes
The entire set of data can be represented using c*r contingency table.

A B1 B2 . . Bj . Bc Total

A1 O11 O12 O1j O1c R1


A2 O21 O22 O2j O2c R2
.
.
.
Ai Oi1 Oi2 Oij Oic Ri
.
.
.
Ar Or1 Or2 Orj Orc
Total C1 C2 Cj n

The chi-square procedure test is used to test the hypothesis of independency of two attributes

The statistic is given by:

..Where =The number of units that belong to category i of A and j of B.

= Expected frequency that belong to category i of A and j of B and is given by

Where Ri=the raw total

190

Prepared by Big Bang, August, 2017 GC


Cj= the column total.
n=total number of observation.

Remarks:

- The null and alternative hypothesis may be stated as:


H0: There is no association between A and B.
H1: not H0 (There is association between A and B).
Decision Rule:

- Reject H0 for independency at α level of significance if the calculated value of exceeds


the tabulated value with degree of freedom equal to (c-1) (r-1).

Example 8.12

In an experiment to study the dependence of hypertension on smoking habits, the following


data are taken on 180 individuals

Non
Moderate smoker Heavy smokers Total
smoker
Hypertension 21(33.5) 36 (29.47) 30(23.68) 87
No Hypertension 48(35.365) 26(32.03) 19(25.32) 93
Total 69 62 49 180
At .Test weather presence or absence of hypertension depends on smoking habit?

Solution

: Presence or absence of hypertension is independent of smoking habit

H1: Ho is not true.

191

Prepared by Big Bang, August, 2017 GC


Decision: Since 14.46>5.99 we reject the null hypothesis

Conclusion: Smoking and presence and absence of hypertension is related

Activity 8.2

A researcher is interested to assess the effect of litracy on family planning use. Accordingly
he collected data and tabulated the findings in the following manner. Can we say there is
association between educational status and family planning use?

FP Use Educational Status Total


Ilitrate Litrate
Yes a 63 b 49 112

No c 15 d 33 48

Total 78 82 160

192

Prepared by Big Bang, August, 2017 GC


CHAPTER SUMMARY

In this chapter we have seen some important points such as:

 Statistical inference involves the procedures of reaching conclusions about a


population 1.5 on sample variance

 There are two types of inferences. These are estimation and tests of
hypothesis

 There are two types of estimations. These are point estimation and Interval
estimation

 In point estimation a single sample result is used to approximate the


population parameter value, while in the interval estimation range of values is used to
estimate the population parametric value.

 The formula that we use for a particular confidence interval estimation


depends on the availability of the population variance and the size of the sample under
consideration

 The degrees of confidence, the maximum allowable errors are the three
important factors needed in the determination of the sample size for a particular problem

 Hypothesis is an idea about a given population parameter

 Test of hypothesis is the procedure we follow either to accept or reject the hypothesis.

 The type of distribution we use for a particular problem


depends on the sampling distribution of the sample statistic under consideration. For
testing about the mean, sample size and the availability of the population variance are the
two most important factors to determine the distribution to be used for a test.

Exercises for Chapter 8

193

Prepared by Big Bang, August, 2017 GC


1. A travel agent estimates that the average cost of three day trip to a park is 915.60 . People
who schedule the trip paid an average cost of 927 of the fee. The population S.d is 35.

At =0.05 test whether .

2. The mean life time of a sample 16 light bulbs is 1570 hrs with standard deviation of 110
hours test the hypothesis that there is some improvement in the mean life of time o f light
bulbs at =0.05

3. A sociologist claims that the average age of murderer victims in small city is less than or
equal 23.2 yrs. A sample of 18 recent victims had a mean age of 22.6 at =0.05 test the
sociologists claim the population s.d is 2 years

4. A sample of 50 days showed that a fast food restaurant served 182 customers during lunch
time. The standard deviation of a sample was 8. Find the 95% CI for the mean N.

5. The president of a large university wants to estimate the average age of the students
presently enrolled. From past studies the standard deviation is known to be 2 year. A
sample of 50 students is selected and the mean is found to be 23.2 years. Contract 95% CI
for the population mean

6. A samples of 16 private-duty nurses showed an average salary of 480 birr. A standard


deviation of the sample was 64. Contract the 95% CI for all nurses in private- duty.

7. A theory predicts that the population of beans in the 4 groups A, B,C,D should be in the
ratio 9:3:3:1. In an experiment among 1600 bean, the number in the four groups are
882,313,287 & 118. Does observed mean that support the theory

194

Prepared by Big Bang, August, 2017 GC


8. A geneticist took a random sample of 300 men to study whether there is association
between father and son regarding boldness. He obtained the following results.

Son
Father Bold Not
Bold 85 59
Not 65 91

Using α=5% test whether there is association between father and son regarding boldness.

9. Random samples of 200 men, all retired were classified according to education and
number of children is as shown below

Number of children

0-1 2-3 Over 3


Education level
Elementary 14 37 32
Secondary and above 31 59 27

195

Prepared by Big Bang, August, 2017 GC


196

Prepared by Big Bang, August, 2017 GC


CHAPTER 9

TWO SAMPLE INFERENCES

CONTENTS

9.1. INFERENCES ABOUT DIFFERENCES BETWEEN MEANS 187

9.2. INFERENCES ABOUT DIFFERENCES BETWEEN

PROPORTIONS 194

9.3. INFERENCES CONCERNING VARIANCES 198

INTRODUCTION

Dear learner, in the previous chapter, you have been introduced to the two problems of
statistical inference; namely, statistical estimation and tests of hypothesis, though restricted
to one mean and one proportion. This chapter is a natural continuation of the previous.

The general focus of this chapter is on testing hypotheses and constructing confidence
intervals about parameters (means and proportions) from two populations, thereby enabling
you to meet the following objectives:

 Test hypotheses and construct confidence intervals about the difference between two
population means and proportions using data from large samples.

 Test hypotheses and establish confidence intervals about the difference between two
population means and proportions using data from small samples when the
population variances are unknown and the populations are normally distributed.

197

Prepared by Big Bang, August, 2017 GC


 Test hypotheses and construct confidence intervals about two population variances
when the two populations are normally distributed.

9.1. INFERENCES ABOUT DIFFERENCES BETWEEN MEANS

In single-sample inference (Chapters 7 & 8) the process is always the same:


(1) Obtain a random sample and
(2) Conduct the appropriate analysis (hypothesis test or interval estimate)
In two-sample inference, you now get to be involved more directly in deciding how to obtain
the sample data:

Example 9.1
Suppose you want to compare two different methods of production, A and B, to see which,
on average, requires less time. You could decide to use either of the two following
sampling plans:

“independent-samples approach” “paired-samples approach”

(1) Have a random sample of 20 (1) Obtain one sample of 25

people use method A and measure people and

the time each takes to complete (2) Have each person use the

production task. method A, then

(2) Do the same thing for a different (3) Have each person use

random sample of, say, 30 people method B, then

(3) Compare the average completion (4) Compare the method A

times for the two groups results to the method B

results for each person

198

Prepared by Big Bang, August, 2017 GC


Deciding on how to obtain the data for comparing two (or more) averages is called
Experimental Design (also called Design of Experiments and abbreviated DOE)

Although two-sample inference is the simplest kind of Experimental Design, most of the
important concepts of Experimental Design are illustrated in the two-sample case:

(1) The independent-samples method (for comparing 2 averages) described above


generalizes to what is called the completely randomized design
(2) Similarly, the paired-samples method generalizes to something called the
randomized block design

9.1.1. Comparing two means using independent samples

Goal: Compare two population means A and B by comparing the sample means and
from two random samples, one taken from population A and the other from population
B.

Data layout:

Sample A Sample B

x1 y1

x2 y2

x3 y3

: :

xnA ynB

Note: The two sample sizes nA and nB don’t have to be equal.

Statistics (sample means & standard deviations) calculated from data:


Sample A Sample B

199

Prepared by Big Bang, August, 2017 GC


and sA and sB

(3) The analysis then proceeds slightly differently depending on whether the
populations standard deviations are known/given or not:

If both A and B are known If both A and B are unknown

Statistic: - Statistic: -

Standard Error: Standard Error:

Distribution: z Distribution: t

Degrees of Freedom, :

=

Where and

Note:  must always lie between min(nA-1,nB-1) and nA+nB-2.

Furthermore, the formula will usually not give an integer value, and it is recommended that
you round your result down to the next nearest integer.

200

Prepared by Big Bang, August, 2017 GC


(4) Confidence Interval estimate of A-B

If both A and B are known If both A and B are unknown

 z/2  t/2

d.f. =  (from formula in step 3)

(5) Hypothesis Test of H0: A-B = D0

Note: In the vast majority of applications, D 0 is usually 0 because we are usually interested
in simply testing whether the two means are equal or not (i.e., whether or not A-B =
0 or A-B < 0, or >0,or  0)

If both A and B are known If both A and B are unknown

z= t=

d.f. =  (from formula in step 4)

Note: In the case where A and B are unknown, the text gives an additional method for
comparing the population means. This method “pools” the values of s A and sB
together.

The really good news is that you can ignore the method based on pooling because it has
recently been shown in the statistics literature that this method is unnecessary and doesn’t
lead to any better results than the method the text describes above (for the A, B unknown
case).

201

Prepared by Big Bang, August, 2017 GC


So, simply use one of the two methods (A,B known or A, B unknown) described in these
notes when using independent samples.

Example 9.2

The problem explicitly states that independent samples are used, but you could have seen
that by just noticing that the sample sizes n A = 17 and nB = 12 are different (i.e., the samples
couldn’t possibly have been paired)

Since the population standard deviations are not given/known, we must use the t distribution
for conducting hypothesis tests and constructing confidence intervals:

Comparing two means using paired samples

Goal: Compare two population means A and B by taking one random sample of items and
measuring them under two different conditions, A and B. The basic idea behind this is that
many extraneous sources of variation in the population can be filtered out by pairing, which
then leaves a clearer picture of the true difference between the means.
For example, think of testing a new drug by measuring peoples’ responses before (A) and
after (B) they take the drug. By comparing the i th person’s individual responses, xi versus yi
(before & after), all of the extraneous factors related to this individual’s life style are
automatically “filtered out” and the difference x i-yi only measures the actual response of that
person to the drug.

In general, pairing is a better thing to do (than independent samples) if pairing is physically


possible for the situation you are studying.

Data layout:

item # sample A sample B difference (A - B)

1 x1 y1 d1

202

Prepared by Big Bang, August, 2017 GC


2 x2 y2 d2

3 x3 y3 d3

: : : :

n xn yn dn

Note: The two sample sizes must be equal since the same n items in the random sample are
being measured twice.

Statistics (sample means & standard deviations) calculated from the differences, di:
Mean of the differences:

Standard deviation of the differences: sd

The analysis then proceeds exactly as if you were doing single-sample inference for a mean
using t distribution:
Statistic:

Standard error: (n = the number of pairs)

Distribution: t (d.f. = n-1)

(1-)100% confidence Interval estimate of A-B

 t/2 (d.f. = n-1)

Hypothesis Tests of H0: A-B = D0

Test statistic: t = (d.f. = n-1)

203

Prepared by Big Bang, August, 2017 GC


Example 9.3

(a) These samples are definitely “paired” because the each car is measured twice, once for
shock absorber A and once for B.

Car # Brand A Brand B difference (A-B)

(manufacturer) (competitor)

1 8.8 8.4 .4

2 10.5 10.1 .4

3 12.5 12.0 .5

4 9.7 9.3 .4

5 9.6 9.0 .6

6 13.2 13.0 .2

= .416666, sd = .132916

Test of H0:A-B = 0 versus a 2-sided alternative using  = 0.05:

t= = = 7.6787. The critical t value ( = .05) is

±t.025(d.f. = 6-1 = 5) = ± 2.571. Since t = 7.6787 exceeds t.025 = 2.571,

We can conclude that this data does show that there is a difference between the mean
strengths of the two brands of shock absorbers.

What if you make a mistake in the beginning and think that these samples are independent?

t= = = 0.4043.

Next,

204

Prepared by Big Bang, August, 2017 GC


= = 0.5116 and = = 0.5507

= = = 9.986, which rounds down to

 = 9. Therefore, the critical values are ±t.025(d.f. = 9) = ± 2.262.

As you can quickly see, t = 0.4043 doesn’t fall in wither tail of the rejection region, so the
(false) conclusion would be that there is no difference between the two population means.

The moral of this story: Mistakenly using the independent samples test (in those cases
when the paired samples test should be used) can lead to incorrect conclusions (so be careful
to correctly identify when to use the independent versus paired samples approach).

9.2 INFERENCES ABOUT THE DIFFERENCES BETWEEN


PROPORTIONS
Goal: Compare two population proportions p A and pB by comparing the sample proportions
and from two random samples, one taken from population A and the other from
population B.

Data layout:

Sample A Sample B

XA = # of ‘successes” YB = # of ‘successes”

nA = sample size nB = sample size

205

Prepared by Big Bang, August, 2017 GC


Note: The two sample sizes nA and nB do not have to be equal.

Statistics (sample proportions) calculated from data:

Sample A Sample B

= =

The analysis proceeds a little differently depending on whether you are doing a confidence
interval or a hypothesis test:

Confidence Interval Hypothesis Test

Statistic: - Statistic: -

Standard error: Standard error:

Distribution: Z, in both cases.

where and = 1-

Note: The text limits its hypothesis tests for proportions to the most common case, where D 0
is 0. The standard error above is based on the assumption that D0 = 0.

(1-)100% Confidence Interval estimate of pA-pB:

( - )  z/2

206

Prepared by Big Bang, August, 2017 GC


Hypothesis Test of H0: pA-pB = 0

Test statistic: z= where

Sample size formulas for estimating pA-pB or A-B

The method for finding the minimum necessary sample sizes, nA and nB for estimating either pA-pB or
A-B is the same: set the desired margin of error, ME, that you are willing to accept equal to the
half-width of the confidence intervals and solve for the sample sizes.

Since this will result is one equation with two unknowns (n A and nB), we usually have to
impose some other condition on the two sample sizes. One of the most frequently used
conditions is that one sample be a fixed (specified) multiple of the other. So, let us assume
that:

nA = r-nB

where r is a constant that you specify in advance. For example, samples from population A
might be cheaper to obtain than samples from population B, so you might want to specify
that twice as many sampled items are taken from A as from B. In that case, you would use a
value of r = 2.

For estimating A-B:

Set: ME = z/2 . Then use the fact that nA = r-nB to write

ME = z/2 . Solve to find nB = . Therefore, we would use

samples of size:

nB = and nA = rnB

207

Prepared by Big Bang, August, 2017 GC


Note: the text only discusses the case of equal sample sizes (when r = 1)

For estimating pA-pB

Set ME = z/2 .

Then use the fact that nA = r-nB to get ME = z/2 .

Solve to find nB = .

Therefore, we would use samples of size:

nB = and nA = rnB

Notes: (a) The text only discusses the case where r=1 (i.e., equal sample sizes)
(b) Also, to use this formula you have to first come up with reasonable guesses
(estimates or bounds) for pAand pB.
(c)The most conservative (i.e., largest sample size) thing to do is use pA = pB = .5.
Otherwise, use upper (or lower) bounds on pA and pB if you know of some.

208

Prepared by Big Bang, August, 2017 GC


9.3 INFERENCES CONCERNING VARIANCES

The F distribution can be shown to be the appropriate probability model for the ratio of the
variances of two samples taken independently from the same normally distributed
population, with there being a different F distribution for every combination of the degrees
of freedom (df) associated with each sample. For each sample, df =n - 1. The statistic that is
used to test the null hypothesis that two population variances are equal is

Since each sample variance is an unbiased estimator of the same population variance, the
long-run expected value of the above ratio is about 1.0. (Note: The expected value is not
exactly 1.0, but rather is df2= (df2 - 2), for mathematical reasons that are outside of the scope
of this outline.) However, for any given pair of samples the sample variances are not likely
to be identical in value, even though the null hypothesis is true. Since this ratio is known to
follow an F distribution, this probability distribution can be used in conjunction with testing
the difference between two variances. Although a necessary mathematical assumption is that
the two populations are normally distributed, the F test has been demonstrated to be
relatively robust, and insensitive to departures from normality when each population is
unimodal and the sample sizes are about equal.

Example 9.4
For a random sample of n1=10 life bulbs the mean bulb light x1-bar =400hrs, with S1=200.
For another brand of bulb whose useful life is assumed to be normally distributed, a random
sample of n2=8 as a ample mean of x2-bar=4300 hour and a sample standard deviation of
S2=250. Test the null hypothesis that the samples were obtained from populations with equal
variances, using the 10 percent level of significance for the test, by use of the F distribution:

209

Prepared by Big Bang, August, 2017 GC


For the test at the 10 percent level of significance, the upper 5 percent point for F and the
lower 5 percent point for F are the critical values:

Since the computed F ratio is neither smaller than 0.304 nor larger than 3.68, it is in the
region of acceptance of the null hypothesis. Thus, the assumption that the variances of the
two populations are equal cannot be rejected at the 10 percent level of significance.

210

Prepared by Big Bang, August, 2017 GC


CHAPTER SUMMARY

 Confidence Interval estimate of A-B

If both A and B are known If both A and B are unknown

 z/2  t/2

 (1-)100% Confidence Interval estimate of pA-pB:

( - )  z/2

Hypothesis Test of H0: pA-pB = 0

Test statistic: z= where

The method for finding the minimum necessary sample sizes, nA and nB for estimating either pA-pB or
A-B is the same: set the desired margin of error, ME, that you are willing to accept equal to the
half-width of the confidence intervals and solve for the sample sizes.

 The statistic that is used to test the null hypothesis that two population variances are
equal is

Exercises on Chapter 9

211

Prepared by Big Bang, August, 2017 GC


1. There are two populations. A sample of size 120 from one of the populations gave a mean
of 15 and a standard deviation of 1.3. A sample of size 88 from the other population gave a
mean of 13.5 and a standard deviation of 1.5. Find a 98% confidence interval for the
difference between the population means. Given sample information:

2. The average length of twenty trout caught in a lake was 10.8 inches with standard
deviation of 2.3 inches, and the average length of fifteen trout caught in another lake was
9.7 inches with standard deviation of 1.5 inches Construct a 90 percent confidence
interval for the difference in the true mean lengths of trout in the two lakes.

x1 :10.8 Sx1 : 2.3 n1 : 20


x 2 : 9.7 Sx 2 :1.5 n2 : 15
C  Level : .9 Pooled : Yes

3. A farmer tried Feed A on 256 cattle and Feed B on 144 cattle. The mean weight of cattle
given Feed A was found to be 1350 pounds with a standard deviation of 180 pounds. On
the other hand, the mean weight of the cattle given Feed B was found to be 1430 pounds
with a standard deviation of 210 pounds. At the 5 percent level of significance, is Feed B
significantly better than Feed A? Find the p-value

4. At a certain university twelve voters were picked at random from those who are in favor
of impeachment of the president, and ten were selected at random from those who are
against. The following table give their ages.

In favor 27 34 28 30 29 50 30 44 29 32 41 35
Against 31 36 43 40 32 48 30 29 42 49

H0: 1-2=0 H1: 1-2≠0


212

Prepared by Big Bang, August, 2017 GC


At a 10% level of significance, is it true that the age of those in favor of impeachment
significantly differs from the age of those against. Use a two sample t-test with pooled
variance

5. Paired t-test.

Dr. Williams claims that the special diet that he recommends significantly reduces weight. A
sample of eight persons was selected and they were put on the diet for a period of 6 weeks.
The table below shows the weights (in pounds) of those eight person before and after dieting.

Before 182 180 195 178 177 221 198 208

After 168 183 187 169 161 204 194 196

a) Construct a 99% confidence interval for the mean difference d in weight before
and after using the dieting recommended by Dr. Williams. Use a paired difference
sd
d  t
n .
2
b) Using a 1% level of significance, can you conclude that the mean weight loss for
all persons due to this special diet is greater than zero?
6. In a study to estimate the proportion of residences in a certain city and its suburbs that
subscribe to a certain magazine, it is found that 63 of 120 urban residences subscribe
while only 34 of 125 suburban residences subscribe. Find a 90% confidence interval for
the difference in the proportion of urban and suburban residences that subscribe to writer's
digest.

7. A jar containing 130 mosquitoes was sprayed with an insecticide of Brand A and it was
found that 98 of them were killed. When another jar containing 150 mosquitoes of the same
type was sprayed with Brand B. 120 of them were killed. At the 2 percent level of
significance, do the two brands differ in their effectiveness?

H 0 : p1  p2 H a : p1  p 2 two tailed test

213

Prepared by Big Bang, August, 2017 GC


CHAPTER 10

SIMPLE LINEAR REGRESSION AND CORRELATION

CONTENTS

10.1. SIMPLE LINEAR REGRESSION (REGRESSION OF Y ON X) 204

10.2. THE COVARIANCE AND THE CORRELATION COEFFICIENT 208

10.3. THE RANK CORRELATION COEFFICIENT 214

INTRODUCTION

Most of the analysis discussed in the previous chapters deal with one variable case. Some
times, how ever, we are interested in determining the degrees of relation ship between two or
more variables and even we try to estimate by how much one variable related to it changes
by one. Regression and correlation analysis are used to study relationships among variables.
This chapter introduces you to such and related issues

Objectives

After completion of this chapter students will be able to:

 Explain the meaning of regression

 Explain the meaning of correlation

 Draw scatter diagram to identify the type of relation ship that exists between
variables

 Differentiate between dependent and independent variable

 Compute and interpret the regression coefficients

 Compute and interpret the coefficient of linear regression

214

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS STAT 281

10.1 SIMPLE LINEAR REGRESSION

Linear regression and correlation is studying and measuring the linear relation ship among
two or more variables. When only two variables are involved, the analysis is referred to as
simple correlation and simple linear regression analysis, and when there are more than two
variables the term multiple regression and partial correlation is used.

10.1.1 DEFINITION

Regression Analysis: is a statistical technique that can be used to develop a mathematical


equation showing how variables are related.

Correlation Analysis: deals with the measurement of the closeness of the relation ship
which are described in the regression equation.

We say there is correlation when the two series of items vary together directly or inversely.
In simple linear regression analysis, two variables are under study/one independent and one
dependent.

i) The independent (explanatory) variable

A variable whose value is used to estimate the value of the dependent variable. It is
denoted by Y

ii) The dependent (response) variable

Is a variable whose value is estimated by the independent variable. It is denoted by X.

10.1.2 FITTING LINEAR REGRESSION BY LEAST SQUARES METHOD

Regression equation of Y on X and X on Y

The simple linear regression madder of Y on X is given

215
INTRODUCTION TO STATISTICS: Stat 281

Where B0 & B1 represent the intercept and the slope (they are called parameters,
regression coefficient)

i - is the random error term

The random error term, i, is included in the model to represent the following two
phenomena.

1. Missing or omitted variables

2. Random variation

Assumptions

3. The random error term, i, has a mean equal to zero

4. The errors associated with difference observations are independent

5. For any given Xi, the distribution of errors in normal.

6. The distribution of population errors for X has the same (constant) deviation
which is denoted by

Note:

Estimation of the regression coefficients

One of the methods that help us to find the estimates of B0 & B1 is the least squares method
or ordinary least squares method (OLS))

The resulting estimates of B0 & B1 denoted by & , respectively are called the Least
squares Estimates

Note:- This method gives the values and such that the sum of squares errors is
minimum. i.e. We minimize

216

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

(SS Residual)

Where Yi = the actual value

= the estimated value

To minimize we take partial derivates respect to and and get values of

& by equating the derivatives to zero.

& derivate it

i.e.

Those will lead to the following normal equations:

When we solve the two equations, we get the least squares values B0 & B1

Then the estimated regression lone (the least squares regression lines, will be given by

Sometimes called the regression of Y on X

217

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

Interpretation of &

The value of gives the predicted or the mean value of Y for X = 0. The value of , gives
the average change in Y (dependent variable) due to a change of one unit in X (independent
variable).

Example 10.1

Find the least squares regression line for the data on the final marks & number of
hours spent on studying

Xi 8 5 13 10 6 18 15 2 9 11
Yi 65 94 72 70 54 90 85 33 56 29

Solution
The Least square regression line is

 On average the final mark of a student increased by 3.59 for a one


hour increase in the number of hours spent on studying

218

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

 29.88 indicated the expected mark of a student who spent zero hour
on studying

The least square regression model is given by

10.2 THE COVARIANCE AND CORRELATION COEFFICIENT

CORRELATION

Correlation is a statistical method used to determine whether a relationship between a


variable exists

MEASURING SIMPLE LINEAR COEFFICIENT

SCATTER DIAGRAM

- is a graph that pertains the relationship between two variables

Strong +ve relationship

219

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

Weak +ve linear relationship

–ve linear relationship

Weak –ve linear relationship

COVARIANCE

If (X1, Y1), (X2, Y2)…., (Xn, Yn) are n pairs of observations of the variables X and Y in a bi-
variety distribution, then

sxy = Cov (XY)

And

Pearson correlation Coefficient

COEFFICIENT OF CORRELATION

This is a numerical measure of strength of direction of linear relationship between two


variables. The symbol for the sample correlation coefficient is r and that of population
correlation coefficient is (rho)

The sample correlation r is calculated as

220

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

where

Calculate correlation coefficient for Example 1 above.

Solution:

Since r is positive & close to 1, is indicates there is a strong positive linear relation ship
between the number of hours spent on studying and the final marks.

(i.e r is between – 1< r < 1, if r = -1, there is strong negative linear relation ship & if r=1,
there is strong positive relationship & if it is 0, no linear relation ship between the Y-
dependent and X independent variable)

221

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

RELATION SHIPS AMONG REGRESSION SLOPES, CORRELATION


COEFFICIENT, COVARIANCE & VARIANCE

Regression coefficient Regression Coefficient


of X on Y (bxy) of Y on X (byx)

i) bXY = i)

ii) ii)

iii) iii)

Example 10.3

Two variables have the regression lines with equations:

3X + 2Y = 26 and 6X + Y = 31

Calculate i) correlation coefficient between X and Y

ii) Standard deviation of Y if variance of X is 25

Solution:

Let us suppose that

3X + 2Y = 26 ….( *)

6X + Y = 31 ……(**) are the lines of regression of Y on X and X on Y respectively

222

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

r2 is called the coefficient of determination, which is a better indicator of the strength of a


relationship than the correlation coefficient. It is better because it identifies the percentage of
variation of the dependent variable that is directly attributable to the variation of the
independent variable.

But since both the regression coefficient are negative

i) r= -0.5

(Since r2<1, our assumption that (*) & (**) are the liner regression of Y on X & X on Y
respectively is true)

 B1; the slope of regression of Y on X

223

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

Similarly, the coefficient of regression of X on Y indicates the change in the value of

variable X corresponding to a unit change in the value of variable Y and is =

Correlation coefficient is the g geometric mean between the regression coefficient

To show relation ship

n X iYi    X i   Yi 
Y on X : bYX  ..........1
n X i 2    X i 
2

224

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

Using equation 7 and 8, we get:

Using equation 8 & 9 into equation 2 we get bXY:

Using equation 8, 7 & 9 in to equation 3, we get:

225

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

10.3. THE RANK CORRELATION COEFFICIENT

We calculate what is called Spearman’s rank correlation coefficient as follows:


Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula

Where

D= the difference between paired ranks


n=the number of pairs

Example 10.4
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is correlation
between the tests of the ladies

Lipsticks A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7

Solution:

RX 2 1 4 3 5 7 6 Total
RY 1 3 2 4 5 6 7
D=RX-RY 1 -2 2 -1 0 1 -1
D2 1 4 4 1 0 1 1 12

Yes, there is positive correlation. (i.e r is between – 1< r < 1, if r = -1, there is strong
negative correlation & if r=1, there is strong positive correlation & if it is 0, no correlation ).

CHAPTER SUMMARY

226

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

Many relation ships among variables exist in real world. One way to determine whether a
relation ship exists is to use the statistical techniques known as correlation and regression.
The strength and direction of the relation ship is measured by the value of the correlation
coefficient. It can assume values between and including -1 and +1.

The closer the coefficient to +1 or -1, the stronger the relation ship is between the variables.
A value of +1 or -1 indicates a perfect relation ship. A positive relation ship between two
variables means that for small values of the independent variables, the values of the
dependent variable will be small, and for large values of the independent variables, the
values of the dependent variable will be large.

A negative relation ship between two variables means that for small values of the
independent variable the values of the dependent variable will be large and for that large
values of the independent variable, the values of the dependent variable will be small

Relation ship can be linear or curvilinear, to determine the shape, one draws a scatter plot of
the variables. If the relation ship is linear, the data can be approximated by a straight line
called regression line or the line of best fit.

227

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

Exercises on Chapter 10

1. A study was reported in a medical journal suggesting that the peak heart rate on
individual can reach during intensive exercise decreases with age. A cardiologist
wanted to do his own study treadmill at 6 miles per hour and their age their heart rates
were recorded as follows.

Age(X) 30 30 40 20 20 45 30 45 50
Heart rate(Y) 190 180 180 200 195 170 180 175 165
a) Find the least square regression of Y on X.

b) For an 80 years old man, what peak heart rate do you predict?

c) Calculate the Pearsonian coefficient of correlation.

2. Given the following data:

, , , , 2 =775, n=100.

Based on the above data find

a) The two lines of regression. b) The sample variances of X and Y.

c) The sample covariance

d) The Karl Pearson’s coefficient of correlation and interpret your result.

3. The equations of two regression lines between two variables are expressed as:

6x + y = 31 and 3x + 2y = 26.

a) Identify which of the two can be called regression of Y on X and which of X on Y.

b) Find: the most probable value of Y when x = 5, , , and r.

c) If , find and .
228

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

APPENDICES

APPENDIX A
CUMULATIVE AREA OF THE STANDARD NORMAL CURVE from 0 to z

0. .01 .02 .03 .04 .05 .06 .07 .08 .09


0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0754
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
0.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974

229

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998
3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
3.6 .4998 .4998 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.8 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.9 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000

230

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

APPENDIX B
CUMULATIVE AREA OF THE Student- t CURVE WITH DEGREES OF FREDOM n-1

The t- Distribution

1 3.078 6.314 12.706 31.821 63.657 1

2 1.886 2.920 4.303 6.965 9.925 2

3 1.638 2.353 3.182 4.541 5.841 3

4 1.533 2.132 2.776 3.747 4.604 4

5 1.476 2.015 2.571 3.365 4.035 5

6 1.440 1.943 2.447 3.143 3.707 6

7 1.415 1.895 2.365 2.998 3.499 7

8 1.397 1.860 2.306 2.896 3.355 8

9 1.383 1.833 2.262 2.821 3.250 9

10 1.372 1.812 2.228 2.764 3.169 10

11 1.363 1.796 2.201 2.718 3.106 11

12 1.356 1.782 2.179 2.681 3.055 12

13 1.350 1.771 2.160 2.650 3.012 13

14 1.345 1.761 2.145 2.624 2.977 14

15 1.341 1.753 2.131 2.602 2.947 15

16 1.337 1.746 2.120 2.583 2.921 16


231

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

17 1.333 1.740 2.110 2.567 2.898 17

18 1.330 1.734 2.101 2.552 2.878 18

19 1.328 1.729 2.093 2.539 2.861 19

20 1.325 1.725 2.086 2.528 2.845 20

21 1.323 1.721 2.080 2.518 2.831 21


22 1.321 1.717 2.074 2.508 2.819 22
23 1.319 1.714 2.069 2.500 2.807 23

24 1.318 1.711 2.064 2.492 2.797 24

25 1.316 1.708 2.060 2.485 2.787 25

26 1.315 1.706 2.056 2.479 2.779 26

27 1.314 1.703 2.052 2.473 2.771 27

28 1.313 1.701 2.048 2.467 2.763 28

29 1.311 1.699 2.045 2.462 2.756 29

1.282 1.645 1.960 2.326 2.576

232

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

APPENDIX C
CUMULATIVE AREA OF RIGHT TAIL AREAS FOR THE CHI-SQUARE
DISTRIBUTION WITH N-1 DEGREES OF FREEDOM

Right tail areas for the Chi-Square Distribution

1 3.841 5.024 6.635 7.879 1

2 5.991 7.378 9.210 10.597 2

3 7.815 9.348 11.345 12.838 3

4 9.488 11.143 13.277 14.860 4

5 11.070 12.832 15.086 16.750 5

6 12.592 14.449 16.812 18.548 6

7 14.067 16.013 18.475 20.278 7

8 15.507 17.535 20.090 21.955 8

9 16.919 19.023 21.666 23.589 9

10 18.307 20.483 23.209 25.188 10

11 19.675 21.920 24.725 26.757 11

233

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

12 21.026 23.337 26.217 28.300 12

13 22.362 24.736 27.688 29.819 13

14 23.685 26.119 29.141 31.319 14

15 24.996 27.488 30.578 32.801 15

16 26.296 28.845 32.000 34.267 16

17 27.587 30.191 33.409 35.718 17

18 28.869 31.526 34.805 37.156 18

19 30.144 32.852 36.191 38.582 19

20 31.410 34.170 37.566 39.997 20

21 32.671 35.479 38.932 41.401 21

22 33.924 36.781 40.289 42.796 22

23 35.172 38.076 41.638 44.181 23

24 36.415 39.364 42.980 45.558 24

25 37.652 40.646 44.314 46.928 25

26 38.885 41.923 45.642 48.290 26

27 40.113 43.194 46.963 49.645 27

28 41.337 44.461 48.278 50.993 28

29 42.557 45.722 49.588 52.336 29

30 43.773 46.979 50.892 53.672 30

234

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

235

Prepared by Big Bang, August, 2017 GC


INTRODUCTION TO STATISTICS: Stat 281

References:

1. Rastogi, V. B. (2006). Fundamentals of Biostatistics. Ane Books India, New Delhi.


2. Bluman, A.G. (1995). Elementary Statistics: A Step by Step Approach (2nd edition).
Wm. C. Brown Communications, Inc.
3. Eshetu Wencheko (2000). Introduction to Statistics. Addis Ababa University Press.
4. Freund, J. E. and Simon, G. A. (1998). Modern Elementary Statistics.
5. Spiegel, M. R. (2001). Theory and Problem of Statistics. Schaums Outline Series.
6. Leonard Santana (2009). Applied Linear Statistical Models

7. Michael W. Trosset (2004): An Introduction to Statistical Inference and Its Applications.


Department of Mathematics, College of William & Mary,
Williamsburg,
8. James H. Stapleton (1995). An Introduction to Linear Models. A Wiley-Inter science
Publication JOHN WILEY & SONS, INC.
9. Gotz Rohwer (2010): Models in Statistical Social Research. Library of Congress
Cataloging in Publication Data
10. George Roussas(2003). Introduction to Probability and Statistical Inference . University
of California, Davis
11. Patrice Bertail and Paul Doukhan (2006). Dependence in Probability and Statistics.
Springer Science+Business Media, LLC
12. Leonard J. Kazmier(2004). Theory and Problems of Business Statistics, Fourth Edition.
W. P. Carey School of Business Arizona State University

236

Prepared by Big Bang, August, 2017 GC

You might also like