Introduction To Statistics: Haramaya University College of Computing and Informatics Department of Statistics

Haramaya University
College of Computing and Informatics

Department of Statistics
Introduction to Statistics
Writer: Editor:
Teshome Kebede (MSc) Awol Seid (MSc)
© September 2015
Contents
1 Introduction 1
1.1 History and Definition of Statistics . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Classification of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Application of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Uses of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Limitation of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Measurement Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Methods of Data Collection and Presentation 11

2.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Methods of Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Secondary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Data Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Editing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Classification of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.3 Tabulation of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.4 Frequency Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Methods of Data Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Measures of Central Tendency 31

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Objectives of MCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Desirable Properties of Good MCT . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Summation Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
i
Introduction to Statistics Haramaya University
3.5 Types of Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . 33

3.6 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.1 Arithmetic Mean (AM) . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.2 Geometric Mean (GM) . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6.3 Harmonic Mean (HM) . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.8 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.9 Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Measures of Variation 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Objectives of Measures of Variation . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Types of Measures of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 Range and Relative Range . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 Quartile Deviation and Coefficient of Quartile Deviation . . . . . . . . 55
4.3.3 Mean Deviation and Coefficient of Mean Deviation . . . . . . . . . . . 56
4.3.4 Variance and Standard Deviation . . . . . . . . . . . . . . . . . . . . . 59
4.3.5 Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 Elementary Probability 65
5.1 What is Probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Concept of Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.1 Set Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Definition and Some Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . 67
5.4 Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Approaches in Probability Definition . . . . . . . . . . . . . . . . . . . . . . . 71
5.6 Some Probability Rules/Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.8 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6 Probability Distributions 79
6.1 Type of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
ii
6.2 Introduction to Expectation of Random Variables . . . . . . . . . . . . . . . . 81

6.2.1 Expectation of Random Variables and Its Properties . . . . . . . . . . 81
6.2.2 Variance of Random Variables and Its Properties . . . . . . . . . . . . 82
6.3 Common Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . 83
6.3.1 Binomial Probability Distribution . . . . . . . . . . . . . . . . . . . . 83
6.3.2 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4 Common Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4.1 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7 One Sample Statistical Inference 93

7.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.2.1 Interval Estimation for the Population Mean . . . . . . . . . . . . . . 95
7.2.2 Interpretation of the Confidence Interval . . . . . . . . . . . . . . . . . 96
7.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3.2 Errors in Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . 98
7.3.3 Hypothesis Testing About the Population Mean . . . . . . . . . . . . 99
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8 Simple Correlation and Linear Regression Analysis 103

8.1 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.2 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.3 Coefficient of Determination (R2 ) . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
iii
iv
1
Introduction
Studying statistics is great!!
1.1. History and Definition of Statistics
All of us are familiar with statistics in everyday life. As a discipline of study and research it
has a short history, but as a numerical information it has a long antiquity. There are various
documents of ancient times containing numerical information about countries (states), their
resources and composition of the people. This explains the origin of the word statistics as
a factual description of a state. The term ‘statistics’ is derived from the Latin word status,
meaning state, and historically statistics referred to the display of facts and figures relating
to the demography of states or countries. Generally, it can be defined in two senses: plural
(as statistical data) and singular (as statistical methods).
Plural sense: Statistics are collection of facts (figures). This meaning of the word is widely
used when reference is made to facts and figures on sales, employment or unemployment,
accident, weather, death, education, etc. In this sense the word Statistics serves simply
as data. But not all numerical data are statistics. In order for the numerical data to
be identified as statistics, it must possess certain identifiable characteristics. Some of
these characteristics are described as follows:
1. Statistics are aggregate of facts. Single or isolated facts or figures cannot be

called statistics as these cannot be compared or related to other figures within
the same framework. Accordingly, there must be an aggregate of these figures.
For example, if a person says that “I earn Birr 30,000 per year”, it would not be
considered as statistics. On the other hand if we say that the average salary of a
professor at our university is Birr 30,000 per year, then this would be considered
1
as statistics since the average has been computed from many related figures such
as yearly salaries of many professors.
2. Statistics, generally, are not the outcome of a single cause but affected
by multiple causes. There are a number of forces working together that affect
the facts and figures. For example, when we say the crime rate in a certain city
has increased by 15% over the last year, a number of factors might affect these
change. These factors may be general level of economy such as economic recession,
unemployment rate, extent of use of drugs, extent of legal effectiveness and so
on. While these factors can be isolated by themselves, the effect of these factors
cannot be isolated and measured individually. Similarly, a marked increase in food
grain production in a certain country may have been due to combined effect of
many factors such as better seeds, more extensive use of fertilizers, governmental
and banking support, adequate rainfall and so on. It is generally not possible to
segregate and study the effect of each of these forces individually.
3. Statistics are numerically expressed. All statistics are stated in numerical

figures only. Qualitative statements cannot be called statistics. For example, such
qualitative statements as ‘Ethiopia is a developing country’ or ‘Jack is very tall’
would not be considered as statistical statements. On the other hand, comparing
per capita income of Ethiopia with that of Kenya would be considered statistical in
nature. Similarly, Jack’s height in numbers compared to average height in Ethiopia
would also be considered as statistics.
4. Statistical data are collected in a systematic manner for predetermined

purpose. The purpose and objective of collecting pertinent data must be clearly
defined, decided upon and determined prior to data collection. Also the proce-
dures for collecting data should be predetermined and well planned. These would
facilitate the collection of proper and relevant data.
5. Statistics are enumerated or estimated according to reasonable stan-

dard of accuracy. There are basically two ways of collecting data. One is the
actual counting or measuring, which is the most accurate way. The second way of
collecting data is by estimation and is used in situations where actual counting or
measuring is not feasible or where it involves prohibitive costs. Estimates, based
on samples cannot be as precise and accurate as actual counts or measurements,
2
but these should be consistent with the degree of accuracy desired.
Singular sense: Statistics is the science that deals with the methods of data collection,
organization, presentation, analysis and interpretation of data. It refers the subject
area that is concerned with extracting relevant information from available data with the
aim to make sound decisions. According to this meaning, statistics is concerned with
the development and application of methods and techniques for collecting, organizing,
presenting, analyzing and interpreting statistical data.
According to the singular sense definition of statistics, a statistical study (statistical inves-
tigation) involves five stages: collection of data, organization of data, presentation of data,
analysis of data and interpretation of data.
1. Collection of Data: This is the first stage in any statistical investigation and involves
the process of obtaining (gathering) a set of related measurements or counts to meet
predetermined objectives. The data collected may be primary data (data collected di-
rectly by the investigator) or it may be secondary data (data obtained from intermediate
sources such as newspapers, journals, official records, etc).
2. Organization of Data: It is usually not possible to derive any conclusion about the
main features of the data from direct inspection of the observations. The second pur-
pose of statistics is describing the properties of the data in a summary form. This stage
of statistical investigation helps to have a clear understanding of the information gath-
ered and includes editing (correcting), classifying and tabulating the collected data in a
systematic manner. Thus, the first step in the organization of data is editing. It means
correcting (adjusting) omissions, inconsistencies, irrelevant answers and wrong compu-
tations in the collected data. The second step of the organization of data is classification
that is arranging the collected data according to some common characteristics. The last
step of the organization of data is presenting the classified data in tabular form, using
rows and columns (tabulation).
3. Presenting of Data: The purpose of data presentation is to have an overview of what

the data actually looks like, and to facilitate statistical analysis. Data presentation can
be done using Graphs and Diagrams which have great memorizing effect and facilitates
comparison.
3
4. Analysis of Data: The analysis of data is the extraction of summarized and com-
prehensive numerical description in order to reach conclusions or provide answers to a
problem. The problem may require simple or sophisticated mathematical expressions.
5. Interpretation of Data: This is the last stage of statistical investigation. Interpre-

tation involves drawing valid conclusions from the data collected and analyzed in order
to make rational decision.
1.2. Classification of Statistics
Based on the scope of the decision making, statistics can be classified into two: Descriptive
and Inferential Statistics.
Descriptive Statistics: refers to the procedures used to organize and summarize masses of
data. It is concerned with describing or summarizing the most important features of
the data. It deals only the characteristics of the collected data without going beyond
it. That is, this part deals with only describing the data collected without going any
further: that is without attempting to infer(conclude) anything that goes beyond the
data themselves.
The methodology of descriptive statistics includes the methods of organizing (classifica-

tion, tabulation, frequency distributions) and presenting (graphical and diagrammatic
presentation) data and calculations of certain indicators of data like measures of central
tendency and measures of variation which summarize some important features of the
data.
Inferential Statistics: includes the methods used to find out something about a population,
based on the sample. It is concerned with drawing statistically valid conclusions about
the characteristics of the population based on information obtained from sample. In
this form of statistical analysis, descriptive statistics is linked with probability theory in
order to generalize the results of the sample to the population. Performing hypothesis
testing, determining relationships between variables and making predictions are also
inferential statistics.
4
Examples: Classify the following statements as descriptive and inferential statistics.
(a) The average age of the students in this class is 21 years.
(b) At least 5% of the killings reported last year in city X were due to tourists.
(c) Of the students enrolled in Haramaya University in this year 74% are male and 26% are
female.
(d) The chance of winning the Ethiopian National Lottery in any day is 1 out of 167000.
(e) The demand for automobiles may decline next year in Europe.
(f) It has been continuously raining in Harar from Monday to Friday. It will continue to
rain in the weekend.
1.3. Application of Statistics
In this modern time, statistical information plays a very important role in a wide range of
fields. Today, statistics is applied in almost all fields of human endeavor.
In Scientific Research: Statistics plays an important role in the collection of data through
efficiently designed experiments, in testing hypotheses and estimation of unknown pa-
rameters, and in interpretation of results.
In Industry: Statistical techniques are used to improve and maintain the quality of manu-
factured goods at a desired level. Statistical methods help to check whether a product
satisfies a given standard.
In Business: Statistical methods are employed to forecast future demand for goods, to plan
for production, and to evolve efficient management techniques to maximize profit.
In Medicine: Principles of design of experiments are used in screening of drugs and in

clinical trials. The information supplied by a large number of biochemical and other
tests is statistically assessed for diagnosis and prognosis of disease. The application of
statistical techniques has made medical diagnosis more objective by combining the col-
lective wisdom of the best possible experts with the knowledge on distinctions between
diseases indicated by tests. Beside, statistical methods are used for computation and
interpretation of birth and death rates.
5
In Literature: Statistical methods are used in quantifying an author’s style, which is useful
in settling cases of disputed authorship.
In Archeology: Quantitative assessment of similarity between objects has provided a method

of placing ancient artifacts in a chronological order.
In Courts of Law: Statistical evidence in the form of probability of occurrence of certain

events is used to supplement the traditional oral and circumstantial evidence in judging
cases.
In Detective Work: Statistics helps in analyzing bits and pieces of information, which indi-
vidually may appear to be unrelated or even inconsistent, to see an underlying pattern.
There seems to be no human activity whose value cannot be enhanced by injecting statistical
ideas in planning and by using statistical methods for efficient analysis of data assessment of
results for feedback and control.
1.4. Uses of Statistics
To reduce and summarize masses of data and to present facts in numerical

and definite form. Statistics condenses and summarizes a large mass of data and
presents facts into a few presentable, understandable and precise numerical figures.
The raw data, as is usually available, is voluminous and haphazard. It is generally not
possible to draw any conclusions from the raw data as collected. Hence it is necessary
and desirable to express these data in a few numerical values.
To facilitate comparison: statistical devises such as averages, percentages, ratios,

etc are used for this purpose.
For formulating and testing hypotheses: For instance, hypothesis like whether a
new medicine is effective in curing a disease, whether there is an association between
variables can be tested using statistical tools.
For forecasting: Statistical methods help in studying past data and predicting future
trends.
6
1.5. Limitation of Statistics
I It does not deal with a single observation, rather, as discussed earlier, it only deals with
aggregate of facts. For example, the marks obtained by one student in a class does not
carry any meaning in itself, unless it is compared with a set standard or with other
students in the same class or with his own marks obtained earlier.
I Statistical methods are not applicable to qualitative characters and cannot be coded in
numerical values.
I Statistical results are true on average; i.e. for the majority of cases. Since statistics is
not exact science, statistical conclusions are not universally true. That is, statistical
laws are not universally true like the laws of physics, chemistry and mathematics.
I Statistics are liable to be misused or misinterpreted. This may be due to incomplete in-
formation, inadequate and faulty procedures during data collection and sample selection
and mainly due to ignorance (lack of knowledge).
1.6. Variable
Variable is any phenomena or an attribute that can assume different values. The most impor-
tant single distinguishing feature of a variable is that it varies; that is, it can take on different
values. Based on the values that variables assume, variables can be classified as
1. Qualitative variables: A qualitative variable has values that are intrinsically nonnu-
merical (categorical).
Example: Gender, Religion, Color of automobile, etc.
2. Quantitative variables: A quantitative variable has values that are intrinsically nu-
merical.
Example: Height, Family size, Weight, etc.
B Discrete variable: takes whole number values and consists of distinct recogniz-
able individual elements that can be counted. It is a variable that assumes a finite
or countable number of possible values. These values are obtained by counting
(0, 1, 2, ...).
7
Example: Family size, Number of children in a family, number of cars at the

traffic light.
B Continuous variable: takes any value including decimals. Such a variable can
theoretically assume an infinite number of possible values. These values are ob-
tained by measuring.
Example: Height, Weight, Time, Temperature, etc.
Generally the values of a variable can be obtained either by counting for discrete
variables, by measuring for continuous variables or by making categories for qual-
itative variables.
Example: Classify each of the following as qualitative and quantitative and if it is quanti-
tative classify as discrete and continuous.
1. Color of automobiles in a dealer’s show room.
2. Number of seats in a movie theater.
3. Classification of patients based on nursing care needed (complete, partial or safer).
4. Number of tomatoes on each plant on a field.
5. Weight of newly born babies.
1.7. Measurement Scales
The level of measurement is one way in which variables can be classified. Broadly, this relates
to the level of information content implicit in the set of values and how each value may be
interpreted (mathematically) relative to other values on the variable - an issue which dictates
how the variable can be used and interpreted in statistical analysis. Consider the following
illustrations.
B Mr A wears 5 when he plays foot ball and Mr B wears 6 when he plays foot ball.
Who plays better?

What is the average shirt number?
B Mr A scored 5 in Statistics quiz and Mr B scored 6 in Statistics quiz.
Who did better?
8
What is the average score?
Based on the number on the shirts it is not possible to judge, whether Mr B plays better.
But by using the test score, it is possible to judge that Mr B did better in the exam. Also
it is not possible to find the average shirt numbers (or the average shirt number is nothing)
because the numbers on the shirts are simply codes but it is possible to obtain the average
test score. Therefore, scales of measurement
shows the information contained in the value of a variable.
shows also that what mathematical operations and what statistical analysis are permis-
sible to be done on the values of the variable.
Different measurement scales allow for different levels of exactness, depending upon the char-
acteristics of the variables being measured. The four types of scales available in statistical
analysis are
1. Nominal Scales of variables are those qualitative variables which show category of
individuals. They reflect classification in to categories (name of groups) where there is
no particular order or qualitative difference to the labels. Numbers may be assigned
to the variables simply for coding purposes. It is not possible to compare individual
basing on the numbers assigned to them. The only mathematical operation permissible
on these variables is counting. These variables
B have mutually exclusive (non-overlapping) and exhaustive categories.
B no ranking or order between (among) the values of the variable.
Example: Gender (Male, Female), Political Affiliation (Labour, Conservative,Liberal),

Ethnicity (White, Black, Asian, Other), etc.
2. Ordinal Scales of variables are also those qualitative variables whose values can be
ordered and ranked. Ranking and counting are the only mathematical operations to
be done on the values of the variables. But there is no precise difference between the
values (categories) of the variable.
Example: Academic Rank (BSc, MSc, PhD), Grade Scores (A, B, C, D, F), Strength
(Very Weak, Week, Strong, Very Strong), Health Status (Very Sick, Sick, Cured), Eco-
nomic Status (Lower Class, Middle Class, Higher Class), etc.
9
3. Interval Scales of variables are those quantitative variables when the value of the
variables is zero it does not show absence of the characteristics i.e. there is no true zero.
Zero indicates lower than empty. For example, for temperature measured in degrees
Celsius, the difference between 5℃ and 10℃ is treated the same as the difference
between 10℃ and 15℃. However, we cannot say that 20℃ is twice as hot as 10℃ ,
i.e. the ratio between two different values has no quantitative meaning. This is because
there is no absolute zero on the Celsius scale; 0℃ not imply ‘no heat’.
4. Ratio Scales of variables are those quantitative variables when the values of the vari-
ables are zero, it shows absence of the characteristics. Zero indicates absence of the
characteristics. All mathematical operations are allowed to be operated on the values
of the variables.
For instance, a zero unemployment rate implies zero unemployment. Thus, we can also
legitimately say an unemployment rate of 20 percent is twice a rate of 10 percent or one
person is twice as old as another. In the case of temperature, we can use the Kelvin
scale instead of the Celsius scale: the Kelvin scale is a ratio scale because 0 Kelvin is
‘absolute zero’ (-273℃) and this does imply no heat.
10
2
Methods of Data Collection and Presentation
2.1. Types of Data
Research results or findings reveal information’s that are obviously an output of properly
and carefully collected relevant data, after they are being analyzed through legitimate data
analysis instruments. So, data are always a base (or an input) for research. This implies
that the quality of our study is heavily dependent on the quality of our data. Data can
be collected from different sources which are generally grouped under two major categories,
namely, primary and secondary sources of data. Thus, despite their nature (i.e., qualitative
or quantitative, discrete or continuous, etc), data are necessarily from:
1. Primary Data: Primary data is the one which is collected by the investigator himself
for the purpose of a specific inquiry or study. These data are those data collected for
the first time either through direct observation or by enquiring individuals under the
direct supervision and instruction of the researcher. Such data is original in character
and is generated in surveys conducted by individuals or research institutions.
2. Secondary Data: When an investigator uses the data which has already been collected
by others, such data is called secondary data. This data is primary data for the agency
that collected it and becomes secondary data for someone else who uses this data for
his own purposes. The secondary data can be obtained from journals, official reports,
government publications, publications of professional and research organizations and so
on.
Based on the role of time, data can be classified as cross-sectional and time series.
1. Cross-sectional data: is a set of observations taken at a point of time.
11
2. Time series data: is a set of observations collected for a sequence of time usually at
equal intervals.
2.2. Methods of Data Collection
The first and foremost task in statistical investigation is data collection. Before the actual
data collection, four important points should be considered. These are the purpose of data
collection (why we need to collect data?), the data to be collected (what kind of data to
be collected?), the source of data (where we can get the data?) and the methods of data
collection (how can we collect this data?).
Once it is decided what type of study is to be made, it becomes necessary to collect information
about the concerned body. This information has to be collected from certain individuals
directly or indirectly. Such a technique is known as survey method. The survey methods
are commonly used in social sciences, i.e., problems related to sociology, political science,
psychology and various economic studies.
Another way of collecting data is experimentation, i.e., an actual experiment is conducted and
then observations (measurements and counts) will be recorded. Such experimental studies
are common in natural sciences; agriculture, biology, medical science, industry,...etc.
2.2.1. Questionnaire
The most common methods of data collection for survey are personal interview and self-
administered questionnaire. In these and other methods of data collection, it is necessary
to prepare a document, called questionnaire, which contains a number of questions to be
answered and is used to record the responses.
Questionnaire is a form containing a cover letter that explains about the person conducting
the survey and the objectives of the survey, and a set of related questions which will be
answered by the respondents. One of the most important points in preparing it is that all
questions in it must have relevance to the objectives of the survey. In short, the following
points should be kept in mind while designing a questionnaire:
B Questions should be simple, short and easy to understand and they should convey one
and only one idea. Technical terms should be avoided.
12
B Sensitive questions (questions of personal and financial nature) should be avoided. Such
questions should be obtained indirectly, by constructing a set of ranges and must put
at the last part.
Examples: age (0−25, 26−50, 51−75, > 75), salary (below 200, 200−500, 500−1000, >
1000).
B Leading questions should be completely avoided. If you ask person like “Do not you
smoke?” the person will automatically say ‘Yes I do not’.
B Answers to the questions should not require any calculation.
B Questions should be capable of objective answers.
2.2.2. Secondary Data
Secondary data should be used with utmost care. The investigator, before using these data,
must observe that they possess the following characteristics.
1. Reliability of Data: The data collected from other source should be reliable enough
to be used by the investigator. Determining and testing the reliability of secondary data
is the most important as well as difficult task. Reliability can be tested by answering
questions like:
Who collected them?
What were the sources of data?
What methods were used to collect them?
At what time were they collected?
2. Suitability of Data: Before using secondary data, they must be evaluated whether
they could serve for another purpose other than the one for which they were collected.
The suitability of data can be evaluated from the point of the nature and scope of
investigation view.
3. Adequacy of Data: Adequacy can be tested by evaluating the data in terms of area
coverage, level of accuracy, number of respondents participated and so on.
13
Once the above points are observed in the secondary data, it is ready to be used for further
analysis.
2.3. Data Organization
It is almost impossible for management to deal with all the collected data in the raw form
as it is in a haphazard and unsystematic form. In order to describe situations and make
inferences about the population even to describe the sample, the data must be organized into
some meaningful way.
2.3.1. Editing Data
Before further analysis, the collected data should be edited for completeness, consistency,
accuracy and homogeneity.
Completeness: If the answer to some questions is missing, it becomes necessary to contact

the person again and complete the missing information.
Consistency: Some information given by the respondent may not be compatible in the
sense that an information furnished by the individual either does not justify some other
information or is contradictory to earlier one.
Accuracy: It is of vital importance. If the data are inaccurate, the conclusions drawn from
it have no relevance. If the investigator has either made a false report or the respondent
has deliberately supplied the wrong information, editing will be of no use. In recent
times, checks have been evolved to attain accuracy example by sending supervisors to
check the work of investigators or reinvestigating a few respondents after a certain gap
of time.
Homogeneity: To maintain homogeneity, the information sheets are checked to see whether
the unit of information or measurement is the same in all the questionnaires. If differ-
ences are there, it has to be converted to the same unit during editing.
2.3.2. Classification of Data
The next important step towards organizing data is classification. Classification is the sep-
aration of items according to similar characteristics and grouping them into various groups.
14
Data may be classified into four broad classes:
1. Geographical classification. This classification groups the data according to location

differences; places, areas or regions among the items. The geographical areas are usually
listed in alphabetical order for easy reference.
2. Chronological classification. Chronological classification includes data according to

the time period; i.e., weekly, monthly, quarterly, annually, ... in which the items under
consideration occurred.
3. Qualitative classification. In this type of classification, the data is grouped together

according to some distinguished characteristic or attribute such as religion, sex, nation
and so on. This classification simply identifies whether a given attribute is present or
absent in a given population.
4. Quantitative classification. It refers to the classification of data according to some

characteristics that has been measured such as classification according to weight, height,
income and so on.
2.3.3. Tabulation of Data
A table is a systematic arrangement of data in rows and columns, which is easy to understand
and makes data fit for further analysis and drawing conclusions. Tabulation should not be
confused with classification, as the two differ in many ways. Mainly the purpose of classifica-
tion is to divide the data into homogenous groups whereas the data are presented into rows
and columns in tabulation. Hence, classification is a preliminary step prior to tabulation.
A statistical table, in general, should have the following parts.
1. Table Number: Every table should be identified by a number. It facilitates easy

reference. Whenever you refer to the table in the text, you can give the number of the
table only.
2. Title: There should be a title at the top of every statistical table. The title should be
clear, concise and adequate. The title should answer the questions : What is the data?
where is the data? how is the data classified? and, what is the time period of data?
3. Stub: It is a title given to each row.
15
4. Caption: The caption labels the data presented in a column of the table. There may
be sub-captions in each caption.
5. Body: The body of the table is the most important part. The information given in the
rows and columns forms the body of the table. It contains the quantitative information
to be presented.
6. Footnote: Any explanatory notes concerning the table itself, placed directly beneath
the table, is called ‘footnote’. The main purpose of footnote is to clarify some of the
specific items given in the table or to explain the ambiguities, omissions, if any, about
the data shown in the table.
7. Source Note: If the data is collected from secondary sources, a source note is given
to disclose the sources from which the data is collected.
Example: Consider the following format.
Table 2.1: Title of your table.

Caption 1 Caption 2 Caption 3
Stub 1 15 65 3.5
Stub 1 22 88 6.3
Stub 3 a∗ 78 5.3
∗ No caption is found for stub 3.
Though the format of a table has already been discussed, some guidelines for preparing a
table are as follows:
The table should contain the required number of rows and columns with stubs and cap-
tions and the whole data should be accommodated within the cells formed corresponding
to these rows and columns.
If the quantity is zero, it should be entered as zero. Leaving blank space or putting
dash in place of zero is confusing and undesirable.
The unit of measurement should either be given in parentheses just below the column’s
caption or in parentheses along with the stub in the row.
If any figure in the table has to be specified for a particular purpose, it should be marked
with an asterisk or another symbol. The specification of the marked figure should be
16
explained at the beneath of the table with the same mark.
2.3.4. Frequency Distributions
The most convenient way of organizing numerical data is to construct a frequency distribu-
tion. Frequency distribution is the organization of raw data in table form, using classes and
frequencies. Here the term ‘class’ is a description of a group of similar numbers in a data set
while ‘frequency’ is the number of times a variable value is repeated. Hence, ‘class frequency’
is the number of observations belonging to a certain class.
There are three types of frequency distributions: categorical, ungrouped and grouped fre-
quency distributions.
1. Categorical Frequency Distribution: the data is qualitative i.e. either nominal or

ordinal. Each category of the variable represents a single class and the number of times
each category repeats represents the frequency of that class (category).
Example: The blood type of 22 students is given below. Construct categorical fre-
quency distribution.
A B B AB O A O O B AB B
A B B O A O AB A O O AB
Class (Blood type) Frequency (no of students)

A 5
B 6
AB 4
O 7
Total 22
2. Ungrouped Frequency Distribution: A frequency distribution of numerical data

(quantitative) in which each value of a variable represents a single class. The values of
the variable are not grouped) and the number of times each value repeats represents
the frequency of that class.
Example: Number of children for 21 families is:
235433231043221114222
17
Construct ungrouped frequency distribution.
Class (no of children) Frequency (No of families)

0 1
1 4
2 7
3 5
4 3
5 1
Total 21
3. Grouped (Continuous) Frequency Distribution: A frequency of numerical data in

which several values of a variable are grouped into one class. The number of observations
belonging to the class is the frequency of the class.
Example: Consider age group and number of persons:
Class Limits Class Boundaries Frequency

1-25 0.5-25.5 20
26-50 25.5-50.5 15
51-75 50.5-75.5 25
76-100 75.5-100.5 10
Total 70
Basic Terms
Class Limits: the lowest and highest values that can be included in a class are called
class limits. The lowest values are called lower class limits and the highest values are
called upper class limits. For example: Class limit for the first class is 1-25, where 1 is
the lower class limit and 25 is the upper class limit of the first class.
Class Boundaries: are class limits when there is no gap between the UCL of the first
class and the LCL of the second class. The lowest values are called lower class boundaries
and the highest values are called upper class boundaries. The class boundary for the
first class 0.5-25.5 where the Lower class boundary is 0.5 and the Upper class boundary
is 25.5. Note that the UCL of one class is the LCL of the next class.
Class Width: the difference between UCB and LCB of a class. It is also the difference
18
between the lower limits of two consecutive classes or it is the difference between upper
limits of two consecutive classes.
w = U CBi − LCBi
= LCLi − LCLi−1
= U CLi − U CLi−1
= CMi − CMi−1
For the above example, w = 25.5 − 0.5 = 26 − 1 = 50 − 25 = 25.
Class Mark: is the half way between the class limits or the class boundaries.
LCLi + U CLi LCBi + U CBi
cmi = =
2 2
Relative Frequency
The absolute frequency distribution is a summary table in which the original data is
condensed into groups and their frequencies, which is called absolute frequency distri-
bution. But if a researcher would like to know the proportion or percentage of cases in
each group, instead of simply, the number of cases, s/he can do so by constructing a
relative frequency distribution table. The relative frequency distribution can be formed
by dividing the frequency in each class of the frequency distribution by the total number
of observations. It can be converted in to a percentage frequency distribution by simply
multiplying each relative frequency by 100.
The relative frequencies are particularly helpful when comparing two or more frequency
distributions in which the number of cases under investigation are not equal. The
percentage distributions make such a comparison more meaningful, since percentages
are relative frequencies and hence the total number in the sample or population under
consideration becomes irrelevant.
Class Limits Class Boundaries Relative Frequency Percentage Frequency

1-25 0.5-25.5 20/70 = 0.2857 28.57
26-50 25.5-50.5 15/70 = 0.2143 21.43
51-75 50.5-75.5 25/70 = 0.3571 35.71
76-100 75.5-100.5 10/70 = 0.1429 14.29
Total 1 100
19
Cumulative Frequency
The above frequency distributions tell us the actual number (percentage) of units in each
class, it does not tell us directly the total number (percentage) of units that lie below
or above the specified values of the classes. This can be determined from a cumulative
frequency distribution. A cumulative frequency distribution displays the total number
of observations above (below) a certain value. When the interest of the investigator
focuses on the number of items below a specified value, then this specified value is the
upper boundary of the class. It is known as less than cumulative frequency distribution.
Similarly, when the interest lies in finding the number of cases above a specified value,
then this value is taken as the lower boundary of the specified class and is known as
more than cumulative frequency distribution.
Class Limits Class Boundaries Frequency LCF MCF

1-25 0.5-25.5 20 20 20+15+25+10=70
26-50 25.5-50.5 15 20+15=35 15+25+10=50
51-75 50.5-75.5 25 20+15+25=60 25+10=35
76-100 75.5-100.5 10 20+15+25+10=70 10
Total 70
Steps for the Construction of Grouped Frequency Distribution
(a) Arrange the data in an array form (increasing or decreasing order).
(b) Find the unit of measurement (u). u is the smallest difference between any two
distinct values of the data.
(c) Find the Range(R). R is the difference between the largest and the smallest values
of the variable.
R = max − min
(d) Determine the number of classes (k) using Sturge’s rule.
k = 1 + 3.322 log N
where N is the total number of observations.
20
(e) Specify the class width (w).
R R
w= =
k 1 + 3.322 log N
(f) Put the smallest value of the data set as the LCL of the first class. To obtain
the LCL of the second class add the class width w to the LCL of the first class.
Continue adding until you get k classes.
Let x be the smallest observation.

LCL1 = x
LCLi = LCLi−1 + w for i = 2, 3, ..., k.
Obtain the UCLs of the frequency distribution by adding w−u to the corresponding
LCLs.
U CLi = LCLi + (w − u) for i = 2, 3, ..., k.
(g) Generate the class boundaries.
LCBi = LCLi − u
2 and U CBi = U CLi + u
2 for i = 2, 3, ..., k.
Example: Mark of 50 students out of 40.
16 21 26 24 11 17 25 26 13 27 24 26 3 27 23 24 15 22 22 12 22 29 18 22 28
25 7 17 22 28 19 23 23 22 3 19 13 31 23 28 24 9 20 33 30 23 20 8 21 24
Construct grouped frequency distribution for the given data set.
Solution:
The array form of the data (increasing order).
3 3 7 8 9 11 12 13 13 15 16 17 17 18 19 19 20 20 21 21 22 22 22 22 22 22
23 23 23 23 23 24 24 24 24 24 25 25 26 26 26 27 27 28 28 28 29 30 31 33
u = 9 − 8 = 1, R = max − min = 33 − 3 = 30
k = 1 + 3.322 log N = 1 + 3.322 log 50 = 6.64 ≈ 7
w = R/k = 30/6.64 = 4.5 ≈ 5
w−u=5−1=4
Hence, the grouped frequency distribution for score of 50 student is:
21
Class Limits Class Boundaries Class Mark Frequency

3-7 2.5-7.5 5 3
8-12 7.5-12.5 10 4
13-17 12.5-17.5 15 6
18-22 17.5-22.5 20 13
23-27 22.5-27.5 25 17
28-32 27.5-32.5 30 6
33-37 32.5-37.5 35 1
Total 50
Advantages and disadvantages of grouped frequency distributions:
Advantages:
– It condenses a large mass of data into a comparatively small table.
– It attracts the attention of even a layman and gives him an insight into the nature
of the distribution.
– It helps for further statistical analysis, like central tendency, scatter, symmetry, of
the data.
Disadvantages:
– In the grouped frequency distributions, the identity of the observations is lost. We

know only the number of observations in a class and do not know what the values
are.
– Because the selection of the class width and the lower class limit of the first class are
to a certain extent arbitrary, different frequency distributions may be constructed
for the same data and hence may give contradictory impressions.
2.4. Methods of Data Presentation
This section covers methods for organizing and displaying data. Such methods provide sum-
mary information about a data set and may be used to conduct exploratory data analyses.
The methods for providing summary information are essential to the development of hypothe-
ses and to establishing the groundwork for more complex statistical analyses.
22
Though the data presented in the form of table yields a good information, they are not
always good for all. Showing data in the form of a graph can make complex and confusing
information appear more simple and straightforward.
Graphic Display of Data
Bar Chart
It is the simplest and most commonly used diagrammatic representation of a frequency dis-
tribution. It is the most common presentation for nominal, categorical or discrete data. It
uses a serious of separated and equally spaced bars. The heights of the bars represent the
frequency or relative frequency of the classes. But the width of the bars has no meaning;
however, all the bars should be the same width to avoid distortion. And also the bars are
separated by constant distance.
B Simple Bar Chart: is a diagram in which categories of a variable are marked on the X
axis and the frequencies of the categories are marked on the Y axis. It is applicable for
discrete variables, that is, for data given according to some period, places and timings.
These periods and timings are represented on the base line (X axis) at regular interval
and the corresponding frequencies are represented on the Y-axis.
– The width of the bar represents nothing (it is meaningless), but it should be equal
for all bars.
– Each bar is separated by an equal space.
– It can also represent some magnitude (on the Y axis) over time, space, groups, etc
(on the X axis).
Example:
Construct simple bar chart for the following data.
Marital Status Number of Individuals

Single 10
Married 7
Divorced 3
Others 1
Total 21
23
B Component Bar Chart: is used when there is a desire to show a total or aggregate is
divided into its component parts. The bars represent total value of a variable with each
total broken into its component parts and different colors are used for identification.
In such type of diagrams, a bar is subdivided into parts in proportion to the size of the
subdivision. These subdivided rectangles are shaded differently by lines, dots and colors
so that they will be very easy to compare the components. Sometimes the volumes of
different attributes may be greatly different.
For making meaningful comparisons, the components of the attributes are reduced to
percentages. In that case each attribute will have 100 as its maximum volume. This
sort of component bar chart is known as percentage bar chart.
Example:
Consider the following table and the corresponding chart.
Marital Male Female Total

Single 90 10 100
Married 30 40 70
Others 5 25 30
24
B Multiple Bar Chart: used to display data on more than one variable. In the multiple
bars diagram two or more sets of inter-related data are interpreted.
Example: Consider the following table which show the export of some item for a given
country and the corresponding chart.
Year Coffee Butter Sugar

1997 120 127 75
1998 25 98 87
1999 100 120 75
2000 198 98 60
Pie Chart
Pie chart is popularly used in practice to show percentage break down of data. It is a circle
representing a set of data by dividing the circle into sectors proportional to the number of
items in the categories or it is a circle representing the total, cut into slices in proportional to
the size of the parts that make up the total. It gives the proportional sizes of different data
groups as slice of a pie or a circle.
25
Example: Construct pie chart for the following data.
Marital Status Number of individuals Percentage Degree

Single 10 10×100
21 = 47.62 47.62×360
100 = 171.43
Married 7 21 = 33.33
7×100 33.33×360
100 = 119.99
Divorced 3 21 = 14.29
3×100 14.29×360
100 = 51.44
Others 1 21 = 4.76
1×100 4.76×360
100 = 17.14
Total 21 100 360
Histogram
Histogram is the most common graphical presentation of a frequency distribution for numer-
ical data. It uses a series of adjacent bars in which the width of each bar represents the class
width and the heights represent the frequency or relative frequency of the class. It is used for
grouped data in which the class boundaries are marked on the X axis and the frequencies are
marked along the Y axis.
Example:
In the following, the heights of 45 female students at Haramaya University are recorded to the
nearest inch. Construct a histogram by hand first. Check your result by using any statistical
package.
67 67 64 64 74 61 68 71 69 61 65 64 62 63 59
70 66 66 63 59 64 67 70 65 66 66 56 65 67 69
64 67 68 67 67 65 74 64 62 68 65 65 65 66 67
26
Frequency Polygon
It is a graph that consists of line segments connecting the intersection of the class marks
and the frequencies of a continuous frequency distribution. It can also be constructed from
histogram by joining the mid-points of each bar.
27
Cumulative Frequency Curves (Ogive)
As there are two cumulative frequency distributions, there are two ogive (pronounced as “oh-
jive”) curves. These are the less than cumulative frequency which is a line graph joining the
intersection points of the upper class boundaries and their corresponding less than cumulative
frequencies and the more than cumulative frequency which is a line graph joining the inter-
section points of the lower class boundaries and their corresponding more than cumulative
frequencies.
Example: Consider the following ogive curves for the marks of 50 students.
28
2.5. Exercises
1. A car salesman takes inventory and finds that he has a total of 125 cars to sell. Of
these, 97 are the 2001 model, 11 are the 2000 model, 12 are the 1999 model, and 5 are
the 1998 model. Which two types of charts are most appropriate to display the data?
Construct one of the plots.
2. Define the following graphical methods and describe how they are used.
a) Bar chart
b) Histogram
c) Relative frequency histogram
d) Frequency polygon
e) Ogive
3. The following are the ages of 30 patients in the emergency room of a hospital on a
Friday night. Construct a histogram display from these data.
35 32 21 43 39 60
36 12 54 45 37 53
45 23 64 10 34 22
36 45 55 44 55 46
22 38 35 56 45 57
4. The final grades in Basic Statistics of 80 students at Haramaya University are recorded
in the accompanying table.
68 84 75 82 68 90 62 88 76 93
73 79 88 73 60 93 71 59 85 75
61 65 75 87 74 62 95 78 63 72
66 78 82 75 94 77 69 74 68 60
96 78 89 61 75 95 60 79 83 71
79 62 67 97 78 85 76 65 71 75
65 80 73 57 88 78 62 76 53 74
86 67 73 81 72 63 76 75 85 77
29
Use these data to prepare:
(a) a frequency distribution.
(b) a relative frequency distribution.
(c) a cumulative frequency distribution.
(d) a histogram.
(e) a frequency polygon.
30
3
Measures of Central Tendency
3.1. Introduction
Usually the collected data is not suitable to draw conclusions about the mass from which it
has been taken. Even though the data will be some what summarized after it is depicted
using frequency distributions and presented by using graphs and diagrams, still we cannot
make any inferences about the data since we have many groups. Hence, organizing a data
into a frequency is not sufficient, there is a need for further condensation, particularly when
we want to compare two or more distributions we may reduce the entire distribution into one
number that represents the distribution we need. A single value which can be considered as
a typical or representative of a set of observations and around which the observations can be
considered as centered is called an average (or average value or center of location). Since such
typical values tend to lie centrally within a set of observations when arranged according to
magnitudes; averages are called measures of central tendency (MCT).
3.2. Objectives of MCT
To condense a mass of data in to one single value. That is to get a single value which
is best representative of the data (that describes the characteristics of the entire data).
Measures of central tendency, by condensing masses of in to one single value enable us
to get an idea of the entire data. Thus one value can represent thousands of data even
more.
To facilitate comparison. Statistical devices like averages, percentages and ratios used
for this purpose. Measures of central tendency, by condensing masses of in to one single
value, facilitates comparison. For instance, to compare two classes A and B, instead
31
of comparing each student result, which is practically infeasible, we can compare the
average mark of the two classes.
3.3. Desirable Properties of Good MCT
A measure of central tendency is good or satisfactory if it possesses the following character-

istics.
1. It should be calculated based on all observations.
2. It should not be affected by extreme values.
3. It should be defined rigidly which means it should have a definite value.
4. It should always exist.
5. It should be easy to understand and calculate. It should not be subject to complicated

and tedious calculations, though the advent of electronic calculators and computers has
made it possible.
6. It should be capable of further algebraic treatment. By algebraic treatment, we mean

that the measures should be used further in the formulation of other formulae or it
should be used for further statistical analysis.
3.4. Summation Notation
Suppose we have variable x having successive values x1 , x2 , ..., xn . The sum of these values
can be written as x1 + x2 + ... + xn . This can be written as using Greek letter as
P
n
x1 + x2 + ... + xn =
X
xi
i=1
By notation we can write

P
. x21 + x22 + ... + x2n =

Pn 2
i=1 xi
. x1 y1 + x2 y2 + ... + xn yn =
Pn
i=1 xi yi
. (x1 + x2 + ... + xn )2 = ( i=1 xi )

Pn 2
+ + + =
1 1 1 1 P4 1
. x1 x2 x3 x4 i=1 xi
32
Rules of Summation
1. For two variables x and y we have

n n n
(xi ± yi ) =
X X X
xi ± yi
i=1 i=1 i=1
2. If k is constant number, we have

n
kxi = k
X X
xi
i=1 i=1
3. For constant number k, we have

n
k = nk
X
i=1
4. i=1 (xi − k)2 = − 2k + nk 2

Pn Pn 2 Pn
i=1 xi i=1 xi
From now onwards we will use xi in place of just for simplicity.

P Pn
i=1 xi
3.5. Types of Measures of Central Tendency
There are many types of measures of central tendency, each possessing particular properties
and each being typical in some unique way. The most frequently encountered ones are
. Mean (computed average)
– Arithmetic mean (simple arithmetic mean, weighted arithmetic mean and com-
bined mean)
– Geometric mean
– Harmonic mean
. Mode (the most frequented value)
. Positional averages
– Median
– Quantiles (quartiles, deciles and percentiles)
33
3.6. Mean
3.6.1. Arithmetic Mean (AM)
Simple Arithmetic Mean
1. Suppose a variable x has observed values x1 , x2 , ..., xn . The simple arithmetic mean
denoted by x̄ (for sample) and µ (for population) is the sum of these observations
divided by the total number of observations. Symbolically,
x1 + x2 + ... + xn
Pn
i=1 xi
x̄ = =
n n
x1 + x2 + ... + xN
PN
i=1 xi
µ= =
N N
Simple AM is the most commonly used average.
2. Suppose the values x1 , x2 , ..., xn are accompanied by frequencies f1 , f2 , ..., fn respec-

tively, then the simple AM is given by
f1 x1 + f2 x2 + ... + fn xn
P
fi xi
x̄ = = P
f1 + f2 + ... + fn fi
3. For data in grouped frequency distribution we use the class mark instead of each ob-
served value and simple AM is given by
f1 m1 + f2 m2 + ... + fn mn
P
fi mi
x̄ = = P
f1 + f2 + ... + fn fi
where mi is the class mark of the ith class.
Example 1: The heights of 7 students selected from a class are given below in centimeter.
165, 160, 172, 168, 159, 170, 173. Calculate the simple AM of heights.
x1 + x2 + ... + x7 1167
P7
i=1 xi
x̄ = = = = 166.5 cm
7 7 7
Example 2: The following is the frequency distribution of marks in Stat 1011 of 46 students
(out of 20). Find the mean mark of this class.
Mark (xi ) 9 10 11 12 13 14 15 16 17 18 Total

No of students (fi ) 1 2 3 6 10 11 7 3 2 1 46
fi xi 9 20 33 72 130 154 105 48 34 18 623
34
f1 x1 + f2 x2 + ... + fn xn 623
P
fi xi
x̄ = = P = = 13.54
f1 + f2 + ... + fn fi 46
Example 3: Calculate the mean amount of yield of maize from the grouped frequency
distribution given below.
Yield (in kg) No of plots (fi ) Class mark (mi ) fi mi

171-179 3 175 525
180-188 7 184 1288
189-197 12 193 2316
198-206 9 202 1818
207-215 4 211 844
216-224 4 220 880
225-233 1 229 229
Total 40 7900
f1 m1 + f2 m2 + ... + fn mn 7900
P
fi mi
x̄ = = P = = 197.5 kg per plot
f1 + f2 + ... + fn fi 40
Weighted Arithmetic Mean
It is an arithmetic mean used when all observations in data have unequal relative importance
(technically termed as weight). Suppose x1 , x2 , ..., xn have weights w1 , w2 , ..., wn respectively,
then weighted arithmetic mean (x̄w ) is given by
w1 x1 + w2 x2 + ... + wn xn
P
wi xi
x̄w = = P
w1 + w2 + ... + wn wi
Example: Semester grade point average (GPA) of a student is a good example of weighted
arithmetic mean.
Course Weights (Credit hours) Grade (x)
Stat 281 4 B=3
Math 261 4 B=3
Math 224 3 C=2
Phil 201 3 B=3
Comp 201 3 C=2
Calculate the GPA of this student?
w1 x1 + w2 x2 + w3 x3 + w4 x4 + w5 x5 45
P
wi x i
GP A = x̄w = = P = = 2.64
w1 + w2 + w3 + w4 + w5 wi 17
35
Combined Mean
If there are k different groups (having the same unit of measurement) with mean x̄1 , x̄2 , ..., x̄k
and number of observations n1 , n2 , ..., nk respectively, then the mean of all the groups i.e. the
combined mean is given by
¯ = x̄c = n1 x̄1 + n2 x̄2 + ... + nk x̄k = Pni x̄i

P
x̄
n1 + n2 + ... + nk ni
Example: There are 49 students in a certain department. Among these 7 are seniors with
average weight of 165 lbs, 9 are juniors with average weight of 160 lbs, 13 are sophomores
with average weight of 152 lbs and 20 freshman with average weight of 150 lbs. Find the
average weight of students in the department.
¯ = ns x̄s + nj x̄j + nso x̄so + nf x̄f

x̄
ns + nj + nso + nf
7 × 165 + 9 × 130 + 3 × 152 + 20 × 150
=
7 + 9 + 13 + 20
= 93.28 lbs
Properties of Arithmetic Mean
B If a constant k is added or subtracted from each value in a distribution, then the new
mean will be
x̄new = x̄old ± k
B If each value of a distribution is multiplied by a constant k, the new mean will be the
original mean multiplied by k. That is,
x̄new = kx̄old
B Arithmetic mean can be calculated for any set of data (quantitative data), and it will
be unique.We cannot calculate AM for open-ended grouped frequency distribution.
B It is highly affected by extreme values.
B It lends itself for further statistical analysis. For example, as combined mean.
B The algebraic sum of the deviations of each value from the arithmetic mean is zero.
That is
(xi − x̄) = 0
X
36
Example 1: The mean age of a group of 100 students was found to be 32.02 years. Later it
was discovered that age of 57 was misread as 27. Find the correct mean.
Solution:
Let x̄cor and x̄wr are the correct and wrong means respectively. Thus, from the given problem
x̄wr = 32.02, n = 100, xwr = 27 and xcor = 57.
( xi )wr
P
x̄wr =
n
( xi )wr = x̄wr × n
X
( xi )wr = 32.02 × 100 = 3202

X
( xi )cor = ( xi )wr + xcor − xwr

X X
( xi )cor = 3202 + 57 − 27 = 3232

X
( xi )cor
P
x̄cor =
n
3232
x̄cor = = 32.32year
100
Example 2: The mean weight of 150 students in certain class is 60 kg. The mean weight of
boys in the class is 70 kg and that of the girls is 55 kg. Find the number of boys and girls in
the class.
Solution:
Let nb and ng are number of boys and girls in the class respectively. Further, suppose
¯ = 60kg, x̄b = 70kg and x̄g = 55kg are the mean weight of both, boys and girls respectively.
x̄
nb + ng = 150 (3.1)
Using combined mean formula
¯ = nb x̄b + ng x̄g = 60 = 70nb + 55ng

x̄
nb + ng nb + ng
ng = 2nb (3.2)
Inserting equation (3.2) in equation (3.1) we obtain nb = 50 and ng = 100.
37
3.6.2. Geometric Mean (GM)
The geometric mean of n-positive numbers is the nth root of their product. The geometric
mean of x1 , x2 , ..., xn is given by the following for raw data, ungrouped and grouped frequency
respectively. v
u n
√
GM = x1 × x2 × ... × xn = t
n
uY
n
xi
i=1
v
q u n
f1 f2 fn u Y fi
GM = x1 × x2 × ... × xn = t
n n
xi
i=1
v
q u n
f1 f2 fn u Y fi
GM = m1 × m2 × ... × mn = t
n n
mi
i=1
We can also use logarithms to calculate GM

√
GM = n
x1 × x2 × ... × xn = (x1 × x2 × ... × xn )1/n
1
log GM = log(x1 × x2 × ... × xn )
n
1
log GM = (log x1 + log x2 + ... + log xn )
n
Taking antilog of both sides we get that
1 1X
GM = anti log{ log(x1 + log x2 + ... + log xn )} = anti log( log xi )
n n
If the variable values are measured as ratios, proportions or percentage and some values are
larger in magnitude and others are small, then the geometric mean is a better representative
of the data than the simple average. In a “geometric series”, the most meaning full average
is the geometric mean. The arithmetic mean is very biased toward the large numbers in the
series. The main disadvantage of geometric mean is that it cannot be calculated if one or
more observations are zero or negative. It is also affected by extreme values but not to the
extent of AM .
Examples
1. A given epidemic was spreading at the rate of 1.5 and 2.67 in two successive days. What
is its average spread rate?
Solution:
√ √ √
GM = x1 × x2 = 1.5 × 2.67 = 4.005 = 2.001
38
2. The price of a commodity increased by 5% from 1989 to 1990, 8% from 1990 to 1991
and by 77% from 1991 to 1992. Find the average price increase.
Solution:
For increment, take the base line value as 100% and then add the % increase so as to
get the values in successive years.
Year % increase Value (xi ) log xi

1989-1990 5 105 2.02
1990-1991 8 108 2.03
1991-1992 77 177 2.25
Total log xi = 6.30
P
Then,
1X 1
GM = anti log( log xi ) = anti log( × 6.30) = anti log(2.1) = 125.89
n 3
Therefore, the price increment is 25.89%.
3. A machine depreciated by 10% each in the first two years and by 40% in the third year.
Find out the average rate of depreciation.
Solution:
Like the previous one, take the base line value of the machine as 100% and then deduct
the % of depreciation so as to get the depreciated values in successive years.
Year % depreciation Value (xi ) log xi

1 10 90 1.95
2 10 90 1.95
3 40 60 1.79
Total log xi = 5.69
P
Then,
1X 1
GM = anti log( log xi ) = anti log( × 5.69) = anti log(1.70) = 50.12
n 3
Therefore, the machine depreciated by is 49.88%.
3.6.3. Harmonic Mean (HM)
Harmonic mean is another specialized average which is useful in averaging variables expressed
as rate per unit of time such as speed, number of units produced per day. Simple harmonic
39
mean is the reciprocal of the arithmetic mean of the numbers.
n n
HM = =P 1
1
x1 + 1
x2 + ... + 1
xn xi
The simple HM is preferably used to calculate average speed for fixed distance, average
price for fixed total cost, average time for fixed total distance.
For ungrouped frequency distribution,
f1 + f2 + ... + fn
P
fi
HM = f1 f2 fn
=P fi
x1 + x2 + ... + xn xi
For grouped frequency distribution,
f1 + f2 + ... + fn
P
fi
HM = f1 f2 fn
=P fi
m1 + m2 + ... + mn mi
The weighted HM of n non-zero observations x1 , x2 , ..., xn having weights w1 , w2 , ..., wn re-

spectively is given by
w1 + w2 + ... + wn
P
wi
HMw = w1 w2 wn = P wi
x1 + x2 + ... + xn xi
The weighted HM is used to compute mean speed to cover differing distances, mean
prices when the total cost is not fixed, etc.
Examples
1. A driver travels for 3 days at speed of 48 km/hr for about 10 hrs, 40 km/hr for 12 hrs,
32 km/hr for 15 hrs respectively. What is the average speed of the driver in 3 days?
Solution:
Using di = si × ti ; i = 1, 2, 3 the distance covered in three days is fixed, which is 480km.
So simple HM is appropriate to compute the average speed.
3 3
HM = =P 1
1
x1 + 1
x2 + 1
x3 xi
3 3
= =
1
48 + 1
40 + 1
32
0.0771
= 38.91km/hr
2. A driver travelled for 3 days on first days he derived for 10 hrs at speed of 48 km/hr,
on the second day for 12 hrs at 45 km/hr, on third day for 15 hrs at 40 km/hr. What
is the average speed?
40
Solution:
Using di = si × ti ; i = 1, 2, 3 the distance covered in each day is not fixed, which is
480km, 540km and 600km respectively. So weighted HM is appropriate to compute
the average speed.
w1 + w2 + w3
P
wi
HMw = w1 w2 w3 = P wi
x1 + x2 + x3 xi
10 + 12 + 15 37
= 10 12 15 =
48 + 45 + 40
0.892
= 41.48km/hr
Some Empirical Relationship among AM, GM and HM
B The GM of two numbers x1 and x2 is equal to the GM of their AM and HM. That is,
√ √
GM = x1 × x2 = AM × HM
B For n positive numbers HM ≤ GM ≤ AM .
3.7. Mode
The mode (modal value) of data set is the value that occurs most frequently. When two values
occur with the same greatest frequency, each one is a mode and the data set is bimodal. When
more than two values occur with the greatest frequency, each is a mode and the data set is
said to be multimodal. When no value is repeated or values are equally repeated, we say that
there is no mode.
Example 1: Find the modes of the following data sets.
I 5553151435
I 122234566679
I 1 2 3 6 7 8 9 10
In a frequency distribution, the mode is located in the class with highest frequency and that
class is the modal class. Then the formula for mode is
fx̂ − fx̂−1

x̂ = Lx̂ + w
(fx̂ − fx̂−1 ) + (fx̂ − fx̂+1 )
41
where
Lx̂ is the lower class boundary of the modal class,

fx̂ is the frequency of modal class,
fx̂−1 is the frequency of the class which precedes the modal class,
fx̂+1 is the frequency of the class which is successor of the modal class and
w is the class width of the modal class.
Example: Use the frequency distribution of heights in the following table to find the mode
of height of the 100 male students at XYZ university and interpret the result.
Height (in) Frequency (fi )

59.5-62.5 5
62.5-65.5 18
65.5-68.5 42
68.5-71.5 27
71.5-74.5 8
Solution:
A class having the highest frequency is considered as a modal class. Thus the 3rd class
(65.5-68.5) is the modal class.
fx̂ − fx̂−1

x̂ = Lx̂ + w
(fx̂ − fx̂−1 ) + (fx̂ − fx̂+1 )
42 − 18

= 65.5 + ×3
(42 − 18) + (42 − 27)
24

= 65.5 + ×3
39
= 65.5 + 1.846
= 67.346
Mode is not affected by extreme values and can be calculated for open-ended classes. But it
often does not exist and is value may not be unique. In such case mode is ill-defined.
Properties of Mode
1. It is simple to calculate and easy to determine.
2. It is not based on all observations.
42
3. The mode can be used for both qualitative (such as religious preference, gender, political
affiliation, etc) and quantitative data types.
3.8. Median
A median is a value which divides set of data in to two equal parts such that the number of
observations below it is the same as the number of observations above it. It is the middle
value when the values are arranged in order of increasing (or decreasing) magnitude. To
find the median, first sort the values (arrange them in order), then use one of the following
procedures.
1. If the number of values is odd, the median is the number that is located in the exact
middle of the list.
n+1
th
x̃ = value
2
Example: What is the median of 180, 201, 220, 191, 219, 209 and 220.
Solution:
First we should have to sort the data: 180, 191, 201, 209, 219, 220, 220. Since n = 7 is
odd
4+1
th
x̃ = value = 4th value = 209
2
2. If the number of values is even, the median is found by computing the mean of the two
middle numbers.
n th
th
+ n
+1

value value
x̃ = 2 2
2
Example: What is the median of 62, 63, 64, 65, 66, 66, 68 and 78.
Solution:
First we should have to sort the data: 62, 63, 64, 65, 66, 66, 68, 78. Since n = 8 is even
| {z }
n th
th
+ n
+1

value value
x̃ = 2 2
2
4th value + 5th value
=
2
65 + 66
= = 65.5
2
43
3. For grouped frequency distributions median is given by the formula
n
− Fx̃−1

x̃ = Lx̃ + 2
w
fx̃
where
Lx̃ is the lower class boundary of the median class,

Fx̃−1 is the less than cumulative frequency just before the median class,
w is the class width of the median class,
fx̃ is the frequency of the median class and n = fi .
P
n th
The median class is the class which include

2 value.
Example: The following table shows a frequency distribution of grades on a final examination
in college algebra for 120 students. Then, obtain median and interpret the results.
Grade No of students
30-39 1
40-49 3
50-59 11
60-69 21
70-79 43
80-89 32
90-99 9
Solution:
First we should do the following.
Class limits Class boundaries Frequency LCF

30-39 29.5-39.5 1 1
40-49 39.5-49.5 3 4
50-59 49.5-59.5 11 15
60-69 59.5-69.5 21 37
70-79 69.5-79.5 43 80
80-89 79.5-89.5 32 112
90-99 89.5-99.5 9 120
44
n th
The class which includes = 60th value is considered as the median class. Hence,

2 value
the 5th class is the median class.
n
− Fx̃−1

x̃ = Lx̃ + 2
w
fx̃
2 − 37
120
!
= 69.5 + × 10
43
= 74.849
Therefore, out of 120 students 60 of them scored less than 74.849 and 60 of them scored
greater than 74.849 on college algebra examination.
Properties of the Median
1. It is an average of location, not the average of the values in the data set.
2. It is more affected by the number of observations than the extreme values.
3. Median can be calculated even in the case open-ended interval.
3.9. Quantiles
The median gives us a value which divides the data set in to two equal parts. There are
also other positional measures that divide a given data set into more than two equal parts.
These measures are collectively known as quantiles. Quantiles include quartiles, deciles and
percentiles.
Quartiles are some three points that divide the array in to four parts in away each portion
contains equal number of observations. The first, second and third points are called the
first (Q1 ), second (Q2 ) and third (Q3 ) quartiles respectively. 25% of the data fall below
Q1 , 50% below Q2 and 75% below Q3 and
Q1 ≤ Q2 ≤ Q3
Deciles are nine points that divide the array in to ten equal parts.The first, second, . . . ,
ninth deciles are denoted by D1 , D2 , ..., D9 respectively. 10% of the data fall below D1 ,
20% below D2 , . . . , 90% below D9 and
D1 ≤ D2 ≤ . . . ≤ D9
45
Percentiles are ninety nine points that divide the array in to 100 equal parts. They are
denoted by P1 , P2 , ..., P99 . Always
P1 ≤ P2 ≤ . . . ≤ P99
Methods of Finding Quantiles
1. For raw data and data in ungrouped frequency distribution. After arranging data in
ascending order, we apply the following formula.
th
i(n + 1)

Qi = value, i = 1, 2, 3
4
th
i(n + 1)

Di = value, i = 1, 2, ..., 9
10
th
i(n + 1)

Pi = value, i = 1, ..., 99
100
Example: Given the data 420, 430, 435, 438, 441, 449, 490, 500, 510 and 515. Find
(a) all quartiles.

th
1 × (10 + 1)

Q1 = value = 2.75th value
4
= 2nd value + 0.75(3rd value − 2nd value)
= 430 + 0.75(435 − 430)
= 433.75
th
2 × (10 + 1)

4
= 5th value + 0.5(6th value − 5th value)
= 441 + 0.5(449 − 441)
= 445
th
3 × (10 + 1)

4
= 500 + 0.25(510 − 500)
= 502.5
46
(b) the 1st and 7th deciles.

th
1 × (10 + 1)

D1 = value = 1.1th value
10
= 1st value + 0.1(2nd value − 1st value)
= 420 + 0.1(430 − 420)
= 421
th
7 × (10 + 1)

D7 = value = 7.7th value
10
= 490 + 0.7(500 − 490)
= 497
(c) the 40th and 75th percentiles.

th
40 × (10 + 1)

P40 = value = 4.4th value
100
= 438 + 0.4(441 − 438)
= 439.2
th
75 × (10 + 1)

P75 = value = 8.25th value
100
= 500 + 0.25(510 − 500)
= 502.5
2. For data in grouped frequency distribution.
( in
4 − Fqi−1 )
Qi = Lqi + w
fqi
( 10
in
− Fdi−1 )
Di = Ldi + w
fdi
( 100
in
− Fpi−1 )
Pi = Lpi + w
fpi
47
where
Lqi , Ldi , Lpi are the lower class boundaries of the classes containing the con-
cerned quantile points,
Fqi−1 , Fdi−1 , Fpi−1 are the LCF of the class which precedes the class containing
the concerned quantile points,
fqi , fdi , fpi are frequencies of classes containing the concerned quantile points
and
w is the class width of a class containing the concerned quantile point.
Note
th
I Qi is found in the class containing the in
4 observation.
th
I Di is found in the class containing the in
10 observation.
th
I Pi is found in the class containing the in
100 observation.
Example: Calculate all quartiles, the 5th and 8th deciles, and the 30th and 80th percentiles
for the students score data and interpret the results.
Class boundaries Frequency (fi ) LCF

10.5-14.5 4 4
14.5-18.5 7 11
18.5-22.5 8 19
22.5-26.5 10 29
26.5-30.5 12 41
30.5-34.5 7 48
34.5-38.5 8 56
Solution:
th
Q1 is found in the 3rd class (18.5-22.5) because this class include the 1×56
4 = 14th value
( 1×56
4 − Fq0 )
Q1 = Lq1 + ×4
fq1
( 1×56
4 − 11)
= 18.5 + ×4
8
= 18.5 + 1.5 = 20
48
th
Q2 is found in the 4th class (22.5-26.5) because this class include the 2×56
4 = 28th value
( 2×56
4 − Fq1 )
Q2 = Lq2 + ×4
fq2
( 2×56
4 − 19)
= 22.5 + ×4
10
= 22.5 + 3.6 = 26.1
th
Q3 is found in the 6th class (30.5-34.5) because this class include the 3×56
4 = 42th value
( 3×56
4 − Fq2 )
Q3 = Lq3 + ×4
fq3
4 − 41)
( 3×56
= 30.5 + ×4
7
= 30.5 + 0.57 = 31.07
th
D5 is found in the 4th class (22.5-26.5) because this class include the 5×56
10 = 28th value
( 5×56
10 − Fd4 )
D5 = Ld5 + ×4
fd5
4 − 19)
( 2×56
= 22.5 + ×4
10
= 22.5 + 3.6 = 26.1
th
D8 is found in the 6th class (30.5-34.5) because this class include the 8×56
10 = 44.8th value
( 8×56
10 − Fd7 )
D8 = Ld8 + ×4
fd8
( 8×56 − 41)
= 30.5 + 10 ×4
7
= 30.5 + 2.17 = 32.67
th
P30 is found in the 3rd class (18.5-22.5) because this class include the 30×56
100 = 16.8th value
( 30×56
100 − Fp29 )
P30 = Lp30 + ×4
fp30
( 30×56
100 − 11)
= 18.5 + ×4
19
= 18.5 + 1.22 = 19.72
49
th
P90 is found in the 7th class (34.5-38.5) because this class include the 90×56
100 = 50.4th value
( 90×56
100 − Fp89 )
P90 = Lp90 + ×4
fp90
( 90×56
100 − 48)
= 34.5 + ×4
8
= 34.5 + 1.2 = 35.7
50
3.10. Exercises
1. Define and compare the characteristics of the mean, the median and the mode.
2. Your statistics instructor tells you on the first day of class that there will be five tests
during the term. From the scores on theses tests for each student he will compute a
measures of central tendency that will serve as the student’s final course grade. Before
taking the first test you must choose whether you want your final grade to be the mean
or the median of the five test scores. Which would you choose? Why? Justify your
answer.
3. A student’s final grades in mathematics, physics, chemistry and sport are, respectively,
82, 86, 90, and 70. If the respective credits received for these courses are 3, 5, 3, and 2,
determine an appropriate average grade.
4. A large department store collects data on sales made by each of its sales people. The
number of sales made on a given day by each of 20 sales people is shown below.
9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17
Then, find Q3 , D8 , P80 and P90 and interpret all results.
5. In a certain investigation, 460 persons were involved in the study, and based on an
enquiry on their age, it was known that 75% of them were 22 or more. The following
frequency distribution shows the age composition of the persons under study.
Mid age in years 13 18 23 28 33 38 43 48

Number of persons 24 f1 90 122 f2 56 20 33
(a) Find the median and modal life of condensers and interpret them.
(b) Find the values of all quartiles.
(c) Compute the 5th decile, 25th percentile, 50th percentile and the 75th percentile and
interpret the results.
6. Given the following frequency distribution,
Mid price of a commodity 15 25 35 45 55

Number of items sold 27 A 28 B 19
51
(a) If 75% of the items were sold in birr 45 or less and most items were sold in birr 34,
find the missing frequencies.
(b) If 25% of the items were sold in greater than or equal to birr 45 and most items
were sold in birr 34, find the missing frequencies.
52
4
Measures of Variation
4.1. Introduction
In the third chapter, we concentrated on a central value (measures of central tendency), which
gives an idea of the whole mass that is a complete set of values. However the information
so obtained is neither exhaustive nor comprehensive, as the mean does not lead us to know
whether the observations are close to each other or far apart. Median is a positional average
and has nothing to do with the variability of the observations in a data set. Mode is the
largest occurring value independent of the other values in the set. This leads us to conclude
that a measure of central tendency is not enough to have a clear idea about the data unless all
observations are the same. Moreover two or more data sets may have the same mean and/or
median but they may be quite different. So MCT alone do not provide enough information
about the nature of the data. The table below displays the price of a certain commodity in
four cities. Find the mean and median prices of the four cities and interpret it.
City A 30 30 30
City B 29 30 31
City C 15 30 45
City D 5 30 55
All the four data sets have mean 30 and median is also 30. But by inspection it is apparent
that the four data sets differ remarkably from one another. So measures of central tendency
alone do not provide enough information about the nature of the data. Thus, to have a
clear picture of the data, one needs to have a measure of dispersion or variability among
observations in the data set.
53
Variation or dispersion may be defined as the extent of scatteredness of value around the
measures of central tendency. Thus, a measure of dispersion tells us the extent to which the
values of a variable vary about the measure of central tendency.
4.2. Objectives of Measures of Variation
1. To have an idea about the reliability of the measures of central tendency. If

the degree of scatterdness is large, an average is less reliable. If the value of the variation
is small, it indicates that a central value is a good representative of all the values in the
data set.
2. To compare two or more sets of data with regard to their variability. Two or
more data sets can be compared by calculating the same measure of variation having
the same units of measurement. A set with smaller value posses less variability or is
more uniform (or more consistent).
3. To provide information about the structure of the data. A value of a measure of

variation gives an idea about the spread of the observation. Further, one can summarise
about the limits of the expansion of the values in the data set.
4. To pave way to the use of other statistical measures. Measures of variation

especially variance and standard deviation lead to many statistical techniques like cor-
relation, regression, analysis of variance,. . . etc.
4.3. Types of Measures of Variation
Absolute Measures of Variation: A measure of variation is said to be an absolute

form when it shows the actual amount of variation of an item from a measure of central
tendency and are expressed in concrete units in which the data have been expressed.
Relative Measures of Variation: A relative measure of variation is the quotient

obtained by dividing the absolute measure by a quantity in respect to which absolute
deviation has been computed. It is a pure number and used for making comparisons
between different distributions.
54
Absolute Measures Relative Measures

Range Coefficient of Range
Quartile Deviation Coefficient of Quartile Deviation
Mean Deviation Coefficient of Mean Deviation
Variance Coefficient of Variation
Standard Deviation Standard Scores
Before giving the details of these measures of dispersion, it is worthwhile to point out that a
measure of dispersion (variation) is to be judged on the basis of all those properties of good
measures of central tendency. Hence, their repetition is superfluous.
4.3.1. Range and Relative Range
Range is the simplest and crudest/rough measure of dispersion. It is defined as the difference
between the largest and the smallest values in the data.
I For raw data: R = L − S
I For grouped data: R = U CLlast − LCLf irst
Coefficient of Range:
I For raw data: CR = L−S

L+S
U CLlast −LCLf irst

I For grouped data: CR = U CLlast +LCLf irst
Range hardly satisfies any property of good measure of dispersion as it is based on two extreme
values only ignoring the others. It is not also liable to further algebraic treatment. The main
advantage in using range is the simplicity of its computation.
4.3.2. Quartile Deviation and Coefficient of Quartile Deviation
Quartile deviation is sometimes known as Semi-Interquartile Range (SIR). The interquartile

Range is Q3 − Q1 . Thus,
Q3 − Q1
QD =
2
The corresponding relative measure of variation, coefficient of quartile deviation is:
Q3 − Q1
CQD =
Q3 + Q1
55
QD involves only the middle 50% of the observations by excluding the observations below
the lower quartile and the observations above the upper quartile. Note also that it does not
take into account all the individual values occurring between Q1 and Q2 . It means that, no
idea about the variation of even the 50% mid values is available from this measure. Anyhow
it provides some idea if the values are uniformly distributed between Q1 and Q2 .
4.3.3. Mean Deviation and Coefficient of Mean Deviation
The measures of variation discussed so far are not satisfactory in the sense that they lack
most of the requirements of a good measure. Mean deviation is a better measure than range
and quartile deviation. Mean deviation is the arithmetic mean of the absolute values of the
deviation from some measures of central tendency usually the mean and the median of a
distribution. Hence we have mean deviation about the mean M D(x̄) and mean deviation
about the median M D(x̃).
Methods Obtaining Mean Deviation
P P
|xi −x̄| |xi −x̃|
I For raw data: M D(x̄) = n and M D(x̃) = n
P P
fi |mi −x̄| fi |mi −x̃|
I For grouped data: M D(x̄) = P
fi
and M D(x̃) = P
fi
M D is not much affected by extreme values. Its main drawback is that the algebraic negative
signs of the deviations are ignored. M D is minimum when the deviation is taken from median.
The coefficient of mean deviations are:
M D(x̄)
CM D(x̄) =
x̄
M D(x̃)
CM D(x̃) =
x̃
Examples
1. Consider a sample with data values of 27, 25, 20, 15, 30, 34, 28, and 25. Compute
the range, coefficient of range, quartile deviation, coefficient of quartile deviation, mean
deviation about mean, mean deviation about median, coefficient of mean deviation
about mean and coefficient of mean deviation about median.
56
Solution:
Data: 15, 20, 25, 25, 27, 28, 30, 34
max − min 34 − 15
R = max − min = 34 − 15 = 19, CR = = = 0.388
max + min 34 + 15
To find QD and CQD, we have to calculate Q1 and Q3 first.
th
1 × (8 + 1)

4
= 2nd value + 0.25(3rd value − 2nd value)
= 20 + 0.25 × (25 − 20)
= 21.25
th
3 × (8 + 1)

4
= 28 + 0.75 × (30 − 28)
= 29.5
Q3 − Q1 29.5 − 21.25
QD = = = 4.125
2 2
Q3 − Q1 29.5 − 21.25 8.25
CQD = = = = 0.163
Q3 + Q1 29.5 + 21.25 50.75
Beside to this to compute M D(x̄), M D(x̃), CM D(x̄) and CM D(x̃) we should obtain x̄
and x̃.
204
P
xi
x̄ = = = 25.5; x̃ = 26
n 8
|xi − x̄| |x1 − x̄| + |x2 − x̄| + ... + |x8 − x̄|
P
M D(x̄) = =
n 8
|15 − 25.5| + |20 − 25.5| + ... + |34 − 25.5|
=
8
34
= = 4.25
8
|xi − x̃| |x1 − x̃| + |x2 − x̃| + ... + |x8 − x̃|
P
M D(x̃) = =
n 8
|15 − 26| + |20 − 26| + ... + |34 − 26|
=
8
32
= =4
8
Thus,
M D(x̄) 4.25
CM D(x̄) = = = 0.1667
x̄ 25.5
M D(x̃) 4
CM D(x̃) = = = 0.154
x̃ 26
57
2. Calculate the R, QD and CQD for the following frequency distribution.
Class limits 10-14 15-19 20-24 25-29 30-34 35-38

Frequency 8 10 22 35 15 10
Solution:
Previously, we have obtained the following quantities for the students score data:
x̄ = 25.64, x̃ = 26.1, Q1 = 20, Q3 = 31.07
Class mi fi |mi − x̄| fi |mi − x̄| |mi − x̃| fi |mi − x̃|

10.5-14.5 12.5 4 13.14 52.56 13.6 54.4
14.5-18.5 16.5 7 9.14 63.98 9.6 67.2
18.5-22.5 20.5 8 5.14 41.12 5.6 44.8
22.5-26.5 24.5 10 1.14 11.40 1.6 16.0
26.5-30.5 28.5 12 2.86 34.32 2.4 28.8
30.5-34.5 32.5 7 6.86 48.02 6.4 44.8
34.5-38.5 36.5 8 10.86 86.88 10.4 83.2
Total 56 338.28 339.2
R = U CLlast − LCLf irst = 38 − 11 = 27
U CLlast − LCLf irst 38 − 11 27

CR = = = = 0.551
U CLlast + LCLf irst 38 + 11 49
Q3 − Q1 31.07 − 20 11.07
QD = = = = 5.54
2 2 2
Q3 − Q1 31.07 − 20 11.07
CQD = = = = 0.22
Q3 + Q1 31.07 + 20 51.07
fi |mi − x̄| 338.28

P
M D(x̄) = = = 6.04
56
P
fi
fi |mi − x̃| 339.2

P
M D(x̃) = = = 6.06
56
P
fi
M D(x̄) 6.04
CM D(x̄) = = = 0.24
x̄ 25.64
M D(x̃) 6.06
CM D(x̃) = = = 0.23
x̃ 26.1
58
4.3.4. Variance and Standard Deviation
Variance and standard deviation are the most superior and widely used measures of disper-
sions and both measure the average dispersion of the observations around the mean. The
variance of a data set is the sum of the squares of the deviation of each observation taken
from the mean divided by total number of observations in the data set. The positive square
root of variance is called standard deviation.
For a population containing N elements, the population standard deviation is denoted by the
Greek letter σ (sigma) and hence the population variance is denoted by σ 2 .
P rP
(xi −µ)2 (xi −µ)2
I For raw data: σ2 = N and σ = N
P rP
fi (mi −µ)2 fi (mi −µ)2
I For grouped data: σ2 = N and σ = N
For a sample of n elements, the sample variance and standard deviation denoted by s2 and
s, respectively, are calculated as using the formulae:
P rP
(xi −x̄)2 (xi −x̄)2
I For raw data: s2 = n−1 and s = n−1
P rP
fi (mi −x̄)2 fi (mi −x̄) 2
I For grouped data: s2 = P
fi −1
and s = P
fi −1
Examples
1. Consider a sample with data values of 10, 20, 12, 17, and 16. Compute the variance
and standard deviation.
Solution:
We are expected to compute the sample mean x̄ first since the sample variance is a
function the sample mean.
10 + 20 + 12 + 17 + 16 75
P
xi
x̄ = = = = 15
n 5 5
(xi − x̄)2
P
S =2
n−1
(10 − 15)2 + (20 − 15)2 + (12 − 15)2 + (17 − 15)2 + (16 − 15)2
=
5−1
64
= = 16
4
√
rP
(xi −x̄)2
Hence, s = n−1 = 16 = 4.
59
2. Calculate the variance and standard deviation for the following frequency distribution.
Class limits 10-14 15-19 20-24 25-29 30-34 35-38

Frequency 8 10 22 35 15 10
Solution:
The necessary calculation for calculating variance are as follows.
Class mi fi (mi − x̄) (mi − x̄)2 fi (mi − x̃)2

10.5-14.5 12.5 4 -13.14 172.6596 690.6384
14.5-18.5 16.5 7 -9.14 83.5396 584.7772
18.5-22.5 20.5 8 -5.14 26.4196 211.3568
22.5-26.5 24.5 10 -1.14 1.2996 12.9960
26.5-30.5 28.5 12 2.86 8.1796 98.1552
30.5-34.5 32.5 7 6.86 47.0596 329.4172
34.5-38.5 36.5 8 10.86 117.9396 943.5168
Total 56 338.28 2870.8576
fi (mi − x̄)2 2870.8576

P
s =
2
= = 52.19
fi − 1 55
P
√
Therefore s = 52.19 = 7.22.
The main objection of mean deviation, removal of the negative signs, is removed by
taking the square of the deviations from the mean. The first main demerit of variance
is that its unit is the square of the unit of measurement of the variable values. For
example, the sample variance of 2m, 6m and 4m is 4m2 . The interpretation is, on
average each value differs from the mean by 4m2 , which is completely wrong because
one thing the unit of measurement of variance is not the same as that of the data set.
The other disadvantage of variance is, the variation of the data is exaggerated because
the deviation of the each value from the mean is squared. For the given example, the
variation of the data is exaggerated from two to four since it is taking the square of the
deviations. Variance also gives more weight the extreme values as compared to those
which are near to the mean value.
Standard deviation is considered to be the best measure of dispersion because the unit
of measurement is the same as the data set and the exaggeration made by variance will
be eliminated by taking the square root of it. In simple words, it explains the average
amount of variation on either sides of the mean. If the standard deviation of the data is
60
small the values are concentrated near the mean and if it large the values are scattered
away from the mean.
Properties of Variance and Standard Deviation
1. If a constant is added (subtracted) to (from) each and every observation, the standard
deviation as well as the variance remains the same.
2. If each and every value is multiplied by a nonzero constant k, the standard deviation is
multiplied by k and the variance is multiplied by k 2 .
3. If there are k different groups having the same units of measurement with sample
means x̄1 , x̄2 , ..., x̄k , number of sample observations n1 , n2 , ..., nk and sample variances
s21 , s22 , ..., s2k respectively, then the variance of all the groups called the pooled variance
denoted by s2p is given by:
(n1 − 1)[s21 + (x̄1 − x̄c )2 ] + ... + (nk − 1)[s2k + (x̄k − x̄c )2 ]

s2p =
n1 + n2 + ... + nk − k
(ni − 1)[s2i + (x̄i − x̄c )2 ]

P
s2p =
ni − k
P
If x̄1 = x̄2 = ... = x̄k
(n1 − 1)s21 + (n2 − 1)s22 + ... + (nk − 1)s2k (ni − 1)s2i

P
s2p = =
n1 + n2 + ... + nk − k ni − k
P
Examples
1. The mean weight of 150 students is 60 kilograms. The mean weight of boys is 70 kg
with a standard deviation of 10 kg. For the girls, the mean weight is 55 kg and the
standard deviation 15 kg. Then,
(a) Find the number of boys and girls.
(b) Find the combined standard deviation.
2. A distribution consists of four parts characterized as follows. Find the mean and stan-
dard deviation of the distribution.
61
Part No of items Mean S.D.

1 50 61 8
2 100 70 9
3 120 50 10
4 30 83 11
3. The arithmetic mean and standard deviation of a series of 20 items were computed as
20 and 5 respectively. While calculating these, an item 13 was misread as 30. Find the
correct mean and standard deviation.
4. The following data are some of the particulars of the distribution of weights of boys and
girls in a class.
Boys Girls
Number 100 50
Mean 60 45
Variance 9 4
a) Find the mean and variance of the combined series.
b) If one of the values is misread as 60 instead of 40 what is the correct standard

deviation.
4.3.5. Coefficient of Variation
All absolute measures of dispersion have units. If two or more distributions differ in their
units of measurement, their variability cannot be compared by any of the absolute measure
of variation. Also, the size of the absolute measures of dispersion depends upon the size of
the values. That is if the size of the values is larger, the value of the absolute measures will
also be larger. Generally absolute measures of variation fail to be appropriate for comparing
two or more groups if:
~ The groups have different units of measurement.
~ The size of the data between the groups is not the same.
Coefficient of variation is a relative measure of standard deviation. It is the ratio of the

standard deviation to the mean and expressed as percent. Hence, it is a unitless measure of
variation and also takes into account the size of the means of the distributions.
62
I For population: cv = σ
µ × 100%
I For sample: cv = s
x̄ × 100%
The distribution having less cv is said to be less variable or more consistent or more
uniform. For field experiments, cv , is generally reported. If it is small, it indicates
more reliability of experimental findings.
Examples
1. Compare the variability of the following two sample data sets using standard deviation
and coefficient of variation.
A : 2 Meters, 4 Meters, 6 Meters
B : 1000 Liters, 800 Liters, 900 Liters
2. The average IQ of statistics students is 110 with standard deviation 5 and the average
IQ of mathematics students is 106 with standard deviation 4. Which class is less variable
in terms of IQ?
63
4.4. Exercises
1. Find the range, quartile deviation, mean deviation about the mean, mean deviation
about the median, mean deviation about the mode, variance, standard deviation and
coefficient of variation for the following distribution.
Class 2-4 4-6 6-8 8-10

Frequency 2 5 4 7
2. Explain the rationale for using n − 1 to compute the sample variance.
3. What is the purpose of coefficient variation?
4. Two persons participated in five shooting competition and were able to hit the target
correctly out of fifteen shots as given below.
Competitor A 6 12 12 10 7
Competitor B 12 15 7 7 4
Which competitor is more uniform in shooting performance?
64
5
Elementary Probability
5.1. What is Probability?
Probability is a numerical description of chance of occurrence of a given phenomena under

certain condition.
Probability theory plays a central role in statistics. After all, statistical analysis is applied to
a collection of data in order to discover something about the underlying events. These events
may be connected to one another. However, the individual choices involved are assumed to
be random. Alternatively, we may sample a population at random and make inferences about
the population as a whole from the sample by using statistical analysis. Therefore, a solid
understanding of probability theory - the study of random events - is necessary to understand
how the statistical analysis works and also to correctly interpret the results.
5.2. Concept of Set
In order to discuss the theory of probability, it is essential to be familiar with some ideas and
concepts of mathematical theory of set. A set is a collection of well-defined objects which is
denoted by capital letters like A, B, C, etc.
In describing which objects are contained in set A, two common methods are available. These
methods are:
1. Listing all objects of A. For example, A = {1, 2, 3, 4} describes the set consisting of the
positive integers 1, 2, 3 and 4.
2. Describing a set in words, for example, set A consists of all real numbers between 0 and
1, inclusive. It can be written as A = {x : 0 ≤ x ≤ 1}, that is, A is the set of all x0 s
65
where x is a real number between 0 and 1, inclusive.
If A = {a1 , a2 , ..., an }, then each object ai ; i = 1, 2, ..., n belonging to set A is called a member
or an element of set A, i.e., ai ∈ A. A set consisting all possible elements under consideration
is called a universal set (denoted by ∪). On the other hand, a set containing no element is
called an empty set (denoted by ∅ or {}).
If every element of set A is also an element of set B, A is said to be a subset of B and write
as A ⊂ B. Every set is a subset of itself, i.e., A ⊂ A. Empty set is a subset of every set. If
A ⊂ B and B ⊂ C, then A ⊂ C. If A ⊂ B and B ⊂ A, then A and B are said to be equal.
5.2.1. Set Operation
1. Union (Or): A set consisting all elements in A or B or both is called the union set
ofA and B, and write as A ∪ B. That is, A ∪ B = {x : x ∈ A, x ∈ B or x ∈ both}. The
setA ∪ B is also called the sum of A andB.
2. Intersection (And): A set consisting all elements in both A and B is called an

intersection set of A and B, and write as A ∩ B. This is, A ∩ B = {x : x ∈ A and
x ∈ B}. The intersection set of A and B is also called the the product of A and B.
3. Complement (Not): The complement of a set A, denoted by Ac , is a set consisting

all elements of ∪ that are not in A; i.e., Ac = {x : x ∈
/ A}.
4. Disjoint Set: Sets A and B are disjoint set if A ∩ B = ∅.
5. Relative Complement: The relative complement of B in A, denoted by A\B is a

set of all elements of A which are not in B. It is written as A\B = {x : x ∈ A and
/ B} = A ∩ B c .
x∈
Important Laws
• Commutative laws:
– A∪B =B∪A
– A∩B =B∩A
• Associative laws:
– A ∪ (B ∪ C) = (A ∪ B) ∪ C
66
– A ∩ (B ∩ C) = (A ∩ B) ∩ C
• Distributive laws:
– A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
– A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
• Identity laws:
– A ∪ A = A, A ∩ A = A
– A ∪ U = U, A ∩ U = A
– A ∪ ∅ = A, A ∩ ∅ = ∅
5.3. Definition and Some Basic Concepts
1. Experiment (ξ): is any statistical process that can be repeated several times and in
any trial of which the outcome is unpredictable.
B Tossing a coin only once, S = {Head (H), Tail (T)}
B Tossing a coin two times, S = {HH, HT, T H, T T }
B Rolling a die, S = {1, 2, 3, 4, 5, 6}
B Selecting an item from a production lot, S = {Defective, Non-defective}
B Introducing a new product, S = {Success, Failure}
2. Sample Space (S): is a set consisting all possible outcomes of a given experiment, ξ.
3. Event: is an outcome or a set of outcomes (having some common characteristics) of an

experiment.
Simple Event (Elementary Event): is an event consisting a single outcome. The

elementary events are the building blocks (or atoms) of a probability model. They are
the events that cannot be decomposed further into smaller sets of events.
Compound Event: is an event consisting two or more outcomes.
4. Independent Event: two or more events are independent if the occurrence of one
event has no effect on the probability of occurrence of the other.
67
5. Mutually Exclusive Events: two or more events are mutually exclusive, if they have
no outcome in common. They cannot occur together simultaneously.
6. Complementary Event: Two mutually exclusive events are complementary if there

are no common elements between themselves and both of them contain all possible
outcomes. To be complementary, first they should be mutually exclusive events.
5.4. Counting Rules
Counting techniques are mathematical models which are used to determine the number of
possible ways of arranging or ordering objects. They are used to find a solution to fix the size
of the sample space that is extremely large. To count possible outcomes of a sample space
or/and an event we use the following counting techniques.
Addition Rule: states that if a task can be done (accomplished) by any of the k procedures,
where ith procedures has ni alternatives, then the total number of ways of doing the
task is
k
n1 + n2 + ... + nk =
X
ni
i=1
Example: Suppose a lady wants to make journey from Harar to Dire Dawa. If she can
use either plane, bus, cycle, horse, and there are 3 flights, 4 buses, 2 cycles and 3 horses
available. In how many different ways can she make her journey?
Solution:
From the given problem nf = 3, nb = 4, nc = 2 and nh = 3. So she has
nf + nb + nc + nh = 3 + 4 + 2 + 3 = 12
different ways to make her trip from Harar to Dire Dawa.
Multiplication Rule: states that if a choice consists k steps where the first step can be
done in n1 ways, for each of which second can be done in n2 ways, ..., for each of those
k th steps can be done in nk ways. Then, the total number of distinct ways to accomplish
the task/choice is equal to
k
n1 × n2 × ... × nk =
Y
ni
i=1
68
Example 1: Suppose a cafeteria provides 5 kinds of cake which it serves with tea, coffee,
milk and coca cola. Then, in how many different ways can you order your breakfast of
cake with a drink?
Solution:
The work has two steps. First, we order a type of cake n1 = 5 and then we order kind
of drink through n2 = 4. Thus,one can have
n1 × n2 = 5 × 4 = 20
different ways to order his/her breakfast.
Example 2: There are 2 bus routes from city X to city Y and 3 train routes from city
Y to city Z. In how many ways can a person go from city X to city Z?
Solution:
n1 × n2 = 2 × 3 = 6
So the person can go from city X to city Z in 6 ways.
Permutation: is arrangement of objects with attention to order of appearance.
Rule 1: The number of permutations of n distinct objects taking all together is
n! = n × (n − 1) × (n − 2) × ... × (1)
By definition 1! = 0! = 1.
Example 1: In how many different ways can 3 persons sleep in a bed?
Solution:
n! = 3! = 3 × 2 × 1 = 6 ways.
Example 2: Suppose a photographer must arrange 4 persons in a row for a photograph.

In how many different ways can the arrangement be done?
Solution:
n! = 4! = 4 × 3 × 2 × 1 = 24 ways.
Rule 2: Given n distinct objects, the number of permutations of r objects taken from
n objects is denoted by nP r and given by
n!
nP r = ; r≤n
(n − r)!
69
Example 1: In how many ways can 10 people be seated on a bench if only 4 seats are
available?
Solution:
10! 10 × 9 × 8 × 7 × 6!
nP r = 10P 4 = = = 5040 ways.
(10 − 4)! 6!
Example 2: How many 5 letter permutations can be formed from the letters in the
word DISCOVER?
Solution:
8! 8 × 7 × 6 × 5 × 4 × 3!
nP r = 8P 5 = = = 6270
(8 − 5)! 3!
Rule 3: Given n objects in which n1 are alike, n2 are alike, ..., nr are alike is given by
n!
n1 ! × n2 ! × ... × nr !
Example: How many different permutations can be made from the letters in the word:
I STATISTICS
Solution:
n1 = n(s) = 3, n2 = n(t) = 3, n3 = n(a) = 1, n4 = n(i) = 2 and n5 = n(c) = 1.

Thus,
n! 10!
= = 50400
n1 ! × n2 ! × n3 ! × n4 ! × n5 ! 3! × 3! × 1! × 2! × 1!
I MISSISSIPPI
Solution:
n1 = n(m) = 1, n2 = n(i) = 4, n3 = n(s) = 4 and n4 = n(p) = 2. Thus,

n! 11!
= = 34650
n1 ! × n2 ! × n3 ! × n4 ! × n5 ! 1! × 4! × 4! × 2!
Combination: A set of n distinct objects considered without regard to the orders of ap-
pearance is called combination. For example, abc, bac, acb, cab, cba are six different
permutations but they are the same combination.
Rule 1: The number of ways of selecting r objects from n distinct objects is called
combination of r objects from n objects denoted by nCr or n
r and given by
!
n n!
nCr = = ; r≤n
r (n − r)! × r!
70
Example: In how many ways can student choose 3 books from a list of 12 different
books?
Solution:
12
! !
n n!
= =
r 3 (n − r)! × r!
12!
=
(12 − 3)! × 3!
12! 12 × 11 × 10 × 9!
= =
9! × 3! 9! × 3!
= 220
Rule 2: If the selection has k steps, by selecting r1 of n1 objects, r2 of n2 , ..., rk of nk

objects, then the total number of ways of doing this selection is equal to
! ! !
n1 n2 nk
× × ... ×
r1 r2 rk
Example: Out of 5 male workers and 7 female workers of some factory a committee
consisting 2 male and 3 female workers to be formed. In how many ways can this done
if
(a) all workers are eligible.
5 7
! ! ! !
n1 n2
× = × = 10 × 35 = 350
r1 r2 2 3
(b) one particular female must be a member.
5 6
! ! ! !
n1 n2
× = × = 10 × 15 = 150
r1 r2 2 2
(c) two particular male workers cannot be members for some reason.
3 7
! ! ! !
n1 n2
× = × = 3 × 35 = 105
r1 r2 2 3
The difference between permutation and combination is that in combination the order
of objects being selected (arranged) is not important, but order matters in permutation.
5.5. Approaches in Probability Definition
1. The Classical Approach (also called Mathematical Approach): Suppose there are
N possible outcomes in the sample space S of an experiment. Out of these N outcomes,
71
only n are favorable to the event E, then the probability that the event E will occur is:
N o of f avourable outcomes to E n(E) n

P (E) = = =
total no of outcomes n(S) N
Example 1: Consider an experiment of tossing a die. Then, what is the probability

that
(a) odd numbers occur.
Solution:
The sample space of the given experiment is S = {1, 2, 3, 4, 5, 6}. Further let A be
an event of getting odd numbers in rolling a die only once.
n(A) 3
P (A) = = = 0.5
n(S) 6
b) number 4 occurs.
Solution:
Let B be an event of getting number 4 in rolling a die only once.
n(B) 1
P (B) = = = 0.167
n(S) 6
(c) number 8 occurs.
Solution:
Let C be an event of getting number 8 in rolling a die only once.
n(C) 0
P (C) = = =0
n(S) 6
(d) numbers between 1 and 6 inclusive occur.
Solution:
Let D be an event of getting numbers between 1 and 6 inclusive occur.
n(D) 6
P (D) = = =1
n(S) 6
B Events with zero probability of occurrence are known as null or impossible events.
B Events with probability equal to unity are known as sure events.
72
Example 2: What is the probability of getting one head in tossing two coins?
Solution:
S = {HH, HT, T H, T T } and suppose E be the event getting one head in an experiment
of tossing two coins.
n(E) 2
P (E) = = = 0.5
n(S) 4
2. The Empirical Approach (also called Frequentist Approach): It is based on a

relative frequency. Given a frequency distribution, the probability of an event being in
a given class is
fi
P (E) = P
fi
The difference between classical and empirical probability is that the former uses sample
space to determine the numerical probability while the latter is based on frequency
distribution.
3. Subjective Approach: calculates probability based on an educated guess or experi-

ence or evaluation of a problem. For example a physician might say that on the basis
of his/her diagnosis, there is a 30% chance the patient will need an operation.
5.6. Some Probability Rules/Axioms
Let S be a sample space associated with a random experiment. Then with any event E, in
this sample space, we associate a real number called probability of E satisfying the following
properties (axioms).
I 0 ≤ P (E) ≤ 1
I P (S) = 1
I If A and B are mutually exclusive events, then
P (A or B) = P (A ∪ B) = P (A) + P (B)
I If A1 , A2 , ..., An are pairwise mutually exclusive events, then

n n
!
= P (Ai )
[ X
P Ai
i=n i=1
I P (A ∪ Ac ) = P (A) + P (Ac )
73
I P (φ) = 0
Using the above axioms, it can be shown that for any two events A and B,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Example 1: A box of 20 candles consists of 5 defective and 15 non-defective candles. If 4 of

these candles are selected at random, what is the probability that
(a) all will be defective.
Solution:
Let A be an event of all candles are defective.
5 15
n(A) ×
P (A) = = 4
20
0
= 0.001032
n(S) 4
(b) 3 will be non-defective.
Let B be an event of 3 candles are non-defective.

5 15
n(B) ×
P (B) = = 1
20
3
= 0.4696
n(S) 4
(c) all will be non-defective.
Let C be an event of all candles are non-defective.

5 15
n(C) ×
P (C) = = 0
20
4
= 0.2817
n(S) 4
Example 2: An urn contains 6 white, 4 red and 9 black balls. If 3 balls are drawn at random,
find the probability that
(a) two of the balls drawn are whites.
Let E1 be an event two of the balls drawn are whites.

6 13
n(E1 ) ×
P (E1 ) = = 2
19
1
= 0.2012
n(S) 3
(b) one is from each colour.
Let E2 be an event of one from each colour.

6 4 9
n(E2 ) × 1 × 1
P (E2 ) = = 1
19 = 0.2229
n(S) 3
74
(c) none is red.
Let E3 be an event of none is red.

15 4
n(E3 ) ×
P (E3 ) = = 3
19
0
= 0.4695
n(S) 3
(d) at least one is white.
Let E4 be an event of at least one is white.

6 13 6 13 6 13
n(E4 ) × × ×
P (E4 ) = = 1
19
2
+ 2
19
1
+ 3
19
0
= 0.7048
n(S) 3 3 3
5.7. Conditional Probability
When the outcome or occurrence of an event affects the outcome or occurrence of another
event, the two events are said to be dependent (conditional). If two events, A and B, are
dependent to each other, the probability of event A occurring knowing that event B has
already occurred is said to be the conditional probability of A given that event B has already
occurred,
P (A ∩ B)
P (A/B) = ; P (B) 6= 0
P (B)
The probability of event B occurring knowing that event A has already occurred is said to be
the conditional probability of B given that event A has already occurred,
P (A ∩ B)
P (B/A) = ; P (A) 6= 0
P (A)
Remarks
(i) 0 ≤ P (A/B) ≤ 1
(ii) P (S/B) = 1
(iii) For mutually exclusive events A1 and A2 ,
P (A1 ∪ A2 /B) = P (A1 /B) + P (A2 /B)
(iv) For pairwise mutually exclusive events A1 , A2 , ..., An

n n
!
= P (Ai /B)
[ X
P Ai /B
i=n i=1
75
Example: If the probability that a research project will be well planned is 0.6, and the
probability that it will be well planned and well executed is 0.54. Then, what is the probability
that it will be
(a) well executed given that it is well planned.
Solution:
Let D and E be an events of the research project is well planned and well executed
respectively. Then P (D) = 0.6 and P (D ∩ E) = 0.54.
P (D ∩ E) 0.54
P (E/D) = = = 0.9
P (D) 0.6
(b) will not be well executed given that it is well planned.
Solution:
P (D ∩ E c ) P (D) − P (D ∩ E) P (D ∩ E)
P (E c /D) = = =1−
P (D) P (D) P (D)
P (D ∩ E c )
P (E c /D) = = 1 − P (E/D) = 1 − 0.9 = 0.1
P (D)
5.8. Independence
Recall mutually exclusive events A and B, A ∩ B = φ, which implies that P (A ∩ B) = 0.

P (A ∩ B)
P (A/B) = =0
P (B)
If B occurs A will never occur at the same time. That means, they are dependent. Again
recall that if A ⊂ B
P (A ∩ B) P (A)
P (B/A) = = ≤1
P (A) P (A)
Definition: Two events, A and B are said to be statistically independent if
P (A ∩ B) = P (A) × P (B)
Example: Consider an experiment of tossing two dice. Then, let
A - the first die show an even number.

B - the second die show an odd number.
C - both dice show even number.
76
Thus check whether A and B, A and C, B and C are independent events.
Solution:
Use the following sample space, S.
→ 1 2 3 4 5 6
1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
n(A) 18 n(A ∩ B) 9
P (A) = = , P (A ∩ B) = =
S 36 S 36
n(B) 18 n(A ∩ C) 9
P (B) = = , P (A ∩ C) = =
S 36 S 36
n(C) 9 n(B ∩ C) 0
P (C) = = , P (B ∩ C) = =
S 36 S 36
P (A ∩ B) = P (A) × P (B)
9 18 18
= ×
36 36 36
P (A ∩ C) 6= P (A) × P (C)
9 18 9
6= ×
36 36 36
P (B ∩ C) 6= P (B) × P (C)
0 18 9
6= ×
36 36 36
Therefore, based on the above results A and B are statistically independent events. However,
events A and C and B and C are not statistically independent.
If A and B are independent, then the following holds true.
(i) P (A/B) = P (A)
(ii) P (B/A) = P (B)
(iii) Ac and B c are independent.
(iv) Ac and B, B c and A are independent.
77
5.9. Exercises
1. Provide definitions for each of these terms:
(a) Random experiment
(b) Events
(c) Mutually exclusive events
(d) Equally likely events
(e) Independent events
2. A package contains 12 resistors, 3 of which are defective. If 3 are selected, find the
probability of getting
(a) no defective resistor.
(b) one defective resistor.
(c) all defective resistors.
3. Let A and B be two events associated with an experiment and suppose that P (A) = 0.4
while P (A ∪ B) = 0.7. Let P (B) = p. For what choice of p
(a) A and B are mutually exclusive?
(b) A and B are independent?
4. The personnel department of a company has records which show the following analysis
of its 200 accountants.
Age Bachelor’s Master’s
Under 30 90 10
30 to 40 20 30
Over 40 40 10
If one accountant is selected at random from the company, find the probability that
(a) he has only a bachelor’s degree.
(b) he has mater’s degree given that he is over 40.
(c) he is under 30 given that he has a bachelor’s degree.
78
6
Probability Distributions
Consider the following illustrations.
Example: Consider the experiment, ξ, of tossing a coin twice.
S = {HH, HT, T H, T T }
Let X be number of heads. Thus, another sample space with respect to X (also called the
range space of X) is
Rx = {0, 1, 2}
Definition: A function X which assigns a real numbers to all possible values of a sample
space is called a random variable. A random variable is a variable that has a single numerical
value (determined by chance) for each outcome of a procedure.
6.1. Type of Random Variables
A random variable can be classified as being either discrete or continuous depending on the
numerical values it assumes.
A discrete random variable has either a finite number of values or a countable number of
values; that is, they result from counting process. The possible value of X may be x1 , x2 , ..., xn .
For any discrete random variable X the following will be true.
(i) 0 ≤ p(xi ) ≤ 1
P∞
(ii) i=1 p(xi ) = 1 for finite and i=1 p(xi ) = 1 for countably infinite.
Pn
p(xi ) is called probability function or point probability function or mass function. The col-
lection of pairs (xi , p(xi )) is called probability distribution. It gives the probability for each
value or range of values of the random variable.
79
Example 1: Construct a probability distribution for getting heads in an experiment of tossing

a coin two times.
Solution:
S = {HH, HT, T H, T T }. Let X be a random variable of getting head in tossing a coin two
times. Then Rx = {0, 1, 2}.
1 2 1
P (X = 0) = P (T T ) = , P (X = 1) = P (HT, T H) = , P (X = 2) = P (HH) =
4 4 4
Hence the probability distribution for X is given by:
X 0 1 2
P (X = xi ) 1
4
2
4
1
4
Example 2: The probability distribution of a discrete random variable Y is given by
P (Y = y) = cy 2 , y = 0, 1, 2, 3, 4
Then find the value of c.
Solution:
First we should have to compute the point probabilities find the value of c.
P (Y = 0) = 0, P (Y = 1) = c, P (Y = 2) = 4c, P (Y = 3) = 9c, P (Y = 4) = 16c
4
P (Y = yi ) = 1
X
i=0
0 + c + 4c + 9c + 16c = 1
1
c=
30
A continuous random variable has infinitely many values, and those values can be as-
sociated with measurements on a continuous scale in such a way that there are no gaps or
interruptions. That means, if it assumes all possible values in the interval (a, b), where a, b ∈ <
and there exist a function called probability density function (pdf) satisfying the following
conditions.
(i) f (x) ≥ 0, ∀x
R∞
(ii) −∞ f (x)dx =1
80
For any two real numbers a and b such that −∞ < a < b < ∞ then
Z b
P (a < X < b) = f (x)dx
a
If X is a continuous random variable, then:

Ra
I P (X = a) = P (a ≤ X ≤ a) = a f (x)dx = 0
Rb
I P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b) = a f (x)dx
Example: Let X be a continuous random variable and its pdf is given by:

2x,

for 0 < x < 1
f (x) =
0,

otherwise
(a) Verify whether f(x) is a pdf or not.
. f (x) = 2x ≥ 0 ∀x
R1 R1
. 0 f (x)dx = 0 2x = 1
(b) Find P (0.5 < X < 0.75).
" #0.75
x2
Z 0.75 Z 0.75
P (0.5 < X < 0.75) = 2xdx = 2 xdx = 2 = 0.315
0.5 0.5 2 0.5
6.2. Introduction to Expectation of Random Variables
6.2.1. Expectation of Random Variables and Its Properties
Definition: If X is discrete random variable with possible values of x1 , x2 , ..., xn having the
probabilities of p(x1 ), p(x2 ), ..., p(xn ), then the mean value of X denoted by E(X) or µ is
defined as:
∞
E(X) = µ = xi p(xi )
X
i=1
if the series converges.
Definition: If X is continuous random variable with pdf of f(x), its mean is given by
Z ∞
E(X) = µ = xf (x)dx
−∞
81
Properties of Expectation
Assume that the expected value of a random variable exists.
1. For any constant “a” we have
E(aX) = aE(X) = aµ
2. If X = a, then E(X) = a.
6.2.2. Variance of Random Variables and Its Properties
Definition: Let X be a random variable. Then variance of X denoted by var(X) or σx2 is

defined as
var(X) = σx2 = E[X − E(X)]2 = E[X − µ]2 = E(X 2 ) − µ2
Thus, the standard deviation of X is given by σx = σx2 .

p
Properties of Variance
1. If “a” is constant, then
B var(X + a) = var(X)
B var(aX) = a2 var(X)
Example 1: A coin is tossed two times. Let X be the number of heads. Find the mean
value and the standard deviation of X.
Solution:
We already constructed a probability distribution for number of heads in previous example.
X 0 1 2
P (X = xi ) 1
4
2
4
1
4
2
1 2 1
E(X) = µ = xi p(xi ) = 0 × +1× +2× =1
X
i=0
4 4 4
To get the standard deviation of X, first we compute σx2 .
1 2 1 6
E(X 2 ) = 02 × + 12 × + 2× = = 1.5
4 4 4 4
82
σx2 = E[X − E(X)]2 = E[X − µ]2 = E(X 2 ) − µ2
σx2 = E(X 2 ) − µ2 = 1.5 − 12 = 0.5

q √
σx = σx2 = 0.5 = 0.707
Example 2: Suppose that X is a continuous random variable with pdf of


1 + x,

−1 ≤ x < 0
f (x) =
1 − x,

0≤x≤1
then find the mean value and variance of X.
Solution:
Z ∞ Z 0 Z 1
E(X) = xf (x)dx = x(1 + x)dx + x(1 − x)dx
−∞ −1 0
#0 #1
1 1
" "
x2 x3 x2 x3
E(X) = + + − =− + =0
2 3 −1
2 3 0
6 6
Z 0 Z 1
E(X 2 ) = x2 (1 + x)dx + x2 (1 − x)dx
−1 0
#0 #1
1 1
" "
x3 x4 x3 x4
E(X 2 ) = + + − = + = 0.167
3 4 −1
3 4 0
12 12
σx2 = E[X − E(X)]2 = E[X − µ]2 = E(X 2 ) − µ2
σx2 = E(X 2 ) − µ2 = 0.167 − 02 = 0.167
6.3. Common Discrete Probability Distributions
6.3.1. Binomial Probability Distribution
The binomial probability distribution is a discrete probability distribution that provides many
applications. It is associated with a multiple-step experiment that we call the binomial
experiment. A binomial experiment exhibits the following four properties.
1. The procedure has a fixed number of trials.
2. The trials are independent. (The outcome of any individual trial does not affect the
probabilities in the other trials.)
83
3. The outcome of each trial must be classifiable into one of two possible categories (success
or failure).
4. The probability of a success, denoted by p, does not change from trial to trial.
If a procedure satisfies these four requirements, the distribution of the random variable (X) is
called a binomial probability distribution (or binomial distribution). To calculate probabilities
we use the following formula.
!
n x n−x
P (X = x) = p q f or x = 0, 1, 2, ..., n
x
where
x = the number of successes

p = the probability of a success on one trial
q = the probability of failure on one trial (q = 1 − p)
n = the number of trials
p(x) = the probability of x successes in n trials.
Expected value and variance of binomially distributed random variable [X ∼ Bin(n, p)] can
be obtained using the following.
E(X) = µ = np
var(X) = σ 2 = np(1 − p) = npq
q √
SD(X) = σ = np(1 − p) = npq
Example: A university found that 20% of its students withdraw without completing the
introductory statistics course. Assume that 20 students registered for the course. Compute
the probability
(a) exactly four will withdraw.
Let X be number of students who will withdraw without completing the introductory
statistics course. From the given problem p = 0.2 = 20%, n = 20 and X ∼ Bin(20, 0.2).
20 20!
!
P (X = 4) = 0.24 0.816 = 0.24 0.816 = 0.2182
4 4!(20 − 4)!
84
(b) at most two will withdraw.

2
P (X ≤ 2) = P (X = 0) + P (X = 1) + P (X = 2) = P (X = xi )
X
i=0
20 20 20
! ! !
= 0.20 0.820 + 0.21 0.819 + 0.22 0.818
0 1 2
20! 20! 20!
= 0.20 0.820 + 0.21 0.819 + 0.22 0.818
0!(20 − 0)! 1!(20 − 1)! 2!(20 − 2)!
= 0.2061
(c) more than three will withdraw.

20
P (X > 3) = P (X = 4) + P (X = 5) + . . . + P (X = 20) = P (X = xi )
X
i=3
= P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
20!
= 0.2061 + 0.23 0.817
3!(20 − 3)!
= 0.2054
(d) the expected and standard deviation of withdrawals.
E(X) = np = 20 × 0.2 = 4
var(X) = σ 2 = np(1 − p) = npq = 20 × 0.2 × 0.8 = 3.2

q √ √
SD(X) = np(1 − p) = npq = 20 × 0.2 × 0.8 = 1.788
6.3.2. Poisson Distribution
In this section we consider a discrete random variable that is often useful in estimating the
number of occurrences over a specified interval of time or space. For example, the random
variable of interest might be the number of arrivals at a car wash in one hour, the number
of repairs needed in 10 miles of highway, or the number of leaks in 100 miles of pipeline. If
the following two properties are satisfied, the number of occurrences is a random variable
described by the Poisson probability distribution.
Properties of a Poisson Experiment
1. The probability of an occurrence is the same for any two intervals of equal length.
85
2. The occurrence or nonoccurrence in any interval is independent of the occurrence or

nonoccurrence in any other interval.
The Poisson probability function is defined by the following equation.

e−λ λx
P (X = x) =
x!
where
p(x) = the probability of x occurrences in an interval

λ = expected value or mean number of occurrences in an interval.
For the Poisson probability distribution, X is a discrete random variable indicating the num-
ber of occurrences in the interval. Since there is no stated upper limit for the number of
occurrences, the probability function p(x) is applicable for values x = 0, 1, 2, ... without limit.
In practical applications, x will eventually become large enough so that p(x) is approximately
zero and the probability of any larger values of x becomes negligible.
A property of the Poisson distribution is that the mean and variance are equal. That is,
E(X) = var(X) = λ
Example: A student finds that the average number of amoeba in 10 ml of pond water is 4.
Find the probability that in 10 ml of water from that pond there are
(a) exactly 5 amoeba.
Let Y be the number of amoeba found in 10 ml pond water. From the given question
λ = 4 which implies that Y ∼ P oisson(λ).
e−4 45
P (X = 5) = = 0.156
5!
(b) no amoeba.
e−4 40
P (X = 0) = = e−4 = 0.0183
0!
(c) fewer than 3 amoeba.

2
P (X < 3) = P (X = 0) + P (X = 1) + P (X = 2) = P (X = xi )
X
i=0
e−4 40 e−4 41 e−4 42
= + +
0! 1! 2!
= e−4 + 4e−4 + 8e−4
= 0.238
86
Exercise 1: If X ∼ P oisson(λ) with standard deviation of 2, then find P (X = 3).
Exercise 2: If X ∼ P oisson(λ) and P (X = 1) = P (X = 3), then find λ.
6.4. Common Continuous Distributions
6.4.1. Normal Distribution
The most important probability distribution for describing a continuous random variable is
the normal probability distribution. The normal distribution has been used in a wide variety
of practical applications in which the random variables are heights and weights of people,
test scores, scientific measurements, amounts of rainfall, and other similar values. It is also
widely used in statistical inference. In such applications, the normal distribution provides a
description of the likely results obtained through sampling.
Normal Curve
The form or shape of the normal distribution is illustrated by the bell-shaped normal curve
in the following figure. The probability density function (pdf) that defines the bell-shaped
curve of the normal distribution follows.
If a random variable X ∼ N (µ, σ 2 ) its probability density function (pdf) is given by:
1 2 2
f (x) = √ e−(x−µ) /2σ −∞<x<∞
2πσ
where µ = mean, σ = standard deviation.
The normal curve has two parameters, µ and σ. They determine the location and shape of
the normal distribution.
87
Properties of Normal Distribution
1. The entire family of normal distributions is differentiated by two parameters: the mean
µ and the standard deviation σ.
2. The highest point on the normal curve is at the mean, which is also the median and
mode of the distribution.
3. The mean of the distribution can be any numerical value: negative, zero, or positive.
Three normal distributions with the same standard deviation but three different means
(-10, 0, and 20) are shown here.
4. The normal distribution is symmetric, with the shape of the normal curve to the left
of the mean a mirror image of the shape of the normal curve to the right of the mean.
The tails of the normal curve extend to infinity in both directions and theoretically
never touch the horizontal axis. Because it is symmetric, the normal distribution is not
skewed; its skewness measure is zero.
5. The standard deviation determines how flat and wide the normal curve is. Larger
values of the standard deviation result in wider, flatter curves showing more variability
in the data. Two normal distributions with the same mean but with different standard
deviations are shown here.
88
6. Probabilities for the normal random variable are given by areas under the normal curve.
The total area under the curve for the normal distribution is 1. Because the distribution
is symmetric, the area under the curve to the left of the mean is 0.50 and the area under
the curve to the right of the mean is 0.50.
7. The percentage of values in some commonly used intervals are
a) 68.3% of the values of a normal random variable are within plus or minus one
standard deviation of its mean.
b) 95.4% of the values of a normal random variable are within plus or minus two
standard deviations of its mean.
c) 99.7% of the values of a normal random variable are within plus or minus three
standard deviations of its mean.
Standard Normal Probability Distribution
A random variable that has a normal distribution with a mean of zero and a standard deviation
of one is said to have a standard normal probability distribution. The letter z is commonly
used to designate this particular normal random variable, that is z ∼ N (0, 1). The reason for
discussing the standard normal distribution so extensively is that probabilities for all normal
distributions are computed by using the standard normal distribution. That is, when we have
a normal distribution with any mean µ and any standard deviation σ, we answer probability
questions about the distribution by first converting to the standard normal distribution. Then
we can use the standard normal probability table and the appropriate z values to find the
89
desired probabilities. Thus, we can convert using the following formula.
x−µ
z=
σ
Consequently, the standard normal density is given by:
1 2
f (z) = √ exp−z /2 −∞ < z < ∞
2π
which is graphically shown below.
Example 1: Given that z is a standard normal random variable, compute the following
probabilities.
a) P (0 ≤ z ≤ 2.5) = 0.4938
b) P (z ≥ 1) = P (z > 0) − P (0 < z < 1) = 0.5 − 0.3413 = 0.1587
c) P (z ≤ 1) = P (z < 0) + P (0 < z < 1) = 0.5 + 0.3413 = 0.8413
d) P (1 ≤ z ≤ 1.5) = P (0 < z ≤ 1.5) − P (0 < z ≤ 1) = 0.4332 − 0.3413 = 0.0919
e) P (−1 < z < 1.5)
P (−1 < z < 1.5) = P (−1 < z < 0) + P (0 < z < 1.5)
= P (0 < z < 1.5) = P (0 < z < 1) + P (0 < z < 1.5)
= 0.3413 + 0.4332 = 0.7745
= 0.7745
Example 2: The college boards, which are administered each year to many thousands of
high school students, are scored so as to yield a mean of 500 and a standard deviation of 100.
These scores are close to being normally distributed. What percentage of the scores can be
expected to satisfy each condition?
90
a) Greater than 600.
Let X be the score of students with mean µ = 500, σ = 100 and X ∼ N (500, 100).
X −µ 600 − µ

P (X > 600) = P >
σ σ
600 − 500

=P z>
100
= P [z > 1]
= 0.1587
b) Less than 450.
X −µ 450 − µ

P (X < 450) = P <
σ σ
450 − 500

=P z<
100
= P [z < −0.5] = P [z < 0] − P [−0.5 < z < 0]
= P [z < 0] − P [0 < z < 0.5]
= 0.5 − 0.1915
= 0.3085
c) Between 450 and 600.
450 − µ X −µ 600 − µ

P (450 < X < 600) = P < <
σ σ σ
450 − 500 600 − 100

=P <z<
100 100
= P [−0.5 < z < 1]
= P [−0.5 < z < 0] + P [0 < z < 1]
= P [0 < z < 0.5] + P [0 < z < 1]
= 0.1915 + 0.3413
= 0.5328
91
6.5. Exercises
1. State the conditions (assumptions) under which random variable can have a binomial
distribution.
2. The probability that a freshman entering Haramaya University will survive first semester
is 0.92. Assuming this pattern remain unchanged over the subsequent years, what is
the probability that among 100 randomly selected freshmen in first semester,
a) None will survive?
b) Exactly 97 will survive?
c) At least three will survive?
3. A normal distribution has mean µ = 62.4. Find its standard deviation if 20% of the
area under the curve lies to the right of 79.2.
4. Show that 65.24% of the observations in a normally distributed population lie between
µ − 1.1σ and µ + 0.8σ.
5. If a set of marks on a statistics examination are approximately normally distributed

with a mean of 74 and a standard deviation of 7.9, then find
(a) the lowest mark if the lowest 10% of the students are given F’s.
(b) the lowest mark to get grade A if the top 5% of the students are given A’s.
(c) the lowest mark to get grade B if the top 10% of the students are given A’s and
the next 25% are given B’s.
92
7
One Sample Statistical Inference
One of the principal objectives of statistical analysis is to draw inference about the population
on the basis of data collected by sampling from population. In other words, one is required to
draw inference (or to generalize) about the population from the sample data. The inference to
be drawn relates to some parameters of the population, such as the mean, standard deviation
or some other feature like the proportion of an attribute occurring in the population. The
two most important types of problems of inference in statistics are:
Estimation of parameter or parameters and
Testing of statistical hypothesis or hypotheses
In the absence of the complete data or information about the population, it would not be
possible to determine the exact or true value of a parameter. It would be worthwhile to obtain
from the sample data an estimate of the unknown true or exact value of the parameter or an
interval of values in which the parameter lies, and also determine a procedure for determining
the accuracy of the estimate. This type of inference is known as estimation of parameters.
There are two types of estimation.
1. Point estimation: here the objective is to find a single value for the unknown param-
eter.
2. Interval estimation: here the objective is to find an interval or range of plausible

values in which the unknown parameters lies.
7.1. Point Estimation
Suppose that we are concerned with the estimation of a parameter of a population from a
given sample of the population. The procedure of point estimation consists of determining a
93
single quantity from the sample values given such that the single number of fairly close to the
unknown value of the parameter of the population. Suppose that the sample (of size n) drawn
from the population is denoted by x1 , x2 , ..., xn , and that the unknown parameter is denoted
by θ. The point estimation of θ will be based on the sample observations x1 , x2 , ..., xn . It
will be a function of the sample observations x1 , x2 , ..., xn , that is, a statistic. The statistic
to be used for point estimation of θ is called a point estimator and is denoted by θ̂. When
an actual set of sample values is given, we can compute a numerical value, which is called
a point estimate of θ̂. The estimator θ̂ of the parameter θ is a function of the sample
observations x1 , x2 , ..., xn , and will assume different values corresponding to different sets
of sample observations x1 , x2 , ..., xn . For a given set of sample observations, we get point
estimate of θ; this is one of the possible values of θ̂.
The point estimator of µ can assume an infinite number of values corresponding to the infinite
set of (the numerical) sample values that x1 , x2 , ..., xn take. From one given set of sample
values, that is, a particular set of numerical values one can compute one particular value of the
P
xi
estimator µ and this value is a point estimate of µ. Besides the mean x̄ = x1 +x2 +...+xn
n = n ,
there may be other types of estimator of µ, based on some other function of the same set of
sample observations x1 , x2 , ..., xn ; in fact the sample median is also an estimator of µ. The
question then arises: which of the sample estimators is to be preferred and why. This raises
another question: what should be the basis of selecting an estimator, or in other words what
should be the criteria of a good estimator. Without going into details, we would like to state
below four desirable properties that an estimator should possess.
1. Unbiasedness: an estimator θ̂ of θ is said to be unbiased if E(θ̂) = θ, i.e. its mean equal

to the parameter value, otherwise it is said to be biased. The property of unbiasedness
ensures that there will not be overestimation or underestimation.
2. Minimum Variance (Efficiency): an estimator θ̂ is a function of the sample observa-

tions x1 , x2 , ..., xn . An estimator with a smaller variance will have greater concentration
near the parameter to be estimated. It will therefore be appropriate to select the esti-
mator with the smallest variance. This would ensure greater accuracy.
3. Consistency: It refers to the effect of sample size on the accuracy of the estimator. A
statistic is said to be consistent estimator of the population parameter if it approaches
the parameter as the sample size increases, i.e. θ̂ → θ as n → N .
4. Sufficiency: An estimator is said to be sufficient if it uses all the information about the
94
population parameter contained in the sample. For example, the statistic mean uses all
the sample values in its computation while median and mode do not. Hence, the mean
is a better estimator in this sense.
7.2. Interval Estimation
In previous section, we stated that a point estimator is a sample statistic used to estimate a
population parameter. For instance, the sample mean x̄ is a point estimator of the population
mean µ and the sample proportion p̂ is a point estimator of the population proportion p.
Because a point estimator cannot be expected to provide the exact value of the population
parameter, an interval estimate is often computed by adding and subtracting a value, called
the margin of error, to the point estimate. The general form of an interval estimate is as
follows:
point estimate ± margin of error
Thus, a confidence interval (or interval estimate) is a range (or an interval) of values that is
likely to contain the true value of the population parameter. A confidence interval associated
with a degree of confidence, which is a measure of how certain we are that our interval contains
the population parameter.
The degree of confidence is the probability 1 − α (often expressed as the equivalent percentage
value) that the confidence interval contains the true value of population parameter. The
degree of confidence is also called the level of confidence or the confidence coefficient.
The purpose of an interval estimate is to provide information about how close the point
estimate, provided by the sample, is to the value of the population parameter. Thus, the
general form of an interval estimate of a population mean is
x̄ ± margin of error
7.2.1. Interval Estimation for the Population Mean
1. Consider when σ is known

σ σ σ

x̄ − zα/2 √ , x̄ + zα/2 √ = x̄ ± zα/2 √
n n n
where 1 − α is the confidence coefficient and zα/2 is the z value providing an area of
α/2 in the upper tail of the standard normal probability distribution.
95
2. Consider when σ is unknown
When developing an interval estimate of a population mean we usually do not have a

good estimate of the population standard deviation either. In this case, we should have
to estimate from sample values. When s is used to estimate σ, the margin of error and
the interval estimate for the population mean are based on a probability distribution
known as the t distribution. Thus, the interval estimate of µ becomes
s
x̄ ± tα/2 (n − 1) √
n
where s is the sample standard deviation, 1 − α is the confidence coefficient, and tα/2
is the t value providing an area of α/2 in the upper tail of the t distribution with n − 1
degrees of freedom. The reason the number of degrees of freedom associated with the t
value in above expression is n − 1 concerns the use of s as an estimate of the population
standard deviation σ. The expression for the sample standard deviation is
sP
(xi − x̄)2
s=
n−1
7.2.2. Interpretation of the Confidence Interval
1. If all possible samples of size n were drawn, then on an average 100(1 − α)% of these
samples would include the population mean within the interval around there sample
means bounded by L and U .
2. If we took a random sample of size n from a given population, the probability is 1 − α

that the population mean would lie between the interval L and U around the sample
mean.
3. If a random sample of size n was taken from the population, we can be 100(1 − α)%
confident in our assertion that the population mean would lie around the sample mean
in the interval bounded by the values L and U .
Example 1: Haramaya University wishes to estimate the average age of students who gradu-
ate with B.Sc. degree. A random sample of 625 graduating students showed that the average
age was 24 with a standard deviation of 5 years. Construct the 95% confidence interval for
the true average age of all such graduating students at the University and interpret it.
96
Solution:
Let µ is the true average age of all graduating students with B.Sc. degree from the University.
From the sample data we have n = 625, x̄ = 24 and s = 5.
100 × (1 − α)% = 95%
α = 0.05, α/2 = 0.025, zα/2 = z0.025 = 1.96
Thus 100 × (1 − α)% confidence interval for µ is
s s s

x̄ − zα/2 √ , x̄ + zα/2 √ = x̄ ± zα/2 √
n n n
5 5

24 − 1.96 × √ , 24 + 1.96 × √
625 625
[24 − 0.392, 24 + 0.392] = [23.608, 24.392]
Interpretation:
On average the true average age of all graduating students with B.Sc. degree from the
University is found between 23.608 and 24.392 at 95% confidence level.
Example 2: An airline wants to evaluate the depth perception of its pilots over the age of
50. A random sample of 14 airline pilots over the age of 50 are asked to judge the distance
between two markers placed 20 feet apart at the opposite end of the laboratory. The sample
data listed here are the pilots’ error (recorded in feet) in judging the distance.
2.7 2.4 1.9 2.6 2.4 1.9 2.3

2.2 2.5 2.3 1.8 2.5 2.0 2.2
Use the sample data to place a 95% confidence interval on µ, the average error in depth
perception for the company’s pilots over the age of 50.
Solution:
Since the sample size small, it is appropriate to construct the confidence interval based on
the t distribution. Let y be the average error in depth perception for the company’s pilots
over the age of 50. We can verify that ȳ = 2.26 and s = 0.28.
Referring to t table, the t-value corresponding to a α = 0.025 and n − 1 = 13 is 2.160. Hence,

the 95% confidence interval for µ is
s s s

ȳ − tα/2 (n − 1) √ , ȳ + tα/2 (n − 1) √ = ȳ ± tα/2 (n − 1) √
n n n
97
0.28 0.28

2.26 − 2.16 × √ , 2.26 + 2.16 × √ = [2.10, 2.42]
14 14
Interpretation:
Therefore, we are 95% confident that the average error in the pilots’ judgment of the distance
is between 2.10 and 2.42 feet.
7.3. Hypothesis Testing
7.3.1. Introduction
A statistical hypothesis is a conjecture (an assumption) about a population parameter which

may or may not be true. Hypothesis testing is a statistical procedure which leads to take
a decision about such an assumption for the population parameter being correct or not, by
using data obtained from the sample. In hypothesis testing, the researcher must define the
population under study, state the particular hypothesis that will be checked, give the signif-
icance level, select sample from the population, perform calculations required for statistical
test and reach conclusion. It is already expressed that a statistical hypothesis may or may
not true. For each situation, there two types of statistical hypotheses.
1. Null Hypothesis (H0 ): is a statistical hypothesis that states there is no difference

between a parameter and a specific value or hypothesized value. The statement is not
rejected unless there is convincing sample evidence that it is false. Often represents the
status quo situation or an existing belief.
2. Alternative Hypothesis (Ha or H1 ): is a statistical hypothesis that states there exists

a difference between a parameter and a specific value or hypothesized value. It is the
assertion of all situations not covered by the null hypothesis.
7.3.2. Errors in Hypothesis Testing
There are two types of error in hypothesis testing.
1. Type-I Error: is a mistake occurred if one rejects the null hypothesis which is actually
true.
2. Type-II Error: is a mistake occurred if one failed to reject the null hypothesis which
is actually false.
98
There are four possible outcomes of any hypothesis test, two of which are correct and two of
which are incorrect. The incorrect ones are called type I and type II.
State of Nature
Decision H0 True H0 False
Do not reject H0 Correct decision (1 − α) Type II error (β)
Reject H0 Type I error (α) Correct decision (1 − β)
7.3.3. Hypothesis Testing About the Population Mean
1. State (formulate) the null and alternative hypotheses. The hypotheses may be either of
the following.
B H0 : µ = µ0 versus Ha : µ 6= µ0 - Two tailed test
B H0 : µ = µ0 versus Ha : µ > µ0 - Right tailed test
B H0 : µ = µ0 versus Ha : µ < µ0 - Left tailed test
2. Choose the level of significance α, the probability of making a Type I Error if H0 is

true.
3. Calculate the appropriate test statistic. The following is a general formula for a test
statistic that will be applicable in many of hypothesis tests.
statistic − hypthesized parameter

test statistic =
standard error of statistic
x̄−µ
B Use t statistic if n is small, t = √0
s/ n
∼ tα (n − 1).
x̄−µ
B Use z statistic if n is large enough, z = √0
σ/ n
∼ N (0, 1).
4. Obtain the tabulated (critical) value. For two tailed test the critical value is zα/2 (tα/2 ),
for right tailed zα (tα ) and for left tailed −zα (−tα ) respectively.
5. Define the critical (rejection) region. If the value of the test statistic falls in the critical
region (rejection region), reject the null hypothesis; otherwise do not reject it.
99
6. State the conclusion.
Examples
1. A professor wants to know if the introductory statistics class has a good grasp of basic
math. Six students are chosen at random from the class and given a math proficiency
test. The professor wants the class to be able to score at least 70 on the test. The six
students get scores of 62, 92, 75, 68, 83, and 95. Can the professor certain that the
mean score for the class on the test would be at least 70 at 0.05?
Solution:
First, compute the sample mean and standard deviation.
sP
(xi − x̄)
P
xi
x̄ = = 79.17, s= = 13.17
n n−1
X Formulate the null and alternative hypotheses.
H0 : µ = 70
Ha : µ > 70
X Specify the level of significance (α = 0.05).
X Compute appropriate test statistics. Since the sample size is small t is appropriate
in this case.
x̄ − µ0 79.17 − 70
t= √ = √ = 1.71
s/ n 13.17/ 6
X Obtain the tabulated value from the t-table which is tα (n − 1).
tα (n − 1) = t0.05 (6 − 1) = t0.05 (5) = 2.015
100
X Here we define the rejection region. Reject H0 if t > 2.015 otherwise do not reject
H0 . Since the computed t value 1.71 is not greater than critical value 2.015, we
fail to reject the null hypothesis.
X Interpretation:
Hence, the professor is not certain on the math test of the class which states that
it is at least 70 at 5% level of significance.
2. A merchant believes that the average age of customers who purchase a certain brand
of wears is 13 years of age. A random sample of 35 customers had an average age of
15.6 years. At 1% should this conjecture be rejected. The standard deviation of the
population is 1 year.
Solution:
Suppose x be the age of customers who purchase a certain brand of wear. Given
µ0 = 13, n = 35, x̄ = 13 and s = 1.
X Formulate the null and alternative hypotheses.
H0 : µ = 13
Ha : µ 6= 13
X Specify the level of significance (α = 0.01). This test is a two tailed test, so you
divide the alpha level by two, α/2 = 0.005.
X Compute appropriate test statistics. Since the sample size is large (n > 30) z is
appropriate in this case.
x̄ − µ0 15.6 − 13 2.6
z= √ = √ = = 15.38
s/ n 1/ 35 0.169
X Obtain the tabulated value from the standard normal table which is zα/2 .
zα/2 = z0.005 = 2.575
X Define the rejection region. Reject H0 if |z| > zα/2 otherwise do not reject H0 .
Since the calculated z value 15.38 is much greater than tabulated value 2.575, the
null hypothesis is rejected.
X Interpretation:
Therefore, the merchants believe is not correct at 1% level of significance.
101
7.4. Exercises
1. In order to ensure efficient usage of a server, it is necessary to estimate the mean num-
ber of concurrent users. According to records, the sample mean and sample standard
deviation of number of concurrent users at 100 randomly selected times is 37.7 and 9.2,
respectively.
(a) Construct a 90% confidence interval for the mean number of concurrent users.
(b) Do these data provide significant evidence, at 1% significance level, that the mean
number of concurrent users is greater than 35?
2. To assess the accuracy of a laboratory scale, a standard weight that is known to weigh 1
gram is repeatedly weighed 4 times. The resulting measurements (in grams) are: 0.95,
1.02, 1.01, 0.98. Assume that the weighings by the scale when the true weight is 1 gram
are normally distributed with mean µ.
(a) Use these data to compute a 95% confidence interval for µ.
(b) Do these data give evidence at 5% significance level that the scale is not accurate?
3. A local juice manufacturer distributes juice in bottles labeled 32 ounces. A government

agency thinks that the company is cheating its customers. The agency selects 35 of
these bottles, measures their contents, and obtains a sample mean of 31.7 ounces with
a standard deviation of 0.70 ounce. Use a 0.01 significance level to test the agency’s
claim that the company is cheating its customers.
102
8
Simple Correlation and Linear Regression Analysis
8.1. Correlation Analysis
Correlation is a mathematical tool desired towards measuring the degree of linear relationship
(degree of association) between the variables. Correlation that involves only two variables
is called simple correlation and which involves more than two variables is called multiple
correlations.
Covariance is a measure of the joint variation in two variables, i.e. it measures the way in
which the values of the two variables vary together.
1. If the covariance is zero, there is no linear relationship between the two variables.
2. If it is negative, there is an indirect linear relationship between them.
3. If the covariance is positive, there is a direct linear relationship between the variables.
Pearson’s Coefficient of Correlation
Pearson’s coefficient of correlation (r) is used to measure the strength of the linear relationship
between two variables.
n xy − x y
P P P
r=p P 2
n x − ( x)2 × n y 2 − ( y)2
P p P P
The value of r is always between -1 and +1 inclusive.
103
Interpretation of r
1. If the value of r is -1 or +1, there is perfect negative or perfect positive linear relationship
between the variables.
2. If the value of r is approximately -1 or +1, there is a strong negative or strong positive

linear relationship between the variables.
3. If r is -0.5 (or approximately -0.5) or +0.5 (or approximately +0.5), there is moderate
negative or moderate positive linear relationship between the variables.
4. If r is 0, there is no linear relationship.
8.2. Simple Linear Regression
Regression is defined as the estimation or prediction of the unknown value of one variable
from the known values of one or more variables. It is also functional relationship between two
or more variables. The variable whose values are to be estimated or predicted is known as
dependent or explained variable while the variable which are used in determining the value
of the dependent variable are called independent or predictor variables. The regression study
that involves only two variables is called simple regression and the regression analysis that
studies more than two variables is called multiple regression.
Regression Equation: is a mathematical equation that defines the relationship between

two variables. Regression of y on x is given by
y = α + βx + ε
where
y is the dependent variable,

x is the independent variable,
α is constant term (intercept),
β is slope (change in y for a unit change in x) and
ε is the error term.
To estimate the regression coefficients (α̂ and β̂), the procedure is minimizing the sum of the
squares of the errors.
104
Let the estimated model be

ŷ = α̂ + β̂x
Then, from sample data the values of α̂ and β̂ can be obtained as follows:
n xy − x y
P P P
β̂ = ; α̂ = ȳ − β̂ x̄
n x2 − ( x)2
P P
Interpretation of the slope
1. If β̂ is positive, there is a direct relationship between the two variables.
2. If β̂ is zero, there is no linear relationship between the two variables.
3. If β̂ is negative, there is indirect linear relationship between the two variables.
8.3. Coefficient of Determination (R2 )
The coefficient of determination tells how well the estimated model fits the data. For simple
linear regression (two variables case), it is defined as the square of the sample correlation
coefficient, and denoted by r2 . Hence r2 measures the proportion or percentage of the variation
in the dependent variable explained by the independent variable. r2 is a nonnegative quantity
which lies in the limits 0 and 1. If it approaches to 1, it means a good fit and if it approaches
0, no relationship between the variables.
Example
A researcher wants to find out if there is a relationship between the heights of sons and the
heights of their fathers. In other words, do taller fathers have taller sons? The researcher
took a random sample of 6 fathers and their 6 sons. Their height in inches is given below in
an ordered array.
Father (X) 63 65 66 67 67 68
Son (Y) 66 68 65 67 69 70
(a) Find the correlation coefficient and interpret.
n xy − x y
P P P
r=
n x2 − ( x)2 × n y 2 − ( y)2
p P P p P P
6 × 26740 − 396 × 405

=p
6 × 26152 − (396)2 × 6 × 27355 − (405)2
p
= 0.597
105
(b) Estimate the regression model of height of sons on height of fathers and interpret the
estimated parameters.
n xy − x y
P P P
β̂ =
n x2 − ( x)2
P P
6 × 26740 − 396 × 405

=
6 × 26152 − (396)2
= 0.625
α̂ = ȳ − β̂ x̄ = 67.5 − 0.625 × 66
= 26.25
Hence, the estimated regression model is:
ŷ = 26.25 + 0.625x
For one inch increment in fathers height, the height of the son is increased by 0.625
inches.
(c) Compute coefficient of determination and interpret the result.
R2 = (r)2 = 0.5972 = 0.357
Thus 35.7% of variation in the dependent variable (son height) is accounted for by the
variation of the independent variable (father height).
8.4. Exercises
1. Given the following data:
AGE SBP
15 116
20 120
25 130
30 132
40 150
50 148
a) Compute regression a line (systolic blood pressure (SBP) on AGE) and interpret
the results.
106
b) Compute correlation coefficient between SBP and AGE and also interpret the
result.
c) How much the variance of SBP can be explained by the fact that there is variability
in AGE?
2. An experiment was conducted to study the effect on sleeping time of increasing the
dosage of a certain barbiturate. Three readings were made at each of three dose levels:
Sleeping time (Hrs) Dosage

4 3
6 3
5 3
9 10
8 10
7 10
13 15
11 15
9 15
a) Plot the scatter diagram.
b) Determine the regression line relating dosage (X) to sleeping time (Y).
c) What is the predicted sleeping time for a dose of 12?
T HE EN D!!
107
108

Introduction To Statistics: Haramaya University College of Computing and Informatics Department of Statistics

Uploaded by

Copyright:

Available Formats

Introduction To Statistics: Haramaya University College of Computing and Informatics Department of Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics: Haramaya University College of Computing and Informatics Department of Statistics

Uploaded by

Copyright:

Available Formats

Haramaya University

College of Computing and Informatics

2 Methods of Data Collection and Presentation 11

3 Measures of Central Tendency 31

3.5 Types of Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . 33

6.2 Introduction to Expectation of Random Variables . . . . . . . . . . . . . . . . 81

7 One Sample Statistical Inference 93

8 Simple Correlation and Linear Regression Analysis 103

Studying statistics is great!!

1.1. History and Definition of Statistics

1. Statistics are aggregate of facts. Single or isolated facts or figures cannot be

3. Statistics are numerically expressed. All statistics are stated in numerical

4. Statistical data are collected in a systematic manner for predetermined

5. Statistics are enumerated or estimated according to reasonable stan-

but these should be consistent with the degree of accuracy desired.

3. Presenting of Data: The purpose of data presentation is to have an overview of what

5. Interpretation of Data: This is the last stage of statistical investigation. Interpre-

1.2. Classification of Statistics

The methodology of descriptive statistics includes the methods of organizing (classifica-

Examples: Classify the following statements as descriptive and inferential statistics.

(a) The average age of the students in this class is 21 years.

1.3. Application of Statistics

In Medicine: Principles of design of experiments are used in screening of drugs and in

In Archeology: Quantitative assessment of similarity between objects has provided a method

In Courts of Law: Statistical evidence in the form of probability of occurrence of certain

1.4. Uses of Statistics

To reduce and summarize masses of data and to present facts in numerical

To facilitate comparison: statistical devises such as averages, percentages, ratios,

1.5. Limitation of Statistics

Example: Gender, Religion, Color of automobile, etc.

Example: Family size, Number of children in a family, number of cars at the

1. Color of automobiles in a dealer’s show room.

2. Number of seats in a movie theater.

3. Classification of patients based on nursing care needed (complete, partial or safer).

4. Number of tomatoes on each plant on a field.

5. Weight of newly born babies.

1.7. Measurement Scales

Who plays better?

B Mr A scored 5 in Statistics quiz and Mr B scored 6 in Statistics quiz.

Who did better?

What is the average score?

 shows the information contained in the value of a variable.

B have mutually exclusive (non-overlapping) and exhaustive categories.

B no ranking or order between (among) the values of the variable.

Example: Gender (Male, Female), Political Affiliation (Labour, Conservative,Liberal),

Methods of Data Collection and Presentation

2.1. Types of Data

1. Cross-sectional data: is a set of observations taken at a point of time.

2.2. Methods of Data Collection

B Answers to the questions should not require any calculation.

B Questions should be capable of objective answers.

2.2.2. Secondary Data

Who collected them?

What were the sources of data?

What methods were used to collect them?

At what time were they collected?

2.3. Data Organization

2.3.1. Editing Data

Completeness: If the answer to some questions is missing, it becomes necessary to contact

shows the information contained in the value of a variable.