Chapter One

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Chapter one

Introduction
1.1. Definition of Statistics
How do we define Statistics?
It has two meanings. In the more common usage (layman definition), statistics refers to a
collection of numerically expressed facts or data.
Examples:
The number of colleges in a city;
The number of students in a college;
Per capita income statistics;
Statistics of imports, exports, consumption, etc;
But the subject statistics has a much broader meaning than just collecting and publishing numerical
information.
Therefore, we define statistics as the science of collecting, organizing, presenting, analyzing, and
interpreting numerical data to assist in making more effective decisions.
According to Dominick Salvatore and Derrick Reagle “statistics refers to collection, presentation,
analysis and utilization of numerical data to make inferences and reach decisions in the face of
uncertainty in economics, business and other social and physical sciences.”
As the definition suggests:
 The first step in investigating a problem is to collect data.
 The data must be organized in some way and perhaps presented in a chart.
 Only after the data have been organized and presented, we can analyze and interpret it.
Example: If students of economics at a university would like to know the monthly household income of
200 residents in a town, then they
a) have to collect the data, that is, income of the households under study ,
b) should organize the data (say by arranging the data in ascending or descending order),
c) should present that data by using charts, tables, etc,
d) and they should do some analysis (say find the average, median, mode variance, standard
deviation, , etc) and interpret the data.
1.2. Types of Statistics
The study of statistics is usually divided in to two categories:
a) Descriptive Statistics
 It is a statistical method that deals with describing (summarizing) given set of data without
making conclusions about the larger data.
 It consists of collection, organization and presentation of data in an informative way.
 Tables, graphs and numerical summary measures may be used to describe data.
 In descriptive statistics, the statistician tries to describe a situation.
Example: Consider the national census conducted by the Ethiopian government in 1999 E.C. Results
of this census give the average age, average household income, and other characteristics of the
Ethiopian population and these are descriptive statistics.
b) Inferential Statistics
 It is also called statistical inference or inductive statistics.
 It is a statistical method that involves taking a sample from a population, computing the statistic
based on the sample, and inferring from the statistic about the value of the corresponding
parameter.
1
 It is a branch of statistics that is used to determine something about the population on the basis of a
sample taken from that specific population
 It is a decision, estimate, prediction, or generalization about a population, based on a sample.
Example: The accounting department of a large firm will select a sample of the invoices to check for accuracy
for all the invoices of the company.
Note the words “population” and “sample” in the definition of inferential statistics.
 A population is a collection of all possible individuals, objects or measurement of interest. When a
researcher gathers data from the whole population for a given measure of interest, it is called census
(complete enumeration).
 A sample is a portion or part of the population of interest.
When we discuss about inferential statistics we have to differentiate between parameter and statistic.
 Parameter is the calculated value of a population (say population mean, population standard
deviation, etc.) and statistic is the calculated value of a sample (say sample mean, sample standard
deviation, etc.).
1.3. Why we study Statistics?
Statistics is required for many college programs like business, economics, engineering,
psychology, medicine etc. The course content is basically the same. The biggest difference is the
examples used and level of mathematics required. Statistics course in colleges of business and
economics usually teach the course at a more applied level.
Thus, in business and economics, we are interested in such things as:
 profits (revenue minus cost),
 Gross Domestic Product (GDP),
 Demand,
 Supply,
 Consumption,
 Cost ,
 Wages, etc.
We are studying statistics for the following reasons:
1) The first reason is that numerical information is everywhere.
If you look in the magazines in Ethiopia, you are going to find a lot of numerical
information like exchange rates (say $1=21birr), unemployment rates (say 5% in
unemployed Gross National Income
mettu = labor force ), per capita income (=Population ), consumption
rate of cement, export of coffee, import of cars, inflation rate, demand for kerosene,
enrollment rates of high schools, etc.
Therefore, to be an educated consumer of this information, an understanding of the
concepts of basic statistics will be useful.
2) Students and/or professionals may be called on to conduct research in their fields, since statistical
procedures are basic to research.
To accomplish this, they must be able to design experiments; collect, organize, analyze
and summarize data and possibly make reliable predictions or forecast for future use.
They must also be able to communicate the results of the study in their own words.

2
3) Students, like professionals, must be able to read and understand the various statistical studies
performed in their field. To have such understanding, they must be knowledgeable about the
vocabulary, concepts and statistical procedures used in these studies.
4) Data is everywhere and no matter what your future line of work, you will make decisions that
involve data and understanding of statistical methods will help you make these decisions more
effectively.
1.4. importance of Statistics
The main functions of statistics are to enlarge our knowledge of complex phenomena. That is;
i. It presents facts in a definite and precise form. Example: Instead of saying that per capita income
of Ethiopia is low; better and clear to say it is 110.
ii. It reduces data: i.e. it simplifies a complex mass of data and presents it in a few, clear, and useful
summaries. The bulky data may be summarized in totals, averages, percentages, etc.
iii. It measures the magnitude of variation in data.
iv. It helps to estimate the unknown population parameter from a sample.
v. It helps to test and formulate hypothesis.
vi. It helps to study the relationship between two or more variables.
vii. It helps to forecast future events.
1.5. Steps of Statistical investigation
A statistical study involves the following stages:
i. Determine the objective of the study;
ii. Collection of data;
iii. Organizing the collected data;
iv. Presenting the data;
v. Analyzing the data, and
vi. Interpreting the results of the study and recommendations.
1.6. Types of measures of central tendency

1) Arithmetic Mean: The arithmetic mean is the sum of the data set values divided by the number of
observations. Arithmetic mean or average value of a variable is the most important numerical
measures of central tendency. For ungrouped data, the population mean (usually denoted by “”) is
the sum of all the population values divided by the total number of population values:
N
∑ Xi
i=1
μ=
N
where : N=number of elements in the population
μ=population mean
The population mean applies when the data represent all of the items within the population. For
ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample
values:

3
n
∑ Xi
X =i=1
n
X =sample mean
n=number of elements in the sample/sample size
A sample of five executives received the following salaries (Birr in thousands): 14.0, 15.0, 17.0, 16.0, and
15.0, find the mean salary.

Σ Xi 14 . 0+.. .+15 . 0 77
X̄ = = = =15 . 4
n 5 5
Arithmetic mean for grouped data
The mean of a sample of data organized in a frequency distribution is computed by the following formula:
k
∑ f i Xi f i =i th class frequency
X = i=1k
X i =midpo int of the i th class
∑ fi
i =1 where: k =number of classes
Example: Compute the arithmetic mean of for the following grouped data:
Class Boundaries Class mark fi fiXi
(Xi)
5.5-10.5 8 1 8
10.5-15.5 13 2 26
15.5-20.5 18 3 54
20.5-25.5 23 5 115
25.5-30.5 28 4 112
30.5-35.5 33 3 99
35.5-40.5 38 2 76
7 7
490
∑ f i =20 ∑ f i X i =490 ⇒ X=20 =24 .5
i=1 i=1

2) Weighted mean: It is a special case of arithmetic mean. It occurs when there are several
observations of the same value which might occur if the data have been grouped in to a frequency
distribution. It is the mean value of data values that have been weighted according to their relative
importance. The formula for the weighted mean for a population or a sample will be as follows:

μω ¿
Where: μ ω= is population weighted mean
X ω =is sample weighted mean
ω i= Weight assigned to the ith data value

4
x i𨠨The ith data value

Examples:
i. During a one hour period on Saturday afternoon a waiter served fifty drinks. She sold 5 drinks for
birr 0.50, 15 for birr 0.75, 15 for birr 0.90, and 15 for birr 1.10. Compute the weighted mean price
X =0 . 875
5∗0 .50+15 ( 0. 75+0 . 90+1 .10 )
=
ω 50
of the soft drinks.

2) Median (MD)
The median of a set of values arranged in the order of their magnitudes, i.e., in an array, is the middle
value or the arithmetic mean of two middle values. Median is that value of a variable which divides an
array of items in such a manner that the number of items below it is equal to the number of items above it.
a) Median for Ungrouped Data

( ) observation
th
n+1
If the number of observations is odd, then, MD =value of 2
Example: Find the median of the following data set: 1, 5, 3, 9, 10, 12, 6
Solution: First array the data: 1, 3, 5, 6, 9, 10, 12, n = 7 → odd

( ) ( ) observation
th th
n+1 7+1
observation
MD = 2 = 2 = 4th observation = 6

() ( )
th th
n n
observation+ +1 observation
2 2
If the number of observations is even, then, MD = 2
Find the median of the following data set: 1, 5, 2, 9, 7, 10, 12, 13
Solution: First array the data: 1, 2, 5, 7, 9, 12, 13, n = 8 → even

() ( ) () ( )
th th th th
n n 8 8
obsn+ +1 obsn obsn+ +1 obsn
2 2 2 2 4 th obsn+5 th obsn
MD = 2 = 2 = 2
7+9
The 4th observation is 7 & the 5th observation is 9, then, MD = 2 = 8
b) Median for Grouped data
For grouped data, median is calculated by using the following formula:

( )
n
−cf
2
MD=ℓ md + ∗i
f
Where
ℓ md is the lower class boundary/class limit of the median class
n is total number of observations
cf is the cumulative frequency preceding the median class
5
i is the class interval/width
f is frequency of the median class

Example: find the median from the following frequency distribution


Class Limit Frequency Cumulative Frequency
30-40 2 2
40-50 18 20
50-60 24 44
60-70 20 64
70-80 8 72
80-90 3 75
total= 75

Properties of Median
1. Array is a must before we calculate the median.
2. There is a unique median for each data set.
3. Median remains unaffected by the magnitude of the extreme values.
3) Mode (MO)
 Mode is the most frequent value in a data set.
 The mode is the value of the observation that appears most frequently.
 The mode of the distribution is the value that has the greatest concentration of tendencies, i.e., the
value that occurs with greatest number of times in a distribution.
 The data value that occurs with greatest frequency is a mode.
Example: the examination scores for ten students are: 81, 93,84,75,68,87,81,75, 81and 87. Because
the score of 81 occurs three times, it is the mode
A data set may have
A. No mode at all, e.g. 1, 3, 9, 0, 7, 8
B. One mode (unimodal) e.g. 1, 3, 1, 7, 1, 9, mode is 1
C. Two modes (bimodal) e.g. 7,2,4,4,7 , mode are 7 and 4
D. Many modes (multimodal) e.g. 1, 0, 0, 1, 3, 2, 2, 3, 7, 7, 4, 9, mode are 1, 0, 3, 2, 7

Mode of a grouped data


The approximate modal value grouped data is calculated by the following formula:
f −f 1 f −f 1
Mode=Lo + i= L0 + i
( f −f 1 + f −f 2 ) ( 2 f −f 1 −f 2 )
Where:
Lo=lower classs boundary of the modal class (i.e., the class with the highest frequency)
f =is the frequency of the modal class
f 1= frequency of the class immediately preceding the modal class class
f 2=¿ frequency of the class immediately following the modal class ¿i=class interval/width
Note: the data is to be arranged in an array. Example: Find the mode of the following distribution:

6
Class Limit Frequency
90-100 10
100-110 37
110-120 65
120-130 80
130-140 51
140-150 35
150-160 18
160-170 4
80 −65 150
Mode=120+ ∗10=120+ =123 . 41
Solution: ( 2∗80−65− 51) 44
Properties of mode
 It is the easiest average to compute.
 It is not affected by extreme values.
 The mode may not exist for a data set.
 It is not unique. A data set can have more than one mode.
 The mode is not based on all observations.

1.7. Types of Measures of Dispersion /Variation


Dispersion is the scatter or variation of items from a measure of central tendency. It measures the extent
to which the values vary among themselves.
Example 5.1. Consider the following data on the expenditures of two groups of workers:
 Group A: Br 6200 2200 17000 17000 12000 (the mean is Br, 2400)
 Group B: Br 1600 1700 13000 4200 32000 (the mean is Br 2400)
We simply conclude that the two groups spend identical amount, if we were given only the average
expenditure of the two groups without knowing the actual expenditures. But the actual observations
indicate that more variation is observed in group A.
To be specific, it is often difficult to assert which set of data is better represented by its mean value unless
we refer to dispersion. This points to the possibility when any two or more sets of sample data having the
same mean (as in the previous example), may differ considerably in terms of the degree of dispersion. For
instance, the average income in a community is not an adequate indicator of the well being of the
community since it doesn’t show us the inequality among the residents. But, the measure of dispersion
can show us this inequality. Therefore, it is useful to have a measure of dispersion to observe variability
of data.
A measure of dispersion may be in an absolute form or relative form.
An absolute measure is said to be in an absolute form when it shows the actual amount of variation of an
item from a measure of central tendency while a relative measure is a quotient obtained by dividing the
absolute measure by a quantity in respect to which the absolute deviation has been computed. Relative
measures are unitless and are used to compare variability between different sets of data.

7
The following are some of the qualities of a good measure of dispersion.
 It should be based on all observations
 It should be easily calculated.
 It should be easily understandable
 It should be affected as little as possible by sampling fluctuations.
There are many types of measures of dispersion as listed below
1. Range
2. Mean deviation
3. Variance and standard deviation
4. Coefficient of variation
As stated so far, when these measures express the magnitude of dispersion in the same unit of
measurement in which the data are recorded, they are known as measures of absolute dispersion.
However, when dispersion is expressed in percentages or ratios, these measures are called measures of
relative dispersion.
1. Range
Range is defined as the difference between the smallest and the largest observations in a given set of raw
data. Obtaining range from raw data thus requires identifying only these two extreme values, and taking
the difference between them
Properties of range
 Only two values are used in its calculation
 It is influenced by an extreme value.
 It is easy to compute and understand.
 It is the crudest measure of dispersion.
 It cannot be determined for an open ended data.
 The grater the range, the higher the variability of the data and vice versa.
Example Compute the range of the following data.
Table 5.1. Results (out of 35%) of 20 students in Econometrics test.
Xi 6 24 18 22 30 15
Fi 3 2 5 1 4 5

Maximum value = 30 marks


Minimum value = 6 marks
Range = Highest value – lowest value = 30 – 6 = 24

Example. – Compute the range of the data given below in table 5.2.
Table 5.2. Results (out of 35%) of 40 students in Econometrics test
Score (35%) Number of Students (Fi)
6 – 10 5
11 – 15 10
16 – 20 15
21 – 25 7
26 – 30 3

8
Example. Compute the coefficient of range for the following raw data.
2, 4, 6, 8, 16, 18, 20
Solution:-
20−2 18
X 100 %= X 100 %
Coefficient of range = 20+2 22 = 81.8%
2. Mean Deviation
The mean deviation, also called the average deviation, measures the average deviation /scatters of a set of
observations about a central value, usually the mean or the median of the distribution. It is computed by
subtracting the mean/median from each individual observations, summing all the deviations ignoring the
negative sign, and dividing the sum by the total number of observations. The negative sign is ignored, for

instance, otherwise the sum of the deviation from the mean [i . e , ]


∑ ( i ) will be zero. The mean
X −X
absolute deviation from the mean for a set of sample data consisting of n observations I computed as
∑ |X i −X|
MD from the mean = n
∑ |X i −M d|
Similarly, MD from the median is obtained as M D from the median = n in the case of
ungrouped data. It is obtained as
∑ f i|X i− X|
MD from the mean = ∑ fi
∑ f i|X i−Md|
M D from the median = ∑ fi
in case of grouped data, where Xi’s are the mid-points and i . ∑ f =n
Example. The age of a sample of 10 students from a class is given below.
18, 19, 19, 19, 20, 21, 21, 22, 23, 24
Find mean deviation (i) from the mean (ii) from the median
Solution:-
∑ X i =206 =20 . 6
Arithmetic mean = n 10

( n 2) ( n 2 +1)
th th
value + value
20+ 21
=
Median = 2 2 = 20.5

9
Age Mean Absolute deviation from Mean absolute deviation from
the mean the median
18 /18 – 20.6/ = 2.6 /18 – 20.5/ = 2.5
19 /19 – 20.6/ = 1.6 /19 – 20.5/ = 1.5
19 /19 – 20.6/ = 1.6 /19 – 20.5/ = 1.5
19 /19 - 20.6/ = 1.6 /19 – 20.5/ = 1.5
20 /20 - 20.6/ = 0.6 /20 – 20.5/ = 0.5
21 /21 - 20.6/ = 0.4 /21 – 20.5/ = 0.5
21 /21 - 20.6/ = 0.4 /21 – 20.5/ = 0.5
22 /22 - 20.6/ = 1.4 /22 – 20.5/ = 1.5
23 /23 - 20.6/ = 2.4 /23 – 20.5/ = 2.5
24 /24 - 20.6/ = 3.4 /24 – 20.5/ = 3.5
16 16

Therefore,
∑ |X i −X| 16
MD from the mean = n = 10 = 1.6
∑ |X i −M d|=16
MD from the mean = n 10 = 1.6
Example Find mean absolute deviation from the mean and from the median for the data given in the
above table
4. Variance and Standard Deviation
Like other measures, variance and standard deviation also quantities the dispersion of the observations
around the mean value.
The population variance is defined as the arithmetic mean of the squared deviations from the population
mean.
.
The formula for the population variance for raw data is:

2 ∑ ( X i −μ )2
δ =
N
where:
μ = Mean (population)
N = total number of observation

2 ∑ ( X i −X ) 2
S=
n−1
Where;
n = sample size
X = mean
Example The ages of a family (in years) are:
2, 18, 34, 42. What is the population variance

10
Solution:

μ=
∑ X i =96
Ν 4 = 24

δ 2
=
∑ ( X−μ )2 ( 2−24 )2 + ( 18−14 )2 + ( 34−24 )2 + ( 42−24 )2
=
Ν 4
944
= 4 = 236
the population standard deviation is the square root of the population variance.

δ=
√ ∑ ( X i −μ )2
N
Example From the sample data given below compute variance and standard deviation
10, 15, 30, 22, 41, 32
 Variance and Standard deviations for grouped data
For grouped data the population and sample variance denoted by δ 2 and S2 respectively are given by:

2 ∑ f i ( X i−μ ) 2
δ =
∑ fi
∑ f i ( X i− X )
S2 =
∑ fi
in which Xi’s are the class mid-points and∑ f =N
i for the population and i ∑ f =n
for the sample.
Example From the following frequency distribution, compute the sample variance and standard
deviation.
Class limits f
i
(scores)
6 –10 5
11 – 15 10
16 – 20 15
21 – 25 7
26 – 30 3
40

5. Coefficient of Variation
Coefficient of variation, developed by Karl person (1857 – 1936), is a relative measure of dispersion
which is a very useful measure when either the data are in different units or the data are in different
units or the data are in the same units but the means are far apart. It is defined as the ratio of the
standard deviation to the arithmetic mean (where mean is different from zero), expressed as a
percentage:
11
S tan darddeviation
CV = X 100 %
Mean
Exercise: In a sample, 100 students doing a master program in management were tested in a general
knowledge paper carrying 100 marks. At the end of the exercise, they were found distributed according to
marks obtained as follows:

Marks
obtained 30 -34 35-39 40-44 45-49 50-54 55-59 60-64
Number of 5 8 12 20 27 20 8
students
Find
a) The range of the distribution,
b) Mean absolute deviation from the mean,
c) Variance and standard deviation, and
d) Coefficient of variation.

12

You might also like