Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
CHAPTER - I
1. Introduction: -
1.1 Definition and Classification of Statistics
Defn: - Statistics: is the science which deals with the collection, organization, Presentation, analysis and interpretation of numerical data.
Data: is the measurement or observation (values) for a variable (factor)
■A collection of data values forms a data set.
■Each value in the data set is called a data value or datum.
o Why we need statistics?
Classification of Statistics
There are two branches of statistics:-
1. Descriptive statistics:-Statistical method that deals with describing or summarizing a given set of data.
o Here there is no generalization or conclusion about the population.
o Consists of collection, organization and presentation of data.
o E.g. frequency distribution, measure of central tendency (such as mean, median) measure of dispersion (like range, , variance,
etc...)
2. Inferential statistics: - Is the process of drawing conclusion (inference) about a population based on the information obtained from the sample.
o Is performing and testing hypothesis, determining relationships among variables, and making prediction.
o Used to describe, infer, estimate, approximate the characteristics of the target population
Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger
population.
These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics
(estimation), forecasting of future observations, descriptions of association (correlation), or modeling of relationships (regression).
Introduction to statistics
If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the
population as a whole.
Statistics offers methods to estimate and correct for randomness in the sample and in the data collection procedure,
The fundamental mathematical concept employed in understanding such randomness is probability.
o Is to display what is contained in our data in the form of pictures. E.g. Diagrams and graphs.
Stage 4. Analysis of data
o Is the mathematical operation on collected & organized data E.g. Calculating Mean, variance, etc.…
Stage 5. Interpretation
o Is giving meaning to the result obtained in the analysis stage.
1.3 Definitions of Some Basic Terms
A variable:-
o Is the factor or characteristics that can take on different possible value or outcome.
o A variable can be qualitative or quantitative
o A qualitative (categorical) variable: - is the variable that can be expressed in categorical ways. I.e. it cannot be expressed in terms of
numbers.
o E.g. Sex, marital status, Religion, Region etc.…
o A Quantitative variable: - is the variable that can be measured in numerical ways.
o E.g. Height, income, weight, age, etc…
o Any non-empty subset of a population. E.g. 25 stuff of JUCAVM out of 500 stuffs.
Parameter:
o It is the measurable character of the population.
o It is numerical result obtained as measuring the population.
Statistic: (not to be confused with Statistics)
o It is measurable character of the sample.
o It used to estimate parameter.
Sampling:
o Is the method of obtaining sample from the population.
Survey experiment:
o It is the device of obtaining the desired data. E.g. Collection observations based on the weight of students in Agro-Economics department.
Statistical design:
o It is the process that involves a decision problem and choosing an approach to solve the problem.
o It guide that indicates how an investigation is going to channeled.
Frame:
o It is the listing of all elementary units in the population under consideration (Study).
CHAPTER - TWO
o Collection of data implies a systematic and meaningful assembly of information for the accomplishment of the objective of a statistical
investigation.
o It refers to the methods used to gathering the required information from the units under investigation. The quality of data greatly affects
final output of an investigation. Hence, utmost care should be attached to the data collection process and every possible precaution should
be taken to ensure accuracy while collecting data. Otherwise, with inaccurate and inadequate data, the whole analysis is likely to be faulty
and also the decisions to be taken will also be misleading.
The following are the major points that we need to take into account while preparing the questionnaire. The number of questions should be
small. Naturally respondents are not comfortable with lengthy questionnaires. Lengthy questionnaire usually bore respondents. Hence, fifteen
to twenty five questions in a questionnaire are optimal. If a lengthy questionnaire is unavoidable, it should preferably be divided in to two or
more parts.
Introduction to statistics
o The question should be short, clear, simple, and unambiguous. Moreover, the question must be arranged in to a logical order so that
natural and spontaneous reply to each is induced. For instance it is not appropriate to ask a person how many packets of cigarette he
/she smoke before asking whether he/she smoke or not.
o Questions of sensitive nature should be avoided. Sensitive questions are those questions that are too personal and pecuniary like
source of income, drinking habit, etc. The logic here is that respondents do not willingly answer sensitive questions. Such information,
if necessary, may be gathered through interviews or through other indirect questions.
o Questions should be capable of objective answers. As much as possible, avoid subjective questions and keep to questions of fact. To
this end, multiple answer questions can be used.
o Mail questionnaires should be accomplished by a covering letter, which should state the purpose of the questionnaire, promise of
confidentially of responses, etc.
B) Method of Secondary Data Collection
o In most cases secondary data is obtained from such sources as census and survey reports, books, official records, reported experimental results, previous
research papers, bulletins, magazines, newspapers, web-sites and other publication. Different organizations and government agencies publish
information (data) in the form of reports, periodicals, journals, etc. in the case of Ethiopia; the central statistical authority (CSA) is the first to be
mentioned in publishing such relevant information (secondary data).
Advantage of Primary Data
o Primary data gives more reliable, accurate and adequate information, which is suitable the objective of and purpose of an investigation.
o Primary source usually shows data in greater detail.
o Primary data is free from errors that may arise from copying of figures from Publications which is the case in secondary data.
o Here the classification criterion is quantitative. It is grouped in to two. These are: - Simple (Ungrouped) frequency distribution &
Grouped frequency distribution.
I. Simple (Ungrouped) Frequency Distribution: - is the distribution that use individual data values along with its distribution.
Usually used when the data range is small.
E.g. Raw data on the number of children per family.
0,2,3,1,1,3,4,2,0,3,4,2,2,1,0,4,1,2,2,3
Required: Construct: - ungrouped frequency distribution, Rf, Pf.
o Cumulative Frequency Distribution: -is a frequency distribution that displays the sum of frequencies of consecutive classes of above or
below a given class.
There are two types of cumulative frequency: -
a) Less than cumulative frequency (Lcf): it is used when our interest focuses on the total number of observation lies below a specified value.
b) More than cumulative frequency (Mcf): it used when frequency interest focuses on the total no of observation above a specified value.
E.g.
Class frequency Lcf Mcf
0 3 3 20
1 4 7 17
2 6 13 13
3 4 17 7
4 3 20 3
Total 20
Introduction to statistics
E.g.
Class Limit Class Boundary
1-5 0.5-5.5
6-10 5.5-10.5
11-15 10.5-15.5
16-20 15.5-20.5
3. Class Width (w): -Is the difference between lower & upper class boundary of any class. And it is possible to find the class width in any of the following
alternatives.
W=Ucbi-Lcbi,
Alternatively possible to find in the following way:
W = Lcbi+1-Lcbi
Or W = Ucbi+1-Ucbi
Introduction to statistics
Or W = UCLi+1-UCLi
Or W = LCLi+1-LCLi
4. Class Mark (Mid Point =Mi)
Mi is the midpoint of a class interval or the average value of the lower and upper class limits.
i.e. Mi= (LCLi+ UCLi)/2 , Mi= (Lcbi+Ucbi)/2
Steps Needed to Construct Grouped Frequency Distribution
1. Calculate the range (R)
R=Xmax- Xmin
2. Calculate the number of class using the sturge’s formula
k= 1+3.322logn, where k-No of classes
n- No of observation and
n = Σfi
Here always make it round up. E.g. k=4.5 ~ 5
3. Calculate the class width
W=R/K R& K must be round up the next whole number.
4. Identify the starting point:- LCL1= Xmin
LCL2=Xmin +W
E.g. Construct a grouped frequency distribution for the following raw data.
11, 29, 6, 33, 14, 31, 22, 27, 19, 20, 21, 18, 17, 22, 38, 23, 26, 34, 39, 27
1. R= Xmax-Xmin, 39-6 = 33
2. K=1+3.322 log20 =5.32 ~ 6
3. W=R/K , 33/6=5.5 ~ 6
4. Determine LCL1=Xmin=6
Class Limit fi Mi Class boundary Lcf Mcf
6-11 2 8.5 5.5-11.5 2 20
12-17 2 14.5 11.5-17.5 4 18
18-23 7 20.5 17.5-23.5 11 16
24-29 4 26.5 23.5-29.5 15 9
30-35 3 32.5 29.5-35.5 18 5
36-41 2 38.5 35.5-41.5 20 2
Introduction to statistics
ii) Pie Chart:-Is the circle that is divided in to different sectors according to the percentage of frequency in to each category of the distribution with angle in
proportion of 360° to the amount associated to each category.
Class frequency Rf Pf 360xPf (in degree)
1st 5 5/25 20% 72°
2nd 7 7/25 28% 100.8
o Is a graph consists of series of rectangles whose bases are equal to the class width of the corresponding classes & whose heights are proportional to class frequencies.
o It is constructed from a grouped frequency distribution.
o In histogram we use class boundaries in the X-axis.
Exercise. The following table is a grouped frequency distribution of money spent per visit by a random sample of 100 customers at a dep’t store.
Amount of spent no of customers
3-7 10
8-12 30
13-17 35
18-22 20
23-27 5
Total 100