(Document Title) (Document Subtitle) : (Company Name) (Company Address)
(Document Title) (Document Subtitle) : (Company Name) (Company Address)
(Document Title) (Document Subtitle) : (Company Name) (Company Address)
[Document subtitle]
[DATE]
[COMPANY NAME]
[Company address]
Statistics:
Statistics is the discipline that concerns the collection, organization, analysis,
interpretation and presentation of data. In Singular Sense, it is the science which
deals with collection, classification, organization, analysis and interpretation of
numerical data. In Plural Sense, it is the aggregate of the facts which is collected in
systematic manner for a predetermined purpose and placed in relation to each
other.
Characteristics of Statistics:
a. It is the aggregate of facts.
b. It is numerically expressed.
c. It is enumerated or estimated according to the reasonable standard of
accuracy.
d. It is collected in systematic manner.
e. It is pre-determined in purpose.
f. It is capable of being placed in relation each other.
Importance of Statistics:
a. It simplifies the complexity of mass figures.
The function of statistics is to present the huge mass of figures into a simple,
presentable and understandable form. Statistical methods extract meaningful
information from the mass of data. It is impossible for one to remember the
whole set of data. The whole mass of data is useless if it is not simplified
through statistical techniques.
b. It facilitates comparison.
Comparison of similar facts is always expected in various studies. Statistical
techniques like average ratio, coefficient of variation and many more facilitates
comparison among two or more than two group. For eg: The performance of
one college can be presented clearly only by comparing it with the performance
of same others.
c. Statistics present the facts in a definite form.
Numerical expressions are more convincing than qualitative expressions. One
of the most important functions of statistics is to present the facts in a
qualitative form. For eg: The statement “the literacy rate of Nepal has increased
in 1999 from last years makes a little sense unless it is expressed as “the literacy
rate of Nepal is 35% in 1999 whereas it was 31% last year.”
d. It helps in forecasting.
The knowledge of future trends is important in framing plans and policies.
Statistical methods are most useful in the forecasting of future events based
upon the historical record and other factors. For eg: The forecasting of demand
of a certain products helps the company to make its pricing policy, personal
policy and so on.
e. It helps in formulating and testing hypothesis.
Statistical methods are extremely helpful in formulating and testing the
hypothesis based on the statistical evidence.
f. It helps in formulating policies.
It provides basic material for framing suitable policies in any firm or
organization or nation. It depends upon the statistical evidence and analysis of
the situation.
g. It helps in planning.
Statistics is one of the crucial parts of planning. Without statistics, the plan
can’t be possible. Statistics helps to do planning in business, economics, and
government level. In the modern era, planning is everything. And almost
every governmental organization, as well as a private one, are using statistics
to formulate their policy and to do adequate planning. Statistics is all about
the collection of data. To do solid planning, companies use the data related
to production, consumption, birth, death, income, and so on. Statistics is
helping the countries to do adequate planning that is crucial for them—the
best examples of planning we see on COVID 19 pandemic. We have seen
that the Government of Newzealand has created the plan to fight against
COVID 19, and they have successfully handled the situation to get rid of
COVID 19. This pandemic show the importance of statistics in our daily
life.
Importance of Statistics in the Various fields:
Income spending problems on and various sections of the people. National wealth
production, demand, and supply adjustment, the effect of economic policies. All
these indicate the importance of statistics in the field of economics and its various
branches. Basically, the government uses statistics in economics to calculate its
GDP and Per capita Income.
In social science, the variation in observations from time to time, object to object,
and place to place.
Statistical tools of regression and correlation analysis are used in social science to
isolate the effect of these factors in every tested observation. Statistics are also
used to conduct social surveys. The social survey uses the Sampling techniques
and estimation theory. Actually, these are the most powerful tools for conducting a
social survey.
Sociology is a crucial part of social studies. Statistics is also playing a critical role
in sociology. It helps in studying mortality, fertility, population growth, and so on.
Research work is all about statistics. The research worker’s job is to present the
data before the community. The research worker uses statistical methods to solve
particular problems under differing conditions.
Statistics are the basics of every research activity. In other words, research can’t be
possible without statistics. The researcher should have excellent statistics skills to
be a good researcher. It helps to keep the interest of the researcher in the research
work.
It also helps to improve your programming logic. Statistics is widely used in some
of the most popular programming languages i.e., Java, Python, Swift, C, and C++.
Statistics are widely used in data analytics and data science technologies. But it is
the foundation of Big data technologies too. Big data is nothing without the data,
and data is useless without statistics.
Thus Big Data technology totally depends on statistics. Statistics is used at the
initial stage of Big data on the raw data sorting.
Statistics is playing its part in the health industry. It helps the doctor to take and
manage the data of their patients. Apart from that WHO is also using statistics to
generate their annual report on the heath populations of the world. Due to statistics,
the medical scientist has invented lots of vaccines and anti tode to fight against
major diseases.
In this COVID 19 Pandemic, the statistics are playing a crucial role in analyzing
how much patience is all around the world, which region has the most number of
cases and much more. All this happening is all because of statistics.
Apart from that, the R & D department of any business relies on statistics. They
used the recent statistics to develop the new product and services for business. It
helps them to make a decision about the latest product and services. And the
company can take the calculative risk on the launch of the product or services.
Besides, businesses also use it to calculate their profit, their employee salary, and
so on. COVID 19 Pandemic also increases the use of statistics in business. Now
the company is working to minimize its loss in this Pandemic duration.
Each large organization uses business statistics and utilize various data analysis
tools. For instance, approximating the probability and see where sales can be
headed in the future. Several tools are used for business statistics, which built on
the bases of mean, median, and mode, the bell curve, and bar graphs, and basic
probability. These can be employed for research problems related to employees,
products, customer service, and much more. Business can successfully rely on the
things what is working and what is not.
Besides this, statistics are widely used in consumer goods products. The reason is
consumer goods are daily used products. The business use statistics to calculate
which consumer goods are available in the store or not.
They also used stats to find out which store needs the consumer goods and when to
ship the products. Even proper statistics decisions are helping the business to make
massive revenue on consumer goods.
(X) Government
(XII) Education
These measures are important and useful because they allow scientists to see
patterns among data, and thus to make sense of that data. Descriptive statistics can
only be used to describe the population or data set under study: The results cannot
be generalized to any other group or population.
b. Inferential Statistics:
6. Statistics is used to generate and They are only gained mostly
subsequently analyze this type through observation.
of data.
The data can be collected through various methods like surveys, observations,
physical testing, mailed questionnaires, questionnaire filled and sent by
enumerators, personal interviews, telephonic interviews, focus groups, case
studies, etc.
1. The investigator has to contend with all the hassles of data collection-
b. Secondary Data
1. The investigator cannot decide what is collected (if specific data about
something is required, for instance).
2. One can only hope that the data is of good quality
3. Obtaining additional data (or even clarification) about something is not
possible (most often)
Primary data refers to the first hand Secondary data means data collected by
data gathered by the researcher someone else earlier.
himself.
It’s process is slow and difficult. It’s process is Quick and easy.
In statistics, there are four data measurement scales. These are simple ways
to sub-categorize different types of data.
i.Nominal
ii. Ordinal
The order of the value is important but the differences between each one is not
really known. It measures non-numeric concept like satisfaction, happiness,
discomfort, etc. It ranks responses. It has the property of identity and magnitude.
The distance between scale point is not scale and not the relative positional
distances.
a. Very happy
b. Unhappy
c. Okay
d. Happy
e. Very happy
iii.Interval
The order and the exact differences between the values is known. It has an interval
scale because it assumes to have equal distance between each of the scale
elements. It has the properties of : identity, magnitude and equal distance. The
equal distance between scale points helps to know how many units greater than or
less than one case is from another. For eg: The meaning of the distance between 25
and 35 is the same as the distance between 65 and 75.
iv.Ratio
It allows for a wide range of both descriptive and inferential statistics to be applied.
The factor which clearly defines a ratio scale is that it has a true zero point. The
properties of ratio scale are: identity, magnitude, equal distance and absolute zero.
These properties allow to apply all possible mathematical operations that include
addition, subtraction, multiplication and division. The absolute true zero allow to
know how many times is one case greater than another. For eg: height, weight,
duration.
Sampling
It is the method or process of data collection in which data is collected from the
representative part of whole population. It is the selection of the sample from the
whole population in order to estimate the characteristics of the population.
For eg:
Advantages of Sampling:
a. The cost of sampling is minimum.
b. It takes less time in collecting, editing, classification analysis and
interpretation of data.
c. More trained and skilled manpower can be used to collect accurate
information.
d. It is applicable in case of large size population.
e. It is applicable if the elements need to be destroyed in case of testing.
Disadvantages of Sampling:
A. Probability Sampling
One of the best probability sampling techniques that helps in saving time
and resources, is the Simple Random Sampling method. It is the simplest
and most common method of sampling. It is a reliable method of obtaining
information where every single member of a population is chosen randomly,
merely by chance. Each individual has the same probability of being chosen
to be a part of a sample.
For example, in an schools of 500 students, if the teacher decides on
conducting team building activities, it is highly likely that they would prefer
picking chits out of a bowl. In this case, each of the 500 students has an
equal opportunity of being selected.
It requires the selection of a starting point for the sample and sample size
that can be repeated at regular intervals. This type of sampling method has a
predefined range, and hence this sampling technique is the least time-
consuming.
For example, a researcher intends to collect a systematic sample of 500
people in a population of 5000. He/she numbers each element of the
population from 1-5000 and will choose every 10th individual to be a part of
the sample (Total population/ Sample Size = 5000/500 = 10).
B. Non-Probability Sampling
a. Convenience Sampling
Example 1: Draw a box-and-whisker plot for the data set {3, 7, 8, 5, 12, 14, 21,
13, 18}.
Example 2: Draw a box-and-whisker plot for the data set {3, 7, 8, 5, 12, 14, 21,
15, 18, 14}.
For Example:
56, 78, 82, 82, 90, 94, 93, 67, 67, 69, 74, 77, 92, 88, 81, 83, 84, 77, 72
Arranging the data in the ascending order:
56, 67, 67, 72, 74, 77, 77, 78, 81, 82, 82, 83, 84, 88, 90, 92, 93, 94
c. Telephone interview
Advantages of Interview:
Disadvantages of Interview:
It is expensive.
iii. Questionnaire
It is the formal list of the questions designed to gather responses from respondents
on a given topic. It is an efficient data collecting mechanism since the researcher
knows exactly what is required and how to measure the variable of interest. It
involves the several steps including writing question items, organizing the question
items on a questionnaire, administering the questionnaire and so on.
-The physical appearances – quality of paper, colour etc must be good to attract the
attention of the respondent
Types of Questions in Questionnaire:
a. Open Questions
Open questions allow people to express what they think in their own words.
Open-ended questions enable the respondent to answer in as much detail as they
like in their own words. For example: “can you tell me how happy you feel
right now?” Open questions are often used for complex questions that cannot be
answered in a few simple categories but require more detail and discussion.
Rich qualitative data is obtained as open questions allow the respondent to
elaborate on their answer.
b. Closed Questions
Closed questions structure the answer by only allowing responses which fit into
pre-decided categories. Data that can be placed into a category is called nominal
data. The category can be restricted to as few as two options, i.e., dichotomous
(e.g., 'yes' or 'no,' 'male' or 'female'), or include quite complex lists of
alternatives from which the respondent can choose (e.g., polytomous). Closed
questions can also provide ordinal data (which can be ranked). This often
involves using a continuous rating scale to measure the strength of attitudes or
emotions. For example, strongly agree / agree / neutral / disagree / strongly
disagree / unable to answer. It is cheap at cost.
Types of Questionnaire:
a. Self Administered
Advantages –
Disadvantages–
a. https://bbamantra.com/methods-of-data-collection-primary-and-secondary-
data/
b. https://www.formpl.us/blog/primary-data
c. https://www.simplypsychology.org/questionnaires.html
For Symmetry:
Correlation
a. Karl Pearson
Calculate the Karl Pearson’s correlation coefficient for the following data
of sales and expenses in thousands rupees of five firms and interpret the
result.
Sales 43 41 36 34 50
Expenses 12 24 15 21 19
Solution:
Let sales be X and expenses be Y
Here;
X Y XY X2 Y2
43 12 516 1849 144
41 24 984 1681 576
36 15 540 1296 225
34 21 714 1156 441
50 19 950 2500 361
ΣX =204 ΣY =91 ΣXY =3704 Σ X 2=8482 ΣY 2=1747
n= 5
We know:
Coeffiecient of Correlation of Karl Pearson:
nΣXY −ΣX . ΣY
r=
√ nΣ X 2−¿ ¿ ¿
5× 3704−204 ×91
=
√ 5 × 8482−¿ ¿ ¿
18520−18564
=
√ 42140−41616 × √ 8735−8281
−44
=
√794 × √ 454
−44
= 28.178× 21.307
−44
= 600.38
=0.0735
∴There is a negative correlation between sales and expenses (‘000)
b. Spearman
1. When Rank is Given
Find out Spearman rank correlation coefficient and interpret the result.
Candidates A B C D E
Ranking by 1 3 2 4 5
X
Ranking by 3 1 5 2 4
Y
Solution:
R1 R2 d= R1−R 2 d2
1 3 -2 4
3 1 2 4
2 5 -3 9
4 2 2 4
5 4 1 1
Σ d 2=22
We know,
−6 Σ d2
r= 1
n3−n
−6 Σ d2
=1 3
n −n
−6 ×22
=1
53−5
−132
=1 125−5
−132
=1 120
=1-1.1=-0.1
Find out Spearman’s rank correlation coefficient between supply and profit.
Also interpret the result.
Supply 75 88 95 70 60 80 81 50
Profit 120 134 150 115 110 140 142 100
Solution:
Here;
We know,
−6 Σ d2
r= 1
n3−n
−6 ×6
=1
83−8
−36
= 1 512−8
−36
= 1 504 =1−0.071=0.9285
Find the Spearman Correlation Coefficient for the following height of father’s and
their sons and interpret the result.
Height of Father(in 6 65 68 67 66
inches) 7
Height of Sons(in inches) 6 65 69 65 67
9
Solution
Here:
m1 = 2
m 2=2
m 3=2
We know:
r = 1−6 ¿ ¿
r = 1−6 ¿ ¿
r = 1−6 ¿ ¿
−6[6.5+0.5+ 0.5+0.5]
r=1 120
−6 × [ 6.5+1.5 ] −6 ×8 −48
r=1 = 1 120 =1 120 = 1-0.4 = 0.6
120
Regression:
The following data gives the experience of the machine operators in years and their
performance as given by the number of the good parts turned out per 100 pieces.
Operator 1 2 3 4 5
Experience 16 12 4 3 4
Performance 87 88 68 78 68
X Y XY X2
16 87 1392 256
12 88 1056 144
4 68 272 16
3 78 234 9
4 68 272 16
ΣX =39 ΣY =389 ΣXY =3226 Σ X 2=441
5× 3226−39 × 389
b= 5× 441−¿ ¿
16130−1517
b= 2205−1521
959
b= 684 = 1.4
Also;
ΣY 389
Ý = = = 77.8
n 5
ΣX 39
X́ = = ¿ = 7.8
n 5
Now,
a= Ý −b X́
=77.8-10.92= 66.88
y=66.88+1.4x
y=66.88+1.4×30
=66.88+42=108.88 Ans.
The height of the fathers and sons is given in the following table. Find the two
lines of regression and estimate the average height of the son when the height of
the father is 67.5 inches.
Height of the Father (in 6 6 67 6 69 7 73
inches) 5 6 8 1
Height of the son (in inches) 6 6 68 7 70 6 70
7 4 2 9
Solution:
Here;
X Y XY X2 Y2
65 67 4355 4225 4489
66 64 4224 4356 4096
67 68 4556 4489 4624
68 72 4896 4624 5184
69 70 4830 4761 4900
71 69 4899 5041 4761
73 70 5110 5329 4900
ΣX =479 ΣY =480 ΣXY =32870 Σ X 2=32825 ΣY 2=32954
Solution:
First:
y on x
First;
nΣXY −ΣX × ΣY
b=
nΣ X 2−¿ ¿
7 ×32870−479 ×480
b= 7 ×32825−¿ ¿
230090−229920
b= 229775−229441
170
b= 334 = 0.508
Also;
ΣY 480
Ý = = =68.57
n 7
ΣX 479
X́ = = ¿ 7 = 68.429
n
Now,
a= Ý −b X́
y=33.808+0.508x
Then;
X on Y
First;
nΣXY −ΣX × ΣY
b= 2
nΣ Y −¿ ¿
7 ×32870−479 ×480
b= 7 ×32954−¿ ¿
230090−229920
b= 230678−230400
170
b= 278 = 0.612
Also;
ΣY 480
Ý = = =68.57
n 7
ΣX 479
X́ = = ¿ = 68.429
n 7
Now,
a= X́ −b Ý
x=26.6914+0.62y
When height of the father is 67.5 inches, the height of the son will be
y=33.808+0.508x
y=33.808+0.508×67.5
y=33.808+34.29
y=68.08
Index Number:
Calculate the price index number for the following data by using simple
aggregative method.
Items A B C D E
Price in 1983 5 7 9 7 6
(Rs)
Price in 1984 10 11 10 11 9
(Rs)
Solution:
We know
Σ p1
P01= × 100
Σ p0
51
P01= × 100
34
=150
Determine the price index number from the following data by using simple average
of price relative taking AM and GM as the average.