(Document Title) (Document Subtitle) : (Company Name) (Company Address)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 39

[DOCUMENT TITLE]

[Document subtitle]

[DATE]
[COMPANY NAME]
[Company address]
Statistics:
Statistics is the discipline that concerns the collection, organization, analysis,
interpretation and presentation of data. In Singular Sense, it is the science which
deals with collection, classification, organization, analysis and interpretation of
numerical data. In Plural Sense, it is the aggregate of the facts which is collected in
systematic manner for a predetermined purpose and placed in relation to each
other.
Characteristics of Statistics:
a. It is the aggregate of facts.
b. It is numerically expressed.
c. It is enumerated or estimated according to the reasonable standard of
accuracy.
d. It is collected in systematic manner.
e. It is pre-determined in purpose.
f. It is capable of being placed in relation each other.
Importance of Statistics:
a. It simplifies the complexity of mass figures.
The function of statistics is to present the huge mass of figures into a simple,
presentable and understandable form. Statistical methods extract meaningful
information from the mass of data. It is impossible for one to remember the
whole set of data. The whole mass of data is useless if it is not simplified
through statistical techniques.
b. It facilitates comparison.
Comparison of similar facts is always expected in various studies. Statistical
techniques like average ratio, coefficient of variation and many more facilitates
comparison among two or more than two group. For eg: The performance of
one college can be presented clearly only by comparing it with the performance
of same others.
c. Statistics present the facts in a definite form.
Numerical expressions are more convincing than qualitative expressions. One
of the most important functions of statistics is to present the facts in a
qualitative form. For eg: The statement “the literacy rate of Nepal has increased
in 1999 from last years makes a little sense unless it is expressed as “the literacy
rate of Nepal is 35% in 1999 whereas it was 31% last year.”
d. It helps in forecasting.
The knowledge of future trends is important in framing plans and policies.
Statistical methods are most useful in the forecasting of future events based
upon the historical record and other factors. For eg: The forecasting of demand
of a certain products helps the company to make its pricing policy, personal
policy and so on.
e. It helps in formulating and testing hypothesis.
Statistical methods are extremely helpful in formulating and testing the
hypothesis based on the statistical evidence.
f. It helps in formulating policies.
It provides basic material for framing suitable policies in any firm or
organization or nation. It depends upon the statistical evidence and analysis of
the situation.
g. It helps in planning.
Statistics is one of the crucial parts of planning. Without statistics, the plan
can’t be possible. Statistics helps to do planning in business, economics, and
government level. In the modern era, planning is everything. And almost
every governmental organization, as well as a private one, are using statistics
to formulate their policy and to do adequate planning. Statistics is all about
the collection of data. To do solid planning, companies use the data related
to production, consumption, birth, death, income, and so on. Statistics is
helping the countries to do adequate planning that is crucial for them—the
best examples of planning we see on COVID 19 pandemic. We have seen
that the Government of Newzealand has created the plan to fight against
COVID 19, and they have successfully handled the situation to get rid of
COVID 19. This pandemic show the importance of statistics in our daily
life.
Importance of Statistics in the Various fields:

(i) Statistics in Mathematics


Statistics is an essential part of mathematics. In other words, it is related as well as
entirely depends upon mathematics. But nowadays, it is turned into the most
advanced mathematical theory of Integration and Measures. The increasing use of
mathematics and statistics is building new ways for the development of statistics,
i.e., known as Mathematical statistics. Statistics is an essential branch of
mathematics. Applied mathematics is a part of statistics that is specialized in data.

Applied mathematics is a part of statistics that is specialized in data.


Computational mathematics is based on advanced statistics.

(ii) Statistics in Economics

Statistics and Economics are interrelated with each other. It is impossible to


separate them. The development of advanced statistics has open new ways to
extensive use of statistics in Economics.

Almost every branch of Economics uses statistics, i.e., consumption, production,


distribution, public finance. All these Economic branches use statistics for
comparison, presentation, interpretation, and so on.

Income spending problems on and various sections of the people. National wealth
production, demand, and supply adjustment, the effect of economic policies. All
these indicate the importance of statistics in the field of economics and its various
branches. Basically, the government uses statistics in economics to calculate its
GDP and Per capita Income.

(iii) Statistics in Social Sciences

In social science, the variation in observations from time to time, object to object,
and place to place.

Statistical tools of regression and correlation analysis are used in social science to
isolate the effect of these factors in every tested observation. Statistics are also
used to conduct social surveys. The social survey uses the Sampling techniques
and estimation theory. Actually, these are the most powerful tools for conducting a
social survey.

Sociology is a crucial part of social studies. Statistics is also playing a critical role
in sociology. It helps in studying mortality, fertility, population growth, and so on.

(iv) Statistics in Trade


Trade without statistics is pretty tough and can be overwhelming for traders. It
helps the trades to make wise decisions in uncertain situations. Business is full of
risks and uncertainties; anything can happen at any stage. That’s why it is crucial
to forecast every step in business. The statistical record helps to do forecasting
efficiently and effectively.

(v) Statistics in Research Work

Research work is all about statistics. The research worker’s job is to present the
data before the community. The research worker uses statistical methods to solve
particular problems under differing conditions.

Statistics are the basics of every research activity. In other words, research can’t be
possible without statistics. The researcher should have excellent statistics skills to
be a good researcher. It helps to keep the interest of the researcher in the research
work.

(Vi) Statistics in Programming

Statistics is playing a crucial role in programming. Nowadays, most of the


advanced programming is based on statistics. Most of the Python developers are
learning statistics to enhance their career in programming. Statistics is the
foundation of artificial intelligence and machine learning. One can’t get good
command over these technologies without the help of statistics.

It also helps to improve your programming logic. Statistics is widely used in some
of the most popular programming languages i.e., Java, Python, Swift, C, and C++.

(VII) Statistics in Big Data and Data Science

Statistics are widely used in data analytics and data science technologies. But it is
the foundation of Big data technologies too. Big data is nothing without the data,
and data is useless without statistics.

Thus Big Data technology totally depends on statistics. Statistics is used at the
initial stage of Big data on the raw data sorting.

(VIII) Statistics in the Health Industry

Statistics is playing its part in the health industry. It helps the doctor to take and
manage the data of their patients. Apart from that WHO is also using statistics to
generate their annual report on the heath populations of the world. Due to statistics,
the medical scientist has invented lots of vaccines and anti tode to fight against
major diseases.

In this COVID 19 Pandemic, the statistics are playing a crucial role in analyzing
how much patience is all around the world, which region has the most number of
cases and much more. All this happening is all because of statistics.

(IX)Statistics in Business or Business Statistics

Business is dependent on statistics. Almost every business uses statistics to


perform its day to day operations. But here in this blog, we are talking about the
importance of statistics. Statistics are crucial for the business to make future
decisions. They collect the data and process their customer’s data with the help of
statistics. And then take the decision from that data to make changes in their
strategies and policies.

Apart from that, the R & D department of any business relies on statistics. They
used the recent statistics to develop the new product and services for business. It
helps them to make a decision about the latest product and services. And the
company can take the calculative risk on the launch of the product or services.

Besides, businesses also use it to calculate their profit, their employee salary, and
so on. COVID 19 Pandemic also increases the use of statistics in business. Now
the company is working to minimize its loss in this Pandemic duration.

Each large organization uses business statistics and utilize various data analysis
tools. For instance, approximating the probability and see where sales can be
headed in the future. Several tools are used for business statistics, which built on
the bases of mean, median, and mode, the bell curve, and bar graphs, and basic
probability. These can be employed for research problems related to employees,
products, customer service, and much more. Business can successfully rely on the
things what is working and what is not.

Besides this, statistics are widely used in consumer goods products. The reason is
consumer goods are daily used products. The business use statistics to calculate
which consumer goods are available in the store or not.

They also used stats to find out which store needs the consumer goods and when to
ship the products. Even proper statistics decisions are helping the business to make
massive revenue on consumer goods.
(X) Government

The importance of statistics in government is utilized by making judgments


about health, populations, education, and much more. It may help the government
to check out what education schedule can be beneficial for students. What is the
progress report of high school students using that particular curriculum? The
government can assemble specific data about the population of the country using a
census.

(XII) Education

The beneficial importance of statistics in education are that teachers can be


considered to be supportive as researchers during their classrooms to recognize
what education technique works on which pupils and know the reason why. They
also need to estimate test details to determine whether students are working
expectedly, statistically, or not. There are statistical studies about student
achievement at all levels of testing and education, from kindergarten to a GRE or
SAT.

Some Parts are adopted from:

a. Handouts provided by Dr Ananta Raj Dhungana, Lecturer, SDSE, PoU


b. https://medium.com/@john_marsh7/10-awesome-reasons-why-statistics-are-
important-96b87e283640

Descriptive and Inferential Statistics


a. Descriptive Statistics:

A descriptive statistic  is a summary statistic that quantitatively describes or


summarizes features of a collection of information. Descriptive statistics describe
what is going on in a population or data set. Numerical measures are used to tell
about features of a set of data. There are a number of items that belong in this
portion of statistics, such as:

 The average, or measure of the center of a data set, consisting of the mean,


median, mode, or midrange
 The spread of a data set, which can be measured with the range or standard
deviation
 Overall descriptions of data such as the five number summary
 Measurements such as skewness and kurtosis
 The exploration of relationships and correlation between paired data
 The presentation of statistical results in graphical form

These measures are important and useful because they allow scientists to see
patterns among data, and thus to make sense of that data. Descriptive statistics can
only be used to describe the population or data set under study: The results cannot
be generalized to any other group or population.

b. Inferential Statistics:

Inferential statistics are produced through complex mathematical calculations that


allow scientists to infer trends about a larger population based on a study of a
sample taken from it. Scientists use inferential statistics to examine the
relationships between variables within a sample and then make generalizations or
predictions about how those variables will relate to a larger population.

It is usually impossible to examine each member of the population individually. So


scientists choose a representative subset of the population, called a statistical
sample, and from this analysis, they are able to say something about the population
from which the sample came. There are two major divisions of inferential
statistics:

 A confidence interval gives a range of values for an unknown parameter of


the population by measuring a statistical sample. This is expressed in terms
of an interval and the degree of confidence that the parameter is within the
interval.
 Tests of significance or hypothesis testing where scientists make a claim
about the population by analyzing a statistical sample. By design, there is
some uncertainty in this process. This can be expressed in terms of a level of
significance.

Techniques that social scientists use to examine the relationships between


variables, and thereby to create inferential statistics, include linear regression
analyses, logistic regression analyses, ANOVA, correlation analyses, structural
equation modeling, and survival analysis. When conducting research using
inferential statistics, scientists conduct a test of significance to determine whether
they can generalize their results to a larger population. Common tests of
significance include the chi-square and t-test. These tell scientists the probability
that the results of their analysis of the sample are representative of the population
as a whole.
Descriptive Statistics Inferential Statistics
a.It is that branch of statistics which is i.It is a type of statistics which focuses on
concerned with describing population drawing conclusion about the population on the
under study. basis of sampling.
b.It organizes, analyses and presents the ii. It compares, tests and predicts data.
data in a meaningful way.
c.It presents the data in graphs and c.It presents the data in probability.
tables.
d.It is used to describe a situation. iv.It is used to explain the chart of occurrence
of an event.
e.It explains the data which is already v.It attempts to reach the conclusion to learn
known to summarize sample. about the population that extends the available
data.
f.It uses methods like mean, mode, vi.It uses methods like regression analysis,
median, standard deviation, variance. testing of hypothesis, estimation of parameters.

Some parts are adopted from: https://www.thoughtco.com/differences-in-


descriptive-and-inferential-statistics-3126224
Data
Data can be defined as a systematic record of a particular quantity. It is the
different values of that quantity represented together in a set. It is a collection of
facts and figures to be used for a specific purpose such as a survey or analysis.
When arranged in an organized form, can be called information. It is also the
qualitative and quantitative attributes of a variable or a set of variables.

S.N. Quantitative Data Qualitative Data


1.       These are data that deal with These data, on the other hand,
quantities, values, or numbers. deals with quality.
2.       It is Measurable. They are generally not
measurable.
3.       Expressed in numerical form. They are descriptive rather than
numerical in nature.
4.       The research methodology is The research methodology
Conclusive. Exploratory
5.       Measures quantities such as Narratives often make use of
length, size, amount, price, and adjectives and other descriptive
even duration. words to refer to data on
appearance, color, texture, and
other qualities.

6.       Statistics is used to generate and They are only gained mostly
subsequently analyze this type through observation.
of data.

7.       It is Objective. It is Subjective.


8.       The data is Structured The data is Unstructured.
9.       It Determines Level of It determines Depth of
occurrence. understanding.
10.    The uses of statistics add Less reliable and objective.
credence or credibility to it so
that quantitative data is overall
seen as more reliable and
objective.
11.    The data collection techniques The data collection techniques
are:Quantitative surveys, are:Qualitative surveys,  Focus
Interviews, Experiments group methods, Documental
revision, etc.
12.    A large number of A small number of non-
representative samples representative samples
13.    Develops initial understanding Recommends the final course
of action
Some parts are adopted from: https://microbenotes.com/difference-between-
quantitative-and-qualitative-data/,
Primary and Secondary Data:
a. Primary Data:

Primary data is data that is collected by a researcher from first-hand sources,


using methods like surveys, interviews, or experiments. It is collected with the
research project in mind, directly from primary sources. It is also known as the first
hand or raw data.

The data can be collected through various methods like surveys, observations,
physical testing, mailed questionnaires, questionnaire filled and sent by
enumerators, personal interviews, telephonic interviews, focus groups, case
studies, etc.

Some Advantages of using Primary data:

1. The investigator collects data specific to the problem under study.


2. There is no doubt about the quality of the data collected (for the
investigator).
3. If required, it may be possible to obtain additional data during the study
period.

Some Disadvantages of using Primary data (for reluctant/ uninterested


investigators):

1. The investigator has to contend with all the hassles of data collection- 

 deciding why, what, how, when to collect


 getting the data collected (personally or through others)
 getting funding and dealing with funding agencies
 ethical considerations (consent, permissions, etc.)

2.   Ensuring the data collected is of a high standard-

 all desired data is obtained accurately, and in the format it is required in


 there is no fake/ cooked up data
 unnecessary/ useless data has not been included

2. Cost of obtaining the data is often the major expense in studies

b. Secondary Data

Secondary data is data gathered from studies, surveys, or experiments


that have been run by other people or for other research. Secondary data
implies second-hand information which is already collected and recorded
by any person other than the user for a purpose, not relating to the current
research problem. It is the readily available form of data collected from
various sources like censuses, government publications, internal records
of the organisation, reports, books, journal articles, websites and so on.

Some Advantages of using Secondary data:

1. The data’s already there- no hassles of data collection


2. It is less expensive
3. The investigator is not personally responsible for the quality of data (“I
didn’t do it”)

Some disadvantages of using Secondary data:

1. The investigator cannot decide what is collected (if specific data about
something is required, for instance).
2. One can only hope that the data is of good quality
3. Obtaining additional data (or even clarification) about something is not
possible (most often)

PRIMARY DATA SECONDARY DATA

Primary data refers to the first hand Secondary data means data collected by
data gathered by the researcher someone else earlier.
himself.

It is Real time data. It is Past data.

It’s process is slow and difficult. It’s process is Quick and easy.

It’s source is Surveys, observations, It’s source is Government publications,


experiments, questionnaire, personal websites, books, journal articles, internal
interview, etc. records etc.

It is Expensive. It is Economical or cheap.

The collection is Long. The collection is Short.

Always specific to the researcher's May or may not be specific to the


needs. researcher's need.

It is available in Crude form It is available in Refined form

It is More accuracy and reliable. It is Relatively less accuracy and


reliable.

Some parts are adopted from:


https://communitymedicine4all.com/2013/01/07/types-of-data-primary-and-
secondary-data/

Parameter and Statistics


Parameter:
A fixed characteristic of population based on all the elements of the
population is termed as the parameter. Here population refers to an
aggregate of all units under consideration, which share common
characteristics. It is a numerical value that remains unchanged, as every
member of the population is surveyed to know the parameter. It indicates
true value, which is obtained after the census is conducted.
Statistics:
A statistic is defined as a numerical value, which is obtained from a
sample of data. It is a descriptive statistical measure and function of
sample observation. A sample is described as a fraction of the population,
which represents the entire population in all its characteristics. The
common use of statistic is to estimate a particular population parameter.
Differences between Parameter and Statistics:
STATISTIC PARAMETER

a. Statistic is a measure i.Parameter refers to a


which describes a fraction measure which describes
of population. population.

b.It’s numerical value is ii.It’s numerical value is


Variable and Known. Fixed and Unknown

c.Statistical Notation of iii.Statistical Notation of


Statistics: Parameter:

x̄ = Sample Mean μ = Population Mean

s = Sample Standard σ = Population Standard


Deviation Deviation

p̂ = Sample Proportion P = Population Proportion

x = Data Elements X = Data Elements

n = Size of sample N = Size of Population

r = Correlation coefficient ρ = Correlation coefficient

d.It shows the iv.It shows the


characteristics of the characteristics of the
sample. population.

e.Romans letters are used to v.Greek letters are used to


denote statistics denote parameter.
For
example:

1. A researcher wants to know the average weight of females aged 22 years or


older in Nepal. The researcher obtains the average weight of 54 kg, from a
random sample of 40 females.
Solution: In the given situation, the statistics are the average weight of 54
kg, calculated from a simple random sample of 40 females, in Nepal while
the parameter is the mean weight of all females aged 22 years or older.
2. A researcher wants to estimate the average amount of water consumed by
male teenagers in a day. From a simple random sample of 55 male teens the
researcher obtains an average of 1.5 litres of water.
Solution: In this question, the parameter is the average amount of water
consumed by all male teenagers, in a day whereas the statistic is the average
1.5 litres of water consumed in a day by male teens, obtained from a simple
random sample of 55 male teens.

Adopted from: https://keydifferences.com/difference-between-statistic-and-


parameter.html

Types of Measurement Scales:

In statistics, there are four data measurement scales. These are simple ways
to sub-categorize different types of data.

i.Nominal

It is used for labeling variables. It is simply called “labels”. It is the


lowest measurement level. It is assigned to items that is divided into
categories without having any order or structure. It used for the purpose
of identification and ordering for ascending or descending. The only
mathematical operation we can perform with nominal data is to count.
For eg:

1. What is your Gender?


a. Male
b. Female
2. Where do you live?
a. Northern hemisphere
b. Southern Hemisphere

ii. Ordinal

The order of the value is important but the differences between each one is not
really known. It measures non-numeric concept like satisfaction, happiness,
discomfort, etc. It ranks responses. It has the property of identity and magnitude.
The distance between scale point is not scale and not the relative positional
distances.

Eg: How do you feel today?

a. Very happy
b. Unhappy
c. Okay
d. Happy
e. Very happy

iii.Interval

The order and the exact differences between the values is known. It has an interval
scale because it assumes to have equal distance between each of the scale
elements. It has the properties of : identity, magnitude and equal distance. The
equal distance between scale points helps to know how many units greater than or
less than one case is from another. For eg: The meaning of the distance between 25
and 35 is the same as the distance between 65 and 75.

iv.Ratio

It allows for a wide range of both descriptive and inferential statistics to be applied.
The factor which clearly defines a ratio scale is that it has a true zero point. The
properties of ratio scale are: identity, magnitude, equal distance and absolute zero.
These properties allow to apply all possible mathematical operations that include
addition, subtraction, multiplication and division. The absolute true zero allow to
know how many times is one case greater than another. For eg: height, weight,
duration.

Sampling

It is the method or process of data collection in which data is collected from the
representative part of whole population. It is the selection of the sample from the
whole population in order to estimate the characteristics of the population.

For eg:

a. A cook can taste a spoon of rice or vegetable whether it is properly


cooked or not.
b. A pathologist or doctor examines a few drops of blood to draw the
conclusion about the whole body.
c. A businessman gives order for the commodities by examining only
small sample of the same commodity.

Advantages of Sampling:
a. The cost of sampling is minimum.
b. It takes less time in collecting, editing, classification analysis and
interpretation of data.
c. More trained and skilled manpower can be used to collect accurate
information.
d. It is applicable in case of large size population.
e. It is applicable if the elements need to be destroyed in case of testing.

Disadvantages of Sampling:

a. Wrong and unreliable conclusion may be obtained.


b. It cannot give accurate results if the sample survey is conducted by
unskilled, untrained and illiterate person.
c. It the population is too heterogeneous, it may be impossible to use the
sampling technique.
d. It may give wrong conclusion if the sample selected from the population is
not the representative.
Methods of Sampling:
The important methods of sampling are given below:
a. Probability Sampling
b. Non-probability Sampling

A. Probability Sampling

Probability sampling is a sampling technique where a researcher sets a selection of


a few criteria and chooses members of a population randomly. All the members
have an equal opportunity to be a part of the sample. It is mainly used
in quantitative research. If you want to produce results that are representative of
the whole population, you need to use a probability sampling technique. In this
method, units of the population are selected under the law of probability.

There are four main types of probability sample.

a. Simple Random Sampling

One of the best probability sampling techniques that helps in saving time
and resources, is the Simple Random Sampling method. It is the simplest
and most common method of sampling. It is a reliable method of obtaining
information where every single member of a population is chosen randomly,
merely by chance. Each individual has the same probability of being chosen
to be a part of a sample.
For example, in an schools of 500 students, if the teacher decides on
conducting team building activities, it is highly likely that they would prefer
picking chits out of a bowl. In this case, each of the 500 students has an
equal opportunity of being selected.

b. Systematic Random Sampling

Researchers use the systematic sampling method to choose the sample


members of a population at regular intervals. This method is used when: i.
Complete list of the population from which the sample drawn is available.

ii. Population is large, scattered and non-homogeneous

It requires the selection of a starting point for the sample and sample size
that can be repeated at regular intervals. This type of sampling method has a
predefined range, and hence this sampling technique is the least time-
consuming.
For example, a researcher intends to collect a systematic sample of 500
people in a population of 5000. He/she numbers each element of the
population from 1-5000 and will choose every 10th individual to be a part of
the sample (Total population/ Sample Size = 5000/500 = 10).

c. Stratified Random Sampling


 Stratified random sampling is a method in which the researcher divides the
population into smaller groups that don’t overlap but represent the entire
population. While sampling, these groups can be organized and then draw a
sample from each group separately. It is used in heterogeneous population.
In this method, the population is first divided into subgroups (or strata) who
all share a similar characteristic. It is used when we might reasonably expect
the measurement of interest to vary between the different subgroups, and we
want to ensure representation from all the subgroups. It improves the
accuracy and representativeness of the results by reducing sampling bias.
However, it requires knowledge of the appropriate characteristics of the
sampling frame (the details of which are not always available), and it can be
difficult to decide which characteristic(s) to stratify by.
For example, a researcher looking to analyze the characteristics of people
belonging to different annual income divisions will create strata (groups)
according to the annual family income. Eg – less than $20,000, $21,000 –
$30,000, $31,000 to $40,000, $41,000 to $50,000, etc. By doing this, the
researcher concludes the characteristics of people belonging to different
income groups. Marketers can analyze which income groups to target and
which ones to eliminate to create a roadmap that would bear fruitful results.
d. Cluster Sampling
 Cluster sampling is a method where the researchers divide the entire
population into sections or clusters that represent a population. Clusters are
identified and included in a sample based on demographic parameters like
age, sex, location, etc. This makes it very simple for a survey creator to
derive effective inference from the feedback. Cluster sampling can be more
efficient that simple random sampling, especially where a study takes place
over a wide geographical region. 

B. Non-Probability Sampling

 In non-probability sampling, the researcher chooses members for research at


random. This sampling method is not a fixed or predefined selection process. This
makes it difficult for all elements of a population to have equal opportunities to be
included in a sample. The units of the population are not selected under the rule of
probability. Non-probability sampling techniques are often appropriate for
exploratory and qualitative research. In these types of research, the aim is not to
test a hypothesis about a broad population, but to develop an initial understanding
of a small or under-researched population.

a. Convenience Sampling

A convenience sample simply includes the individuals who happen to be


most accessible to the researcher. The investigator selects the sample
elements on the basis of his or her convenience. It is also known as
accidental sampling because sample is chose accidentally. The
investigator choses the closest person as respondents. It is not a scientific
plan ans also does not have any definite plan. The selection is totally
biased.

b. Purposive or Judgement Sampling

Also known as selective, or subjective, sampling, this technique relies on the


judgement of the researcher when choosing who to ask to participate.
Researchers may implicitly thus choose a “representative” sample to suit
their needs, or specifically approach individuals with certain characteristics.
This approach is often used by the media when canvassing the public for
opinions and in qualitative research. It is often used in qualitative research,
where the researcher wants to gain detailed knowledge about a specific
phenomenon rather than make statistical inferences. It is useful for situations
where we need to reach a targeted sample quickly and proportional sampling
is not a primary concern.
Judgement sampling has the advantage of being time-and cost-effective to
perform whilst resulting in a range of responses (particularly useful in
qualitative research). However, in addition to volunteer bias, it is also prone
to errors of judgement by the researcher and the findings, whilst being
potentially broad, will not necessarily be representative.
c. Quota Sampling
In this method, sample is selected according to some fixed quota. It is
similar to stratified random sampling but sample items are chosen
accidentally not randomly. This method of sampling is often used by market
researchers. Interviewers are given a quota of subjects of a specified type to
attempt to recruit. For example, an interviewer might be told to go out and
select 20 adult men, 20 adult women, 10 teenage girls and 10 teenage boys
so that they could interview them about their television viewing. Ideally the
quotas chosen would proportionally represent the characteristics of the
underlying population.
d. Snowball Sampling
This method is commonly used in social sciences when investigating hard-to-reach
groups. Existing subjects are asked to nominate further subjects known to them, so
the sample increases in size like a rolling snowball. For example, when carrying
out a survey of risk behaviours amongst intravenous drug users, participants may
be asked to nominate other users to be interviewed.
Snowball sampling can be effective when a sampling frame is difficult to identify.
However, by selecting friends and acquaintances of subjects already investigated,
there is a significant risk of selection bias (choosing a large number of people with
similar characteristics or views to the initial individual identified).
 
Adopted from:
a. https://www.scribbr.com/methodology/sampling-methods/
b. https://www.questionpro.com/blog/types-of-sampling-for-social-
research/
c. https://www.healthknowledge.org.uk/public-health-textbook/research-
methods/1a-epidemiology/methods-of-sampling-population
Box and Whisker Plot
It is the visual way to show the five number summary. It is the graphical
representation of the data and displays a five number summary based on
the minimum value, lower quartile, median, upper quartile and the
maximum value. The left vertical line of the box is first quartile (Q1) and
the right vertical line shows third quartile (Q3). The box contains middle
50% of the value. The lower 25% of the data is represented by the line
.i.e. whisker connecting minimum value and first quartile (Q1). The
upper 25% of the data are represented by the whisker connecting the
largest value and the third quartile (Q3). If the middle vertical line of the
box is near to the right vertical line of the box then the data is right
skewed. If it is in the middle of the box then the data is symmetric.

Example 1: Draw a box-and-whisker plot for the data set {3, 7, 8, 5, 12, 14, 21,
13, 18}.

Minimum: 3, Q1 : 6, Median: 12, Q3 : 16, and Maximum: 21.

Example 2: Draw a box-and-whisker plot for the data set {3, 7, 8, 5, 12, 14, 21,
15, 18, 14}.

Minimum: 3, Q1: 7, Median: 13, Q3: 15, and Maximum: 21.

Stem and Leaf Display


It is the valuable tool for organizing the set of data and understanding the
distribution of data in the data set. It separates the whole data into two
parts: leading part and trailing part. A single data is used to define each
leaf and if leaf is not shown, it is assumed to be one.

A good stem and leaf plot :

 shows the first digits of the number (thousands, hundreds or tens) as


the stem and shows the last digit (ones) as the leaf.
 usually uses whole numbers. Anything that has a decimal point is rounded to
the nearest whole number. For example, test results, speeds, heights,
weights, etc.
 looks like a bar graph when it is turned on its side.
 shows how the data are spread—that is, highest number, lowest number,
most common number and outliers (a number that lies outside the main
group of numbers).

For Example:
56, 78, 82, 82, 90, 94, 93, 67, 67, 69, 74, 77, 92, 88, 81, 83, 84, 77, 72
Arranging the data in the ascending order:
56, 67, 67, 72, 74, 77, 77, 78, 81, 82, 82, 83, 84, 88, 90, 92, 93, 94

Methods of Data Collection


a. Methods of Primary Data Collection
i.Observation
It is the process of recognizing and noting people, objects and occurrences rather
than asking for information. Communication with people is absent in this method.
It allows everybody to study people in their natural setting without influencing
their behavior. Observational data consists of detailed information about groups or
situations.
Methods of Observation:
1.Covert and Overt Observation
Covert Observation: There is not identification of the researcher so that the
subjects behavior is not influenced by his or her presence. The researcher observes
the situations from a distance.
Overt Observation: There is identification of the researcher and the researcher
explains about the purpose of observation. The problem with this method is that
the subject teds to modify their behavior when they know they are being watched.
2.Structured and Unstructured Observation
Structured Observation: It is the systematic and highly predetermined method of
data collection. The main purpose of this observation is to quantify behavior. It dos
not give the complete picture of the situation or behavior under study.
Unstructured Observation: It is the holistic way to observe and record behavior
without the use of a pre-determined guide. It attempts to provide as complete and
selective description as possible.
Advantages of Observation:
 It is free from subjective biasness.
 Data is not affected by past behavior or future intentions.
 Natural behavior of the group can be observed.
Disadvantages of Observation:
 It is expensive.
 Obtained information is limited.
 Unforeseen events may interfere observational task.
ii.Interview
It is the scientific investigation technique based on the process of verbal
communication between two persons in order to collect information. Interview is
a method of data collection that involves two groups of people, where the first
group is the interviewer (the researcher(s) asking questions and collecting data)
and the interviewee (the subject or respondent that is being asked questions).
Interviews can be carried out in the following ways:
a. Direct Personal Interview:

Direct Personal Interview requires an interviewer or a group of interviewers to ask


questions from the interviewee in a face to face fashion. 

It can be direct or indirect, structured or structure, focused or unfocused, etc. Some


of the tools used in carrying out in-person interviews include a notepad or
recording device to take note of the conversation—very important due to human
forgetful nature. Non-verbal communication likes gestures and facial expressions
give meaning to the respondent answer.

b. Indirect Oral interview

In this method, the information is collected by interviewer from third


person who is directly or indirectly concerned with the events known as
witness. This method is used when the informants are hesitating to give
the information directly. The information obtained from this method
cannot be relied due to the absence of direct contact.

c. Telephone interview

The interviewer contacts respondents by telephones. This method uses a


structured interview schedule.

d. Focus Group Interview

It generally involves 6-10 persons. The involved persons are brough


together at one place to discuss the topic of interest. The inner feelings
and emotional attitudes of the interviewees with respect to a given
problem or situation are studied. The interviewer may does not interfere
during the discussion and brings the discussion back to the main issues
when it goes outside the theme of the discussion

Advantages of Interview:

 More information can be obtained.

 Sample can be controlled.


 It has greater flexibility.

 Personal information can also be obtained.

 Mis-interpretation can be avoided by using unstructured way.

Disadvantages of Interview:

 It is expensive.

 There is chances of biasness of interviewer or respondent.

 It is more time consuming.

 There is more possibility of imaginary info and less frank


responses.

 It needs high skilled interviewer.

iii. Questionnaire

It is the formal list of the questions designed to gather responses from respondents
on a given topic. It is an efficient data collecting mechanism since the researcher
knows exactly what is required and how to measure the variable of interest. It
involves the several steps including writing question items, organizing the question
items on a questionnaire, administering the questionnaire and so on.

Characteristics of the Good Questionnaire

-It should be short and simple

-Questions should proceed in a logical sequence

-Technical terms and vague expressions must be avoided.

-Control questions to check the reliability of the respondent must be present.

-Brief directions with regard to filling up of questionnaire must be provided

-The physical appearances – quality of paper, colour etc must be good to attract the
attention of the respondent
Types of Questions in Questionnaire:

a. Open Questions

Open questions allow people to express what they think in their own words.
Open-ended questions enable the respondent to answer in as much detail as they
like in their own words. For example: “can you tell me how happy you feel
right now?” Open questions are often used for complex questions that cannot be
answered in a few simple categories but require more detail and discussion.
Rich qualitative data is obtained as open questions allow the respondent to
elaborate on their answer.

b. Closed Questions

Closed questions structure the answer by only allowing responses which fit into
pre-decided categories. Data that can be placed into a category is called nominal
data. The category can be restricted to as few as two options, i.e., dichotomous
(e.g., 'yes' or 'no,' 'male' or 'female'), or include quite complex lists of
alternatives from which the respondent can choose (e.g., polytomous). Closed
questions can also provide ordinal data (which can be ranked). This often
involves using a continuous rating scale to measure the strength of attitudes or
emotions. For example, strongly agree / agree / neutral / disagree / strongly
disagree / unable to answer. It is cheap at cost.

Types of Questionnaire:

a. Self Administered

In this method, the respondents usually complete self-administered


questionnaires.

 Online Questionnaire: It is done by sing the email, internet or the


website.

 Mail Questionnaire: It is done by posting the questionnaires to


respondents who return them by post.

 Delivery and Collection Questionnaire: It is done by hand to hand to each


respondent and collecting later.
b. Interviewer Administered Questionnaire

It is generally administered by the researcher him or her or by any other


interviewer.

 Telephone Questionnaire: The researcher contacts the respondents and


administers questionnaires by using the telephone. The accurate information
and response are essential conditions for a good telephone questionnaire.
The respondents selected for the telephone questionnaire need to be
informed before hand by email or telephone or fixing appointment about the
study.

 Interview Schedule: It is administered by the interviewer by physically


meeting the respondent and asking the questions face to face. It uses
schedule device which is the set of questions. It provides opportunity to the
researcher to rapport with the respondents.

Advantages –

⦁ Free from bias of interviewer

⦁ Respondents have adequate time to give

⦁ Respondents have adequate time to give answers

⦁ Respondents are easily and conveniently approachable

⦁ Large samples can be used to be more reliable

Disadvantages–

⦁ Low rate of return of duly filled questionnaire

⦁ Control over questions is lost once it is sent

⦁ It is inflexible once sent

⦁ Possibility of ambiguous or omission of replies

⦁ Time taking and slow process


b.Methods of Secondary Data Collection
A researcher can obtain secondary data from various sources. Secondary data may
either be published data or unpublished data. Published data are available in :
a. Publications of government
b. technical and trade journals
c. reports of various businesses, banks etc.
d. public records
e. statistical or historical documents.
Unpublished data may be found in letters, diaries, unpublished biographies or
work.
Before using secondary data, it must be checked for the following characteristics:
1. Reliability of data – Who collected the data? From what source? Which
methods? Time? Possibility of bias? Accuracy?
2.Suitability of data – The object, scope and nature of the original enquiry must be
studies and then carefully scrutinize the data for suitability.
3.Adequacy – The data is considered inadequate if the level of accuracy achieved
in data is found inadequate or if they are related to an area which may be either
narrower or wider than the area of the present enquiry

Some parts are adopted from:

a. https://bbamantra.com/methods-of-data-collection-primary-and-secondary-
data/
b. https://www.formpl.us/blog/primary-data
c. https://www.simplypsychology.org/questionnaires.html

Five Number Summary

It consists of the following things:

 The minimum value of a data set is the least value in the set.


 The maximum value of a data set is the greatest value in the set.
 The range of a data set is the distance between the maximum and minimum
value. To compute the range of a data set, we subtract the minimum from the
maximum:
range = maximum – minimum.
 The interquartile range of a data set is the distance between the two
quartiles.
Interquartile range = Q3 – Q1

a way of determining the shape of the distribution .i.e. to see if there


It provides
is symmetric or not in data.

For Symmetry:

a. Difference between Second Quartile and Minimum Value is equal to


difference between Maximum Value and Second Quartile (Q2-
Xmin=Xmax-Q2)
b. Difference between First Quartile and Minimum Value is equal to
difference between Maximum value and Third quartile. (Q1-
Xmin=Xmax-Q3)

For Right Skewed:

a. Difference between Maximum value and Second quartile is greater than


difference between Second Quartile and Minimum value. (Xmax-
Q2>Q2-Xmin)
b. Difference between Maximum value and Third Quartile is greater than
First Quartile and Minimum Value. (Xmax-Q3>Q1-Xmin)

For Left Skewed

a. Difference between Second Quartile and Minimum Value is greater than


difference between Maximum value and Second Quartile. (Q2-
Xmin>Xmax-Q2)
b. Difference between First Quartile and Minimum Value is greater than
difference between Maximum value and Third Quartile. (Q1-
Xmin>Xmax-Q3)
Example 1:  Find the five-number summary for the data set {3, 7, 8, 5, 12,
14, 21, 13,
18}.
From our Example 1's on the previous pages, we see that the five-number
summary is:
Minimum: 3         Q1 : 6            Median: 12            Q3 : 16           Maximum:
21
Example 2: Find the five-number summary for the data set {3, 7, 8, 5, 12,
14, 21, 15, 18, 14}.
From our Example 2's on the previous pages, we see that the five-number
summary is:
Minimum: 3         Q1 : 7            Median: 13            Q3 : 15           Maximum:
21  

Correlation
a. Karl Pearson
Calculate the Karl Pearson’s correlation coefficient for the following data
of sales and expenses in thousands rupees of five firms and interpret the
result.

Sales 43 41 36 34 50
Expenses 12 24 15 21 19

Solution:
Let sales be X and expenses be Y
Here;

X Y XY X2 Y2
43 12 516 1849 144
41 24 984 1681 576
36 15 540 1296 225
34 21 714 1156 441
50 19 950 2500 361
ΣX =204 ΣY =91 ΣXY =3704 Σ X 2=8482 ΣY 2=1747

n= 5
We know:
Coeffiecient of Correlation of Karl Pearson:
nΣXY −ΣX . ΣY
r=
√ nΣ X 2−¿ ¿ ¿
5× 3704−204 ×91
=
√ 5 × 8482−¿ ¿ ¿
18520−18564
=
√ 42140−41616 × √ 8735−8281
−44
=
√794 × √ 454
−44
= 28.178× 21.307
−44
= 600.38
=0.0735
∴There is a negative correlation between sales and expenses (‘000)

b. Spearman
1. When Rank is Given

Find out Spearman rank correlation coefficient and interpret the result.

Candidates A B C D E
Ranking by 1 3 2 4 5
X
Ranking by 3 1 5 2 4
Y

Solution:

Let R1 be ranking by Y and R2 be ranking by X.

R1 R2 d= R1−R 2 d2
1 3 -2 4
3 1 2 4
2 5 -3 9
4 2 2 4
5 4 1 1
Σ d 2=22

We know,

−6 Σ d2
r= 1
n3−n

−6 Σ d2
=1 3
n −n

−6 ×22
=1
53−5
−132
=1 125−5

−132
=1 120

=1-1.1=-0.1

Therefore, there is a negative rank correlation between ranking by Y and ranking


by X.

2. When rank is not given ( Number is given)

Find out Spearman’s rank correlation coefficient between supply and profit.
Also interpret the result.

Supply 75 88 95 70 60 80 81 50
Profit 120 134 150 115 110 140 142 100

Solution:

Let R1 be rank of supply and R2 be rank of profit.

Here;

Suppl R1 Profit R1 d= R1−R 2 d2


y
75 5 120 5 0 0
88 2 134 4 -2 4
95 1 150 1 0 0
70 6 115 6 0 0
60 7 110 7 0 0
80 4 140 3 1 1
81 3 142 2 1 1
50 8 100 8 0 0
Σ d 2=6

We know,

−6 Σ d2
r= 1
n3−n
−6 ×6
=1
83−8

−36
= 1 512−8

−36
= 1 504 =1−0.071=0.9285

Therefore, there is a positive rank correlation between demand and supply.

3. When Number is repeated.

Find the Spearman Correlation Coefficient for the following height of father’s and
their sons and interpret the result.

Height of Father(in 6 65 68 67 66
inches) 7
Height of Sons(in inches) 6 65 69 65 67
9

Solution

Let R1 be rank of height of fathers and R2 be rank of height of sons.

Here:

Height of R1 Height of Son R2 d= R1−R 2 d2


Father
67 2.5 69 1.5 1
65 5 65 4.5 0.25
68 1 69 1.5 0.25
67 2.5 65 4.5 4
66 4 67 3 1
Σ d 2=6.5

m1 = 2

m 2=2

m 3=2
We know:

r = 1−6 ¿ ¿

r = 1−6 ¿ ¿

r = 1−6 ¿ ¿
−6[6.5+0.5+ 0.5+0.5]
r=1 120

−6 × [ 6.5+1.5 ] −6 ×8 −48
r=1 = 1 120 =1 120 = 1-0.4 = 0.6
120

Therefore, there is a positive rank correlation coefficient between height of father


and height of son.

Regression:

a. One Linear Regression Equation

The following data gives the experience of the machine operators in years and their
performance as given by the number of the good parts turned out per 100 pieces.

Operator 1 2 3 4 5
Experience 16 12 4 3 4
Performance 87 88 68 78 68

Obtain the regression equation of performance ratings on experience. Also estimate


performance when experience is 30.

Let, performance ratings be Y and experience be X.

X Y XY X2
16 87 1392 256
12 88 1056 144
4 68 272 16
3 78 234 9
4 68 272 16
ΣX =39 ΣY =389 ΣXY =3226 Σ X 2=441

Let the regression equation be y=a+bx


First;
nΣXY −ΣX × ΣY
b=
nΣ X 2−¿ ¿

5× 3226−39 × 389
b= 5× 441−¿ ¿

16130−1517
b= 2205−1521

959
b= 684 = 1.4

Also;

ΣY 389
Ý = = = 77.8
n 5

ΣX 39
X́ = = ¿ = 7.8
n 5

Now,

a= Ý −b X́

=77.8-10.92= 66.88

Therefore, the required regression equation is y=a+bx

y=66.88+1.4x

When experience is 30, the performance ratings will be : y=66.88+1.4x

y=66.88+1.4×30

=66.88+42=108.88 Ans.

b. Two linear regression equation:

The height of the fathers and sons is given in the following table. Find the two
lines of regression and estimate the average height of the son when the height of
the father is 67.5 inches.
Height of the Father (in 6 6 67 6 69 7 73
inches) 5 6 8 1
Height of the son (in inches) 6 6 68 7 70 6 70
7 4 2 9

Solution:

Let height of the father be X and height of the son be Y.

Here;

X Y XY X2 Y2
65 67 4355 4225 4489
66 64 4224 4356 4096
67 68 4556 4489 4624
68 72 4896 4624 5184
69 70 4830 4761 4900
71 69 4899 5041 4761
73 70 5110 5329 4900
ΣX =479 ΣY =480 ΣXY =32870 Σ X 2=32825 ΣY 2=32954

Solution:

First:

y on x

let the regression equation be y=a+bx

First;
nΣXY −ΣX × ΣY
b=
nΣ X 2−¿ ¿

7 ×32870−479 ×480
b= 7 ×32825−¿ ¿

230090−229920
b= 229775−229441

170
b= 334 = 0.508
Also;

ΣY 480
Ý = = =68.57
n 7

ΣX 479
X́ = = ¿ 7 = 68.429
n

Now,

a= Ý −b X́

=68.57-68.429× 0.508= 68.57-34.761932=33.808

Therefore, the required regression equation is y=a+bx

y=33.808+0.508x

Then;

X on Y

let the regression equation be x=a+by

First;
nΣXY −ΣX × ΣY
b= 2
nΣ Y −¿ ¿

7 ×32870−479 ×480
b= 7 ×32954−¿ ¿

230090−229920
b= 230678−230400

170
b= 278 = 0.612

Also;
ΣY 480
Ý = = =68.57
n 7

ΣX 479
X́ = = ¿ = 68.429
n 7

Now,

a= X́ −b Ý

=68.429-68.57× 0.612= 68.429-41.96484=26.46416

Therefore, the required regression equation is y=a+bx

x=26.6914+0.62y

When height of the father is 67.5 inches, the height of the son will be
y=33.808+0.508x

y=33.808+0.508×67.5

y=33.808+34.29

y=68.08

Index Number:

a. Simple Aggregative Method

Calculate the price index number for the following data by using simple
aggregative method.

Items A B C D E
Price in 1983 5 7 9 7 6
(Rs)
Price in 1984 10 11 10 11 9
(Rs)

Solution:

Let price in 1983 be p0 and price in 1984 be p1


p0 p1
5 10
7 11
9 10
7 11
6 9
Σ p 0=34 Σ p 1=51

We know
Σ p1
P01= × 100
Σ p0

51
P01= × 100
34

=150

b. Simple Average of Price Relative

Determine the price index number from the following data by using simple average
of price relative taking AM and GM as the average.

You might also like