INSTITUTE - University School of Business DEPARTMENT - Management

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 170

INSTITUTE –University

School of Business
DEPARTMENT -Management
M.B.A
QUANTITATIVE TECHNIQUES
FOR MANAGERS
By : AMAN JINDAL
(Associate Professor)
UNIT-1 DISCOVER . LEARN .
EMPOWER

1
Course After undergoing this Course, the students will be
Outcome able:
To learn the use of descriptive Statistics in taking
CO1
business decisions
To understand the concept of regression lines &
CO2 Probability distribution of data for making futuristic
predictions.
To gain an Understanding of inferential statistics for
CO3
making business decisions.
Definition (Statistics)
 Statistics is an art of learning from data. It is a Science
of collection, presentation, analysis, and reasonable
interpretation of data

3
A Brief History of Statistics
 A systematic collection of data on the population and the economy was
begun in the Italian city-states of Venice and Florence during the
Renaissance.
 The term statistics, derived from the word state, was used to refer to a
collection of facts of interest to the state.
 In 1662 the English tradesman John Graunt published a book entitled
Natural and Political Observations Made upon the Bills of Mortality .

4
Definition (Descriptive statistics)
The part of statistics concerned with the
description and summarization of data is
called descriptive statistics

For example:
Tables or graphs are used to organize data, and
descriptive values such as the average score are used
to summarize data.
A descriptive value for a population is called a
parameter and a descriptive value for a sample is
called a statistic.

5
Inferential Statistics
Definition (inferential statistics)
 The part of statistics concerned with the drawing of
conclusions from data is called inferential statistics

 When the experiment is completed and the data are described


and summarized, we hope to be able to draw a conclusion about
the efficacy of the drug.

6
Application Areas of Statistics
 In the early 20th century, two of the most important areas of applied statistics were
population biology and agriculture .
 Nowadays the ideas of statistics are everywhere.
 Descriptive statistics are featured in every newspaper and magazine.
 Statistical inference has become indispensable.
 to public health and medical research,
 to marketing and quality control
 to education,
 to accounting
 to economics
 to meteorological forecasting
 to polling and surveys
 to sports,
 to insurance
 to gambling and
 to all research that makes any claim to being scientific.

7
Scope and importance of Statistics
 Statistics and planning: Statistics in indispensable

into planning in the modern age which is termed as “the


age of planning”. Almost all over the world the govt. are
re-storing to planning for economic development.
Statistics and economics: Statistical data and
techniques of statistical analysis have to immensely
useful involving economical problem. Such as wages,
price, time series analysis, demand analysis.

8
Statistics and business: Statistics is an irresponsible
tool of production control. Business executive are
relying more and more on statistical techniques for
studying the much and desire of the valued customers.

Statistics and industry: In industry statistics is widely


used inequality control. In production engineering to
find out whether the product is confirming to the
specifications or not. Statistical tools, such as
inspection plan, control chart etc.

9
Statistics and mathematics: Statistics are intimately
related recent advancements in statistical technique
are the outcome of wide applications of mathematics.

Statistics and modern science: In medical science the


statistical tools for collection, presentation and
analysis of observed facts relating to causes and
incidence of disease and the result of application
various drugs and medicine are of great importance.

10
Statistics, psychology and education: In education and
physiology statistics has found wide application such
as, determining or to determine the reliability and
validity to a test, factor analysis etc.

Statistics and war: In war the theory of decision


function can be a great assistance to the military and
personal to plan “maximum destruction with
minimum effort.”

11
Statistics in business and management
1 Marketing: Statistical analysis are frequently used in providing
information for making decision in the field of marketing it is
necessary first to find out what can be sold and then to evolve
suitable strategy, so that the goods which to the ultimate
consumer. A skill full analysis of data on production purchasing
power, man power, habits of competitors, habits of consumer,
transportation cost should be consider to take any attempt to
establish a new market.
2. Production: In the field of production statistical data and
method play a very important role. The decision about what to
produce? How to produce? When to produce? For whom to
produce is based largely on statistical analysis.
3. Finance: The financial organization discharging their finance
function effectively depend very heavily on statistical analysis of
peat and tigers.

12
Statistics in business and management
3. Banking: Banking institute have found if increasingly to
establish research department within their organization for the
purpose of gathering and analysis information, not only
regarding their own business but also regarding general
economic situation and every segment of business in which they
may have interest. 
4. Investment: Statistics greatly assists investors in making clear
and valued judgment in his investment decision in selecting
securities which are safe and have the best prospects of yielding
a good income.
5. Purchase: the purchase department in discharging their
function makes use of statistical data to frame suitable purchase
policies such as what to buy? What quantity to buy? What time
to buy? Where to buy? Whom to buy?

13
6. Accounting: statistical data are also employer in
accounting particularly in auditing function, the
technique of sampling and destination is frequently
used.

7. Control: the management control process combines


statistical and accounting method in making the
overall budget for the coming year including sales,
materials, labor and other costs and net profits and
capital requirement. 

14
Limitations of Statistics
Statistics laws are true on average. Statistics are
aggregates of facts. So single observation is not a
statistics, it deals with groups and aggregates only.
Statistical methods are best applicable on
quantitative data.
Statistical cannot be applied to heterogeneous data.
It sufficient care is not exercised in collecting,
analyzing and interpretation the data, statistical
results might be misleading.
Only a person who has an expert knowledge of
statistics can handle statistical data efficiently.
Some errors are possible in statistical decisions.
Particularly the inferential statistics involves certain
errors. We do not know whether an error has been
committed or not. 15
CLASSIFICATION
AND
TABULATION OF DATA
Concept of Variable
Variable
A characteristic which takes on different values in
different persons, place or things.

Example: Heart rate, the heights of adult males, the


weights of preschool children
Quantitative Variable:-

 One that can be measured and expressed numerically. The

measurements convey information regarding amount.

Example: The heights of adult males, the weights of preschool children

Qualitative Variable:-

The characteristics that can’t be measured quantitatively but can be

categorized. The measurement convey information regarding the


attribute. The measurement in real sense can’t be achieved but persons,
places or things belonging to different categories can be counted.

Example: sex of a person, colure of skin


Random Variable
 Values obtained arise as a result of chance event/factor, so that can’t be exactly
predicted in advance.

Example: heights of a group of randomly selected students.

 Discrete Random Variable:- Characterized by gaps or interrupts in the values


that it can assume.

It assumes values with definite jumps.

It can’t take all possible values within a range.

It is observed through counting only

Example: No. of daily admission to a general hospital, the no. of decayed,


missing or filled teeth per child in an elementary school.
Continuous Random Variable:-
It can take all possible values positive, negative, integral and

fractional values within a specified relevant interval.


 Doesn’t possess the gaps or interruptions within a specified

relevant interval of values assumed by the variable.


Derived through measurement

Example: height, weight and skull circumference


Because of limitations of available measuring instruments,
however observations on variables that are inherently
continuous are recorded as if they are discrete.
The ordered array
A first step in organizing data is preparation of an
ordered array.
It is a listing of values of a data series from the
smallest to the largest values.
It enables one to quickly determine the smallest and
largest value in the data set and other facts about the
arrayed data that might be needed in a hurried
manner.
 DATA CLASSIFICATION: The grouping of related facts/data
into different classes according to certain common

 Basis of data Classification:


• Broadly 4 broad basis
1. Geographical i.e. area wise
 Total Population of Punjab by districts
 No. of death due to malaria by districts.
 Infant deaths in Punjab by districts
2. Chronological or Temporal
 i.e. on the basis of time
Table: 2 Death by lightening

Year Number
1990 10
1991 5
1992 12
1993 6
1994 9
1995 3
1996 3
1997 5
1998 12
1999 12
2000 8
2001 7
2002 8
Total 100
3. Qualitative i.e. on the basis of some attributes
Example: People by place of residence, sex and
literacy

Place of residence
Rural Urban
Male Female Male Female
Literate Illiterate Literate Illiterate Literat Illiterate Literate Illiterate
e
4. Quantitative: On the basis of quantitative
class intervals

For example students of a college may be classified


according to weight as follows
Table 3 :Weight of students of a college
Wt. In (LBS) No. of
students
90-100 50
100-110 200
110-120 260
120-130 360
130-140 90
140-150 40
Total 1000
Classification of Age of 600 person in
the Social Survey
Class Relative
Interval Frequency frequency
15 -24 56 09.3
25-34 153 25.5
35-44 149 24.8
45-54 75 12.5
55 - 64 61 10.2
65 - 74 70 11.7
75 - 84 28 4.7
85 - 94 8 1.3
Total 600 100.0
OBJECTIVES OF CLASSIFICATION
Helps in condensing the mass of data such that
similarities and dissimilarities can be readily
distinguished.
No. of No of Cum. Fre. Cum.
children families Less than Fre.
(Frequency) Greater
than
0-2 7 7 35
3-5 16 23 28
6 and 12 35 12
above
Total 35
 Facilitate comparison
No. of No of Cum. Fre. Cum.
children families Less than Fre.
(Frequency) Greater
than
0-2 7 7 (20%) 35
(100)
3-5 16 23 28
(65.7%) (80%)
6 and 12 35 12
above (100%) (34%)
Total 35
 Most significant features of the data
can be pin pointed at a glance
 Enables statistical treatment of the
collected data
 Averages can be computed
 Variations can be revealed
 Association can be studied
 Model for prediction / forecasting can be
built
 Hypothesis can be formulated and tested
etc.
Principles of Classification:
There is no hard and fast rules for
deciding the class interval, however it
depends upon:
Knowledge of the data
Lowest and highest value of the set of
observations
Utility of the class intervals for meaningful
comparison and interpretation

r
 The classes should be collectively exhaustive and
non-overlapping i.e. mutually exclusive.

 The number of classes should not be too large other


wise the purpose of class i.e. summarization of data
will not be served.

 The number of classes should not be too small


either, for this also may obscure the true nature of
the distribution.

 The class should preferable of equal width. Other


wise the class frequency would not be comparable,
and the computation of statistical measures will be
laborious.
Classification will be called exclusive (Continuous),
when the class intervals are so fixed that the upper
limit of one class is the lower limit of the next class
and the upper limit is not included in the class.
An example

Income (Rs.) No. of


families
1000 – 1100 = (1000 but under 15
1100)
1100 – 1200 = (1100 but under 25
1200)
1200 – 1300 = (1200 but under 10
1300)
Total 50
Classification will be inclusive (discontinuous) when
the upper and lower limit of one class is include in that
class itself

Income (Rs.) No. of


persons
1000 – 1099 = (1000 but < 50
1099)
1100 – 1199 = (1100 but < 100
1199)
1200 – 1299 = (1200 but < 200
1299)
Total 300
Discontinuous class interval can be made continuous
by applying the Correction factor.

Lower limit of 2nd Class – Upper


limit of the 1st Class
CF =
2

The correction factor is subtracted


from the lower limit and added to
the upper limit to make the class
interval continuous.
Frequency distributions
Quantitative Variables:
Discrete variable
Continuous variable
Qualitative variable (attributes)

 The manner in which the total


number of observations are
distributed over different classes
is called a frequency distribution.
Frequency distribution of an attribute
Table 4 : Results of survey
on Awarenesson HIV / AIDs

In 1993, 1674 inhabitants State of Number of


Knowledge people
of Calcutta, Bombay and
Aware 620
Madras were surveyed.
Unaware 1054
Each was asked, among,
other questions, Total 1674
whether he/she knew
about the HIV / AIDS. Table 5 : Proportion of
people Aware of HIV /
The results is tabulated. AIDS
State of Relative
Knowledge frequency
Aware 0.370
Unaware 0.630
Total 1.000
Frequency distribution of a discrete variable
Data grouped in to classes and the number of cases which
fall in each class are recorded

Example: In a survey of 35 families in a village, the number


of children per family was recorded data were obtained.

1 0 2 3 4 5 6
7 2 3 4 0 2 5
8 4 5 9 6 3 2
7 6 5 3 3 7 8
9 7 9 4 5 4 3
Steps for frequency distribution
 Find the largest & smallest value; those are 9 and 0
respectively.

 Form a table with 10 classes for the 10 values 0,1,2……9

 Look at the given values of the variable one by one and for
each value put a tally mark in the table against the
appropriate class.

 To facilitate counting, the tally marks are arranged in the


blocks of five every fifth stroke being drawn across the
proceeding four. This is done below.
Table 6: Frequency Table

Cumulative Cumulative
No. of Frequency Frequency
Tallies Frequency
children Less than More than
type type
0  2 2 35
1  1 3 33
2  4 7 32
3  13 28
6

4  5 18 22
5  5 23 17
6  3 26 12
7  4 30 9
8  2 32 5
9  3 35 3
TABULATION OF DATA

Compress the data into rows and columns and


relation can be understood.

Tabulation simplifies complex data, facilitate


comparison, gives identify to the data and reveals
pattern
Different
Table number
parts of a table
Title of the table
Caption: Column Heading
Stub : Row heading
Body : Contains data
Head notes: Some thing that is not explained in the title, caption, stubs
can be explained in the head notes on the top of the table below the
title.
Foot notes: Source of data, some exception in the data can be given in
the foot notes.
Table can be classified into 3 ways
Type of table Characteristic Feature
1. Simple table only one characteristic is shown
2. Complex table
a. Two way table shows two characteristics and is
formed when either the stub or the
caption is divided in to two co-
ordinate parts
b. Higher order When three or more characteristic are
table represented in the same table, such a
table is called higher order table
3. General and published by Govt. such as in the
special purpose table statistical Abstract of India, or census
reports are general purpose table
Complex table
Average Number of OPD patients in a PHC in a
rural area in different age group according of sex

OPD Patients
Age in yrs
Male Female Total
Below 25 25 5 30
25-35 30 4 34
35-45 25 5 30
45-55 22 3 25
Above 55 15 1 16
Total 117 18 135
Number of patients in OPDs of Public sector
hospital by Religion, Age, Rank and Sex
Religion Age(in yr.) Rank
Superviso Assistant Clerks Total
r
F M T F M T F M T FMT
Hindu Below 25
25- 35
35 – 45
45 – 55
55 &
above
Muslim Below 25
25- 35
35 – 45
45 – 55
55 &
above
Total
Presentation of data-
Tables , Charts, Graphs
Introduction
The collection and classification of data lead to the
problems of presentation of data.
Exhibition of data in a clear and attractive manner.

Presentation
of data

Table Graphs
Diagram
Tabular presentation of data
“A statistical table is a systematic organization of data in
columns and rows.”
- Neiswanger
“ Tabulation involves the orderly and systematic
presentation numerical data in a form designed to
elucidate the problem under consideration.”
- Prof. L.R. Connor
Objectives of Tabulation
Simple
Brief
Facilitates comparison
Helpful in presentation
Helpful in analysis
Clarifies the chief characteristics of data
Difference between Classification and
Tabulation
Classification and tabulation have done in sequence. First data
are classified and then they are presented in table.
Classifications forms the basis of tabulation
In classification, data are classified into different classes acc to
their similarities and dissimilarities. On the other hand, in
tabulation, the classified data re placed in rows and columns.
Classification is a process of statistical analysis whereas
tabulation is a process of presentation
Classification divides the data into classes and sub classes,
while tabulation presents the data under headings and sub
headings.
Components of a Table
Table number
Title of the table
Head note
Stubs- Title of the rows of a table
Captions- Title of the columns of a table
Body or Field
Footnotes
Source
Types of Tables
Types

Acc to Acc to
purpose construction
Diagrammatic presentation of data
Data may be presented in a simple and attractive manner
in the form of diagrams.

Technique of presenting data in the form of Bar


diagrams, Rectangles, Pie-diagrams, Pictographs and
Cartographs
Utility of diagrammatic presentation

Make data simple and understandable


Remembrance for long period
No need of training or special knowledge
Attractive and effective means of presentation
Facilitates comparisons
Informative and entertaining
Helpful in predictions
Rules for constructing diagrams
Diagrams must be attractive and effective in
communicating the information
It should be neither too big nor too small
Diagram must bear proper headings
Colors may be used to indicate different aspects of a
diagram
One should make use of minimum number of words
or numerical
Must be bordered with bold lines
limitations
Shows only estimate of the actual behavior
Only a limited set of data can be presented in
the form of a diagram
It is a time consuming process
Not very easy to arrive at final conclusions after
seeing the diagrams.
Types of diagrams
Bar diagram
Pie diagrams
Pictograms
Cartograms
1. Bar diagram
Only the length of the bars are taken into account
The width of the bar is adjusted in accordance with
the space available
The gap between one bar and other bar is also kept
constant
Either vertical or horizontal bars are used
The table shown here displays the number of
crimes investigated by law enforcement officers
in U.S. national parks during 1995. Construct a
Bar chart for the data.

Type Number

Homicide 13
Rape 34
Robbery 29
Assault 164
164
15
0
10
0
50 34 29
13
Homicid

Assaul
Rap

Robbe
e

ry

t
e

Total number of crime: 234


2. Pie Graph
A pie graph is a circle that is divided into sections
or wedges according to the percentage of
frequencies in each category of the distribution.
Example
This frequency distribution shows the number of pounds
of each snack food eaten during the 1998 Super Bowl.
Construct a pie graph for the data.

Million
Snack pounds
Potato Chips 11.2
Tortilla Chips 8.2
Pretzels 4.3
Popcorn 3.8
Snack nuts 2.5
We need to find percentages for each category and then
compute the corresponding sectors so that we divide the
circle proportionally.

Million
Snack pounds percentage Degree
Potato Chips 11.2 37.33% ≈134º
Tortilla Chips 8.2 27.33% ≈98º
Pretzels 4.3 14.33% ≈41º
Popcorn 3.8 12.67% ≈46º
Snack nuts 2.5 8.33% ≈30º
Snack nuts
8%
Popcorn
13% Potato Chips
Potato Chips
37% Tortilla Chips
Pretzels
Pretzels
14% Popcorn
Snack nuts
Tortilla Chips
27%
3. PICTOGRAMS-
Used in presenting statistical data
Shows the data in the form of pictures
For eg: Data on cars would be represented by
pictures of cars etc

4. CARTOGRAMS-
Used to give quantitative information on
geographical basis
Show data in the form of maps
For eg: rainfall in different parts of the country,
size of population in diff regions etc.
DATA COLLECTION
PRIMARY & SECONDARY
INTRODUCTION
Data collection is a term used to describe
a process of preparing and collecting data
Systematic gathering of data for
a particular purpose from various sources,
that has been systematically observed,
recorded, organized.
Data are the basic inputs to any decision
making process in business
PURPOSE OF DATA COLLECTION
The purpose of data collection is-
 to obtain information
 to keep on record
 to make decisions
about important issues,
to pass information on
to others
CLASSIFICATION OF DATA

TYPES

PRIMARY SECONDARY
DATA DATA
PRIMARY DATA
The data which are collected from the field under the
control and supervision of an investigator
Primary data means original data that has been collected
 specially for the purpose in mind
This type of data are generally afresh and collected for the
first time
It is useful for current studies as well as for future studies
For example: your own questionnaire.
Primary Research Methods & Techniques
Primary
Research

Quantitative Data Qualitative Data

Surveys Experiments Focus groups


 Personal
interview Individual depth
Mechanical
(intercepts) interviews
 Mail observation
 In-house, self- Human
administered observation
 Telephone, Simulation
fax, e-mail,
Web Case studies
Primary Research Methods & Techniques
Quantitative and Qualitative Information:

Quantitative – based on numbers – 56% of 18 year


olds drink alcohol at least four times a week - doesn’t
tell you why, when, how.

Qualitative – more detail – tells you why, when and


how!
Primary Research Categories
Quantitative Research
Numerical
Statistically reliable
Projectable to a broader population
METHODS
 OBSERVATION METHOD
Through personal observation
 PERSONAL INTERVIEW
Through Questionnaire
 TELEPHONE INTERVIEW
Through Call outcomes, Call
timings
 MAIL SURVEY
Through Mailed Questionnaire
SECONDARY DATA
Data gathered and recorded by someone else prior to and
for a purpose other than the current project
Secondary data is data that has been collected for another
purpose.
It involves less cost, time and effort
Secondary data is data that is being reused. Usually in a
different context.
For example: data from a book.
SOURCES
INTERNAL SOURCES
Internal sources of secondary data are usually for
marketing application-
 Sales Records
Marketing Activity
Cost Information
Distributor reports and feedback
Customer feedback
SOURCES
EXTERNAL SOURCES
External sources of secondary data are usually for
Financial application-
Journals
Books
Magazines
Newspaper
Libraries
The Internet
Advantages & Disadvantages of Primary
Data
Advantages
Targeted Issues are addressed
Data interpretation is better
Efficient Spending for Information
Decency of Data
Proprietary Issues
Addresses Specific Research Issues
Greater Control
Advantages & Disadvantages of Primary
Data
Disadvantages
High Cost
Time Consuming
Inaccurate Feed-backs
More number of resources is required
Advantages & Disadvantages of
Secondary Data
Advantages
Ease of Access
Low Cost to Acquire
Clarification of Research Question
May Answer Research Question
Disadvantages & Disadvantages of
Secondary Data
Disadvantages
Quality of Research
Not Specific to Researcher’s Needs
Incomplete Information
Not Timely
Data Collection Flow
Measures of Central
Tendency: Mean, Mode,
Median
Introduction:

 Measures of central tendency are statistical measures


which describe the position of a distribution.
 They are also called statistics of location, and are the
complement of statistics of dispersion, which provide
information concerning the variance or distribution of
observations.
 In the univariate context, the mean, median and mode are
the most commonly used measures of central tendency.
 computable values on a distribution that discuss the
behavior of the center of a distribution.
Measures of Central Tendency
The value or the figure which represents the whole series is
neither the lowest value in the series nor the highest it lies
somewhere between these two extremes.
1. The average represents all the measurements made on a
group, and gives a concise description of the group as a
whole.
2. When two are more groups are measured, the central
tendency provides the basis of comparison between
them.
Definition
Simpson and Kafka defined it as “ A measure of
central tendency is a typical value around which other
figures congregate”

Waugh has expressed “An average stand for the whole


group of which it forms a part yet represents the whole”.
1. Arithmetic Mean
Arithmetic mean is a mathematical average
and it is the most popular measures of
central tendency. It is frequently referred to as
‘mean’ it is obtained by dividing sum of the
values of all observations in a series (ƩX) by
the number of items (N) constituting the series.
Thus, mean of a set of numbers X1, X2, X3,
………..Xn denoted by x̅ and is defined as
Arithmetic Mean Calculated Methods :

Direct Method :

Short cut method :

Step deviation Method :


Example : Calculated the Arithmetic Mean DIRC Monthly Users Statistics in the
University Library

Month No. of Total Users Average


Working Users per
Days month
Sep-2011 24 11618 484.08
Oct-2011 21 8857 421.76
Nov-2011 23 11459 498.22
Dec-2011 25 8841 353.64
Jan-2012 24 5478 228.25
Feb-2012 23 10811 470.04
Total 140 57064
= 407.6
Advantages of Mean:
 It is easy to understand & simple calculate.
 It is based on all the values.
 It is rigidly defined .
 It is easy to understand the arithmetic
average even if some of the details of the
data are lacking.
 It is not based on the position in the series.
Disadvantages of Mean:

 It is affected by extreme values.


 It cannot be calculated for open end
classes.
 It cannot be located graphically
 It gives misleading conclusions.
 It has upward bias.
Median

Median is a central value of the distribution, or the


value which divides the distribution in equal parts, each
part containing equal number of items. Thus it is the central
value of the variable, when the values are arranged in order
of magnitude.
Connor has defined as “ The median is that value of the
variable which divides the group into two equal parts, one
part comprising of all values greater, and the other, all
values less than median”
Calculation of Median –Discrete series :

i. Arrange the data in ascending or descending order.

ii. Calculate the cumulative frequencies.

iii. Apply the formula.


Calculation of median – Continuous series

For calculation of median in a continuous


frequency distribution the following formula
will be employed. Algebraically,
Example: Median of a set Grouped Data in a
Distribution of Respondents by age

Age Group Frequency of Cumulative


Median class(f) frequencies(cf)
0-20 15 15
20-40 32 47
40-60 54 101
60-80 30 131
80-100 19 150
Total 150
Median
(M)=40+
40
=+

=
40+0.52X
20
=
40+10.37
= 50.37
Advantages of Median:

 Median can be calculated in all distributions.


 Median can be understood even by common people.
 Median can be ascertained even with the extreme items.
 It can be located graphically
 It is most useful dealing with qualitative data
Disadvantages of Median:
 It is not based on all the values.

 It is not capable of further mathematical


treatment.
 It is affected fluctuation of sampling.

 In case of even no. of values it may not the


value from the data.
Mode
 Mode is the most frequent value or score

in the distribution.
 It is defined as that value of the item in

a series.
 It is denoted by the capital letter Z.
 highest point of the frequencies

distribution curve.
Croxton and Cowden : defined it as “the mode of a
distribution is the value at the point armed with the item
tend to most heavily concentrated. It may be regarded as the
most typical of a series of value”

The exact value of mode can be obtained by the following


formula.

Z=L1
+
Example: Calculate Mode for the distribution of
monthly rent Paid by Libraries in Karnataka

Monthly rent (Rs) Number of Libraries (f)


500-1000 5
1000-1500 10
1500-2000 8
2000-2500 16
2500-3000 14
3000 & Above 12
Total 65
Z=200
0+

Z =2000+

Z=2000+0.8 ×500=400
Z=24
Advantages of Mode :
• Mode is readily comprehensible and easily
calculated
• It is the best representative of data
• It is not at all affected by extreme value.
• The value of mode can also be determined
graphically.
• It is usually an actual value of an important
part of the series.
Disadvantages of Mode :
 It is not based on all observations.
 It is not capable of further mathematical
manipulation.
 Mode is affected to a great extent by
sampling fluctuations.
 Choice of grouping has great influence
on the value of mode.
Conclusion
•   A measure of central tendency is a measure that
tells us where the middle of a bunch of data lies.

• Mean is the most common measure of central

tendency. It is simply the sum of the numbers


divided by the number of numbers in a set of
data. This is also known as average.
Conclusion
• Median is the number present in the middle when the
numbers in a set of data are arranged in ascending or
descending order. If the number of numbers in a data
set is even, then the median is the mean of the two
middle numbers.
• Mode is the value that occurs most frequently in a set
of data.
Measures of Dispersion
In This Presentation
Measures of dispersion.
You will learn
Basic Concepts
How to compute and interpret the Range (R) and the
standard deviation (s)
The Concept of Dispersion
Dispersion = variety, diversity, amount of variation
between scores.
The greater the dispersion of a variable, the greater the
range of scores and the greater the differences between
scores.
The Concept of Dispersion: Examples
Typically, a large city will have more diversity than a
small town.
Some states (California, New York) are more racially
diverse than others (Maine, Iowa).
Some students are more consistent than others.
The Concept of Dispersion: Interval/ratio
variables
The taller curve has less dispersion.
The flatter curve has more dispersion.
The Range
Range (R) = High Score – Low Score
Quick and easy indication of variability.
Can be used with ordinal or interval-ratio variables.
Why can’t the range be used with variables measured
at the nominal level?
Standard Deviation
The most important and widely used measure of
dispersion.
Should be used with interval-ratio variables but is
often used with ordinal-level variables.
Standard Deviation
Formulas for variance and standard deviation:
Standard Deviation
To solve:
Subtract mean from each score in a distribution of
scores
Square the deviations (this eliminates negative
numbers).
Sum the squared deviations.
Divide the sum of the squared deviations by N: this is
the Variance
Find the square root of the result.
Interpreting Dispersion
Low score=0, Mode=12, High score=20
Measures of dispersion: R=20–0=20, s=2.9

Years of Education (Full Sample)

900
800
700
600
500
400
300
200
100
0
Interpreting Dispersion
What would happen to the dispersion of this variable
if we focused only on people with college-educated
parents?
We would expect people with highly educated parents
to average more education and show less dispersion.
Interpreting Dispersion
 Low score=10, Mode=16, High Score=20
 Measures of dispersion: R=20-10=10, s=2.2
Years of Education (Both Parents w BA)

45

40

35

30

25

20

15

10

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Interpreting Dispersion
Entire sample:
Mean = 13.3
Range = 20
s = 2.9
Respondents with college-educated parents:
Mean = 16.0
R = 10
s =2.2
Interpreting Dispersion
As expected, the smaller, more homogeneous and
privileged group:
Averaged more years of education
 (16.0 vs. 13.3)
And was less variable
 (s = 2.2 vs. 2.9; R = 10 vs. 20)
Measures of Dispersion
Higher for more diverse groups (e.g., large samples,
populations).
Decrease as diversity or variety decreases (are lower
for more homogeneous groups and smaller samples).
The lowest value possible for R and s is 0 (no
dispersion).
CORRELATION
Correlation
key concepts:
Types of correlation
Methods of studying correlation
a) Scatter diagram
b) Karl pearson’s coefficient of correlation
c) Spearman’s Rank correlation coefficient
d) Method of least squares
Correlation
Correlation: The degree of relationship between the
variables under consideration is measure through the
correlation analysis.
The measure of correlation called the correlation
coefficient .
The degree of relationship is expressed by coefficient
which range from correlation ( -1 ≤ r ≥ +1)
The direction of change is indicated by a sign.
The correlation analysis enable us to have an idea about
the degree & direction of the relationship between the two
variables under study.
Correlation
Correlation is a statistical tool that
helps to measure and analyze the
degree of relationship between two
variables.
Correlation analysis deals with the
association between two or more
variables.
Correlation & Causation
Causation means cause & effect relation.
 Correlation denotes the interdependency among the
variables for correlating two phenomenon, it is
essential that the two phenomenon should have cause-
effect relationship,& if such relationship does not exist
then the two phenomenon can not be correlated.
If two variables vary in such a way that movement in
one are accompanied by movement in other, these
variables are called cause and effect relationship.
Causation always implies correlation but correlation
does not necessarily implies causation.
Types of Correlation
Type I

Correlation

Positive Correlation Negative Correlation


Types of Correlation Type I
Positive Correlation: The correlation is said to be
positive correlation if the values of two variables
changing with same direction.
Ex. Pub. Exp. & sales, Height & weight.
Negative Correlation: The correlation is said to be
negative correlation when the values of variables change
with opposite direction.
Ex. Price & qty. demanded.
Direction of the Correlation
Positive relationship – Variables change in the same
direction.
 As X is increasing, Y is increasing Indicated by
 As X is decreasing, Y is decreasing
sign; (+) or (-).
E.g., As height increases, so does weight.
Negative relationship – Variables change in opposite
directions.
 As X is increasing, Y is decreasing
 As X is decreasing, Y is increasing

E.g., As TV time increases, grades decrease


More examples
Positive relationships Negative relationships:
water consumption alcohol consumption
and temperature. and driving ability.
study time and Price & quantity
grades. demanded
Types of Correlation
Type II

Correlation

Simple Multiple

Partial Total
Types of Correlation Type II
Simple correlation: Under simple correlation problem
there are only two variables are studied.
Multiple Correlation: Under Multiple Correlation three
or more than three variables are studied. Ex. Qd = f ( P,PC,
PS, t, y )
Partial correlation: analysis recognizes more than two
variables but considers only two variables keeping the
other constant.
Total correlation: is based on all the relevant variables,
which is normally not feasible.
Types of Correlation
Type III

Correlation

LINEAR NON LINEAR


Types of Correlation Type III
Linear correlation: Correlation is said to be linear
when the amount of change in one variable tends to
bear a constant ratio to the amount of change in the
other. The graph of the variables having a linear
relationship will form a straight line.
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
Non Linear correlation: The correlation would be
non linear if the amount of change in one variable
does not bear a constant ratio to the amount of
change in the other variable.
Methods of Studying Correlation
Scatter Diagram Method
Graphic Method
Karl Pearson’s Coefficient of
Correlation
Method of Least Squares
Scatter Diagram Method

Scatter Diagram is a graph of observed


plotted points where each points
represents the values of X & Y as a
coordinate. It portrays the relationship
between these two variables graphically.
A perfect positive correlation

Weight
Weight
of B
Weight A linear
relationship
of A

Height
Height Height
of A of B
High Degree of positive correlation
Positive relationship

r = +.80

Weigh
t

Height
Degree of correlation
Moderate Positive Correlation

r = + 0.4
Sho
e
Size

Weight
Degree of correlation
Perfect Negative Correlation

r = -1.0
TV
watchin
g per
week

Exam score
Degree of correlation
Moderate Negative Correlation

r = -.80
TV
watchin
g per
week

Exam score
Degree of correlation
Weak negative Correlation

Sho
r = - 0.2
e
Size

Weight
Degree of correlation
No Correlation (horizontal line)

r = 0.0
IQ

Height
Degree of correlation (r)
r = +.80 r = +.60

r = +.40 r = +.20
2) Direction of the Relationship
Positive relationship – Variables change in the same
direction.
 As X is increasing, Y is increasing Indicated by
 As X is decreasing, Y is decreasing
sign; (+) or (-).
E.g., As height increases, so does weight.
Negative relationship – Variables change in opposite
directions.
 As X is increasing, Y is decreasing
 As X is decreasing, Y is increasing

E.g., As TV time increases, grades decrease


Advantages of Scatter Diagram
Simple & Non Mathematical
method
Not influenced by the size of
extreme item
First step in investing the
relationship between two variables
Disadvantage of scatter diagram

Can not adopt the an exact degree of


correlation
Karl Pearson's
Coefficient of Correlation
Pearson’s ‘r’ is the most common
correlation coefficient.
Karl Pearson’s Coefficient of
Correlation denoted by- ‘r’ The
coefficient of correlation ‘r’ measure the
degree of linear relationship between
two variables say x & y.
Karl Pearson's
Coefficient of Correlation
 Karl Pearson’s Coefficient of
Correlation denoted by- r
-1 ≤ r ≥ +1
 Degree of Correlation is
expressed by a value of
Coefficient
 Direction of change is Indicated
by sign
Karl Pearson's
Coefficient of Correlation
 When deviation taken from actual
mean: r(x, y)= Σxy /√ Σx² Σy²
When deviation taken from an
assumed mean:
r= N Σdxdy - Σdx Σdy
√N Σdx²- √N Σdy²-(Σdy)²
(Σdx)²
Procedure for computing the
correlation coefficient
Calculate the mean of the two series ‘x’ &’y’
Calculate the deviations ‘x’ &’y’ in two series from their
respective mean.
Square each deviation of ‘x’ &’y’ then obtain the sum of
the squared deviation i.e.∑x2 & .∑y2
Multiply each deviation under x with each deviation
under y & obtain the product of ‘xy’.Then obtain the
sum of the product of x , y i.e. ∑xy
Substitute the value in the formula.
Interpretation of Correlation
Coefficient (r)
The value of correlation coefficient ‘r’ ranges from -1 to +1
If r = +1, then the correlation between the two variables is
said to be perfect and positive
If r = -1, then the correlation between the two variables is
said to be perfect and negative
If r = 0, then there exists no correlation between the
variables
Properties of Correlation coefficient
The correlation coefficient lies between -1 & +1
symbolically ( - 1≤ r ≥ 1 )
 The correlation coefficient is independent of the
change of origin & scale.
The coefficient of correlation is the geometric mean
of two regression coefficient.
r = √ bxy * byx
The one regression coefficient is (+ve) other regression
coefficient is also (+ve) correlation coefficient is (+ve)
Assumptions of Pearson’s
Correlation Coefficient
There is linear relationship between
two variables, i.e. when the two
variables are plotted on a scatter
diagram a straight line will be formed by
the points.
Cause and effect relation exists
between different forces operating on
the item of the two variable series.
Advantages of Pearson’s Coefficient

It summarizes in one value, the


degree of correlation & direction
of correlation also.
Limitation
of Pearson’s Coefficient

Always assume linear relationship


Interpreting the value of r is difficult.
Value of Correlation Coefficient is
affected by the extreme values.
Time consuming methods
Coefficient of Determination
The convenient way of interpreting the value of
correlation coefficient is to use of square of coefficient of
correlation which is called Coefficient of Determination.
The Coefficient of Determination = r2.
Suppose: r = 0.9, r2 = 0.81 this would mean that 81% of
the variation in the dependent variable has been
explained by the independent variable.
Coefficient of Determination
The maximum value of r2 is 1 because it is possible to
explain all of the variation in y but it is not possible to
explain more than all of it.
Coefficient of Determination =
Explained variation / Total variation
Coefficient of Determination: An example
Suppose: r = 0.60
r = 0.30 It does not mean that the first
correlation is twice as strong as the second the ‘r’ can be
understood by computing the value of r2 .
When r = 0.60 r2 = 0.36 -----(1)
r = 0.30 r2 = 0.09 -----(2)
This implies that in the first case 36% of the total variation
is explained whereas in second case 9% of the total
variation is explained .
Spearman’s Rank Coefficient of
Correlation
When statistical series in which the variables under study
are not capable of quantitative measurement but can be
arranged in serial order, in such situation pearson’s
correlation coefficient can not be used in such case
Spearman Rank correlation can be used.
R = 1- (6 ∑D2 ) / N (N2 – 1)
 R = Rank correlation coefficient
 D = Difference of rank between paired item in two series.
 N = Total number of observation.
Interpretation of Rank
Correlation Coefficient (R)
The value of rank correlation coefficient, R ranges from
-1 to +1
If R = +1, then there is complete agreement in the order
of the ranks and the ranks are in the same direction
If R = -1, then there is complete agreement in the order
of the ranks and the ranks are in the opposite direction
If R = 0, then there is no correlation
Rank Correlation Coefficient (R)
a) Problems where actual rank are given.
1) Calculate the difference ‘D’ of two Ranks i.e. (R1 –
R2).
2) Square the difference & calculate the sum of the
difference i.e. ∑D2
3) Substitute the values obtained in the formula.
Rank Correlation Coefficient
b) Problems where Ranks are not given :If the ranks
are not given, then we need to assign ranks to the data
series. The lowest value in the series can be assigned
rank 1 or the highest value in the series can be assigned
rank 1. We need to follow the same scheme of ranking
for the other series.
Then calculate the rank correlation coefficient in similar
way as we do when the ranks are given.
Rank Correlation Coefficient (R)
Equal Ranks or tie in Ranks: In such cases average ranks
should be assigned to each individual. R = 1- (6 ∑D2 ) +
AF / N (N2 – 1)

AF = 1/12(m13 – m1) + 1/12(m23 – m2) +…. 1/12(m23 – m2)


m = The number of time an item is repeated
Merits Spearman’s Rank Correlation
This method is simpler to understand and easier to
apply compared to karl pearson’s correlation method.
This method is useful where we can give the ranks and
not the actual data. (qualitative term)
This method is to use where the initial data in the form
of ranks.
Limitation Spearman’s Correlation
Cannot be used for finding out correlation in a grouped
frequency distribution.
This method should be applied where N exceeds 30.
Advantages of Correlation studies
Show the amount (strength) of relationship present
Can be used to make predictions about the variables
under study.
Can be used in many places, including natural settings,
libraries, etc.
Easier to collect co relational data
References
Black, K. (2008). Business statistics for contemporary
decision making, New Delhi: Wiley India.
Spiegel, M. R., Schiller, J., & Srinivasan, R. A. ,
Probability and statistics New Delhi: Tata McGraw Hill.
Gupta, S. P., & Gupta, M. P, . Business statistics. Delhi:
Sultan Chand & Sons.
Levin, R. I., & Rubin, D. S. (1999). Statistics for
management, New Delhi: Prentice Hall of India.
Webster, A. (2006). Applied statistics for business and
economics,. New Delhi: McGraw Hill.
169
Assessment Pattern

170
THANK YOU

171

You might also like