Chapter 2756

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

COMPUTER ORIENTED

STATISTICAL TECHNIQUES
(As per the New Syllabus of Mumbai University for B.Sc. (Information Technology),
Semester IV, 2017-18)

Dr. Dinesh Gabhane


Ph.D. (Mgmt.), M.Phil. (Commerce), MBA (Mktg. & HR), UGC-NET (Mgmt.), B.E. (Production)
Associate Professor, Rajeev Gandhi College of Management Studies,
Navi Mumbai.

Ms. Madhuri S. Bankar


M.Sc. (C/S), MCA, PGDCS&A
Head, Department of Information Technology,
K.B.P College,
Vashi, Navi Mumbai.

ISO 9001:2008 CERTIFIED


© Authors
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording and/or otherwise without the prior written permission of the
authors and the publisher.

First Edition : 2018

Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,
“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.
Phone: 022-23860170, 23863863; Fax: 022-23877178
E-mail: [email protected]; Website: www.himpub.com

Branch Offices :

New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj, New Delhi - 110 002.
Phone: 011-23270392, 23278631; Fax: 011-23256286

Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.
Phone: 0712-2738731, 3296733; Telefax: 0712-2721216

Bengaluru : Plot No. 91-33, 2nd Main Road, Seshadripuram, Behind Nataraja Theatre,
Bengaluru - 560 020. Phone: 080-41138821; Mobile: 09379847017, 09379847005

Hyderabad : No. 3-4-184, Lingampally, Besides Raghavendra Swamy Matham, Kachiguda,


Hyderabad - 500 027. Phone: 040-27560041, 27550139

Chennai : New No. 48/2, Old No. 28/2, Ground Floor, Sarangapani Street, T. Nagar,
Chennai - 600 012. Mobile: 09380460419

Pune : “Laksha” Apartment, First Floor, No. 527, Mehunpura, Shaniwarpeth (Near Prabhat Theatre),
Pune - 411 030. Phone: 020-24496323, 24496333; Mobile: 09370579333

Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,
Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549

Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,
Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847

Ernakulam : 39/176 (New No. 60/251), 1st Floor, Karikkamuri Road, Ernakulam, Kochi - 682 011.
Phone: 0484-2378012, 2378016; Mobile: 09387122121

Bhubaneswar : Plot No. 214/1342, Budheswari Colony, Behind Durga Mandap, Bhubaneswar - 751 006.
Phone: 0674-2575129; Mobile: 09338746007

Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank, Kolkata - 700 010.
Phone: 033-32449649; Mobile: 07439040301

DTP by : Bhakti S. Gaonkar

Printed at : M/s. Aditya Offset Process (I) Pvt. Ltd., Hyderabad. On behalf of HPH.
Dedication

I would like to dedicate this book to my mother and father.


I would like to thank my wife and son for continuous support.
I would like to extend my gratitude to all my friends and colleagues for
encouraging me in writing this book.
Dr. Dinesh Gabhane

I would like to dedicate this book to my family for allowing me the time to
write it.
I would like to thank my husband Dr. Dinesh Gabhane for standing beside
me throughout my work. He is a source of inspiration and motivation for
continuing to improve my knowledge and move my career forward.
Special thanks go to my dear son Vedant who gave me energy.
Last but not least my college principal Dr. V.S. Shivankar, all my friends and
colleagues who encouraged me all time.
Ms. Madhuri S. Bankar
Preface

It gives us immense pleasure to present the first edition of book “Computer Oriented
Statistical Techniques” to the teachers and students of Semester-IV of S.Y.B.Sc. (Information
Technology). This book has been written as per the syllabus prescribed by University of
Mumbai with effect from academic year 2017-18.
The whole syllabus is divided in to V units and XIII chapters. In each chapter, the concept
and theory is followed by sufficient number of solved examples. We have tried our level best
to present the subject matter in simple language for better understanding of the readers. We
hope that this edition will meet the requirements of the students of S.Y.B.Sc. (IT) in their
examination preparation.
Constructive suggestions and comments from the readers will be sincerely appreciated.
We would be glad to hear from you, if you would like to suggest improvements or to
contribute in any way. Kindly send your correspondence to [email protected] or
[email protected].
Finally, we would like to acknowledge our sincere respect and gratitude to Kiran Gurbani
(Head of Computer Science and IT Department, R.K. Talreja College, Ulhasnagar) for reviewing
this book thoroughly and providing an environment which stimulates new thinking and
innovations and her support which helped us for bringing out this book in time.
We are thankful to Mr. S.K. Srivastava for giving us an opportunity and encouragement to
write this book. We also extend our thanks to the staff of Himalaya Publishing House Pvt. Ltd.
for assisting us in proof reading and compilation of this book.

Dr. Dinesh Gabhane

Ms. Madhuri Bankar


Syllabus

Computer Oriented Statistical Techniques

Sr. No. Modules/Units Lectures


The Mean, Median, Mode and Other Measures of Central Tendency:
Index, or Subscript, Notation, Summation Notation, Averages, or Measures
of Central Tendency, The Arithmetic Mean, The Weighted Arithmetic
Mean, Properties of the Arithmetic Mean, The Arithmetic Mean Computed
from Grouped Data, The Median, The Mode, The Empirical Relation
between the Mean, Median, and Mode, The Geometric Mean G, The
Harmonic Mean H, The Relation between the Arithmetic, Geometric and
Harmonic Means, The Root Mean Square, Quartiles, Deciles, and
Percentiles, Software and Measures of Central Tendency.
Unit I The Standard Deviation and Other Measures of Dispersion: Dispersion 12
or Variation, The Range, The Mean Deviation, The Semi-Interquartile
Range, The 10-90 Percentile Range, The Standard Deviation, The
Variance, Short Methods for Computing the Standard Deviation, Properties
of the Standard Deviation, Charlie’s Check, Sheppard’s Correction for
Variance, Empirical Relations between Measures of Dispersion, Absolute
and Relative Dispersion; Coefficient of Variation, Standardized Variable;
Standard Scores, Software and Measures of Dispersion.
Introduction to R: Basic Syntax, Data Types, Variables, Operators,
Control Statements, R-functions, R-vectors, R-lists, R-arrays.
Moments, Skewness and Kurtosis: Moments, Moments for Grouped
Data, Relations Between Moments, Computation of Moments for Grouped
Data, Charlie’s Check and Sheppard’s Corrections, Moments in
Dimensionless Form, Skewness, Kurtosis, Population Moments, Skewness,
and Kurtosis, Software Computation of Skewness and Kurtosis.
Elementary Probability Theory: Definitions of Probability, Conditional
Probability; Independent and Dependent Events, Mutually Exclusive
Events, Probability Distributions, Mathematical Expectation, Relation
Unit II 12
between Population, Sample Mean, and Variance, Combinatorial Analysis,
Combinations, Stirling’s Approximation to n!, Relation of Probability to
Point Set Theory, Euler or Venn Diagrams and Probability.
Elementary Sampling Theory: Sampling Theory, Random Samples and
Random Numbers, Sampling With and Without Replacement, Sampling
Distributions, Sampling Distribution of Means, Sampling Distribution of
Proportions, Sampling Distributions of Differences and Sums, Standard
Errors, Software Demonstration of Elementary Sampling Theory.
Statistical Estimation Theory: Estimation of Parameters, Unbiased
Estimates, Efficient Estimates, Point Estimates and Interval Estimates;
Their Reliability, Confidence-Interval Estimates of Population Parameters,
Probable Error.
Statistical Decision Theory: Statistical Decisions, Statistical Hypotheses,
Tests of Hypotheses and Significance, or Decision Rules, Type I and Type
Unit III II Errors, Level of Significance, Tests Involving Normal Distributions, 12
Two-tailed and One-tailed Tests, Special Tests, Operating-Characteristic
Curves; the Power of a Test, p-Values for Hypotheses Tests, Control
Charts, Tests Involving Sample Differences, Tests Involving Binomial
Distributions.
Statistics in R: Mean, Median, Mode, Normal Distribution, Binomial
Distribution, Frequency Distribution in R.
Small Sampling Theory: Small Samples, Student’s t Distribution,
Confidence Intervals, Tests of Hypotheses and Significance, The Chi-
Square Distribution, Confidence Intervals for Sigma, Degrees of Freedom,
The F Distribution.
Unit IV The Chi-Square Test: Observed and Theoretical Frequencies, Definition 12
of Chi-Square, Significance Tests, The Chi-Square Test for Goodness of
Fit, Contingency Tables, Yates’ Correction for Continuity, Simple
Formulas for Computing Chi-Square, Coefficient of Contingency,
Correlation of Attributes, Additive Property of Chi-Square.
Curve Fitting and the Method of Least Squares: Relationship between
Variables, Curve Fitting, Equations of Approximating Curves, Freehand
Method of Curve Fitting, The Straight Line, The Method of Least Squares,
The Least Squares Line, Non-linear Relationships, The Least Squares
Parabola, Regression, Applications to Time Series, Problems Involving
More than Two Variables.
Correlation Theory: Correlation and Regression, Linear Correlation,
Unit V 12
Measures of Correlation, The Least Squares Regression Lines, Standard
Error of Estimate, Explained and Unexplained Variation, Coefficient of
Correlation, Remarks Concerning the Correlation Coefficient, Product
Moment Formula for the Linear Correlation Coefficient, Short
Computational Formulas, Regression Lines and the Linear Correlation
Coefficient, Correlation of Time Series, Correlation of Attributes,
Sampling Theory of Correlation, Sampling Theory of Regression.
List of Practicals

1. Using R, execute the basic commands, array, list and frames.


2. Create a matrix using R and perform the operations: addition, inverse, transpose and
multiplication operations.
3. Using R, execute the statistical functions: mean, median, mode, quartiles, range and inter-
quartile range histogram.
4. Using R, import the data from Excel / .CSV file and perform the above functions.
5. Using R, import the data from Excel / .CSV file and calculate the standard deviation,
variance and co-variance.
6. Using R, import the data from Excel / .CSV file and draw the skewness.
7. Import the data from Excel / .CSV and perform the hypothetical testing.
8. Import the data from Excel / .CSV and perform the Chi-Square Test.
9. Using R, perform the binomial and normal distribution on the data.
10. Perform the Linear Regression using R.
11. Compute the Least Squares means using R.
12. Compute the Linear Least Square Regression.
Paper Pattern

Evaluation Scheme
1. Internal Evaluation: 25 Marks
(i) Test: 1 Class test of 20 marks.
Attempt any four of the following: (20)
(a)
(b)
(c)
(d)
(e)
(f)
(ii) 5 marks: Active participation in the class, overall conduct, attendance.
2. External Examination: 75 Marks
All questions are compulsory
(i) (Based on Unit 1) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
(ii) (Based on Unit 2) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
(iii) (Based on Unit 3) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
(iv) (Based on Unit 4) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
(v) (Based on Unit 5) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
3. Practical Exam: 50 marks
Certified copy journal is essential to appear for the practical examination.
1. Practical Question 1 (20)
2. Practical Question 2 (20)
3. Journal (5)
4. Viva Voce (5)
Contents

UNIT I
1. The Mean, Median, Mode and Other Measures of Central Tendency 1 - 36
2. The Standard Deviation and Other Measures of Dispersion 37 - 57
3. Introduction to R 58 - 90
UNIT II
4. Moments, Skewness and Kurtosis 91 - 111
5. Elementary Probability Theory 112 - 125
6. Elementary Sampling Theory 126 - 137
UNIT III
7. Statistical Estimation Theory 138 - 145
8. Statistical Decision Theory 146 - 163
9. Statistics in R 164 - 174
UNIT IV
10. Small Sampling Theory 175 - 190
11. The Chi-Square Test 191 - 205
UNIT V
12. Curve Fitting and the Method of Least Squares 206 - 220
13. Correlation Theory 221 - 243
Unit I
CHAPTER 1 The Mean, Median,
Mode and Other
Measures of Central
Tendency
Structure
1.1 Index or Subscript, Notation
1.2 Summation Notation
1.3 Averages or Measures of Central Tendency
1.4 Arithmetic Mean
1.5 The Weighted Arithmetic Mean
1.6 Properties of the Arithmetic Mean
1.7 The Arithmetic mean Computed from Grouped Data
1.8 The Median
1.9 The Mode
1.10 The Empirical Relation between the Mean, Median and Mode
1.11 The Geometric Mean (G.M.)
1.12 The Harmonic Mean (H.M.)
1.13 The Relation Between Arithmetic, Geometric and Harmonic Means
1.14 The Root mean Square
1.15 Quartiles, Deciles and Percentiles
1.16 Software and Measures of Central Tendency
 Solved Examples
 Practice Examples

1.1 INDEX OR SUBSCRIPT, NOTATION


Let the symbol Xj (read ‘‘X sub j’’) denote any of the N values X1, X2, X3, ... , XN assumed by a
variable X. The letter j in Xj, which can stand for any of the numbers 1, 2, 3, ... , N is called a subscript,
or index. Clearly any letter other than j, such as i, k, p, q, or s, could have been used as well.
2 Compute
er Oriented Sttatistical Tech
hniques

1.2 SUMMATTION NOTATTION


T symbol ∑
The is useed to denote the
t sum of all the Xj’s from
m j = 1 to j = N; by definittion,
∑ = X1 + X2 + X3 +…+ XN
The syymbol  is th
he Greek cap
pital letter siggma, denotin
ng sum.

1.3 AVERAGEES OR MEASSURES OF CENTRAL


E TENDENCY
A
According to Simpson
S andd Kafka, A meeasure of cen ntral tendencyy is a typical value around
d which
other figures conggregate or whhich divides their numbeer in half. Thhus an averaage can be used u to
describbe or represent a whole series of figgures involv ving magnituddes of the ssame variablee. That
averagge is an overaall single valuue which reprresents the serries.

Measure
es of
Centra
al
Tenden
ncy

Mathematical Positional
Avverages Averages

Arithme
etic Ge
eometric Harmonic
Median M
Mode
Mean(AA.M) Me
ean(G.M) Mean(H.M)

Quartile
es,
Deciles and
a
Percentiiles

Measures of central tendenncy permits uss to compare individual items in the grroup with it and
M a also
permitts us to compare different series of figuures with regaard to their ceentral tendenccies.
A
Averages are derived
d figurees and not thee original datta.

Mean
n (Averagee)
 Accordingg to Clark, “A
An average iss a figure thatt represents thhe whole grouup.”
 A. E. Waaugh defines, “An averagge is a singlle value seleccted from a group of vaalues to
represent them in some way.”
 Accordingg to Croxtonn and Cowdeen, “An averaage is a singlle value withhin the rangee of the
data that is used to reepresent all thet values in n the series. Since an aveerage is somewhere
within thee range of thee data it is som
metimes calleed a measure of central vaalue.”
 Crum andd Smith say,, “An averagge is sometim mes called a ‘measure oof central ten
ndency’
because inndividual vallues of the vaariable usually
y cluster arouund it.”
The Mean, Median, Mode and Other Measures of Central Tendency 3

Characteristics of a Good Average


An average should be:
1. Rigorously defined,
2. Easy to compute,
3. Capable of simple interpretation,
4. Dependent on all the observed values,
5. Not unduly influenced by one or two extremely large or small values,
6. Should fluctuate relatively little from one random sample or small values,
7. Be capable of mathematical manipulation.

1.4 THE ARITHMETIC MEAN


An arithmetic mean is a measure of central tendency and is popularly known as mean. Arithmetic
mean is obtained by dividing the sum of the values of all items of a series by the number of items of
that series. Normally, arithmetic mean is denoted by which is read as ‘X bar’. It can be computed for
unclassified or ungrouped data or individual series as well as classified or grouped data or discrete or
continuous series.

Practical Steps Involved in the Computation of Arithmetic Mean for


Unclassified Data
Step 1  Treat the given values of variables as X.
Step 2  Enter the given values in a column headed as X.
Step 3  Add together all the values of variable X and obtain the total i.e., ∑X.

Step 4  Apply the following formula: =

where, = Arithmetic Mean


∑X = Sum of all values of variables X
N = Number of individual observation

1.5 THE WEIGHTED ARITHMETIC MEAN


While calculating arithmetic mean, as discussed earlier, equal importance (or weight) is given to
each observation in the data set. However, there are situations in which values of individual
observations in the data set are not of equal importance. If such values occur with different
frequencies, then computing A.M. of values (as opposed to the A.M. of observations) may not be true
representative of the data set characteristic and thus may be misleading. Under these circumstances,
we may attach to each observation value a ‘weight’ w1, w2… wn as an indicator of their importance
within the data set and compute a weighed mean or average denoted by w as follows:

w=

4 Computer Oriented Statistical Techniques

Note: The weighted arithmetic mean should be used


1. when the importance of all the numerical values in the given data set is not equal;
2. when the frequencies of various classes are widely varying;
3. where there is a change either in the proportion of numerical values or in the proportion of their
frequencies;
4. when ratios, percentages orates are being averaged.

1.6 PROPERTIES OF THE ARITHMETIC MEAN


1. The algebraic sum of the deviations of a set of numbers from their arithmetic mean is zero.
2. The sum of squares of deviations of observations is minimum when taken from their
arithmetic mean.
3. Arithmetic mean is capable of treated algebraically.
4. If and N1 are the mean and number of observations of a series and and N2 are the
corresponding magnitudes of another series, then the mean of the combined series of N1 +
N2 observations is given by
+
=
+
5. If a constant B is added (subtracted) from every observation, the mean of these observations
also gets added (subtracted) by it.
6. If every observation is multiplied (divided) by a constant b, the mean of these observations
also gets multiplied (divided) by it.
7. If some observations of a series are replaced by some other observations, then the mean of
original observations will change by the average change in magnitude of the changed
observations.

1.7 THE ARITHMETIC MEAN COMPUTED FROM GROUPED DATA


Practical Steps Involved in the Computation of Arithmetic Mean for Discrete
Series
Step 1  Treat the given values of variables as X and frequencies as f.
Step 2  Enter the given values of variable X in a column headed as X.
Step 3  Enter the given frequencies f in a column headed as f and obtain the sum of these frequencies
i.e. N of ∑f.
Step 4  Multiply the variable of each row with the respective frequency and denote these products by
fX and enter the same in a column headed as fX.
Step 5  Obtain the sum of these products i.e. ∑fX.

Step 6  Apply the following formula: =

where, = Arithmetic Mean


The Mean, Median, Mode and Other Measures of Central Tendency 5

∑ = Sum of products of frequency and value of variables X


N = ∑f =Sum of frequencies

Practical Steps Involved in the Computation of Arithmetic Mean for


Continuous Series
Step 1 Enter the class intervals in the first column.
Step 2 Calculate the mid-point of each class, denote these mid-points as m and enter the
same in a column headed as m.
Note:Mid-point (m) =

Step 3 Enter the given frequencies f in a column headed as f and obtain the sum of these
frequencies i.e. N of ∑f.
Step 4 Multiply the mid-point of each row with the respective frequency and denote these
products by fm and enter the same in a column headed as fm.
Step 5 Obtain the sum of these products i.e. ∑fm.

Step 6 Apply the following formula: =

Where, = Arithmetic Mean


∑ = Sum of products of mid-points and frequency
N = ∑f = Sum of frequencies

1.8 THE MEDIAN


Median is the central value of the variable that divide the series into two equal parts in such a way
that half of the items lie above this value and the remaining half lie below this value. Median is called a
positional average because it is based on the position of a given observation in a series arranged in an
ascending or descending order and the position of the median is such that an equal number of items lie
on either side of it. Median is usually denoted by ‘Med’ or ‘Md’. Median can be computed for both
ungrouped data (and individual series) and grouped data (or Discrete/Continuous Series).

Computation of Median for Individual Series


Step 1 Arrange the size of observation in ascending or descending order.
Step 2 Ascertain th observation.

Step 3 Calculate Median as follows:


(a) In case th observation works out to be a whole number.

Median = size or value of th observation in the data array

(b) In case th observation works out to be in fractions,

Median = size or value of full item + 50% of the difference between size of immediate
next item and size of full item.
6 Computer Oriented Statistical Techniques

Computation of Median for Discrete Series


Step 1  Arrange the size of observation in ascending or descending order.
Step 2  Calculate Cumulative Frequencies (c.f.)
Step 3  Ascertain th observation.

Step 4  Ascertain the Cumulative Frequency which includes th observation

Step 5  Calculate Median as follows:


Median = size or value of the observation corresponding to the cumulative frequency which
includes th observation

Computation of Median for Grouped Data or Continuous Series


Step 1  Calculate Cumulative Frequencies (c.f.)
Step 3  Ascertain th observation.

Step 4  Ascertain the Cumulative Frequency which includes th observation, the corresponding
class frequency (f) and lower limit (L) of that class, the interval between the upper and lower
limit of class and cumulative frequency of the preceding class (c.f.).
Step 3  Calculate Median as follows:
. .
Median = + ×

Where, L = Lower limit of the class


c.f. = Cumulative frequency of the preceding class
f = Frequency of the class
i = Interval between upper and lower limit of class
Note: To find median value by using interpolation, it is assumed that the numerical values of observations are
evenly spaced over the entire class interval.

Merits of Median
1. The median is useful in case of frequency distribution with open-end classes.
2. The median is recommended if distribution has unequal classes.
3. Extreme values do not affect the median as strongly as they affect the mean.
4. It is the most appropriate average in dealing with qualitative data.
5. The value of median can be determined graphically where as the value of mean cannot be
determined graphically.
6. It is easy to calculate and understand.
The Me
ean, Median, Mode
M and Oth
her Measures of
o Central Ten
ndency 7

Deme
erits of Meddian
1 For calcuulating mediaan it is necessary to arran
1. nge the data, where as othher averages do not
need arranngement.
2 Since it is
2. i a positionaal average its value is no
ot determinedd by all the observationss in the
series.
3 Median iss not capable for further allgebraic calcu
3. ulations.
4 The samppling stabilityy of the mediaan is less as compared to mean.
4. m

1.9 THE MODDE


M
Mode is oftenn said to be that
t value in a series whiich occurs most
m frequentlly or which has h the
greatest frequency.. But it is noot exactly truee for every frequency
f disstribution. Raather it is thaat value
aroundd which the observations
o tend to conccentrate most heavily. It is also called the most typ pical or
fashionnable value of
o distributionn because it is
i the value which
w has the greatest freqquency densitty in its
immeddiate neighboourhood. It iss usually denooted by Mo. It I may be nooted that a disstribution maay have
one mode or two modes
m or several modes.
1 Unimodaal: A distributtion is said too be Unimodaal if it has onlly one mode.
1.
2 Bimodal:: A distributioon is said to be
2. b bimodal iff it has two modes.
m
3 Multimod
3. dal: A distribbution is said to be multim
modal if it hass more than tw
wo modes.

Computation of
o Mode fo
or Individu
ual Series
Step 1  Count thee number of tiimes the various values off the series reepeat themselvves.
Step 2  Ascertainn the value occurring the maximum
m num
mber of timess.
Step 3  Mode = Value
V occurrinng maximum
m number of tiimes.

Computation of
o Mode fo
or Discretee Series
Step 1  Ascertainn maximum frrequency
Step 2  Ascertainn the value off the observatiion correspon
nding to maxiimum frequenncy.
Step 3  Mode = Value
V of the observation
o coorresponding
g to maximum
m frequency.
Notee: In case of Discrete
D series (i.e. where vallue of observatiions along withh frequencies arre given), modee can be
determinedd just by inspecttion method.
8 Computer Oriented Statistical Techniques

Computation of Mode for Grouped Data or Continuous Series


Step 1  Ensure that given series is a continuous exclusive series having equal class-intervals. If the
given series is not a continuous exclusive series, follow the procedure suggested below:
Given Series Procedure to be followed
Less than series Convert into continuous exclusive series
More than series Convert into continuous exclusive series
Inclusive series Convert into continuous exclusive series
Having unequal class Make the class intervals equal and adjust the frequencies assuming that
intervals they are equally distributed throughout the class.
Step 2  Ascertain the modal class as follows:
(a) By preparing the Grouping Table and Analysis in case there is a small difference between
the maximum frequency and the frequency preceding it or succeeding it.
(b) By inspection in other cases. In his case the class with maximum frequency is the Modal
Class.
Step 3  Calculate the Mode as follows:
1. By inspection formula in case of Unimodal distribution (i.e. where there is single mode)
(a) Where the modal class is one having the maximum frequency
│ │
Mode = = + ×
│ – │
Where, L = Lower limit of the Modal Class
f1 = Frequency of the Modal Class
f0 = Frequency of the pre-modal class i.e. preceding the modal class
f2 = Frequency of the post-modal class i.e. succeeding the modal class
I = Class interval of Modal Class
Notes:
1. If Modal Class id the first class, f0 is taken as zero.
2. If Modal Class id the last class, f2 is taken as zero.
3. Where the Modal Class is other than the one having the maximum frequency
2. By Empirical relationship formula in case of bimodal or multimodal distribution (i.e.
where there are two or more values having the same maximum frequency)
Mode = 3 Median – 2 Mean
Merits of Mode
1. It is easy to calculate and simple to understand.
2. It is not affected by the extreme values.
3. The value of mode can be determined graphically.
4. Its value can be determined in case of open-end class interval.
5. The mode is the most representative of the distribution.
The Me
ean, Median, Mode
M and Oth
her Measures of
o Central Ten
ndency 9

Deme
erits of Modde
1 It is not suuitable for fuurther mathem
1. matical treatm
ments.
2 The valuee of mode cannnot always be
2. b determined
d.
3 The valuee of mode is not
3. n based on each and eveery item of thee series.
4 The modee is strictly deefined.
4.
5 It is difficcult to calculaate when onee of the observ
5. vations is zerro or the sum
m of the obserrvations
is zero.

1.10
0 THE EMPIRICAL RELAATION BETW
WEEN THE MEAN, MEDIA
AN AND MO
ODE
Iff values of mean,
m median and mode arre equal, then n distributionn of numericaal values in th
he data
set is symmetrical as shown inn the figure (aa). But, if th hese values arre not equal then distribu ution of
numerrical values inn the data set is not symmeetrical as show
wn in figure (b) and figuree (c).

Median = Mean = Mode M


Mode Median Mean Mean Median Mod
de

(a) Symmetrical (b) Skewed to the


t Right (c) Skkewed to the Lig
ght

Iff most of the values fall either


e to the right
r or to thee left of the mode,
m then suuch a distribu
ution is
said too be skewed. In such casees, a relationsship between n these three measures of central tendeency as
suggessted by Karl Pearson
P is as follows:
M
Mean – Mode = 3 (Mean – Median)
O
OR Mode = 3 (Median – 2 Mean)
Iff most of the values of obsservations in a distribution n fall to the right of the m
mode as shown n in the
figure (b), then it iss said to be skkewed to the right
r or posittively skewedd (i.e. values oof higher mag
gnitude
are cooncentrated more
m to the right
r of the mode). In th his case, modde remains uunder the peaak (i.e.
represeenting highesst frequency)) but the meddian (value that t depends on the numbber of observ vations)
and mean
m more to thet right (valuue that is affeected by extreme values). The order off magnitude of o these
measuures will be
M
Mean > Mediaan > Mode
B if the disttribution is skkewed to the left or negatively skewedd (i.e. values of lower mag
But gnitude
are conncentrated more
m to the lefft of the modee), then modee is again undder the peak w
whereas mediian and
mean move
m to the left
l of mode. The order off magnitude of these measuures will be
M
Mean < Mediaan < Mode
10 Computer Oriented Statistical Techniques

In both the cases, the difference between mean and mode is three times the difference between
mean and median.
In general, for a single mode skewed distribution (non-symmetrical), the median is preferred to
the mean for measuring location because it is neither influenced by the frequency of occurrence of a
single observation value as mode nor it is affected by extreme values.

1.11 THE GEOMETRIC MEAN (G.M.)


In many business and economics problems, such as calculation of compound interest and
inflation, quantities (variables) change over a period of time. In such cases, a decision maker may like
to know an average percentage change rather than simple average value to represent the average
growth or declining rate in the variable value over a period of time. Thus, another measure of central
tendency called geometric mean (G.M.) is calculated.
For example, consider the annual growth rate of output of accompany in the last five years.
Year Growth Rate (Percent) Output at the end of the Year
2006 5.0 105.00
2007 7.5 112.87
2008 2.5 115.69
2009 5.0 121.47
2010 10.0 133.61
The simple arithmetic mean of the growth rate is
= (5 + 7.5 + 2.5 + 5 + 10) = 6
This value of mean implies that if 65 percent is the growth rate, then output at the end of year
2012 should be 133.81, which is slightly more than the actual value, 133.61. Thus the correct growth
rate should be less than 6.
To find the correct growth rate, we apply the formula of geometric mean:
G.M. = √Product of all the n values
= √ 1 ∙ 2 ∙ …. = (X1·X2·X3…..Xn)1/n
In other words, G.M. of a set of n observations is the nth root of their product.
Substituting the values of growth rate in the given formula, we have
G.M. = √5 × 7.35 × 2.5 × 5 × 10 = √4687.5 = 5.9 percent average growth.

Computation of Geometric Mean for Individual Series


If the number of observations are more than three, then G.M. can be calculated by taking
logarithm on both the sides of the equation. The formula for G.M. for un-grouped data can be
expressed in terms of logarithms as shown below:
Log (G.M.) = log (X1·X2·…Xn)

= (logX1 + logX2 +…. + logXn) = ∑ i


The Mean, Median, Mode and Other Measures of Central Tendency 11

and therefore G.M. = Antilog { ∑ i }



or G.M. = Antilog [ ] where, N = Total no. of items

Computation of Geometric Mean for Discrete Series


If the observations X1, X2,…, Xn occur with frequencies f1, f2,…, fn, respectively, and the total
frequencies are, n = ∑ i then the G.M. foe such data is given by
log (G.M.) = {f1 logX1 + f2 logX2 + …+ fn logXn}

= ∑ ( )

G.M. = Antilog { ∑fi logXi}



OR G.M. = Antilog [ ] where, N=Total no. of items

Computation of Geometric Mean for Grouped Data or Continuous Series


Step 1  Calculate the mid-points of each class and enter these mid-points in the column headed as
‘m’
Step 2  Take the logarithms of each mid-point and enter in the column headed as log m.
Step 3  Multiply these logarithms (log m) with the respective frequencies and enter these products
(f log m) in the column headed as f log m and then obtain their total i.e. ∑f log m.
Step 4 Calculate Geometric Mean as follows:

G.M. = Antilog [ ]

Weighted Geometric Mean


Like the weighted Arithmetic Mean, Weighted Geometric Mean may be calculated.
Symbolically,
G.M.W = × …

Computation of Weighted Geometric Mean


Step 1 Take the logarithms of each item of variable X and enter in the column headed as log X.
Step 2 Multiply these logarithms (log X) with the respective weights (W) and enter these products
(W log X) in the column headed as W log X and then obtain their total i.e. ∑W log X.
Step 3 Calculate Geometric Mean as follows:

G.M. = Antilog [ ]

Uses of Geometric Mean


(i) Geometric Mean is used to find the average percentage in sales, production etc.
(ii) Geometric Mean is used to find the index numbers since it shows the relative change.
12 Computer Oriented Statistical Techniques

(iii) When large weights are given to small items and small weights are given to large items, the
best measure of central tendency is Geometric Mean. That is, when there are extreme values,
the best measure of central tendency to be used is Geometric Mean.
Merits of Geometric Mean
(i) Geometric Mean is calculated based on all observations in the series.
(ii) Geometric Mean is clearly defined.
(iii) Geometric Mean is not affected by extreme values in the series.
(iv) Geometric Mean is amenable to further algebraic treatment.
(v) Geometric Mean is useful in averaging ratios and percentages.
Demerits of Geometric Mean
(i) Geometric Mean is difficult to understand.
(ii) We cannot compute geometric mean if there are both positive and negative values occur in
the series.
(iii) We cannot compute geometric mean if one or more of the values in the series is zero.

1.12 THE HARMONIC MEAN (H.M.)


The harmonic mean (H.M) is defined as the reciprocal of the arithmetic mean of the reciprocal of
the individual observations.
H.M. =
( … )

Where, X1, X2… Xn refer to the value of various items of the series
N = Total number of items of the series

Computation of Harmonic Mean for Individual Series


Step 1  Calculate the reciprocals of each item of variable X and enter in the column headed as and
obtain their total i.e. ∑

Step 2  Calculate H.M. as follows: H.M. =


∑( )

Computation of Harmonic Mean for Discrete Series


Step 1  Calculate the reciprocals of each item of variable X and enter in the column headed as

Step 2  Multiply these reciprocals ( ) with the respective frequencies and enter these products ( )
in the column headed as and then obtain their total i.e. ∑( )

Step 3 Calculate Harmonic Mean as follows: H.M. =


∑( )
The Mean, Median, Mode and Other Measures of Central Tendency 13

Computation of Harmonic Mean for Grouped Data or Continuous Series


Step 1 Calculate the mid-point of each item of variable X and enter these mid-points in
column headed as m.
Step 2 Calculate the reciprocals of the mid-points and in the column headed as .

Step 2 Multiply these reciprocals ( ) with the respective frequencies and enter these products ( )
in the column headed as and then obtain their total i.e. ∑( )

Step 3 Calculate Harmonic Mean as follows: H.M. =


∑( )

Weighted Harmonic Mean


Like the weighted Arithmetic Mean, Weighted Harmonic Mean may be calculated. Symbolically,

G.M. =
∑( )

Uses of harmonic Mean


(i) The H.M is used for computing the average rate of increase in profits of a concern.
(ii) The H.M is used to calculate the average speed at which a journey has been performed.
Merits of Harmonic Mean
(i) Its value is based on all the observations of the data.
(ii) It is less affected by the extreme values.
(iii) It is suitable for further mathematical treatment.
(iv) It is strictly defined.
Demerits of Harmonic Mean
(i) It is not simple to calculate and easy to understand.
(ii) It cannot be calculated if one of the observations is zero.
(iii) The H.M is always less than A.M and G.M.

1.13 THE RELATION BETWEEN ARITHMETIC, GEOMETRIC AND HARMONIC MEANS


(a) For any finite number of positive values of a variable, A.M. ≥ G.M. ≥ H.M.
Proof: We shall prove it in case of two positive numbers. Let x1 and x2 be the two positive numbers.
Now, A.M. of x1 and x2 = , their G.M. = √ × and their H.M. =

(√ − √ ) ≥ 0 (Since square of a real number is non-negative)


≫( ) +( ) − 2 ≥0

≫ + ≥2
14 Computer Oriented Statistical Techniques
+
≫ ≥
2
≫ A. M. ≥ G. M. … (I)

Again, − ≥0
√ √

≫ + − ≥0

2
≫ ≥
+

≫ G. M. ≥ H. M …(II)
From Eq. (1) and (2), we get
A.M. ≥ G.M. ≥ H.M.
(b) For any two positive numbers, A.M. × H.M. = (G.M.)2.
Proof: Let, a and b be the two positive numbers, we have
a+b
A. M. = , G. M. = √ab
2
2 2ab
H. M. = =
+ ( ) a+b

+
( . .) × ( . .) = × = = ( . .)
+

1.14 THE ROOT MEAN SQUARE


The root mean square (RMS), or quadratic mean, of a set of numbers X1, X2, ... , XN is sometimes
denoted by and is defined by
∑ ∑
RMS = = =

This type of average is frequently used in physical applications.

Example: The RMS of the set 1, 3, 4, 5, and 7 is = √20 = 4.47

1.15 QUARTILES, DECILES AND PERCENTILES


If a set of data is arranged in order of magnitude, the middle value (or arithmetic mean of the two
middle values) that divides the set into two equal parts is the median. By extending this idea, we can
think of those values which divide the set into four equal parts. These values denoted by Q1, Q2, and
Q3, are called the first, second, and third quartiles, respectively, the value Q2 being equal to the
median. Similarly, the values that divide the data into 10 equal parts are called deciles and are denoted
by D1, D2,..., D9, while the values dividing the data into 100 equal parts are called percentiles and are
denoted by P1, P2,... , P99. The fifth decile and the 50th percentile correspond to the median. The 25th
The Mean, Median, Mode and Other Measures of Central Tendency 15

and 75th percentiles correspond to the first and third quartiles, respectively. Collectively, quartiles,
deciles, percentiles, and other values obtained by equal subdivisions of the data are called quantiles.

1.16 SOFTWARE AND MEASURES OF CENTRAL TENDENCY


The output for all five packages is given for the test scores:
Test Scores

25 28 28 28 29 30 32 33 33 33 34 34 35 36 37
38 41 42 42 45 46 47 51 51 53 53 53 55 56 57
57 60 61 62 62 62 67 68 69 71 72 73 73 75 75
79 82 85 86 86 86 88 88 89 91 93 94 96 96 99

EXCEL
If the pull-down ‘‘Tools => Data Analysis => Descriptive Statistics’’ is given, the measures of
central tendency median, mean, and mode as well as several measures of dispersion are obtained:
Mean 59.16667
Standard Error 2.867425
Median 57
Mode 28
Standard Deviation 22.21098
Sample Variance 493.3277
Kurtosis 1.24413
Skewness 0.167175
Range 74
Minimum 25
Maximum 99
Sum 3550
Count 60

MINITAB
If the pull-down ‘‘Stat=> Basic Statistics => Display Descriptive Statistics’’ is given, the
following output is obtained:
Descriptive Statistics: testscore
Variable N N* Mean SE Mean St Dev Minimum Q1 Median Q3
Testscore 60 0 59.17 2.87 22.21 25.00 37.25 57.00 78.00
Variable Maximum
testscore 99.00
16 Computer Oriented Statistical Techniques

SPSS
If the pull-down ‘‘Analyze => Descriptive Statistics => Descriptives’’ is given, the following
output is obtained:
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Testscore valid 60 25.00 99.00 59.1667 22.21098
N (listwise) 60

SAS
If the pull-down ‘‘Solutions =>Analysis => Analyst’’ is given and the data are read in as a file,
the pull-down ‘‘Statistics => Descriptive => Summary Statistics’’ gives the following output:

STATISTIX
If the pull-down ‘‘Statistics =>Summary Statistics => Descriptive Statistics’’ is given in the
software package STATISTIX, the following output is obtained:

 SOLVED EXAMPLES
Example 1: Write out the terms in each of the following indicated sums:
(a) ∑ (b) ∑ −3 (c) ∑ (d) ∑ (e) ∑ −
Solution: (a) + + + + +
(b) ( − 3) + ( − 3) + ( − 3) + ( − 3)
(c) + + + ⋯+ =
(d) + + + +
(e) ( − )+) ( − )+) ( − )= + + − 3a
Example 2: Express each of the following by using the summation notation:
(a) X + X + X + ⋯ + X
(b) (X + Y ) + (X + Y ) + ⋯ + (X + Y )
(c) f X + f X + ⋯ + f X
(d) a b + a b + a b + ⋯ + aN bN
(e) f X Y + f X Y + f X Y + f X Y
Solution: (a) ∑ X
(b) ∑ X +Y
(c) ∑ fX
The Mean, Median, Mode and Other Measures of Central Tendency 17

(d) ∑N a b
(e) ∑ fXY
Example 3: Calculate the arithmetic mean of the following observations.
32, 35, 36, 37, 39, 41, 43, 47, 48
Solution: A.M. = = = 39.77
Example 4: In a survey of 5 cement companies, the profit (in ` crore) earned during a year was
15, 20, 10, 35 and 32. Find the arithmetic mean of the profit earned.
Solution: A.M. = = 22.4
Thus, the arithmetic of the profit earned by these companies during a year was ` 22.4 crore.
Example 5: An examination was held to decide for awarding of a scholarship. The weights of
various subjects were different. The marks obtained by 3 candidates (out of 10 in each subject) are
given below:
Students
Subject Weight
A B C
Mathematics 4 60 57 62
Physics 3 62 61 67
Chemistry 2 55 53 60
English 1 67 77 49
Calculate the weighted A.M. to award the scholarship.
Solution: The calculation of the weighted arithmetic mean is shown below:
Students
Subject Weight (wi) Student A Student B Student C
Marks (Xi) Xiwi Marks (Xi) Xiwi Marks (Xi) Xiwi
Mathematics 4 60 240 57 228 62 248
Physics 3 62 186 61 183 67 201
Chemistry 2 55 110 53 106 60 120
English 1 67 67 77 77 49 49
10 244 603 248 594 238 618
Applying the formula for weighted mean, we get:
wA = = 60.3; A= = 61

wB = = 59.4; B= = 62

wC = = 61.8; C= = 59.3
18 Computer Oriented Statistical Techniques

From above calculations, it may be noted that student B should get the scholarship as per simple
A.M. values, but according to weighted A.M., student C should get the scholarship because all the
subjects of examination are not of equal importance.
Example 6: The owner of a general store was interested in knowing the mean contribution (sales
price minus variable cost) of his stock of 5 items. The data is given below:
Product Contribution per Unit Quantity Sold
1 6 160
2 11 60
3 8 260
4 4 460
5 14 110
Solution: If the owner ignores the values of the individual products and gives equal importance
to each product, then the mean contribution per unit sold will be
= (1/5) 6 + 11 + 8 + 4 + 14 = ` 8.6
However, ` 8.60 may not necessarily be the mean contribution per unit of different quantities of
the products sold. In this case, the owner has to take into consideration the number of units of each
product sold as different weights. Computing weighted A.M. by multiplying units sold (w) of a
product by its contribution (X). That is,
( ) ( ) ( ) ( ) ( ) ,
= = = ` 6.74
,

This value, ` 6.74, is different from the earlier value, ` 8.60. The owner must use the value
` 6.74 for decision making purpose.
Example 7: Find the mean from the following data:
X 5 10 15 20 25 30 35 40
f 5 9 13 21 2 15 8 3
Solution: Total Frequency = ∑f = 5+9+13+21+2+15+8+3
= 76 = Number of values
X f fX
5 5 25
10 9 90
15 13 195
20 21 420
25 2 50
30 15 450
35 8 280
40 3 120
∑f = 76 ∑fX = 1630
The Mean, Median, Mode and Other Measures of Central Tendency 19

∑fX = Sum of the products of X values with their respective frequencies.


= Sum of the values = 1630

Arithmetic Mean = = = 21.44

Example 8: If A, B, C and D are four chemicals costing ` 15, ` 12, ` 8 and ` 5 per 100g,
respectively, and are contained in a given compound in the ratio of 1, 2, 3 and 4 parts, respectively,
then what should be the price of the resultant compound.
∑ × × × ×
Solution: A.M. = = = ` 8.30

Example 9: The daily earning (in rupees) of 175 employees working on a daily basis in a firm
are:
Daily Earnings (`) 100 120 140 160 180 200 220
Number of Employees 3 6 10 15 24 42 75
Calculate the average daily earning for all employees by assumed mean method.
Solution: Let us take assumed mean, A = 160.
The calculation of average daily earning for employees is shown below:

Daily Earnings (in `) Number of Employees di = Xi – A


fi d i
(Xi) (fi) = Xi - 160

100 3 -60 -180


120 6 -40 -240
140 10 -20 -200
160 15 0 0
180 24 20 480
200 42 40 1680
220 75 60 4500
∑f = 175 ∑fd = 6040
The required A.M. ( ) using the formula is given by:

=A+ = 160 + 6040/175 = ` 194.51
Thus, the average daily earning for all employees is ` 194.51
Example 10: A company is planning to improve plant safety. For this, accident data for the last
50 weeks was complied. These data are grouped into the frequency distribution as shown below.
Calculate the A.M. of the number of accidents per week.
Number of accidents 0-10 10-20 20-30 30-40 40-50
Number of weeks 6 20 10 8 2
Solution: The calculation of Arithmetic Mean is shown below:

You might also like