Chapter 2756
Chapter 2756
Chapter 2756
STATISTICAL TECHNIQUES
(As per the New Syllabus of Mumbai University for B.Sc. (Information Technology),
Semester IV, 2017-18)
Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,
“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.
Phone: 022-23860170, 23863863; Fax: 022-23877178
E-mail: [email protected]; Website: www.himpub.com
Branch Offices :
New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj, New Delhi - 110 002.
Phone: 011-23270392, 23278631; Fax: 011-23256286
Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.
Phone: 0712-2738731, 3296733; Telefax: 0712-2721216
Bengaluru : Plot No. 91-33, 2nd Main Road, Seshadripuram, Behind Nataraja Theatre,
Bengaluru - 560 020. Phone: 080-41138821; Mobile: 09379847017, 09379847005
Chennai : New No. 48/2, Old No. 28/2, Ground Floor, Sarangapani Street, T. Nagar,
Chennai - 600 012. Mobile: 09380460419
Pune : “Laksha” Apartment, First Floor, No. 527, Mehunpura, Shaniwarpeth (Near Prabhat Theatre),
Pune - 411 030. Phone: 020-24496323, 24496333; Mobile: 09370579333
Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,
Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549
Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,
Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847
Ernakulam : 39/176 (New No. 60/251), 1st Floor, Karikkamuri Road, Ernakulam, Kochi - 682 011.
Phone: 0484-2378012, 2378016; Mobile: 09387122121
Bhubaneswar : Plot No. 214/1342, Budheswari Colony, Behind Durga Mandap, Bhubaneswar - 751 006.
Phone: 0674-2575129; Mobile: 09338746007
Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank, Kolkata - 700 010.
Phone: 033-32449649; Mobile: 07439040301
Printed at : M/s. Aditya Offset Process (I) Pvt. Ltd., Hyderabad. On behalf of HPH.
Dedication
I would like to dedicate this book to my family for allowing me the time to
write it.
I would like to thank my husband Dr. Dinesh Gabhane for standing beside
me throughout my work. He is a source of inspiration and motivation for
continuing to improve my knowledge and move my career forward.
Special thanks go to my dear son Vedant who gave me energy.
Last but not least my college principal Dr. V.S. Shivankar, all my friends and
colleagues who encouraged me all time.
Ms. Madhuri S. Bankar
Preface
It gives us immense pleasure to present the first edition of book “Computer Oriented
Statistical Techniques” to the teachers and students of Semester-IV of S.Y.B.Sc. (Information
Technology). This book has been written as per the syllabus prescribed by University of
Mumbai with effect from academic year 2017-18.
The whole syllabus is divided in to V units and XIII chapters. In each chapter, the concept
and theory is followed by sufficient number of solved examples. We have tried our level best
to present the subject matter in simple language for better understanding of the readers. We
hope that this edition will meet the requirements of the students of S.Y.B.Sc. (IT) in their
examination preparation.
Constructive suggestions and comments from the readers will be sincerely appreciated.
We would be glad to hear from you, if you would like to suggest improvements or to
contribute in any way. Kindly send your correspondence to [email protected] or
[email protected].
Finally, we would like to acknowledge our sincere respect and gratitude to Kiran Gurbani
(Head of Computer Science and IT Department, R.K. Talreja College, Ulhasnagar) for reviewing
this book thoroughly and providing an environment which stimulates new thinking and
innovations and her support which helped us for bringing out this book in time.
We are thankful to Mr. S.K. Srivastava for giving us an opportunity and encouragement to
write this book. We also extend our thanks to the staff of Himalaya Publishing House Pvt. Ltd.
for assisting us in proof reading and compilation of this book.
Evaluation Scheme
1. Internal Evaluation: 25 Marks
(i) Test: 1 Class test of 20 marks.
Attempt any four of the following: (20)
(a)
(b)
(c)
(d)
(e)
(f)
(ii) 5 marks: Active participation in the class, overall conduct, attendance.
2. External Examination: 75 Marks
All questions are compulsory
(i) (Based on Unit 1) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
(ii) (Based on Unit 2) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
(iii) (Based on Unit 3) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
(iv) (Based on Unit 4) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
(v) (Based on Unit 5) Attempt any three of the following: (15)
(a)
(b)
(c)
(d)
(e)
(f)
3. Practical Exam: 50 marks
Certified copy journal is essential to appear for the practical examination.
1. Practical Question 1 (20)
2. Practical Question 2 (20)
3. Journal (5)
4. Viva Voce (5)
Contents
UNIT I
1. The Mean, Median, Mode and Other Measures of Central Tendency 1 - 36
2. The Standard Deviation and Other Measures of Dispersion 37 - 57
3. Introduction to R 58 - 90
UNIT II
4. Moments, Skewness and Kurtosis 91 - 111
5. Elementary Probability Theory 112 - 125
6. Elementary Sampling Theory 126 - 137
UNIT III
7. Statistical Estimation Theory 138 - 145
8. Statistical Decision Theory 146 - 163
9. Statistics in R 164 - 174
UNIT IV
10. Small Sampling Theory 175 - 190
11. The Chi-Square Test 191 - 205
UNIT V
12. Curve Fitting and the Method of Least Squares 206 - 220
13. Correlation Theory 221 - 243
Unit I
CHAPTER 1 The Mean, Median,
Mode and Other
Measures of Central
Tendency
Structure
1.1 Index or Subscript, Notation
1.2 Summation Notation
1.3 Averages or Measures of Central Tendency
1.4 Arithmetic Mean
1.5 The Weighted Arithmetic Mean
1.6 Properties of the Arithmetic Mean
1.7 The Arithmetic mean Computed from Grouped Data
1.8 The Median
1.9 The Mode
1.10 The Empirical Relation between the Mean, Median and Mode
1.11 The Geometric Mean (G.M.)
1.12 The Harmonic Mean (H.M.)
1.13 The Relation Between Arithmetic, Geometric and Harmonic Means
1.14 The Root mean Square
1.15 Quartiles, Deciles and Percentiles
1.16 Software and Measures of Central Tendency
Solved Examples
Practice Examples
Measure
es of
Centra
al
Tenden
ncy
Mathematical Positional
Avverages Averages
Arithme
etic Ge
eometric Harmonic
Median M
Mode
Mean(AA.M) Me
ean(G.M) Mean(H.M)
Quartile
es,
Deciles and
a
Percentiiles
Measures of central tendenncy permits uss to compare individual items in the grroup with it and
M a also
permitts us to compare different series of figuures with regaard to their ceentral tendenccies.
A
Averages are derived
d figurees and not thee original datta.
Mean
n (Averagee)
Accordingg to Clark, “A
An average iss a figure thatt represents thhe whole grouup.”
A. E. Waaugh defines, “An averagge is a singlle value seleccted from a group of vaalues to
represent them in some way.”
Accordingg to Croxtonn and Cowdeen, “An averaage is a singlle value withhin the rangee of the
data that is used to reepresent all thet values in n the series. Since an aveerage is somewhere
within thee range of thee data it is som
metimes calleed a measure of central vaalue.”
Crum andd Smith say,, “An averagge is sometim mes called a ‘measure oof central ten
ndency’
because inndividual vallues of the vaariable usually
y cluster arouund it.”
The Mean, Median, Mode and Other Measures of Central Tendency 3
Step 3 Enter the given frequencies f in a column headed as f and obtain the sum of these
frequencies i.e. N of ∑f.
Step 4 Multiply the mid-point of each row with the respective frequency and denote these
products by fm and enter the same in a column headed as fm.
Step 5 Obtain the sum of these products i.e. ∑fm.
∑
Step 6 Apply the following formula: =
Median = size or value of full item + 50% of the difference between size of immediate
next item and size of full item.
6 Computer Oriented Statistical Techniques
Step 4 Ascertain the Cumulative Frequency which includes th observation, the corresponding
class frequency (f) and lower limit (L) of that class, the interval between the upper and lower
limit of class and cumulative frequency of the preceding class (c.f.).
Step 3 Calculate Median as follows:
. .
Median = + ×
Merits of Median
1. The median is useful in case of frequency distribution with open-end classes.
2. The median is recommended if distribution has unequal classes.
3. Extreme values do not affect the median as strongly as they affect the mean.
4. It is the most appropriate average in dealing with qualitative data.
5. The value of median can be determined graphically where as the value of mean cannot be
determined graphically.
6. It is easy to calculate and understand.
The Me
ean, Median, Mode
M and Oth
her Measures of
o Central Ten
ndency 7
Deme
erits of Meddian
1 For calcuulating mediaan it is necessary to arran
1. nge the data, where as othher averages do not
need arranngement.
2 Since it is
2. i a positionaal average its value is no
ot determinedd by all the observationss in the
series.
3 Median iss not capable for further allgebraic calcu
3. ulations.
4 The samppling stabilityy of the mediaan is less as compared to mean.
4. m
Computation of
o Mode fo
or Individu
ual Series
Step 1 Count thee number of tiimes the various values off the series reepeat themselvves.
Step 2 Ascertainn the value occurring the maximum
m num
mber of timess.
Step 3 Mode = Value
V occurrinng maximum
m number of tiimes.
Computation of
o Mode fo
or Discretee Series
Step 1 Ascertainn maximum frrequency
Step 2 Ascertainn the value off the observatiion correspon
nding to maxiimum frequenncy.
Step 3 Mode = Value
V of the observation
o coorresponding
g to maximum
m frequency.
Notee: In case of Discrete
D series (i.e. where vallue of observatiions along withh frequencies arre given), modee can be
determinedd just by inspecttion method.
8 Computer Oriented Statistical Techniques
Deme
erits of Modde
1 It is not suuitable for fuurther mathem
1. matical treatm
ments.
2 The valuee of mode cannnot always be
2. b determined
d.
3 The valuee of mode is not
3. n based on each and eveery item of thee series.
4 The modee is strictly deefined.
4.
5 It is difficcult to calculaate when onee of the observ
5. vations is zerro or the sum
m of the obserrvations
is zero.
1.10
0 THE EMPIRICAL RELAATION BETW
WEEN THE MEAN, MEDIA
AN AND MO
ODE
Iff values of mean,
m median and mode arre equal, then n distributionn of numericaal values in th
he data
set is symmetrical as shown inn the figure (aa). But, if th hese values arre not equal then distribu ution of
numerrical values inn the data set is not symmeetrical as show
wn in figure (b) and figuree (c).
In both the cases, the difference between mean and mode is three times the difference between
mean and median.
In general, for a single mode skewed distribution (non-symmetrical), the median is preferred to
the mean for measuring location because it is neither influenced by the frequency of occurrence of a
single observation value as mode nor it is affected by extreme values.
= ∑ ( )
(iii) When large weights are given to small items and small weights are given to large items, the
best measure of central tendency is Geometric Mean. That is, when there are extreme values,
the best measure of central tendency to be used is Geometric Mean.
Merits of Geometric Mean
(i) Geometric Mean is calculated based on all observations in the series.
(ii) Geometric Mean is clearly defined.
(iii) Geometric Mean is not affected by extreme values in the series.
(iv) Geometric Mean is amenable to further algebraic treatment.
(v) Geometric Mean is useful in averaging ratios and percentages.
Demerits of Geometric Mean
(i) Geometric Mean is difficult to understand.
(ii) We cannot compute geometric mean if there are both positive and negative values occur in
the series.
(iii) We cannot compute geometric mean if one or more of the values in the series is zero.
Where, X1, X2… Xn refer to the value of various items of the series
N = Total number of items of the series
Step 2 Multiply these reciprocals ( ) with the respective frequencies and enter these products ( )
in the column headed as and then obtain their total i.e. ∑( )
Step 2 Multiply these reciprocals ( ) with the respective frequencies and enter these products ( )
in the column headed as and then obtain their total i.e. ∑( )
≫ + ≥2
14 Computer Oriented Statistical Techniques
+
≫ ≥
2
≫ A. M. ≥ G. M. … (I)
Again, − ≥0
√ √
≫ + − ≥0
√
2
≫ ≥
+
≫ G. M. ≥ H. M …(II)
From Eq. (1) and (2), we get
A.M. ≥ G.M. ≥ H.M.
(b) For any two positive numbers, A.M. × H.M. = (G.M.)2.
Proof: Let, a and b be the two positive numbers, we have
a+b
A. M. = , G. M. = √ab
2
2 2ab
H. M. = =
+ ( ) a+b
+
( . .) × ( . .) = × = = ( . .)
+
and 75th percentiles correspond to the first and third quartiles, respectively. Collectively, quartiles,
deciles, percentiles, and other values obtained by equal subdivisions of the data are called quantiles.
25 28 28 28 29 30 32 33 33 33 34 34 35 36 37
38 41 42 42 45 46 47 51 51 53 53 53 55 56 57
57 60 61 62 62 62 67 68 69 71 72 73 73 75 75
79 82 85 86 86 86 88 88 89 91 93 94 96 96 99
EXCEL
If the pull-down ‘‘Tools => Data Analysis => Descriptive Statistics’’ is given, the measures of
central tendency median, mean, and mode as well as several measures of dispersion are obtained:
Mean 59.16667
Standard Error 2.867425
Median 57
Mode 28
Standard Deviation 22.21098
Sample Variance 493.3277
Kurtosis 1.24413
Skewness 0.167175
Range 74
Minimum 25
Maximum 99
Sum 3550
Count 60
MINITAB
If the pull-down ‘‘Stat=> Basic Statistics => Display Descriptive Statistics’’ is given, the
following output is obtained:
Descriptive Statistics: testscore
Variable N N* Mean SE Mean St Dev Minimum Q1 Median Q3
Testscore 60 0 59.17 2.87 22.21 25.00 37.25 57.00 78.00
Variable Maximum
testscore 99.00
16 Computer Oriented Statistical Techniques
SPSS
If the pull-down ‘‘Analyze => Descriptive Statistics => Descriptives’’ is given, the following
output is obtained:
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Testscore valid 60 25.00 99.00 59.1667 22.21098
N (listwise) 60
SAS
If the pull-down ‘‘Solutions =>Analysis => Analyst’’ is given and the data are read in as a file,
the pull-down ‘‘Statistics => Descriptive => Summary Statistics’’ gives the following output:
STATISTIX
If the pull-down ‘‘Statistics =>Summary Statistics => Descriptive Statistics’’ is given in the
software package STATISTIX, the following output is obtained:
SOLVED EXAMPLES
Example 1: Write out the terms in each of the following indicated sums:
(a) ∑ (b) ∑ −3 (c) ∑ (d) ∑ (e) ∑ −
Solution: (a) + + + + +
(b) ( − 3) + ( − 3) + ( − 3) + ( − 3)
(c) + + + ⋯+ =
(d) + + + +
(e) ( − )+) ( − )+) ( − )= + + − 3a
Example 2: Express each of the following by using the summation notation:
(a) X + X + X + ⋯ + X
(b) (X + Y ) + (X + Y ) + ⋯ + (X + Y )
(c) f X + f X + ⋯ + f X
(d) a b + a b + a b + ⋯ + aN bN
(e) f X Y + f X Y + f X Y + f X Y
Solution: (a) ∑ X
(b) ∑ X +Y
(c) ∑ fX
The Mean, Median, Mode and Other Measures of Central Tendency 17
(d) ∑N a b
(e) ∑ fXY
Example 3: Calculate the arithmetic mean of the following observations.
32, 35, 36, 37, 39, 41, 43, 47, 48
Solution: A.M. = = = 39.77
Example 4: In a survey of 5 cement companies, the profit (in ` crore) earned during a year was
15, 20, 10, 35 and 32. Find the arithmetic mean of the profit earned.
Solution: A.M. = = 22.4
Thus, the arithmetic of the profit earned by these companies during a year was ` 22.4 crore.
Example 5: An examination was held to decide for awarding of a scholarship. The weights of
various subjects were different. The marks obtained by 3 candidates (out of 10 in each subject) are
given below:
Students
Subject Weight
A B C
Mathematics 4 60 57 62
Physics 3 62 61 67
Chemistry 2 55 53 60
English 1 67 77 49
Calculate the weighted A.M. to award the scholarship.
Solution: The calculation of the weighted arithmetic mean is shown below:
Students
Subject Weight (wi) Student A Student B Student C
Marks (Xi) Xiwi Marks (Xi) Xiwi Marks (Xi) Xiwi
Mathematics 4 60 240 57 228 62 248
Physics 3 62 186 61 183 67 201
Chemistry 2 55 110 53 106 60 120
English 1 67 67 77 77 49 49
10 244 603 248 594 238 618
Applying the formula for weighted mean, we get:
wA = = 60.3; A= = 61
wB = = 59.4; B= = 62
wC = = 61.8; C= = 59.3
18 Computer Oriented Statistical Techniques
From above calculations, it may be noted that student B should get the scholarship as per simple
A.M. values, but according to weighted A.M., student C should get the scholarship because all the
subjects of examination are not of equal importance.
Example 6: The owner of a general store was interested in knowing the mean contribution (sales
price minus variable cost) of his stock of 5 items. The data is given below:
Product Contribution per Unit Quantity Sold
1 6 160
2 11 60
3 8 260
4 4 460
5 14 110
Solution: If the owner ignores the values of the individual products and gives equal importance
to each product, then the mean contribution per unit sold will be
= (1/5) 6 + 11 + 8 + 4 + 14 = ` 8.6
However, ` 8.60 may not necessarily be the mean contribution per unit of different quantities of
the products sold. In this case, the owner has to take into consideration the number of units of each
product sold as different weights. Computing weighted A.M. by multiplying units sold (w) of a
product by its contribution (X). That is,
( ) ( ) ( ) ( ) ( ) ,
= = = ` 6.74
,
This value, ` 6.74, is different from the earlier value, ` 8.60. The owner must use the value
` 6.74 for decision making purpose.
Example 7: Find the mean from the following data:
X 5 10 15 20 25 30 35 40
f 5 9 13 21 2 15 8 3
Solution: Total Frequency = ∑f = 5+9+13+21+2+15+8+3
= 76 = Number of values
X f fX
5 5 25
10 9 90
15 13 195
20 21 420
25 2 50
30 15 450
35 8 280
40 3 120
∑f = 76 ∑fX = 1630
The Mean, Median, Mode and Other Measures of Central Tendency 19
Example 8: If A, B, C and D are four chemicals costing ` 15, ` 12, ` 8 and ` 5 per 100g,
respectively, and are contained in a given compound in the ratio of 1, 2, 3 and 4 parts, respectively,
then what should be the price of the resultant compound.
∑ × × × ×
Solution: A.M. = = = ` 8.30
∑
Example 9: The daily earning (in rupees) of 175 employees working on a daily basis in a firm
are:
Daily Earnings (`) 100 120 140 160 180 200 220
Number of Employees 3 6 10 15 24 42 75
Calculate the average daily earning for all employees by assumed mean method.
Solution: Let us take assumed mean, A = 160.
The calculation of average daily earning for employees is shown below: