Statistics
Statistics
Statistics
Chapter 1
Statistics are used in various field some of them are
Medical Research
Stock Market
Sales projection
Weather Forecasting
Sampling
Sampling is the process of collecting data to perform analysis on
Sample vs Population
Population is the entire data set such as the whole population of a country
Sample Frame
A sampling frame is a list from which sample is selected such as a citizen register for a country or
employee list for a company etc.
Sampling error
Sampling error is an error that leads to our sample not accurately representing our population
Random sampling
Is the process of selecting a subset / sample from a population in such a way that every data point is
equally likely to be included in the sample
Stratified sampling
Is the process of dividing your samples into layers or group and then performing random for each group
For example – if we have two group people male and female both in group and we need 50%
selected so we check on both the categories male as well as female
It means out of 12. 6 are selected 3 from the male and 3 from the female
Systematic sampling
Is the process of selecting your sample by picking every Kth element in your population. You don’t need
a list for this.
Central Tendencies
Central Tendency is used to indicate where does the middle or center of the distribution of our data lies
Mean
Mode
Median
Mode – is used to indicate the most frequent data point in other words the one which occurs most
number of times.
Median- is the middle of the data is arranged in ascending order then the data element which occurs
right at the center is the median.
Mean – Is the average of the data. In simpler terms. It’s the sum of values divided by total number of
values .Its represented by Greek letter Sigma.
Trimmed Mean – is used to deal with outliers by trimming or removing some data from both ends so
as to get rid of outliers (trim upper and downward value with the same amount)
Weighted mean – is used when certain values are supposed to count more in some context example
Calculating average grade of a student based on their grade distribution.
Q1 10 4 4*.10= .4
Q2 20 3.5 3.5*.20 = .7
Q3 70 3 3*.7 = 3.2
Calculating Mean
Mean
Direct and short cut method
Direct method
Employee monthly income
1 1780
2 1760
3 1690
4 1750
5 1840
6 1920
7 1100
8 1810
9 1050
10 1950
16650
Formula =sigma x/n 1665
mean 1665
shortcut method
Assume mean
Employee monthly income x=X-A
1 1780 -140
2 1760 -160
3 1690 -230
4 1750 -170
5 1840 -80
6A 1920 0
7 1100 -820
8 1810 -110
9 1050 -870
10 1950 30
-2550
formula =A+simax/n 1665
Continuous series
shortcut method
Continuous series
step deviation method
Mode
Continuous series
Mode
step deviation method
Marks f
0-10 3
10--20 5
20--30 7
30--40 10
40--50 12 f0
50--60 15 f1
60--70 12 f2
70--80 6
80--90 2
90--100 8
mode =15
mode=l+f1-fo/2f1-f0+f2*i 55
Variation
Variation in statistics is used to show how data is dispersed or spread or speed out. Several measures or
variation are used in statistics.
Range
Quartiles
Variance
Range – is the difference between the highest and the lowest values in our data set. Ranges tells us the
distance between the lowest and highest values in our data.
Percentiles – are score that are used to describe a value below which some observation fall .Example
if x is at 70th percentile it mean 70% of other data points from our sample are below x.
Quartiles-are used to break the data into 4 parts so as to better find the spread of data in way that is
less influenced by outliers.
Q1-25th
Q2-50th
Q3-75th
Interquartile Range (IQR)- Interquartile is the difference between the upper and lower quartile .
This gives us a better idea of the range of data q3-q1
Chapter 2
Correlation
is a term that is a measure of the strength of a linear relationship between two quantitative variable
Topic Cover for correlation
Signification
Type of correlation
Method of studying correlation
1-Scatter diagram method
2-Graphic method
3-Karl Pearson’s coefficient of correlation
4-Concurrent deviation method
Coefficient of determination
Covariance
Properties of r
Merit and limitation r
Type of correlation
Positive Correlation – is a term that is used to describe a positive linear relationship between
two quantitative variables
It is free from those defects from which the earlier method suffer and satisfies the most of the
properties a good measurement of dispersion
SD is also known as roor mean square deviation
The SD measurement the absolute dispersion of distribution the greater the amount of dispersion
A small SD means a high degree of uniformity of the observation as well as homogeneity of a series
2 method
1-By taking deviation of items from actual mean
Deviations taken from actual mean when deviation are taken from actual mean the following is applied
Chapter- 4
Regression Analysis
Introduction
1- After establishing(Correlation ) the fact that two variable are closely related we may be
interested in estimating predicting the value of one variable given the value of another
2- Ex –If we know that Advertising and sales are closely correlated we found out expected
amount of sales for a given advertising expenditure or the required amount of expenditure
for attaining the given amount of sale
3- Similarly if we know that the yield of rice and amount of rainfall are closely related we may
find out the amount of rain required to achieve a certain production figure
4- Regression Analysis reveals average relationship between two variable and this make
possible estimation or prediction
5- Regression Analysis – The act of returning or going back
6- Regression is first used by – sir francis galton
7- RA attempts to establish the nature of the relationship between variable that is to study the
functional Relationship between the variables thereby provide a mechanism for prediction
or forecasting
8- RA is a Statistical Device with the help of which we are in a position to estimate or predict
the unknown values of one variable from known value of another variable
9- The variable which is used to predict the variable of interest is called the independent
variable or explanatory variable we are typing to predict is called the dependent variable or
explained variable
10- The analysis used is called the simple linear regression analysis simple because there is only
predictor or independent variable and linear because of the assumed linear relationship
between the dependent and the independent variable
11- The term linear means eq of straight line = y = a+bx
And a & b is constant
find a and b
y on x
y=a+bx
sigma y=Na +bsigma x……….eq1*x
sigmaxy=asigma x +bsigmax^2………eq2
now solve eq 1 and 2
40=5a+30b ……1 *6
214=30a+220b…….2
240=30a+180b…..1
214=30a+220b…2
b=-.65 a=11.9
xon y
x=a+by
sigma x=Na +bsigma y……….eq1*y
sigmaxy=asigma y +bsigmay^2………eq2
now solve eq 1 and 2
30=5a+40b ……1 *8
214=40a+340b…….2
240=40a+320b…..1
214=40a+340b…2
b=--1.3 a=16.4
sigmaxy=asigma y +bsigmay^2………eq2
now solve eq 1 and 2
30=5a+40b ……1 *8
214=40a+340b…….2
240=40a+320b…..1
214=40a+340b…2
b=--1.3 a=16.4
30=5a+40(-1.3)
5a=30+52=82
x=16.4-1.3y
Chapter 5- Time Series Analysis
Method 4 – Square root method
Probability