Statistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 64

Statistics

Chapter 1
Statistics are used in various field some of them are

 Medical Research
 Stock Market
 Sales projection
 Weather Forecasting

Sampling
Sampling is the process of collecting data to perform analysis on

Sample vs Population
Population is the entire data set such as the whole population of a country

sample subject is subset of that population which is analyzed to make inferences

Sample Frame
A sampling frame is a list from which sample is selected such as a citizen register for a country or
employee list for a company etc.

Sampling error
Sampling error is an error that leads to our sample not accurately representing our population

Non sampling error


Occurs dues to poor sample design, inaccurate measurement, bias in data collection etc

Random sampling
Is the process of selecting a subset / sample from a population in such a way that every data point is
equally likely to be included in the sample

5 male and 5 female are equally like to select

Stratified sampling
Is the process of dividing your samples into layers or group and then performing random for each group

For example – if we have two group people male and female both in group and we need 50%
selected so we check on both the categories male as well as female
It means out of 12. 6 are selected 3 from the male and 3 from the female
Systematic sampling
Is the process of selecting your sample by picking every Kth element in your population. You don’t need
a list for this.

Central Tendencies
Central Tendency is used to indicate where does the middle or center of the distribution of our data lies

 Mean
 Mode
 Median

Mode – is used to indicate the most frequent data point in other words the one which occurs most
number of times.

Median- is the middle of the data is arranged in ascending order then the data element which occurs
right at the center is the median.

Mean – Is the average of the data. In simpler terms. It’s the sum of values divided by total number of
values .Its represented by Greek letter Sigma.

Trimmed Mean – is used to deal with outliers by trimming or removing some data from both ends so
as to get rid of outliers (trim upper and downward value with the same amount)

Weighted mean – is used when certain values are supposed to count more in some context example
Calculating average grade of a student based on their grade distribution.

Q1 10 4 4*.10= .4

Q2 20 3.5 3.5*.20 = .7

Q3 70 3 3*.7 = 3.2
Calculating Mean
Mean
Direct and short cut method
Direct method
Employee monthly income
1 1780
2 1760
3 1690
4 1750
5 1840
6 1920
7 1100
8 1810
9 1050
10 1950
16650
Formula =sigma x/n 1665
mean 1665

shortcut method
Assume mean
Employee monthly income x=X-A
1 1780 -140
2 1760 -160
3 1690 -230
4 1750 -170
5 1840 -80
6A 1920 0
7 1100 -820
8 1810 -110
9 1050 -870
10 1950 30
-2550
formula =A+simax/n 1665

Continuous series
shortcut method

Marks f m(X) x=X-A fx


0-10 5 5 -30 -150
10--20 10 15 -20 -200
20--30 25 25 -10 -250
30--40 30 35 0 0
40--50 20 45 10 200
50--60 10 55 20 200
100 -200
Formula = a+sigma fx/sigms f 33

Continuous series
step deviation method

Marks f m(X) x=X-A/i x=x/i fx


-
0-10 5 5 -30 -3 15
-
10--20 10 15 -20 -2 20
-
20--30 25 25 -10 -1 25
30--40 30 35 0 0 0
40--50 20 45 10 1 20
50--60 10 55 20 2 20
-
100 20
Formula = a+sigma fx/sigms f *i 33
Median
Continuous series
median
step deviation method
X f X cf
0-5 2 2.5 2.5
c
5--10 5 7.5 10 f
10--20 1 15 22.5
20--30 3 25 40
30--40 12 35 60
formula =L+N/2-cf*i 23 85
11.5
formula =L+N/2-cf/f*i 57.5

Mode
Continuous series
Mode
step deviation method
Marks f
0-10 3
10--20 5
20--30 7
30--40 10
40--50 12 f0
50--60 15 f1
60--70 12 f2
70--80 6
80--90 2
90--100 8
mode =15

mode=l+f1-fo/2f1-f0+f2*i 55
Variation
Variation in statistics is used to show how data is dispersed or spread or speed out. Several measures or
variation are used in statistics.

 Range
 Quartiles
 Variance

Range – is the difference between the highest and the lowest values in our data set. Ranges tells us the
distance between the lowest and highest values in our data.

Percentiles – are score that are used to describe a value below which some observation fall .Example
if x is at 70th percentile it mean 70% of other data points from our sample are below x.

Quartiles-are used to break the data into 4 parts so as to better find the spread of data in way that is
less influenced by outliers.

Q1-25th

Q2-50th

Q3-75th

Interquartile Range (IQR)- Interquartile is the difference between the upper and lower quartile .
This gives us a better idea of the range of data q3-q1

Chapter 2

Correlation

is a term that is a measure of the strength of a linear relationship between two quantitative variable
Topic Cover for correlation
Signification
Type of correlation
Method of studying correlation
1-Scatter diagram method
2-Graphic method
3-Karl Pearson’s coefficient of correlation
4-Concurrent deviation method
Coefficient of determination
Covariance
Properties of r
Merit and limitation r

Type of correlation
Positive Correlation – is a term that is used to describe a positive linear relationship between
two quantitative variables

Temp increase than coca sales increase

Negative Correlation - is a term that is used to describe a decrease linear relationship


between two quantitative variables

Temp decreases than coca sales decreases

Method of studying correlation


1-Scatter diagram method
2-Graphic method
3-Karl Pearson’s coefficient of correlation
4-Concurrent deviation method
2- Graphic Method
3-Karl Pearson’s Coefficient of correlation(imp)
Covariance
4-Rank Correlation Coefficient
Chapter -3

Method of Studying Variation


1- Range
2- The interquarile range and the quartile deviation
3- The mean Deviation or Average Deviation
4- The Standard Deviation

4-Standard Variance and Standard Deviation


The standard deviation concept was introduced by karl pearson in 1823 It is so far the most important
and widely used measure of studying dispersion

It is free from those defects from which the earlier method suffer and satisfies the most of the
properties a good measurement of dispersion
SD is also known as roor mean square deviation

The SD measurement the absolute dispersion of distribution the greater the amount of dispersion

A small SD means a high degree of uniformity of the observation as well as homogeneity of a series

Calculation of standard Deviation (Individual Observation)

2 method
1-By taking deviation of items from actual mean

2-By taking deviation of items from an assumed mean

Deviations taken from actual mean when deviation are taken from actual mean the following is applied
Chapter- 4

Regression Analysis

Introduction
1- After establishing(Correlation ) the fact that two variable are closely related we may be
interested in estimating predicting the value of one variable given the value of another
2- Ex –If we know that Advertising and sales are closely correlated we found out expected
amount of sales for a given advertising expenditure or the required amount of expenditure
for attaining the given amount of sale
3- Similarly if we know that the yield of rice and amount of rainfall are closely related we may
find out the amount of rain required to achieve a certain production figure
4- Regression Analysis reveals average relationship between two variable and this make
possible estimation or prediction
5- Regression Analysis – The act of returning or going back
6- Regression is first used by – sir francis galton
7- RA attempts to establish the nature of the relationship between variable that is to study the
functional Relationship between the variables thereby provide a mechanism for prediction
or forecasting
8- RA is a Statistical Device with the help of which we are in a position to estimate or predict
the unknown values of one variable from known value of another variable
9- The variable which is used to predict the variable of interest is called the independent
variable or explanatory variable we are typing to predict is called the dependent variable or
explained variable
10- The analysis used is called the simple linear regression analysis simple because there is only
predictor or independent variable and linear because of the assumed linear relationship
between the dependent and the independent variable
11- The term linear means eq of straight line = y = a+bx
And a & b is constant

Correlation Analysis Vs Regression Analysis

1- CA – It measurement the degree of covariblity between x and y


Ra – It study the nature of relationship between variables
2- CA- We cannot say that one variable is the cause and other the effect
RA- one variable is taken as dependent and other is taken as independent and thus make it
possible to study the cause and effect relationship
Correlation Analysis VS Regression Analysis
Linear Regrssion method 1
N x y xy x^2 y^2
1 6 9 54 36 81
2 2 11 22 4 121
3 10 5 50 100 25
4 4 8 32 16 64
5 8 7 56 64 49
sum 30 40 214 220 340

find a and b
y on x
y=a+bx
sigma y=Na +bsigma x……….eq1*x
sigmaxy=asigma x +bsigmax^2………eq2
now solve eq 1 and 2
40=5a+30b ……1 *6
214=30a+220b…….2

240=30a+180b…..1
214=30a+220b…2

b=-.65 a=11.9

xon y

x=a+by
sigma x=Na +bsigma y……….eq1*y
sigmaxy=asigma y +bsigmay^2………eq2
now solve eq 1 and 2
30=5a+40b ……1 *8
214=40a+340b…….2

240=40a+320b…..1
214=40a+340b…2

b=--1.3 a=16.4
sigmaxy=asigma y +bsigmay^2………eq2
now solve eq 1 and 2
30=5a+40b ……1 *8
214=40a+340b…….2
240=40a+320b…..1
214=40a+340b…2

b=--1.3 a=16.4

30=5a+40(-1.3)
5a=30+52=82
x=16.4-1.3y
Chapter 5- Time Series Analysis
Method 4 – Square root method
Probability

You might also like