Introduction To Statistics PDF
Introduction To Statistics PDF
Introduction To Statistics PDF
K M Billah
Lecturer, Dept. of Civil Engineering, UU
Two Types of Statistics
• Descriptive statistics of a POPULATION
• Relevant notation (Greek):
– mean
– N population size
– sum
88 + 95 = 183
97 is the MODE
Central Tendency and the
Shape of the Distribution
• Because the mean, the median, and the mode are all measuring
central tendency, the three measures are often systematically
related to each other.
• In a symmetrical distribution, the mean and median will always be
equal.
• If a symmetrical distribution has only one mode, then the mode,
mean, and median will all have the same value.
• In a skewed distribution, the mode will be located at the peak on one
side and the mean usually will be displaced toward the tail on the
other side.
• The median is usually located between the mean and the mode.
mean
median
mode
10 20 30 40 50 60 70 80 90
median median
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Shape of the distribution:
Skewness
• A measure of the lack of symmetry, or the
lopsidedness of a distribution.
• Use ‘median’.
Shape of Distribution:
Kurtosis
How flat or peaked a distribution appears.
(Does not affect the central tendency)
97 34 is the RANGE
-63 or spread
of this set of data
34
Variance: a measure of how data
points differ from the mean
• Data Set 1: 3, 5, 7, 10, 10
Data Set 2: 7, 7, 7, 7, 7
But we know that the two data sets are not identical!
The variance shows how they are different.
We want to find a way to represent these two data set
numerically.
How to Calculate?
• If we conceptualize the spread of a distribution
as the extent to which the values in the
distribution differ from the mean and from each
other, then a reasonable measure of spread
might be the average deviation, or difference, of
the values from the mean.
( x X )
N
• Although this might seem reasonable, this expression
always equals 0, because the negative deviations about the
mean always cancel out the positive deviations about the
mean.
• We could just drop the negative signs, which is the same
mathematically as taking the absolute value, which is known
as the mean deviations.
• The concept of absolute value does not lend itself to the kind
of advanced mathematical manipulation necessary for the
development of inferential statistical formulas.
• The average of the squared deviations about the mean is
called the variance.
x X
2
x X
2
For sample variance
s
2
n 1
MEASURES OF VARIABILITY
POPULATION VARIANCE
(x i )
2 i 1
N
• Where 2 stands for the population variance
• is the population mean
• N is the total number of values in the population
• xi is the value of the i-th observation.
• represents a summation
MEASURES OF VARIABILITY
SAMPLE VARIANCE
(x i x)
s2 i 1
n 1