W2 Descriptive Statistics
W2 Descriptive Statistics
W2 Descriptive Statistics
STATISTICS
KEY STATISTICAL CONCEPTS
Sample
• a set of data drawn from
the population.
Population • Potentially large, but less
• the group of all items of interest to than the population
a statistics practitioner.
• frequently very large; sometimes
infinite.
KEY STATISTICAL CONCEPTS
Statistic
Parameter
110.54 110.23
• Organize Data
– Tables
– Graphs
• Summarize Data
– Central Tendency
– Variation
DESCRIPTIVE STATISTICS
• Organize Data
– Tables
• Frequency Distribution
• Relative Frequency Distribution
– Graphs
• Bar Chart
• Histogram
• Stem and Leaf Plot
• Frequency Polygon
• Pie Chart
• Scatter Plot
SPSS OUTPUT FOR
FREQUENCY DISTRIBUTION
GROUPED RELATIVE FREQUENCY
DISTRIBUTION
80 – 89 3 11.5 11.5
90 – 99 5 19.2 30.7
100 – 109 7 26.9 57.6
110 – 119 4 15.4 73.0
120 – 129 3 11.5 84.5
130 – 139 2 7.7 92.2
140 – 149 1 3.8 96.0
150 and over 1 3.8 100.0
Stem Leaf
8 0 7 9
9 3 3 678
10 2 3 56999
11 0 1 59
12 0 7 8
13 1 1
14 0
15
16 2
SPSS OUTPUT OF A
FREQUENCY POLYGON
PIE CHART
SCATTER PLOT
DESCRIPTIVE STATISTICS
Summarizing Data:
Y-bar = Σ Yi
n
MEAN
What’s up with all those symbols, man?
98 106 80 109
93 87
140 119
120 105
93 97
109
110
MEAN
The mean is the “balance point.”
Each person’s score is like 1 pound placed at the score’s
position on a see-saw. Below, on a 200 cm see-saw, the
mean equals 110, the place on the see-saw where a
fulcrum finds balance:
1 lb at 1 lb at 1 lb at
93 cm 106 cm 110 cm 131 cm
17 21
4
units
units 0
above
below units
units
below
The scale is balanced because…
17 + 4 on the left = 21 on the right
MEAN
Income in Malaysia.
Syed Al-Bukhary
All of Us
Mean Outlier
MEDIAN
Symmetric Skewed
Mean
Median
Median Mean
MEDIAN
Data distribution on
the right is
“bimodal” (even
statistics can be
open-minded)
MODE
1. It may give you the most likely experience
rather than the “typical” or “central”
experience.
2. In symmetric distributions, the mean,
median, and mode are the same.
3. In skewed data, the mean and median lie
further toward the skew than the mode.
Symmetric Skewed
Mean
Median
Mode Mode Median Mean
Choosing a Measure of Central
Tendency
Summarizing Data:
25th percentile is a quartile that divides the first ¼ of cases from the
latter ¾.
75th percentile is a quartile that divides the first ¾ of cases from the
latter ¼.
Mean
The smaller the variance, the closer the individual scores
are to the mean.
Mean
VARIANCE
Variance is a number that at first seems complex to
calculate.
235.45 = 15.34
Review:
1. Deviation
2. Deviation squared
3. Sum of squares
4. Variance
5. Standard deviation
STANDARD DEVIATION
1. Larger s.d. = greater amounts of variation around the mean.
For example:
19 25 31 13 25 37
Y = 25 Y = 25
s.d. = 3 s.d. = 6
2. s.d. = 0 only when all values are the same (only when you have
a constant and not a “variable”)
3. If you were to “rescale” a variable, the s.d. would change by the
same magnitude—if we changed units above so the mean
equaled 250, the s.d. on the left would be 30, and on the right,
60
4. Like the mean, the s.d. will be inflated by an outlier case value.
STANDARD DEVIATION
• Note about computational formulas:
– Your book provides a useful short-cut formula for computing
the variance and standard deviation.
– This is intended to make hand calculations as quick as
possible.
– They obscure the conceptual understanding of our statistics.
– SPSS and the computer are “computational formulas” now.
SYMBOLS IN STATISTICS
DESCRIPTIVE STATISTICS
Summarizing Data:
162
123.5
M=110.5 106.5
96.5
82
SPSS OUTPUT OF CLASS A & B
SHAPE OF DISTRIBUTIONS
• Shape of distribution is measured by
– Skewness & Kurtosis
• When the scores in your distribution tend to cluster in
one of the tails (i.e., a cluster of high scores or a cluster
of low scores) the distribution is skewed.
– Positively Skewed Distributions – occur when there is
cluster of lower scores, the smaller, more spread-out tail
will be on the right (i.e., fewer high scores).
– Negatively Skewed Distributions – occur when there is a
cluster of higher scores, the smaller more spread out tail
will be on the left (i.e., fewer small scores).
• Statisticians use several specific
terms to describe the different
shapes these distributions can
assume.
– Unimodal Distributions have
one prominent category or
high point.
– Multimodal Distributions
have several prominent
categories or high points.
SAMPLE RESEARCH ARTICLE
DESCRIPTIVE STATISTICS
• Now you are qualified to use descriptive statistics!
• Questions?
• Do your Quiz 1 (Week 1 and Week 2’s lectures) online
PLEASE!