Data Management Part 1 2024
Data Management Part 1 2024
Data Management Part 1 2024
Definition of Statistics
Statistics is used in everyday life, which people do not realize.
The science of classification and manipulation of data in
order to draw inferences.
Statistics is derived from the Latin word "status" meaning
state.
◦ Two basic meanings of the word Statistics:
1. It refers to actual numbers derived from the data.
2. It refers as method of analysis.
Definition of Statistics
Statistics is a collection of quantitative data, such as
statistics of crimes, statistics of enrolment, statistics
of unemployment. Statistics is also the study of how
to collect, organize, analyze, and interpret
numerical information from data.
It simplifies mass of data (condensation);
METHODS OF
DATA Tabular Method
PRESENTATION
Graphical Method
Textual Method
Textual method uses a narrative description of the data gathered.
STUBS / BODY
CLASSES
Graphical Method
Qualities of a Good Graph
1. It is accurate
2. It is clear
3. It is simple
4. It has a good appearance
MODE
MEAN MEDIAN MODE
• Sum of all observed • Defined as the • The observed value that occurs
values divided by the positional middle most frequently.
number of value when • The data is said to be unimodal if
observations observations are there is only one mode, bimodal if
ordered from smallest there are two modes, trimodal if
to largest (or vice there are three modes.
versa)
Quantitative Data Quantitative Data Quantitative & Qualitative Data
x i
Formula for getting the mean of ungrouped data: X = i =1 , where n
n
is the number of observations
EXAMPLE#1: MEAN
Data: 4 6 5 7 3 4 5 4
EXAMPLE#2: MEAN
Data: Scores of 14 students in Math122a Midterm exam
72 83 84 82 72 80 79 80 76 80 85 79 90 91
What is the mean?
What if a 15th student took the Midterm exam just by guessing and got a
score of 10?
What happens to the mean?
The Median
How to get the median of ungrouped data:
• Arrange the scores in ascending or descending order.
• If n is odd, the median is the middle score, if n is even the median is the average
of the two middlemost score.(n is the number of observations)
For values of Xi, for i = 1,2,3, …, n
M d = X n +1 For n that is odd
2
Weighted mean =
( x.w )
w
Where ( x.w) is the sum of the products of the number and its
assigned weight, and w is the sum of all the weights.
Examples:
The table below shows Vincent’s first semester course grades.
Use the weighted mean formula to find Vincent’s GPA for the
semester.
Course Course Course Grade Point grade
grade units A 4
MMW A 3 B 3
Calculus B 4 C 2
Chemistry C 3 D 1
P.E. D 2 F 0
Examples
Ages of Science Fair Contestants
Age Frequency
7 3 Find the mean, the median
8 4 and all modes for the data
9 6 in the given table.
10 15
11 11
12 7
13 1
Measure of PERCENTILE
Location
- values below which
a specified fraction or
percentage of the
DECILE
observations in a
given set must fall
QUARTILE
Absolute Dispersion
Measure of - range, variance, standard
Dispersion deviation
- indicate the extent
to which individual
items in a series are
scattered about an Relative Dispersion
average.
- Coefficient of variation,
standard score
1
Measures of Dispersion
Absolute Dispersion
Measure of - range, variance, standard
125
Dispersion deviation
100
75
Which of the
- indicate the 50
25
distributions of
extent to which 0
1 2 3 4 5 6 7 8 9 10
individual items in a
scores has theRelative 125Dispersion
series are scattered
larger dispersion?
about an average. 100
- Coefficient of variation,
75
50
standard score
25
0
1 2 3 4 5 6 7 8 9 10
Measures of dispersion
Measures of dispersion indicate the extent to which individual items in a
series are scattered about an average.
◦ The more similar the scores are to each other, the lower the measure of
dispersion will be
◦ The less similar the scores are to each other, the higher the measure of
dispersion will be
◦ In general, the more spread out a distribution is, the larger the measure
of dispersion will be
54
Measures of Absolute Dispersion
Measures of absolute dispersion are expressed in the units of
the original observations.
There are three main measures of absolute dispersion:
The range
The semi-interquartile range (SIR)
Variance / standard deviation
55
The Range
The range is defined as the difference between the largest score in the set
of data and the smallest score in the set of data, XL – XS
The range is used when
◦ you have ordinal data or
◦ you are presenting your results to people with little or no knowledge of
statistics
What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
56
The Standard Deviation and the Variance
Variance is the mean of the squared deviation scores
The larger the variance is, the more the scores deviate, on
average, away from the mean
The smaller the variance is, the less the scores deviate, on
average, from the mean
57
The Standard Deviation and the Variance
When the deviate scores are squared in variance, their unit of measure is
squared as well
◦ E.g. If people’s weights are measured in pounds, then the variance of the
weights would be expressed in pounds2 (or squared pounds)
Since squared units of measure are often awkward to deal with, the square
root of variance is often used instead
◦ The standard deviation is the square root of variance
58
The Standard Deviation and the Variance
Sample- Population-
s: Standard Deviation σ: Standard Deviation
s2: Variance σ2: Variance
N is the population.
n is the sample.
59
Computational Formula Example
xi xi - (xi -)
2
9
8
6
5
8
6
= 42 = 0 = 12
60
Computational Formula Example
xi xi - (xi -)
2
9
8
6
5
8
6
= 42 = 0 = 12
61
Measures of Relative Dispersion
Measures of relative dispersion are unit-less and are used
when one wishes to compare the scatter of one
distribution with another distribution.
62
Coefficient of Variation
The Coefficient of Variance, CV, is the ratio of the
standard deviation (SD) to the mean and is usually
expressed in percentage. It is computed as
𝑆𝐷 𝜎
𝐶𝑉 = ∗ 100% = ∗ 100%
𝑚𝑒𝑎𝑛 𝜇
64
Coefficient of Variation
Example: A laboratory technician studied recent instruments made with two
different instruments. The 1st measured the diameter of a ball bearing and
obtained a mean of 4.96 mm with SD of 0.022 mm. the second ball measured the
diameter of a metal rod and obtained a mean of 6.48 mm with SD of 0.032 mm.
which of the two was relatively more precise?
Solution:
0.022 mm
Instrument #1: 𝐶𝑉1 = × 100% = 0.44%
4.96 mm
0.032 mm
Instrument #2: 𝐶𝑉2 = × 100% = 0.49%
6.48 mm
67
Standard Score
Example: Mario got a grade of 75% in English and a grade of
90% in History. The mean grade in English is 65% and SD is
10%, whereas in History, the mean grade is 80% and SD is
20%, in which subject did Mario perform well?
𝑥 − 𝑥ҧ 𝑥 − 𝑥ҧ
𝑧𝐻𝑖𝑠𝑡𝑜𝑟𝑦 = 𝑧𝐸𝑛𝑔ℎ𝑙𝑖𝑠ℎ =
𝑠 𝑠
90−80 75−65
= =
20 10
= 0.5 = 1.0