Data Management Part 1 2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

DATA MANAGEMENT: (Statistics)

Definition of Statistics
Statistics is used in everyday life, which people do not realize.
The science of classification and manipulation of data in
order to draw inferences.
Statistics is derived from the Latin word "status" meaning
state.
◦ Two basic meanings of the word Statistics:
1. It refers to actual numbers derived from the data.
2. It refers as method of analysis.
Definition of Statistics
Statistics is a collection of quantitative data, such as
statistics of crimes, statistics of enrolment, statistics
of unemployment. Statistics is also the study of how
to collect, organize, analyze, and interpret
numerical information from data.
 It simplifies mass of data (condensation);

 Helps to get concrete information about any


problem;
 Helps for reliable and objective decision
making;
Importance of  It presents facts in a precise & definite form;
Statistics  It facilitates comparison (Measures of central
tendency and measures of dispersion);
 It facilitates Predictions (Time series and
regression analysis are the most commonly used
methods towards prediction.);

 It helps in formulation of suitable policies;


In Engineering – The engineer samples a product
quality characteristics along with various
controlled process variable to assist in locating
important variable related to product quality.

Importance of In Manufacturing – Newly manufactures fuses


are sampled before shipping to decide whether
Statistics to ship or hold individual lots.

Quality Control: Determining techniques for


evaluation of quality through adequate sampling,
in process control, consumer survey and
experimental design in product development etc.
Two Kinds of Statistics

Inferential Statistics – Deals with


making generalizations about a body
Descriptive Statistics – Deals with
of data where only part of it is
the methods of organizing,
examined. This comprises those
summarizing and presenting a mass
methods concerned with the
of data so as to yield meaningful
analysis of a subset of data leading
information.
to predictions or inferences about
the entire set of data.
POPULATION
AND SAMPLE
VARIABLE
oRefers to any
characteristics of
interest
measureable on
each and every
individual in the
population.
oOthers also call
this to as data
Data that is numerical, counted, or
compared on a scale
Quantitative ◦ Demographic data
Variable ◦ Answers to closed-ended survey
items
◦ Scores on standardized instruments
Classification of Quantitative Variable
Discrete quantitative Continuous
variable quantitative variable
results from either a results from infinitely
finite number of many possible values
possible values or a that can be associated
countable number of with points on a
possible values continuous scale
Narratives, logs, experience
◦Focus groups
◦Interviews
Qualitative ◦Open-ended survey items
Variable ◦Diaries and journals
◦Notes from observations
◦Photographs or Video
Recordings
Levels of Measurement
Level 1. Nominal is characterized by
data that consists of names, labels, or
categories only.
Levels of Ex. name civil status
Measurement sex religion
address degree program
Level 2. Ordinal involves data that may
arranged in some order, but
differences between data values either
cannot be determined or are
Levels of meaningless.
Measurement military rank
job position
year level
Level 3. Interval is like the ordinal level,
with the additional property that
meaningful amounts of differences
Levels of between data can be determined.
However, there is no inherent (natural)
Measurement zero starting point.
IQ score
temperature (in 0C)
Level 4. Ratio – the interval level
modified to include the inherent zero
starting point. For values at this level,
Levels of differences and ratios are meaningful.
Measurement height
area
weekly allowance
Levels of Measurement
Textual Method

METHODS OF
DATA Tabular Method
PRESENTATION

Graphical Method
Textual Method
Textual method uses a narrative description of the data gathered.

The survey returns showed that 54


percent of the respondents
indicated they believe their weight
is ideal for their height, and they
also think that they are in overall
good health.

"However, when the respondents


gave their actual height and
weight, it appeared that only 45%
fit the normal category (applying
the Asian body mass index range,
or BMI).
Tabular Method
Tabular Method is a systematic arrangement of information into columns
and rows.
Frequency Distribution Table (FDT)
1. Qualitative FDT
2. Quantitative FDT
Tabular Method
FDT for Qualitative or Categorical Data
- the data are grouped according to some qualitative characteristics / non-numerical
categories. TABLE
TABLE 2: Frequency Distribution of the Brands of Gas Range Used HEADING
CAPTION

STUBS / BODY
CLASSES
Graphical Method
Qualities of a Good Graph
1. It is accurate
2. It is clear
3. It is simple
4. It has a good appearance

Qualitative Data Quantitative Data

1. Pie Chart 1. Scatter Graph


2. Column or Bar Graph 2. Line Chart
3. Frequency Histogram
4. Relative Frequency Histogram
5. Frequency Polygon
6. Ogives
Common
Types of Graph
Common
Types of Graph
Common
Types of Graph
Common
Types of Graph
Graphical
Presentation of
the Frequency
Distribution
Table
Graphical
Presentation of
the Frequency
Distribution
Table
Graphical
Presentation of
the Frequency
Distribution
Table
MEAN
Measure of
Central
Tendency
- any single value that
is used to identify the
MEDIAN
“center” of the data
or typical value.

MODE
MEAN MEDIAN MODE
• Sum of all observed • Defined as the • The observed value that occurs
values divided by the positional middle most frequently.
number of value when • The data is said to be unimodal if
observations observations are there is only one mode, bimodal if
ordered from smallest there are two modes, trimodal if
to largest (or vice there are three modes.
versa)
Quantitative Data Quantitative Data Quantitative & Qualitative Data

• Most popular • Extreme values do • May not exist


measure of central not affect the median • May also not be unique
location as strongly as they do • Extreme values do not affect the mode.
• Affected by extreme the mean • Not necessarily unique - may have more
values • Useful when than one value.
• It is unique - there is comparing sets of • When no values repeat in the data set, the
only one answer. data mode is every value and is useless.
• Useful when • It is unique - there is • When there is more than one mode, it is
comparing sets of only one answer. difficult to interpret and/or compare
data.
Mean
Median
and
Mode
of ungrouped data
The Arithmetic Mean n

x i
Formula for getting the mean of ungrouped data: X = i =1 , where n
n
is the number of observations
EXAMPLE#1: MEAN
Data: 4 6 5 7 3 4 5 4
EXAMPLE#2: MEAN
Data: Scores of 14 students in Math122a Midterm exam
72 83 84 82 72 80 79 80 76 80 85 79 90 91
 What is the mean?
 What if a 15th student took the Midterm exam just by guessing and got a
score of 10?
 What happens to the mean?
The Median
How to get the median of ungrouped data:
• Arrange the scores in ascending or descending order.
• If n is odd, the median is the middle score, if n is even the median is the average
of the two middlemost score.(n is the number of observations)
For values of Xi, for i = 1,2,3, …, n
M d = X n +1 For n that is odd
2

For n that is even

EXAMPLE#1: MEDIAN Data: 4 6 5 7 3 4 5 4


EXAMPLE#2: MEDIAN Data: 4 6 5 3 4 5 4
The Mode
How to find the mode of ungrouped data:
✓ Simply find the score or the value that occurs the most
EXAMPLE#1: MODE
Data: 4 6 5 7 3 4 5 4
EXAMPLE#2: MODE
Data: 72 83 64 82 71 60 79
EXAMPLE#3: MODE
Data: Blood Type of 20 patients in UMC
A, A, AB, O, O, B, A, O, O, O, A, A, A, B, B, O, B, B, B, AB
Examples:
1. A statistics class of 60 students took a quiz. In this class, 18
students scored 4 points, 15 students scored 3 points, 9
students scored 2 points, 12 students scored 1 point, and 6
students scored 0.
a. Find the Mean
b. Find the Mode
c. Find the Median
Examples:
2. If the mean of five values is 8.2 and four of the values are 6,
10, 7, and 12. Find the fifth value.
3. The life car of batteries (in years) were obtained by
manufacturing company and the following information were
taken: 3.2, 1.8, 2.1, 3.5, 4.0, 2.4, 2.7, 3.1, 3.6, and 4.2. What is the
median?
Examples:
4. The number of buses observed in ten major roads
in Metro Manila are 667, 705, 645, 705, 800, 759, 724,
759, 769 and 750.
a. Find the mean.
b. Find the median.
5. What happens to the median if two observations
are included with counts of 150 and 1,500 buses? Will
it increase, decrease or will be the same.
The Weighted Mean
The weighted mean of the n numbers x1 , x2 , x3, ...xn

With the respective assigned weights is w1 , w2 , w3 ...wn

Weighted mean =
 ( x.w )
w
Where  ( x.w) is the sum of the products of the number and its
assigned weight, and  w is the sum of all the weights.
Examples:
The table below shows Vincent’s first semester course grades.
Use the weighted mean formula to find Vincent’s GPA for the
semester.
Course Course Course Grade Point grade
grade units A 4
MMW A 3 B 3
Calculus B 4 C 2
Chemistry C 3 D 1
P.E. D 2 F 0
Examples
Ages of Science Fair Contestants
Age Frequency
7 3 Find the mean, the median
8 4 and all modes for the data
9 6 in the given table.
10 15
11 11
12 7
13 1
Measure of PERCENTILE
Location
- values below which
a specified fraction or
percentage of the
DECILE
observations in a
given set must fall

QUARTILE
Absolute Dispersion
Measure of - range, variance, standard
Dispersion deviation
- indicate the extent
to which individual
items in a series are
scattered about an Relative Dispersion
average.
- Coefficient of variation,
standard score
1

Measures of Dispersion
Absolute Dispersion
Measure of - range, variance, standard
125

Dispersion deviation
100
75

Which of the
- indicate the 50
25

distributions of
extent to which 0
1 2 3 4 5 6 7 8 9 10
individual items in a
scores has theRelative 125Dispersion
series are scattered
larger dispersion?
about an average. 100
- Coefficient of variation,
75
50

standard score
25
0
1 2 3 4 5 6 7 8 9 10
Measures of dispersion
Measures of dispersion indicate the extent to which individual items in a
series are scattered about an average.

◦ The more similar the scores are to each other, the lower the measure of
dispersion will be
◦ The less similar the scores are to each other, the higher the measure of
dispersion will be
◦ In general, the more spread out a distribution is, the larger the measure
of dispersion will be

54
Measures of Absolute Dispersion
 Measures of absolute dispersion are expressed in the units of
the original observations.
 There are three main measures of absolute dispersion:
The range
The semi-interquartile range (SIR)
Variance / standard deviation

55
The Range
The range is defined as the difference between the largest score in the set
of data and the smallest score in the set of data, XL – XS
The range is used when
◦ you have ordinal data or
◦ you are presenting your results to people with little or no knowledge of
statistics
What is the range of the following data:
4 8 1 6 6 2 9 3 6 9

Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
56
The Standard Deviation and the Variance
 Variance is the mean of the squared deviation scores
 The larger the variance is, the more the scores deviate, on
average, away from the mean
 The smaller the variance is, the less the scores deviate, on
average, from the mean

57
The Standard Deviation and the Variance
When the deviate scores are squared in variance, their unit of measure is
squared as well
◦ E.g. If people’s weights are measured in pounds, then the variance of the
weights would be expressed in pounds2 (or squared pounds)
Since squared units of measure are often awkward to deal with, the square
root of variance is often used instead
◦ The standard deviation is the square root of variance

58
The Standard Deviation and the Variance
Sample- Population-
s: Standard Deviation σ: Standard Deviation
s2: Variance σ2: Variance

N is the population.
n is the sample.
59
Computational Formula Example
xi xi - (xi -)
2

9
8
6
5
8
6
 = 42  = 0  = 12

60
Computational Formula Example
xi xi - (xi -)
2

9
8
6
5
8
6
 = 42  = 0  = 12

61
Measures of Relative Dispersion
 Measures of relative dispersion are unit-less and are used
when one wishes to compare the scatter of one
distribution with another distribution.

 Some measures of absolute dispersion:


 Coefficient of Variation
 Standard Score

62
Coefficient of Variation
The Coefficient of Variance, CV, is the ratio of the
standard deviation (SD) to the mean and is usually
expressed in percentage. It is computed as

𝑆𝐷 𝜎
𝐶𝑉 = ∗ 100% = ∗ 100%
𝑚𝑒𝑎𝑛 𝜇

It answers the question; how big is the SD of the


distribution to the mean of the distribution?
63
Coefficient of Variation
Example: A laboratory technician studied recent
instruments made with two different instruments. The 1st
measured the diameter of a ball bearing and obtained a
mean of 4.96 mm with SD of 0.022 mm. the second ball
measured the diameter of a metal rod and obtained a mean
of 6.48 mm with SD of 0.032 mm. which of the two was
relatively more precise?

64
Coefficient of Variation
Example: A laboratory technician studied recent instruments made with two
different instruments. The 1st measured the diameter of a ball bearing and
obtained a mean of 4.96 mm with SD of 0.022 mm. the second ball measured the
diameter of a metal rod and obtained a mean of 6.48 mm with SD of 0.032 mm.
which of the two was relatively more precise?
Solution:
0.022 mm
Instrument #1: 𝐶𝑉1 = × 100% = 0.44%
4.96 mm

0.032 mm
Instrument #2: 𝐶𝑉2 = × 100% = 0.49%
6.48 mm

∴ Instrument #1 is relatively more precise.


65
Standard Score
• It measures how many standard deviation is above or below the
mean. It is computed as
𝑥−𝜇
z=
𝜎
and the sample counterpart is
𝑥 − 𝑥ҧ
z=
𝑠
• Not really a measure of relative dispersion but related somehow.

• Useful for comparing 2 values from different series especially when


these 2 series differ with respect to the mean or SD or both are
expressed in different units.
66
Standard Score
Example: Mario got a grade of 75% in English and a grade of
90% in History. The mean grade in English is 65% and SD is
10%, whereas in History, the mean grade is 80% and SD is
20%, in which subject did Mario perform well?

67
Standard Score
Example: Mario got a grade of 75% in English and a grade of
90% in History. The mean grade in English is 65% and SD is
10%, whereas in History, the mean grade is 80% and SD is
20%, in which subject did Mario perform well?
𝑥 − 𝑥ҧ 𝑥 − 𝑥ҧ
𝑧𝐻𝑖𝑠𝑡𝑜𝑟𝑦 = 𝑧𝐸𝑛𝑔ℎ𝑙𝑖𝑠ℎ =
𝑠 𝑠
90−80 75−65
= =
20 10
= 0.5 = 1.0

∴ Mario perform well in English.


68

You might also like