Chapter 4 Data Management
Chapter 4 Data Management
Chapter 4 Data Management
CHAPTER 4
Introduction
Statistics – Branch of Science that deals
with the COLLECTION, ORGANIZATION,
PRESENTATION, ANALYSIS and
INTERPRETATION of data.
Introduction
Major Areas:
• Descriptive Statistics – Concerned with the
collecting and describing a set of data so as to
yield meaningful observations.
• Inferential Statistics – Deals with the analysis of
a subset of data leading to predictions or inference
about the entire set of data.
Introduction
Definition of Terms:
Population – Totality of the
observations.
Sample – Subset of population
Introduction
Population and Sample.
Population: All First Year
students
(BS Accountancy, BS Nursing,
Sample:
BS Psychology, etc.)
All First Year BS
Agriculture
Students
Introduction
Experimental Unit Variable
Individual or object on which Attribute or characteristics of
variable is being measure person or object which can be
assume different values or labels
Object treated in an experiment
The value of the variable can
“vary” from one entity to
another
Introduction
Height
Number of Leaves
Fruit Bearing (Yes or No)
Flower Bearing (Yes or No)
Diameter of Branch
Specie
Qualitative Quantitative
Variable Variable
Discrete Continuous
70 80 75 82 72 83 81 81 75 85
96 94 82 71 85 82 75 85 76 86
87 88 88 75 78 77 91 92 90 79
87 74 79 77 86 89 74 84 83 82
A. Data Gathering, Organizing,
Representing and Interpreting
2. Data Organization and Presentation
Frequency Distribution Table:
70 71 72 74 74 75 75 75 75 76
77 77 78 79 79 80 81 81 82 82
82 82 83 83 84 85 85 85 86 86
87 87 88 88 89 90 91 92 94 96
A. Data Gathering, Organizing,
Representing and Interpreting
Data Organization and Presentation
Frequency Distribution Table:
Class Interval Frequency Class Mark Class Relative Relative <CF >CF
Boundary Frequency Frequency %
A. Data Gathering, Organizing,
Representing and Interpreting
3. Data Analysis and Interpretation
Descriptive Statistics:
Measure of Central Tendency
Measure of Dispersion
Measure of Skewness and Kurtosis.
A. Data Gathering, Organizing,
Representing and Interpreting
3. Data Analysis and Interpretation
Inferential Statistics are techniques wherein
samples can be used to make generalizations
about the populations from which samples were
drawn.
A. Data Gathering, Organizing,
Representing and Interpreting
3. Data Analysis and Interpretation
Inferential Statistic arise out of the fact that
sampling naturally incurs sampling error and thus
a sample is not expected to perfectly represent
the population.
Methods of Inferential Statistics: Estimation of
Parameter and Hypothesis Testing.
B. Measures of Central Tendency
Measure of Central Tendency are measures
indicating the center of a set of data which are
arranged in order of magnitude. It is described as
the point about which the scores tend to cluster,
hence, regarded as a sort of average in the series.
It is a single number which described the totality
of the set of data collected.
B. Measures of Central Tendency
1. Mean or Arithmetic mean (or Average) – most
popular and well-known measure of central
tendency. It can be used both discrete and
continuous data.
Weighted Mean – the weight is considered in
computation.
B. Measures of Central Tendency
Properties of Mean:
1. Sum of deviation is zero.
2. Sum of the squared deviations of the
observations from the mean is minimum.
3. Mean reflects the magnitude of every
observation, since every observation contributes
to the value of the mean.
B. Measures of Central Tendency
Properties of Mean:
4. The mean can be easily affected by outliers.
5. The mean of subgroups may be combined when
properly weighted, the combined mean is called
weighted mean.
B. Measures of Central Tendency
2. Median – is the middle score for a set pf data
arranged in order of magnitude. Median is best
used when data has several extreme entries.
Grouped –
[ ]
B. Measures of Central Tendency
Properties of Median:
1. Not affected by outlier.
2. Sum of absolute deviation is minimum.
3. Not amenable for further computation and hence
cannot be combined in the same manner as the mean.
4. Median of grouped can be calculated even with
open-ended interval provided the median is not open-
ended.
B. Measures of Central Tendency
3. Mode – most frequent score in the data set.
Sometime considered as the most popular option.
Grouped –
[ ]
C. Measure of Dispersion
Identify how a set of values spread or fluctuates.
• Range
• Mean absolute deviation or variance
• Standard Deviation
• Coefficient of Variation
• Coefficient of Skewness
• Boxplot
C. Measure of Dispersion
Range – difference between highest and lowest
score.
Ungrouped – R = |Max – Min|
Grouped – RG = |ULHC – LLLC|
C. Measure of Dispersion
Properties:
1. Quick but rough measure of dispersion.
2. The larger the value, more dispersed.
3. Considers only Highest and Lowest value.
C. Measure of Dispersion
Mean absolute deviation or Variance – Simplest
method of taking into account the variations or
the spread ability of all items into a series from the
point of central tendency.
σ2 – Population variance
s2 – Sample variance
C. Measure of Dispersion
Formula:
(Computational Formula)
C. Measure of Dispersion
Formula (Ungrouped):
(Computational Formula)
C. Measure of Dispersion
Formula (Grouped):
C. Measure of Dispersion
Properties:
1. Always non-negative.
2. The larger the value the more dispersed.
3. Easily be manipulated.
4. Each observation contributes to the magnitude
of variance.
5. Unit is the squared unit of the original data.
C. Measure of Dispersion
Standard Deviation – is based on the deviations of
all the scores in the series. Always computed from
the mean. Positive square root of the variance.
=
= =
C. Measure of Dispersion
Properties:
Same properties with the variance except for the
unit of measure. The unit of measure for Standard
Deviation is same as the original data.
C. Measure of Dispersion
Coefficient of Variation – also known as the
relative dispersion, is the ratio of the standard
deviation and the mean and is usually expressed in
percent.
CV = CV =
No unit of measure.
The higher the value of CV, the more dispersed.
C. Measure of Dispersion
Skewness – Measure how asymmetric the
distribution of data from the mean.
If Mean = Median = Mode, the SK is zero
If Mean > Median > Mode, SK is positive.
If Mean < Median < Mode, SK is negative.
C. Measure of Dispersion
Kurtosis – Peakedness and flatness of the
distribution.
Peaked – Leptokurtic. K>3
Normal – Mesokurtic. K=3
Flat – Platykurtic. K<3
D. Measure of Relative Position
Percentile – Divides the whole data set into 100
equal parts.
Decile – Divides the whole data set into 10 equal
parts.
Quartile – Divides the whole data set into 4 equal
parts.
References:
• Cordial, R. R., et al. (2018). Mathematics in the modern world.
Panday-Lahi Publishing House, Inc. Muntinlupa City.
• Walpole, R. E. (1997). Introduction to statistics. Prentice-Hall
International. Singapore.