Chapter 4 Data Management

Data Management
CHAPTER 4
Introduction
Statistics – Branch of Science that deals
with the COLLECTION, ORGANIZATION,
PRESENTATION, ANALYSIS and
INTERPRETATION of data.
Introduction
Major Areas:
• Descriptive Statistics – Concerned with the
collecting and describing a set of data so as to
yield meaningful observations.
• Inferential Statistics – Deals with the analysis of
a subset of data leading to predictions or inference
about the entire set of data.
Introduction
Definition of Terms:
Population – Totality of the
observations.
Sample – Subset of population
Introduction
Population and Sample.
Population: All First Year
students
(BS Accountancy, BS Nursing,
Sample:
BS Psychology, etc.)
All First Year BS
Agriculture
Students
Introduction
Experimental Unit Variable
Individual or object on which Attribute or characteristics of
variable is being measure person or object which can be
assume different values or labels
Object treated in an experiment
The value of the variable can
“vary” from one entity to
another
Introduction
Height
Number of Leaves
Fruit Bearing (Yes or No)
Flower Bearing (Yes or No)
Diameter of Branch
Specie
Experimental Unit Variables

Introduction
Dependent Independent
Variable Variable
Variable the experimenter changes
A variable whose value depends on or controls and is assumed have a
another variable direct effect on the Dependent
Variable
Measured
Controlled
Effect
Cause
Introduction
Types of Variables
Qualitative Quantitative
Variable Variable
Attribute or Characteristics Attribute or

whose values are assigned Characteristics whose
values are assigned
categorically
numerically
Introduction
Types of Variables
Height
Number of Leaves
Fruit Bearing (Yes or No)
Flower Bearing (Yes or No)
Circumference of Branch
Specie
Qualitative Quantitative
Variable Variable
Introduction
Types of Quantitative Variables
Discrete Continuous
These are quantitative These are quantitative

variables whose values are variables whose values are
Countable Measurable
Introduction
Introduction
Types of Quantitative Variables
Height Number of Leaves
Diameter of Branch
Discrete Continuous
A. Data Gathering, Organizing,
Representing and Interpreting
1. Data Gathering (Methods):
 Direct or Interview – person-to-person encounter,
the interviewee and the interviewer. Can be done
personal, phone or internet.
 Indirect or Questionnaire – questionnaire is used
to elicit.
 Registration – obtains data from the records or
government agency authorized by law to keep such
data or information and made these available to the
researcher
 Observation – Data are obtained through
observation mostly on behavior of individual or
group given a situation.
 Experimental – Data are gathered from result of
series of experiments on some variables. Mostly
used in Scientific Inquiries.
2. Data Organization and Presentation
 Levels of Measurement:
 Nominal Scale – assigns names or labels or
category.
 Ordinal Scale – assigns names or labels with
order or ranking.
 Levels of Measurement:
 Interval Scale – Unit of measurement is
arbitrary and there is no “true zero point”
 Ratio Scale – has “true zero point”
 Ways to present data:
 Textual – words, sentences, paragraphs.
 Tabular – rows and columns. Data are classified
in array.
 Graphical – pictorial form. Graphs, symbols and
visual aids.
 Tabular Presentation:
 Simple
 Focus on data
 Meaning and significance of information being
presented clear.
 Statistical Table should have the following parts:
 Heading (Table Number, Title and Head note)
 Box Head
 Stub
 Footnote
 Source note
 Good chart should possess the following:
 Accurate
 Simple
 Clear
 Attractive
 Types of Graphs:
 Line Graph
 Bar Graph
 Pie Graph
 Pictograph or Pictogram
 2 Ways of organizing collected numerical data:
 Array
 FDT or Frequency Distribution Table
 Frequency Distribution Table:
 Classes
 Class Frequency
 Class Mark or Class Midpoint
 Cumulative Frequency
 Relative Frequency
Class Frequency Class Class Relative Relative <CF >CF
Interval Mark Boundary Frequency Frequency %
70 80 75 82 72 83 81 81 75 85
96 94 82 71 85 82 75 85 76 86
87 88 88 75 78 77 91 92 90 79
87 74 79 77 86 89 74 84 83 82
70 71 72 74 74 75 75 75 75 76
77 77 78 79 79 80 81 81 82 82
82 82 83 83 84 85 85 85 86 86
87 87 88 88 89 90 91 92 94 96
Data Organization and Presentation
Frequency Distribution Table:
Class Interval Frequency Class Mark Class Relative Relative <CF >CF
Boundary Frequency Frequency %
3. Data Analysis and Interpretation
 Descriptive Statistics:
 Measure of Central Tendency
 Measure of Dispersion
 Measure of Skewness and Kurtosis.
 Inferential Statistics are techniques wherein
samples can be used to make generalizations
about the populations from which samples were
drawn.
 Inferential Statistic arise out of the fact that
sampling naturally incurs sampling error and thus
a sample is not expected to perfectly represent
the population.
 Methods of Inferential Statistics: Estimation of
Parameter and Hypothesis Testing.
B. Measures of Central Tendency
Measure of Central Tendency are measures
indicating the center of a set of data which are
arranged in order of magnitude. It is described as
the point about which the scores tend to cluster,
hence, regarded as a sort of average in the series.
It is a single number which described the totality
of the set of data collected.
1. Mean or Arithmetic mean (or Average) – most
popular and well-known measure of central
tendency. It can be used both discrete and
continuous data.
Weighted Mean – the weight is considered in
computation.
Properties of Mean:
1. Sum of deviation is zero.
2. Sum of the squared deviations of the
observations from the mean is minimum.
3. Mean reflects the magnitude of every
observation, since every observation contributes
to the value of the mean.
Properties of Mean:
4. The mean can be easily affected by outliers.
5. The mean of subgroups may be combined when
properly weighted, the combined mean is called
weighted mean.
2. Median – is the middle score for a set pf data
arranged in order of magnitude. Median is best
used when data has several extreme entries.
Grouped –
[ ]
Properties of Median:
1. Not affected by outlier.
2. Sum of absolute deviation is minimum.
3. Not amenable for further computation and hence
cannot be combined in the same manner as the mean.
4. Median of grouped can be calculated even with
open-ended interval provided the median is not open-
ended.
3. Mode – most frequent score in the data set.
Sometime considered as the most popular option.
Grouped –
[ ]
C. Measure of Dispersion
Identify how a set of values spread or fluctuates.
• Range
• Mean absolute deviation or variance
• Standard Deviation
• Coefficient of Variation
• Coefficient of Skewness
• Boxplot
Range – difference between highest and lowest
score.
Ungrouped – R = |Max – Min|
Grouped – RG = |ULHC – LLLC|
Properties:
1. Quick but rough measure of dispersion.
2. The larger the value, more dispersed.
3. Considers only Highest and Lowest value.
Mean absolute deviation or Variance – Simplest
method of taking into account the variations or
the spread ability of all items into a series from the
point of central tendency.
σ2 – Population variance
s2 – Sample variance
Formula:
(Computational Formula)
Formula (Ungrouped):
(Computational Formula)
Formula (Grouped):
Properties:
1. Always non-negative.
2. The larger the value the more dispersed.
3. Easily be manipulated.
4. Each observation contributes to the magnitude
of variance.
5. Unit is the squared unit of the original data.
Standard Deviation – is based on the deviations of
all the scores in the series. Always computed from
the mean. Positive square root of the variance.
=
= =
Properties:
Same properties with the variance except for the
unit of measure. The unit of measure for Standard
Deviation is same as the original data.
Coefficient of Variation – also known as the
relative dispersion, is the ratio of the standard
deviation and the mean and is usually expressed in
percent.
CV = CV =
No unit of measure.
The higher the value of CV, the more dispersed.
Skewness – Measure how asymmetric the
distribution of data from the mean.
If Mean = Median = Mode, the SK is zero
If Mean > Median > Mode, SK is positive.
If Mean < Median < Mode, SK is negative.
Kurtosis – Peakedness and flatness of the
distribution.
Peaked – Leptokurtic. K>3
Normal – Mesokurtic. K=3
Flat – Platykurtic. K<3
D. Measure of Relative Position
Percentile – Divides the whole data set into 100
equal parts.
Decile – Divides the whole data set into 10 equal
parts.
Quartile – Divides the whole data set into 4 equal
parts.
References:
• Cordial, R. R., et al. (2018). Mathematics in the modern world.
Panday-Lahi Publishing House, Inc. Muntinlupa City.
• Walpole, R. E. (1997). Introduction to statistics. Prentice-Hall
International. Singapore.

Chapter 4 Data Management

Uploaded by

Copyright:

Available Formats

Chapter 4 Data Management

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 Data Management

Uploaded by

Copyright:

Available Formats

Data Management

Experimental Unit Variables

Attribute or Characteristics Attribute or

These are quantitative These are quantitative

You might also like