Data Management
Data Management
Data Management
WORLD
Data Management
MAIN
BRANCHES
Inferential Statistics refers to OF
generalizing from a sample to a
STATISTICS
population, estimating unknown
population parameters, drawing
conclusions, and making
decisions .
Population refers to all the items
Population and sampleSample is a subset or portion of
(infinite or finite) that we are
interested in. It consists of the
the population. It involves
totality of the observations,
looking only at some items
individuals, or objects in which
selected from a population.
the investigator/ researcher is
interested in.
Parameter – is a value calculated using all the
data from a population.
PARAMETER and STATISTIC
Statistic – is a value calculated using the data
from the sample
What is variable?
A Variable is a characteristic of interest about an object under investigation
that can take on different possible outcomes, such as age, hair, color, height,
weight, and religious preference.
Two kinds of Variables
Qualitative Variables – These are variables that can be placed into distinct
categories, according to some characteristics or attributes.
Quantitative Variables – These are numerical and can be ordered or ranked.
Also, these consist of two types: Discrete and Continuous.
Discrete are frequencies, obtained by means of counting.
Continuous are represented by measurement values.
Data
Data is a set of values collected from the
variable from each of the subjects that
belong to the sample. It refers to a
collection of natural phenomena
descriptors such as results from
experiences, observations or
experiments, or a set of premises. It may
consist of numbers, words, or images.
Data can be classified according to the
type of variable for which it was drawn.
There are two general types of data
according to how the data vary across
cases:
Categorical data have values that are described
by words rather than numbers. It is of limited
statistical use. On occasions, the values of these
variables might be represented using numbers.
This is called coding. Example: 1= cash; 2=check;
3=credit/debit; 4=gift card
Coding a category as number does not make the
data numerical and the numbers do not typically
Types of imply a rank. Example: 1= Bachelor’s; 2 = Master’s;
3 = Doctorate
Data Numerical Data arise from counting, measuring
something, or some kind of mathematical
operation. Example: number of insurance claims;
sales of the last quarter; accounting data;
economic indicators; financial ratios.
Two types of Numerical data: discrete (distinct
number or integer) and continuous (any value
within an interval).
NOMINAL LEVEL
From the Latin nomen,
meaning “name” and the weakest
level of measurement
It merely identify a category
LEVEL
It can be treated as nominal but not vice versa.
INTERVAL It is rank data and has meaningful interval between scale points.
LEVEL
Since intervals between numbers represent distances, mathematical
operations can be done such as taking the “average.
R = f/ N (100%)
MEASURES OF
CENTRAL TENDENCY
Types of Measures for
Center
Once the data are collected, it is
useful to summarize the data set by
identifying a value around which the
data are centered.
Mean = P101,200
Median = P36,000
Mode = P20,000
Most of the employees of this company would probably
agree that the median of P36,000 better represents the average
of the salaries than does either the mean or the mode.
MEASURES OF
DISPERSION
Types of Measures of
Dispersion or Variability
Another important feature that can help us
understand more about a data set is the way the data
are distributed.
Range is the difference between the largest value
(maximum) and the smallest value (minimum) in the
data.
Standard deviation is an extremely important
measure of spread that is based on the mean. It is a
measure of the average deviation for all the data
point from the mean.
Variance is the square of the standard deviation of
the data. It does not use the same unit of measure
as the original data.
Illustration in computing
EXAMPLE: A consumer group has tested a sample of 8 size AAA
batteries from each of 3 companies. The results of the tests are shown
in the following table. According to these tests, which company
produces batteries for which the values representing hours of constant
use have the smallest standard deviation?
Company Hours of constant use per battery
EverSoBright 6.2, 6.4, 7.1, 5.9, 8.3, 5.3, 7.5, 9.3
Dependable 6.8, 6.2, 7.2, 5.9, 7.0, 7.4, 7.3, 8.2
Beacon 6.1, 6.6, 7.3, 5.7, 7.1, 7.6, 7.1, 8.5
Measures of Relative Position
Measures of Position, are used to locate the relative position of
value in the data set. These measures are:
Percentile = x 100
23 25 19 21 28 15 20 24 22 27
percentile = x 100
percentile = 65 percentile
• This means that a student with a score 24 did better than 65% of the
class.
Quartiles- are positional measures that divide the distribution into
four parts, such as first quartile (Q 1), second quartile (Q2), and
third quartile (Q3).
2, 4, 4, 5, 6, 7, 8
Q1 Q2 Q3
lower middle upper
This means that 50% of the students in the class got a score 0f 5 or less. 25% of the students
obtained a score of 4 or below, and 75% of the students got a score of 7 or below. Equivalently, this
Q1 Q2 Q3
interquartile range
= Q3 - Q1
Box and whisker Plot
• We can show all the important values in a box and whisker plot,
like this:
Example: Box and whisker Plot and interquartile range for: Q. 4,
17, 7, 14, 18, 12, 3, 16, 10, 4, 4, 11
Quartile 1 = = 4
Quartile 2 = = 10.5
Quartile 3 = = 15
Sample z =
where = observed value
x̄ = sample mean
s = sample standard deviation
Example 1. Johnny scored 72 in a quiz in algebra for which the
average score of the class was 65 with a standard deviation of 8.
He also took a qiz in statistics and scored 60 for which the
average score of the class was 45, and the standard deviation
was 12. Relative to other students in the class, did Johnny do
better in Algebra or Statistics?
Solution. Compute the z-score of Johnny’s scores for each quiz.
For Algebra, For Statistics,
z= z=
z = 0.875 z = 1.25
• This indicate that relative to his classmates, Johnny scored
better in Statistics than in Algebra.
Example 2. A national achievement test is administered annually
to 3rd graders. The test has a mean score of 100 and a standard
deviation of 15. If Jane’s z-score is 1.20, What was her score on
the test?
Solution.
Solving for Jane’s test score (x), we get