Data Management

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

MATHEMATICS OF THE MODERN

WORLD
Data Management

Mary Joy P. Dio, LPT


A Mathematical Tool: Data Management
What is Statistics?
 Statistics is a branch of
applied mathematics
concerned with collecting,
organizing, and
interpreting data. It
attempts to infer the
properties of a large
collection of data from
inspection of a sample of
the collection thereby
allowing educated guesses
to be made with a
minimum of expense.
Descriptive Statistics refers to the
collection, presentation, and
summary of data (either using
charts and graphs or using a
numerical summary).

MAIN
BRANCHES
Inferential Statistics refers to OF
generalizing from a sample to a
STATISTICS
population, estimating unknown
population parameters, drawing
conclusions, and making
decisions .
Population refers to all the items
Population and sampleSample is a subset or portion of
(infinite or finite) that we are
interested in. It consists of the
the population. It involves
totality of the observations,
looking only at some items
individuals, or objects in which
selected from a population.
the investigator/ researcher is
interested in.
Parameter – is a value calculated using all the
data from a population.
PARAMETER and STATISTIC
Statistic – is a value calculated using the data
from the sample
What is variable?
 A Variable is a characteristic of interest about an object under investigation
that can take on different possible outcomes, such as age, hair, color, height,
weight, and religious preference.
 Two kinds of Variables
Qualitative Variables – These are variables that can be placed into distinct
categories, according to some characteristics or attributes.
Quantitative Variables – These are numerical and can be ordered or ranked.
Also, these consist of two types: Discrete and Continuous.
 Discrete are frequencies, obtained by means of counting.
 Continuous are represented by measurement values.
Data
 Data is a set of values collected from the
variable from each of the subjects that
belong to the sample. It refers to a
collection of natural phenomena
descriptors such as results from
experiences, observations or
experiments, or a set of premises. It may
consist of numbers, words, or images.
 Data can be classified according to the
type of variable for which it was drawn.
There are two general types of data
according to how the data vary across
cases:
 Categorical data have values that are described
by words rather than numbers. It is of limited
statistical use. On occasions, the values of these
variables might be represented using numbers.
This is called coding. Example: 1= cash; 2=check;
3=credit/debit; 4=gift card
Coding a category as number does not make the
data numerical and the numbers do not typically
Types of imply a rank. Example: 1= Bachelor’s; 2 = Master’s;
3 = Doctorate
Data  Numerical Data arise from counting, measuring
something, or some kind of mathematical
operation. Example: number of insurance claims;
sales of the last quarter; accounting data;
economic indicators; financial ratios.
Two types of Numerical data: discrete (distinct
number or integer) and continuous (any value
within an interval).
NOMINAL LEVEL
From the Latin nomen,
meaning “name” and the weakest
level of measurement
 It merely identify a category

LEVELS OF  These are data same as “qualitative”


, “categorical” , or “classification”
MEASUREMENT  These date are being coded
numerically. The codes are arbitrary
placeholders with no numerical
meaning.
 With these data, the only
permissible mathematical operations
are counting (e. g., frequencies)
ORDINAL Ordinal data codes connote a ranking of data values.

LEVEL
It can be treated as nominal but not vice versa.

There is no clear meaning to the distance between 1&2 or between


2&3, or between 3&4 (no clear meaning between “rarely” and “never”).

INTERVAL It is rank data and has meaningful interval between scale points.

LEVEL
Since intervals between numbers represent distances, mathematical
operations can be done such as taking the “average.

The absence of zero is a key characteristic of interval data.


RATIO LEVEL
• It has all the properties of
the other three data types
and being considered as
the strongest level of
measurement.
• It possess a meaningful
zero that represents the
absence of the quantity
being measured.
Data: (male, female, male, male,
female)

Table 1: Respondents in terms sex,


n=5
Sex Frequency %
male 3 60
female 2 40
TOTAL 5 100

R = f/ N (100%)
MEASURES OF
CENTRAL TENDENCY
Types of Measures for
Center
 Once the data are collected, it is
useful to summarize the data set by
identifying a value around which the
data are centered.

Mode – is the most frequently occurring


number in a data set.
Median – is the middle number or the
mean of the two middle numbers in an
ordered set of data.
Mean – is the numerical balancing point
of the data set.
 The mean is easy to compute. You only deal with one
number. It is not so with the median.
 The mean is affected by outliers while the median is
resistant. In a sense, the median can resist the pull of a
faraway value, but the mean is drawn to such values.
 A change in any of the numbers changes the mean,
and the mean can be changed drastically by changing
an extreme value.
 In contrast, the median and the mode of a set of data
are usually not changed by changing an extreme value.
 The mean, the median, and the mode are all averages;
however, they are generally not equal.
Example
 Which measure of center is most
useful?
 A teacher wants to know about
her student's family situation. She
asks for the number of children in
their families:
6 3 2 3 4 1 2 2 4 3 1 2
2 4
 A shoe manufacturer wants to
know the average shoe size of
women
 Another teacher wants to know
how well her class performed in a
long test.
Compare the mean, the median, and the mode for the salaries of
5 employees of a small company.

Salaries: P370,000 P60,000 P36,000 P20,000 P20,000

Mean = P101,200
Median = P36,000
Mode = P20,000
Most of the employees of this company would probably
agree that the median of P36,000 better represents the average
of the salaries than does either the mean or the mode.
MEASURES OF
DISPERSION
Types of Measures of
Dispersion or Variability
Another important feature that can help us
understand more about a data set is the way the data
are distributed.
 Range is the difference between the largest value
(maximum) and the smallest value (minimum) in the
data.
 Standard deviation is an extremely important
measure of spread that is based on the mean. It is a
measure of the average deviation for all the data
point from the mean.
 Variance is the square of the standard deviation of
the data. It does not use the same unit of measure
as the original data.
Illustration in computing
 EXAMPLE: A consumer group has tested a sample of 8 size AAA
batteries from each of 3 companies. The results of the tests are shown
in the following table. According to these tests, which company
produces batteries for which the values representing hours of constant
use have the smallest standard deviation?
Company Hours of constant use per battery
EverSoBright 6.2, 6.4, 7.1, 5.9, 8.3, 5.3, 7.5, 9.3
Dependable 6.8, 6.2, 7.2, 5.9, 7.0, 7.4, 7.3, 8.2
Beacon 6.1, 6.6, 7.3, 5.7, 7.1, 7.6, 7.1, 8.5
Measures of Relative Position
Measures of Position, are used to locate the relative position of
value in the data set. These measures are:

Percentiles- are measures of relative position that divide the


distribution into 100 parts.

Percentile = x 100

25th percentile - also called the 1st quartile


50th percentile - is generally the median
75th percentile - also called the 3rd quartile
• Interquartile range - Q3-Q1
Example. A 30-point quiz was given to 10 students and the
scores are shown below. What is the percentile rank of 24?

23 25 19 21 28 15 20 24 22 27

Solution. Arrange the data in ascending order.


15 19 20 21 22 23 24 25 27 28

There are 6 values below 24


Determine the percentile using the formula.

percentile = x 100

percentile = 65 percentile

• This means that a student with a score 24 did better than 65% of the
class.
Quartiles- are positional measures that divide the distribution into
four parts, such as first quartile (Q 1), second quartile (Q2), and
third quartile (Q3).

• put the list of numbers in order


• Then cut the list into four equal parts
• The quartiles are at the “cuts”
Example. Find the value of Q1, Q2, and Q3 of the following
scores of students in a class.

Put then in order: 2,4,4,5,6,7,8


Cut the list into quarters

2, 4, 4, 5, 6, 7, 8
Q1 Q2 Q3
lower middle upper

quartile quartile quartile

This means that 50% of the students in the class got a score 0f 5 or less. 25% of the students

obtained a score of 4 or below, and 75% of the students got a score of 7 or below. Equivalently, this

means that 25% of the class got a score higher than 7.


Interquartile Range

The interquartile range is from Q1 to Q

Q1 Q2 Q3

25% 25% 25% 25%

interquartile range

= Q3 - Q1
Box and whisker Plot

• We can show all the important values in a box and whisker plot,
like this:
Example: Box and whisker Plot and interquartile range for: Q. 4,
17, 7, 14, 18, 12, 3, 16, 10, 4, 4, 11

Step1. Put them in order


3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18

Step2. Cut into quarters

3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18


In this case all the quartiles are between numbers:

Quartile 1 = = 4

Quartile 2 = = 10.5

Quartile 3 = = 15

also the lowest value is 3 and the highest value is 18


So now we have enough data for the box and whisker plot:

And the interquartile range


Q3 - Q1
15 - 4 = 11
Standard score or z-score - is the number os standard deviations that a
value is above the mean or below the mean of the data set. Observed
values above the mean have positive z-scores while values below the mean
have negative z-scores.

The standard score or z-score can be computed using the following


formulas:

Sample z =
where = observed value
x̄ = sample mean
s = sample standard deviation
Example 1. Johnny scored 72 in a quiz in algebra for which the
average score of the class was 65 with a standard deviation of 8.
He also took a qiz in statistics and scored 60 for which the
average score of the class was 45, and the standard deviation
was 12. Relative to other students in the class, did Johnny do
better in Algebra or Statistics?
Solution. Compute the z-score of Johnny’s scores for each quiz.
For Algebra, For Statistics,
z= z=
z = 0.875 z = 1.25
• This indicate that relative to his classmates, Johnny scored
better in Statistics than in Algebra.
Example 2. A national achievement test is administered annually
to 3rd graders. The test has a mean score of 100 and a standard
deviation of 15. If Jane’s z-score is 1.20, What was her score on
the test?

Solution.
Solving for Jane’s test score (x), we get

x = (z * s) + xbar This indicate that the test


x = (1.20 * 15) + 100 score of Jane is 118.
x = 118
THANK YOU!

You might also like