Descriptive Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

MODULE 3: DESCRIPTIVE STATISTICS

OBJECTIVES:
After successful completion of this module, you should be Data Presentation
able to:
✦ Distinguish the three main forms of data presentation. Data are usually collected in a raw format and thus
✦ Know the different parts of the table. the inherent information is difficult to understand.
✦ Choose appropriate diagrams/graphs to present a given set of Therefore, raw data need to be summarized,
data.
✦ Organize qualitative and quantitative data in tables.
processed, and analyzed to usefully derive
information from them. However, no matter how well
✦ Compute measures of central tendency, measures of variation and
measures of relative position of grouped and ungrouped data.
manipulated, the information derived from the raw
✦ Describe the shape of a distribution.
data should be presented in an effective format,
otherwise, it would be a great loss for both authors
✦ Identify regions under the normal curve corresponding to
and readers. Planning how the data will be presented
different standard normal values.
✦ Compute probabilities using the standard normal table and Excel.
is essential before appropriately processing raw data.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Presentation of Data Textual Presentation


Presentation of data refers to an exhibition • All the data is presented in the form of text,
or putting up data in an attractive and useful phrases, or paragraphs.
manner such that it can be easily interpreted. • It involves enumerating important
characteristics, emphasizing significant figures
The three main forms of presentation of data
and identifying important features of data.
are:
Textual Presentation • Text is the principal method for explaining
Tabular Presentation findings, outlining trends, and providing
contextual information.
Graphical Presentation
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Example: Advantage of Textual Presentation
A researcher is asked to present the performance of a section in ✦ The data would be more interpreted.
the statistics test. The following are the test scores:
34 42 20 50 17 9 34 43
✦ Can help in emphasizing some important points
50 18 35 43 50 23 23 35 in data.
37 38 38 39 39 38 38 39 ✦ Small sets of data can be easily presented.
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46 Remember!
The data presented in textual form would be like this: ✦ Keep your paragraphs simple and short.
In the statistics class of 40 students, 3 obtained the perfect
score of 50. Sixteen students got a score 40 and above, ✦ Always make sure that the readers are provided
while only 3 got 19 and below. Generally, the students with additional explanations about the relevance
performed well in the test with 23 or 70% getting a passing of the figures and its implications.
score of 38 and above.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Advantage of Tabular
Tabular Presentation: Presentation
• It is a systematic and logical arrangement of ✦ More information may be presented.
data in the form of Rows and Columns with
respect to the characteristics of data.

Exact values can be read from a table to
retain precision.
• A table is best suited for representing individual
information and represents both quantitative
✦ Flexibility is maintained without
and qualitative information. distortion of data.
✦ Less work and less cost are required in
the preparation.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Preparing Tables B. Boxhead: The boxhead contains the captions or
The making of a compact table itself is an art. This should column headings. The heading of each column
contain all the information needed within the smallest possible
should contain as few words as possible, yet
space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind explain exactly what the data in the columns
while preparing for a statistical table. An ideal table should represent.
consist of the following main parts:.
A. Title: The title must tell as simply as possible what is in the C. Stubs: The row captions are known as the stub.
table. It should answer the questions: Items in the stub should be grouped to facilitate
✦ Who? White females with breast cancer, black males with interpretation of the data. For example, rows may
lung cancer. stand for score of classes and columns for data
✦ What are the data? Counts, percentage distributions, rates. related to sex of students. In the process, there will
✦ Where are the data from? Example: One hospital, or the be many rows for scores classes but only two
entire population covered by your registry. columns for male and female students.

When? A particular year, time period.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

D. Footnotes: Footnotes are given at the foot of the


table for explanation of any fact or information Parts of the Table
included in the table which needs some explanation.
Thus, they are meant for explaining or providing
further details about the data that have not been
covered in title, captions and stubs.
E. Sources of Data: We should also mention the source
of information from which data are taken. This may
preferably include the name of the author, volume,
page and the year of publication. This should also
state whether the data contained in the table is of
‘primary or secondary’ nature.
https://byjus.com/commerce/tabular-presentation-of-data/

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Example: Simple or One – Way Table
Construction of Data Tables Optionally, the table may also include totals or
percentages.
✦ The title should be in accordance with the
objective of study
✦ Comparison
✦ Alternative location of stubs
✦ Headings
✦ Footnote
✦ Size of columns
✦ Use of abbreviations
✦ Units
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Example: Compound Table Organize Quantitative Variable in Table


A compound table is just an extension of a simple in which
Classes are categories into which data are grouped. When a
there are more than one variable distributed among its
attributes (subvariable). An attribute is just a quality, property data set consists of a large number of different discrete data
values or when a data set consists of continuous data, we create
or component of a variable according to which it can be
differentiated with respect to other variables. classes by using intervals of numbers.
We may refer to a compound table as a cross tabulation or Make sure that the classes do not overlap. This is necessary to
even to a contingency table depending on the context in which avoid confusion as to which class a data value belongs. Also,
it is used. make sure that the class widths are equal for all classes.
Upper Class
Lower Class Limit (LC) Limit (UC)
Number
The class width is the Age
(in thousands)
difference between 25 - 34 14,482
consecutive lower class 35 - 44 14,156
45 - 54 13,801
limits.
55 - 64 12,123
Polytechnic University of the Philippines
College of Science
Polytechnic University of the Philippines
College of Science
65 - 74 7,010
Department of Mathematics and Statistics Department of Mathematics and Statistics
One exception to the requirement of Scores Frequency Guidelines for Determining the Lower Class Limit of the First
equal class widths occurs in open- Class and Class Width
10 - 19 25
ended tables. A table is open ended if 20 - 29 36 Determining the Class Width:
the first class has no lower class limit 30 - 39 40 • Decide on the number of classes. Generally, there should be
or the last class has no upper class 40 and over 12 between 5 and 20 classes. The smaller the data set, the fewer
limit. classes you should have.
• Determine the class width by computing: x − xmin
cw = max
Guidelines for Determining the Lower Class Limit of the First
Class and Class Width cw is the class width nc
nc is the number of classes
Choosing the Lower Class Limit of the First Class:
Round this value up to a convenient number.
Choose the smallest observation in the data set or a Remember!
convenient number slightly lower than the smallest Creating the classes for summarizing continuous data is an art
observation in the data set. form. There is no such thing as the correct frequency distribution.
However, there can be less desirable frequency distributions. The
For example, the smallest observation is 10.2. A convenient larger the class width, the fewer classes a frequency distribution
lower class limit of the first class is 10. will have.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

How to Construct Frequency Example: Use the “Sample Data file”.

Distribution Table?
A frequency distribution list each
category of data and the number of
occurrences for each category of data.

Solution:
To answer this question we need to construct a frequency
distribution to determine how many female and male
respondents participated in the study.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦ If the data is in the form of qualitative data
To construct the frequency distribution using
excel use the command:
=frequency(data_array,bins_array)
Then Ctrl → Shift → Enter
{=frequency(data_array,bins_array)}
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Final Output Example: Use the “Sample Data file”.

Table 1 shows the frequency and percentage distribution of


the respondents in terms of sex. It can be gleaned from the
table that, out of 128 respondents considered in the study,
65 or 50.8% are male and 63 or 49.2% are female.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Procedure in Constructing Procedure in Constructing


Frequency Table Frequency Table
✦If the data is in the form of quantitative data ✦If the data is in the form of quantitative data
Steps Steps
1. Set an interval or range for your data. It is
4. Highlight your data for the “INPUT RANGE”.
needed for the “BIN RANGE”.
5. Highlight your data for the “BIN RANGE”.
2. Click “DATA” on the menu bar and Click
6. Click the box of “LABELS IN FIRST ROW”
“DATA ANALYSIS” on the tool bar
then click “OK”.
3. The dialog box “DATA ANALYSIS” will appear
7. The result will appear on the new worksheet of
and choose “HISTOGRAM” on the dialog box
the excel file. Get the Percentage and total.
then click OK.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Final Output

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Example: Identify problems with the following


table.
Graphical Presentation
✦ A graph is a very effective visual tool as it displays data at
a glance, facilitates comparison, and can reveal trends and
relationships within the data such as changes over time,
and correlation or relative share of a whole.
Answer:
✦ Useless Information – Don’t show decimals if they are not ✦ It is considered an important medium of communication
needed. because we are able to create a pictorial representation of
✦ Poor Alignment – Make sure alignment makes sense. the numerical figures.
• Don’t center numbers, always right justify – try to align
✦ Suited when we need to show the results of the study to
decimal points.
• Consider the appropriate placement of row titles.
nonprofessionals and or people who dislike numbers and too
✦ Difficult to Read – Use commas used when the number exceeds lengthy texts.
a thousand.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Example: Simple Bar Graph
Bar Graph The simple bar chart is used for the case of one
variable only.
✦ It is constructed by labeling each category
of data on either the horizontal or vertical
axis and the frequency or relative frequency
of the category on the other axis. Rectangles
of equal width are drawn for each category.
The height of each rectangle represents the
category’s frequency or relative frequency.
✦ It is use to organize discrete data.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Multiple Bar Graph\ Grouped Component Bar Graph/ Subdivided


Example: Column Chart Example: Column Chart
The multiple bar chart is an extension of a simple bar chart In this type of bar chart, the components (quantities) of each
when there are quantities of several variables to be variable are piled on top of one another. It saves space as
displayed. The bars representing the quantities for the compared to a multiple bar chart. One of the disadvantage
different variables are piled next to one another for each of this graph is that it is not always easy to compare size of
attribute. The figure becomes very cumbersome when there the components, or parts. It is used to represent data in
are too many variables and components. which the total magnitude is divided into different or
components.

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Remember! Histogram
✦ It is constructed by drawing rectangles for each class of
• Bar graphs may also be drawn with horizontal data. The height of each rectangle is the frequency or
bars. Horizontal bars are preferable when relative frequency of the class. The width of each rectangle
category names are lengthy. is the same and the rectangles touch each other.
✦ It is a graph used to present quantitative data, is similar to
• In bar graphs, the order of the categories does the bar graph.
not usually matter. However, bar graphs that ✦ It is use to organize continuous data.
have categories arranged in decreasing order
of frequency help prioritize categories for
decision-making purposes in areas such as
quality control, human resources, and
marketing.
Polytechnic University of the Philippines
College of Science
Polytechnic University of the Philippines
College of Science
https://newonlinecourses.science.psu.edu/
Department of Mathematics and Statistics Department of Mathematics and Statistics stat500/lesson/1/1.6/1.6.2

Pie Chart When should a bar graph or a


It is a circle divided into sectors. Each sector represents a

category of data.The area of each sector is proportional to pie chart be used?


the frequency of the category.
✦ Pie charts are typically used to present the relative ✦ Pie charts are useful for showing the
frequency of qualitative data. Inmost cases the data are division of all possible values of a
nominal, but ordinal data can also be displayed in a pie qualitative variable into its parts.
chart.
✦ Bar graphs are useful when we want to
compare the different parts, not necessarily
the parts to the whole.

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Line Graph Example: Simple Line Graph
The simplest of line graphs is the single line graph, so
✦ A graph that shows information that is called because it displays information concerning one
connected in some way (such as change over variable only, in terms of its frequencies.
time)
✦ Line segments are then drawn connecting the
points. It is use to organize continuous data.
✦ Very useful in identifying trends in the data
over time.

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Example: Multiple Line Graph Guidelines for Constructing


Multiple line graphs illustrate information on Good Graphics
several variables so that comparison is possible
between them.
✦ Title and label the graphic axes clearly,
providing explanations if needed. Include units
of measurement and a data source when
appropriate.
✦ Avoid distortion.
✦ Minimize the amount of white space in the
graph. Use the available space to let the data
stand out. If you truncate the scales, clearly
indicate this to the reader.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Guidelines for Constructing Grouped and Ungrouped Data
Good Graphics Data is often described as ungrouped Scores Frequency
or grouped. 1 - 10 5
11 - 20 9
Grouped data is the type of data 21 - 30 10
✦ Avoid clutter, such as excessive gridlines and which is classified into groups after 31 - 40 12
unnecessary backgrounds or pictures. collection. 41 - 50
Total
24
60
✦ Don’t distract the reader. Ungrouped data which is also known
as raw data is data that has not been Ungrouped data with a
✦ Avoid three dimensions. placed in any group or category after frequency distribution
collection. No. of Television
✦ Do not use more than one design in the same Sets Frequency

graphic. Let the data speak for themselves. Ungrouped data without a 0
1
7
15
frequency distribution 2 12
3 4
1, 5, 4, 7, 2, 4, 1, 3, 8, 2, 2, 9 4 5
Polytechnic University of the Philippines
College of Science
Polytechnic University of the Philippines 5 2
College of Science
Department of Mathematics and Statistics
Department of Mathematics and Statistics Total 45

Measures of Central Tendency: Formula for Mean:


MEAN ✦ For Ungrouped Data
Sample Mean
✦ For Grouped Data

• It is the sum of the data values divided by the number of where: where:
∑i=1 fxi
data values.
∑i=1 xi xi = data values
xi = data values n r
• It is also called the average. n = no. of
x̄ = f = frequency x̄ =
• It is appropriate only for data under interval and ratio scale sample n n = no. of n
measurement. observations sample
observations
Advantage of Mean Population Mean
✦ Simple to understand and easy to calculate. where:
∑i=1 xi xi = data values ∑i=1 fxi
N where: r
✦ It is rigidly defined. xi = data values
✦ It is least affected fluctuation of sampling. N = no. of μ= f = frequency
μ=
observations N N
✦ It takes into account all the values in the series. N = no. of
observations
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Measures of Central Tendency: Formula for Median:
MEDIAN ✦ For Ungrouped Data ✦ For Grouped Data
It is the “middle observation” when the data set is sorted (in
(2 )

1. Arrange the data from n
either increasing or decreasing order). − < cf i
lowest to highest (or highest
• The median divides the distribution into two equal parts. x̃ = LB +
to lowest). f
Advantage of Median where:
✦ The median is not affected by the size of extreme values but 2. For an odd number of LB = lower boundary of the
by the number of observations. data, the median of a data median class
✦ The median can be calculated even when the frequency set is the “middle i = class width
distribution contains “open-ended” intervals. observation”. When the n = no. of observations
✦ It can also be used to define the middle of a number of
number of data is even, the < cf = less than the cumulative
median is the “average of frequency of the class
objects, properties, or quantities which are not really
quantitative in a nature. the two middle scores”. preceding the median class
f = frequency of the median
✦ It can be easily interpreted.
Polytechnic University of the Philippines Polytechnic University of the Philippines
class
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Measures of Central Tendency: Formula for Mode:


MODE ✦
For Ungrouped Data ✦ For Grouped Data
It is the most frequently occurring value in a list of data.
( d1 + d2 )

d1
• It is sometimes called nominal average. 1.Obtain a frequency x ̂ = LB + i
• It is an appropriate measure of average for data using the distribution of the distinct
nominal scale of measurement. values of the data. where:
LB = lower boundary of the
• It is the only measure of central tendency used in both modal class
quantitative and qualitative data. 2.The mode is the most
i = class width
Advantage of Mode frequently occurring data
d1 = difference between the
✦ The mode is easy to understand. (if there is one).
frequency of the modal class
✦ Like the median, it is not greatly affected by extreme and the class preceding it
values. d2 = difference between the
✦ Like the median, it can be computed even when the frequency of the modal class
frequency distribution contains “open-ended” intervals. and the class following it
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Remember! Choosing a Measure of Central Tendency:
We have discussed three types of central tendency-the
• Whenever you hear the word average, be aware that mode, the mean, and the median and examined how they
the word may not always be referring to the mean. differ in terms of finding the center of a data distribution.
One average could be used to support one position,
The next legitimate question to ask may be “When do we
while another average could be used to support a use which measure?”
different position.
Consider the following data sets:
• Mode is not always present in the data sets unlike
mean and median.
Data Set I 108 112 116 120 124
Data Set II 108 112 116 120 205
• If you are interested in the “center of gravity” of your
data, then use the mean; if you are interested in the Determine the mean, median and mode.
“middle value” within your data, then use the median
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

In both data sets, the median is 116, as it is the number that • The mode is simply the most frequently occurring data
divides the data set into two exact halves. However, you will values in the data set. Therefore, it is mainly useful for the
notice that the mean is not identical in both data sets. For the nominal level of measurement. Both median and mean are
first data set, the mean is equal to 116 where the mean of the useful when the variable being measured can be quantified.
second data set is equal to 132.5 Also both data sets have no mode that’s why mode is not
appropriate measure to use in these data sets.
Notice how the mean of the second data set has been
influenced by the presence of an unusual case/outlier in the
data set. If we were to say the mean is equal to 132.5 for the
• It is better to use the median than to use the mean when
the sample is small or asymmetrical (i.e., skewed) and
second data set and it represents a typical case, this will not
make much sense because the majority of data values are less
unusual cases/outliers is present in the data sets. This is
than 120. Therefore, the mean should not be used when
why the average housing price is always reported with the
unusual, or outlying, data values are present in the data set, as median, since even one million-dollar house can distort the
the mean tends to be extremely sensitive to the unusual average housing price when most of the houses are in
values. Rather, the median should be reported in this case. Php500,000–Php650,000 range.

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Example: Solution:
The data given below is the age of the residents in To compute mean of grouped data, first you need to
Barangay 634, Sta. Mesa, Manila. Compute mean, fill out this table.
median and mode. Class
Interval
Frequency
(f)
x fx

55 - 59 3
It is the midpoint of
Class Interval Frequency
50 - 54 6 every class interval.
55 - 59 55 45 - 49 7
To compute this:
LC + UP
50 - 54 23 40 - 44 9

x=
45 - 49 37 35 - 39 6
40 - 44 37 30 - 34 4
2
35 - 39 48 25 - 29 5 Ex:
7 55 + 59
30 - 34 42
fxi = x= = 57
25 - 29 27
Total n=
∑ 2
50 + 54
i=1
x= = 52
Polytechnic University of the Philippines Polytechnic University of the Philippines
2
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Solution: Solution:
7 To compute median and mode of grouped data, first
Class Interval Frequency x fx ∑i=1 fxi
x̄ =
(f) you need to fill out this table.
55 - 59 3 57 171
50 - 54 6 52 312 n Class
f LB < cf
Interval To compute the lower
1,675
45 - 49 7 47 329 55 - 59 3
b o u n d a r y, a l w a y s
=
40 - 44 9 42 378 50 - 54 6
subtract 0.5 to lower
40
35 - 39 6 37 222 45 - 49 7
30 - 34 4 32 128 40 - 44 9 class limit (LC).
= 41.88
25 - 29 5 27 135
7
35 - 39 6 Ex:
55 − 0.5 = 54.5
30 - 34 4
fxi = 1,675
Total n = 40 ∑
50 − 0.5 = 49.5
25 - 29 5
i=1
Total n=

The average age is 41.88 45 − 0.5 = 44.5

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Solution: If the arrangement of Solution:
the class interval is n
Class Class First, compute , it will help us to
Interval f LB < cf descending order, Interval
f LB < cf
2
55 - 59 3 54.5 always start at the 55 - 59 3 54.5 40 determine the median class and the
50 - 54 6 49.5 bottom part. 50 - 54 6 49.5 37 < cf.
n 40
= = 20
45 - 49 7 44.5 45 - 49 7 44.5 31
40 - 44 9 39.5 40 - 44 9 39.5 24 2 2
35 - 39 6 34.5 35 - 39 6 34.5 15
30 - 34 4 29.5 Copy the frequency 30 - 34 4 29.5 9
The median class is the class
containing the 20th item. Hence, the
25 - 29 5 24.5 5 of the lowest class 25 - 29 5 24.5 5
Total n = 40 Total n = 40 median class is 40 - 44.
interval.

(2 )
5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40 n
− < cf i
(20 − 15)5
x̃ = LB + x̃ = 39.5 + = 42.28
f 9

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Solution:
Class
Interval f LB < cf The modal class is the class interval
Measures of Relative Position
55 - 59 3 54.5 40 with the highest frequency. The
modal class is 40 - 44.
50 - 54 6 49.5 37 Quantiles are statistics that describe
45 - 49
40 - 44
7
9
44.5
39.5
31
24 If there are two class interval that various subdivisions of a frequency
35 - 39 6 34.5 15
contains the highest frequency, distribution into equal proportions.
always choose the highest class
30 - 34 4 29.5 9
25 - 29 5 24.5 5
interval. Three special Quantiles:
d1 = 9 − 6 = 3 1. Quartiles
( d1 + d2 )
d1
x ̂ = LB + i
d2 = 9 − 7 = 2 2. Deciles
3
(3 + 2)
x ̂ = 39.5 + 5 = 42.5
3. Percentiles
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Formula for Quartile:
Quartiles - split
the ordered data ✦ For Ungrouped Data ✦ For Grouped Data
into four quarters.
(4 )
nk
1. Arrange the data from − < cf i
lowest to highest. Then use
Qk = LB +
this formula. f
Deciles - split the nk
Qclass = + 0.5
where:
ordered data into
ten equal. 4 LB = lower boundary of the
quartile class
2. If the resulting positioning i = class width
point is an integer, the
n = no. of observations
particular numerical k = quartile position
Percentiles - split
observation corresponding
the ordered data < cf = less than the cumulative
to that point is chosen for frequency of the class
into 100 equal
parts.
the quartile. If not, use preceding the quartile class
interpolation. f = frequency of the quartile
class
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Formula for Decile: Formula for Percentile:


✦ For Ungrouped Data ✦ For Grouped Data ✦ For Ungrouped Data ✦ For Grouped Data

( 100 )
nk
( 10 )
1. Arrange the data from 1. Arrange the data from
nk
− < cf i lowest to highest. Then use − < cf i
lowest to highest. Then use
this formula. Dk = LB + this formula. Pk = LB +
f f
nk
Dclass =
nk
+ 0.5 Pclass = + 0.5 where:
10
where: 100 LB = lower boundary of the
LB = lower boundary of the
2. If the resulting 2. If the resulting percentile class
decile class
i = class width positioning point is an i = class width
positioning point is an
n = no. of observations n = no. of observations
integer, the particular integer, the particular
k = decile position k = percentile position
numerical observation numerical observation
< cf = less than the cumulative < cf = less than the cumulative
corresponding to that point corresponding to that point frequency of the class
is chosen for the decile.If frequency of the class is chosen for the percentile.
preceding the decile class preceding the percentile class
not, use interpolation. If not, use interpolation. f = frequency of the percentile
f = frequency of the decile class
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
class
Example 1: Solution: To compute Q3 of ungrouped data:

The data given below is the total number of hours 1. Arrange the data from lowest to highest.
lost due to tardiness and absences of employees in a 20 23 24 27 30 32 37 37 40 42 48 55
company in a given year. 1 2 3 4 5 6 7 8 9 10 11 12
(12)(3)
Qclass = = 9.5
Month Hour Lost (x)
Find Q3, D4 and P55. January
February
55
23
4
March 37
2. Use interpolation since the computed Qclass is not an integer.
April 37
May 48 20 23 24 27 30 32 37 37 40 42 48 55
June 42 1 2 3 4 5 6 7 8 9 10 11 12

Q3 = 40 + 0.5(42 − 40)
July 27
August 20

= 41
September 30
October 32
November 24
December 40
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Solution: To compute D4 of ungrouped data: Solution: To compute P55 of ungrouped data:

1. Arrange the data from lowest to highest. 1. Arrange the data from lowest to highest.
20 23 24 27 30 32 37 37 40 42 48 55 20 23 24 27 30 32 37 37 40 42 48 55

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

(12)(4) (12)(55)
Dclass = + 0.5 = 5.3 Pclass = + 0.5 = 7.1
10 100
2. Use interpolation since the computed Dclass is not an integer. 2. Use interpolation since the computed Pclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55 20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

D4 = 30 + 0.3(32 − 30) P55 = 37 + 0.1(37 − 37)


= 30.6 = 37
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Example 2: Solution:
The data given below is the age of the residents in To compute Q1, D7, and P10 of grouped data, first you
Barangay 634, Sta. Mesa, Manila. Compute Q1, D7, and need to fill out this table.
P10. Class f LB < cf
Interval To compute the lower
Class Interval Frequency 55 - 59 3
50 - 54 6
b o u n d a r y, a l w a y s
55 - 59 55
45 - 49 7 subtract 0.5 to lower
50 - 54 23
40 - 44 9 class limit (LC).
45 - 49 37
35 - 39 6 Ex:
55 − 0.5 = 54.5
40 - 44 37 30 - 34 4
35 - 39 48 25 - 29 5
30 - 34 42 Total n= 50 − 0.5 = 49.5
25 - 29 27
45 − 0.5 = 44.5

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Solution: If the arrangement of Solution:


Class
f LB < cf
the class interval is Class f LB < cf First, compute
nk
, it will help us to
Interval descending order, Interval 4
55 - 59 3 54.5 55 - 59 3 54.5 40 determine the quartile class and the
50 - 54 6 49.5
always start at the 50 - 54 6 49.5 37
nk (40)(1)
bottom part. < cf.
= = 10
45 - 49 7 44.5 45 - 49 7 44.5 31
40 - 44 9 39.5 40 - 44 9 39.5 24 4 4
35 - 39 6 34.5 35 - 39 6 34.5 15
30 - 34 4 29.5 Copy the frequency 30 - 34 4 29.5 9 The quartile class is the class
containing the 10th item. Hence, the
25 - 29 5 24.5 5 of the lowest class 25 - 29 5 24.5 5
quartile class is 35 - 39.
Total n = 40 Total n = 40
interval.

(4 )
5 + 4 = 9 + 6 = 15 + 9 = 24 + 7 = 31 + 6 = 37 + 3 = 40 nk
− < cf i
(10 − 9)5
Qk = LB + Q1 = 34.5 + = 35.33
f 6

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Solution: Solution:
nk
nk First, compute , it will help us to
100
Class Class
f LB < cf First, compute , it will help us to f LB < cf
Interval 10 Interval
determine the percentile class and
55 - 59 3 54.5 40 determine the decile class and the 55 - 59 3 54.5 40
50 - 54 6 49.5 37 50 - 54 6 49.5 37
the
< cf. nk (40)(7) < cf. nk (40)(10)
45 - 49 7 44.5 31
= = 28 45 - 49 7 44.5 31
= =4
40 - 44 9 39.5 24 10 10 40 - 44 9 39.5 24 100 100
35 - 39 6 34.5 15 35 - 39 6 34.5 15
30 - 34 4 29.5 9
The decile class is the class 30 - 34 4 29.5 9
The percentile class is the class
25 - 29 5 24.5 5 containing the 28 item. Hence, the 25 - 29 5 24.5 5 containing the 4th item. Hence, the
Total n = 40 decile class is 45 - 49. Total n = 40 percentile class is 25 - 29.

( 10 ) ( 100 )
nk
− < cf i
nk
− < cf i (5 − 0)5
(28 − 24)5 P10 = 24.5 + = 29.5
Dk = LB + D7 = 44.5 + = 47.36 Pk = LB + 5
f 7 f

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Example 2: Solution:
The ages of the town’s people in a certain community To compute Q2, D5, and P50 of grouped data, first you
is as follows: need to fill out this table.
Class
f LB < cf
Class Interval Frequency Interval To compute the lower
18 - 24 28 18 - 24 28 b o u n d a r y, a l w a y s
25 - 31 54
25 - 31 54 subtract 0.5 to lower
32 - 38 38
32 - 38 38 class limit (LC).
39 - 45 20
39 - 45 20 Ex:
18 − 0.5 = 17.5
46 - 52 17
46 - 52 17
53 - 59 3
53 - 59 3
Total n= 25 − 0.5 = 24.5
Find Q2, D5, and P50. 32 − 0.5 = 31.5

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Solution: If the arrangement of Solution:
the class interval is nk
Class Class First, compute , it will help us to
Interval f LB < cf a s c e n d i n g o r d e r, Interval
f LB < cf
4
18 - 24 28 17.5 28 always start at the 18 - 24 28 17.5 28 determine the quartile class and the
upper part.
nk (160)(2)
25 - 31 54 24.5 25 - 31 54 24.5 82 < cf.
32 - 38 38 31.5 = = 80
4 4
32 - 38 38 31.5 120
39 - 45 20 38.5 Copy the frequency 39 - 45 20 38.5 140
46 - 52 17 45.5 of the lowest class 46 - 52 17 45.5 157 The quartile class is the class
53 - 59 3 52.5 interval. 53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 Total n = 160 quartile class is 25 - 31.

(4 )
28 + 54 = 82 + 38 = 120 + 20 = 140 + 17 = 157 + 3 = 160 nk
− < cf i
(80 − 28)7
Qk = LB + Q2 = 24.5 + = 31.24
f 54

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Solution: Solution:
nk
nk First, compute , it will help us to
Class First, compute , it will help us to Class
100
10
Interval f LB < cf Interval f LB < cf
determine the percentile class and
18 - 24 28 17.5 28 determine the decile class and the 18 - 24 28 17.5 28
the
< cf. (160)(5) (160)(50)
25 - 31 54 24.5 82 25 - 31 54 24.5 82
nk < cf. nk
= = 80 = = 80
10 10 100 100
32 - 38 38 31.5 120 32 - 38 38 31.5 120
39 - 45 20 38.5 140 39 - 45 20 38.5 140
46 - 52 17 45.5 157 The decile class is the class 46 - 52 17 45.5 157 The percentile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the 53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 decile class is 25 - 31. Total n = 160 percentile class is 25 - 31.

( 10 ) ( 100 )
nk nk
− < cf i − < cf i (80 − 28)7
(80 − 28)7
Dk = LB + D5 = 24.5 + = 31.24 Pk = LB + P50 = 24.5 + = 31.24
f 54 f 54

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Sample Interpretation: Measures of Dispersion/Variability
1. Jennifer just received the results of her SAT exam. Her Based on the figure below, determine which between the
SAT Mathematics score of 600 is in the 74th percentile. What two scatter diagram illustrate larger variability?
does this mean?
Figure 1 Figure 2
A percentile rank of 74% means that 74% of SAT
Mathematics scores are less than or equal to 600 and 26%
of the scores are greater. So 26% of the students who took
the exam scored better than Jennifer.

2. Time taken to finish a test is 35 minutes. This time was the


first quartile. What does this mean?
25% of the learners finished the exam in 35 minutes or Since the data points in figure 2 is more scattered than the
less, and 75% of the learners finished the exam in more data points in figure 1, then the data set depicted in figure 2
than 35 minutes. is more varied.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Measures of Dispersion/Variability: Measures of Dispersion/Variability:


RANGE STANDARD DEVIATION
It is the difference between the largest and the smallest • It is a measure of how far away items in a data set are from
observations or items in a set of data. the mean.

R = Xmax. − Xmin.
• The larger the standard deviation, the more variation there
is in the data set.
Range is simple to calculate. However, we should be • The standard deviation can never be a negative number,
cautious about using range as a measure of variability. due to the way it’s calculated and the fact that it measures a
distance (distances are never negative numbers).
Range is a very crude measure of variability as it only
uses the highest and lowest values in computation. • The smallest possible value for the standard deviation is 0,
Therefore, it does not accurately capture information and that happens only in contrived situations where every
about how data values in the set differ if the data set single number in the data set is exactly the same (no
deviation).
contains an unusual cases/outliers.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Formula for Standard Deviation: Measures of Dispersion/Variability:
Sample Standard Deviation
✦ For Ungrouped Data ✦ For Grouped Data
VARIANCE
where: where: It represents all data points in a set and is calculated
∑i=1 (xi − x̄) xi = data
2 ∑i=1 f(xi − x̄)2
n r
xi = data
values s = values s =
by averaging the squared deviation of each mean.
n−1 n−1
x̄ = mean x̄ = mean
n = no. of sample observations f = frequency Variance is not easy to read as it is the squared format
n = no. of sample observations and hence not easily interpretable. However,
Population Standard Deviation Standard deviation being in the same units as the
where: mean we can easily understand the spread of data.
where:
xi = data
∑i=1 (xi − μ) 2 xi = data ∑i=1 f(xi − μ)2
N r
values σ = values σ =
μ = mean N μ = mean N
N = no. of observations f = frequency
College of Science
N = no. of observations
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Formula for Variance: Example 1:


Sample Variance
The data given below is the age of the residents in
✦ For Ungrouped Data

For Grouped Data
Barangay 634, Sta. Mesa, Manila. Compute sample
where: where: standard deviation and sample variance.
∑i=1 (xi − x̄)2 xi = data ∑i=1 f(xi − x̄)2
n r
xi = data
values s = values s =
2 2
n−1 n−1 Class Interval Frequency
x̄ = mean x̄ = mean 55 - 59 55
n = no. of sample observations f = frequency 50 - 54 23
n = no. of sample observations 45 - 49 37
Population Variance 40 - 44 37
where: where: 35 - 39 48

∑i=1 (xi − μ)2 xi = data ∑i=1 f(xi − μ)2


xi = data N r 30 - 34 42
values σ =
2 values σ =
2 25 - 29 27
μ = mean N μ = mean N
N = no. of observations f = frequency
College of Science
N = no. of observations
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Solution: Solution:
To compute SD and Var of grouped data, first you Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
need to fill out this table. 55 - 59 3 57 171 228.61

(xi − x̄)2 f(xi − x̄)2


50 - 54 6 52 312 102.41
Class
f x fx 45 - 49 7 47 329 26.21
Interval
55 - 59 3 40 - 44 9 42 378 0.01
50 - 54 6 35 - 39 6 37 222 23.81
45 - 49 7 30 - 34 4 32 128 97.61
40 - 44 9 25 - 29 5 27 135 221.41
7 7
fxi =
35 - 39 6
∑ f(xi − x̄)2 =
30 - 34 4 Total n = 40
i=1 1,675 ∑
25 - 29 5 i=1
7 7
1,675 (x1 − x̄)2 = (57 − 41.88)2 = 228.61

fxi =

f(xi − x̄)2 = x̄ =
(x2 − x̄)2 = (52 − 41.88)2 = 102.41
Total n=
i=1 i=1 40
= 41.88 (x3 − x̄)2 = (47 − 41.88)2 = 26.21
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Solution: Solution: 7
∑i=1 f(xi − x̄)2
Class
(xi − x̄)2 f(xi − x̄)2 s=
f x fx
n−1
(xi − x̄) 2
f(xi − x̄) 2
Interval
Class
55 - 59 3 57 171 228.61 685.83
3,124.20
Interval
s=
50 - 54 6 52 312 102.41 614.46 55 - 59 228.61 685.83
45 - 49 7 47 329 26.21 183.47 50 - 54 102.41 614.46 40 − 1
40 - 44 9 42 378 0.01 0.09 45 - 49 26.21 183.47 = 8.95
35 - 39 6 37 222 23.81 142.86 40 - 44 0.01 0.09
30 - 34 4 32 128 97.61 390.44
7
∑i=1 f(xi − x̄)2
35 - 39 23.81 142.86

s =
25 - 29 5 27 135 221.41 1107.05 30 - 34 97.61 390.44 2
7 7
fx = f(x − x̄)2 =
25 - 29 221.41 1107.05
n−1
Total n = 40 ∑ i ∑ i 7
1,675 3,124.20 f(xi − x̄)2 = 3,124.20
i=1

i=1
s2 =
Total
3,124.20
f(x1 − x̄) = 3(228.61) = 685.83
2 i=1
40 − 1
f(x2 − x̄)2 = 6(102.41) = 614.46 = 80.11
f(x3 − x̄)2 = 7(26.21) = 183.47
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
How to interpret variance and standard We cannot use variance as a measure of variability. Let us
assume that the values represent weight losses measured in
deviation? pounds taken from five subjects. Because the deviation of each
observation from the mean has been squared, the unit for the
Consider the following data set of toddler variance is now in (pound)2 . What does (pound)2 mean? If we
weights in an outpatient clinic, assuming that the were to say that data values differ from the mean on average
data values were taken: about 9.7 (pound)2, would this claim make sense? Probably not,
since there is no such a unit as a (pound)2.
Data Set 15 13 20 19 14
Why do we then take the square of the deviation if the (unit)2
will not make sense to interpret at the end? The answer is
Computed variance for this data set is 9.7. simple: If you do not square the deviation and sum each
Computed standard deviation for this data set is deviation, it will always add up to zero no matter what data
set you work with.
3.11. n n
(xi − x̄) = 0 → (xi − x̄)2 ≠ 0
What does this mean? ∑ ∑
i=1 i=1
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

How can we then talk about variability if the measure of Choosing a Measure of Dispersion/Variability:
variability comes out to be equal to zero? This is why we take We have discussed four types of dispersion/variability - the
square of the deviation to compute the variance first and range, the interquartile range, the variance, and the
then take square root of it to compute the standard standard deviation and examined how they differ. The next
deviation, bringing us back to the original unit of legitimate question to ask may be “When do we use which
measurement. measure?”
We get the standard deviation of 3.11 by taking square root of
9.7; we can then say that the data values differ from the mean You should use the range only as a crude measure, since it
(16.2 lbs.) on an average of about 3.11 pounds. We can is extremely sensitive to unusual values in the data set.
interpret this finding to mean that, on average, the weights fall Interquartile range is not as sensitive to unusual data values,
between 13.09 and 19.31 pounds. This makes more sense where standard deviation is very sensitive to unusual values.
when you look at the data set, compared to the variance. Note Therefore, the interquartile range should be used with the
that the mean and standard deviation should always be median when the data contain unusual data values.
reported together! However, the standard deviation should be used with the
16.2 − 3.11 = 13.09
mean when the data are free of unusual data values.
16.2 + 3.11 = 19.31
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Shape of Distribution Skewness
A symmetrical distribution will have a skewness of 0.
These two statistics give you insights into the shape of So, a normal distribution will have a skewness of 0.
the distribution.
In a symmetrical distribution, the Mean, Median and

Skewness is the degree of distortion from the Mode are equal to each other and the ordinate at
symmetrical bell curve or the normal distribution. It mean divides the distribution into two equal parts.
measures the lack of symmetry in data distribution.
✦ Kurtosis is a measure of the combined sizes of the
two tails. It tells you how tall and sharp the central
peak is, relative to a standard bell curve.

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

There are two types of Skewness:


• Negatively Skewed/Skewed Left is when the tail of the left Karl Pearson’s Measure of
side of the distribution is longer or fatter than the tail on the
right side. The mean and median will be less than the mode.
Skewness
• Positively Skewed/Skewed Right means when the tail on the Noticed that the mean, median and mode are not
right side of the distribution is longer or fatter. The mean and equal in a skewed distribution.
median will be greater than the mode.
The Karl Pearson's measure of skewness is based
upon the divergence of mean from mode in a skewed
distribution. Karl Pearson’s Coefficient of Skewness
(Sk), given by
where:
x̄ − x ̂
x̄ is the mean Sk =
x ̂ is the median
s
Skewness < 0 Skewness > 0 Skewness = 0
s is the sample standard deviation
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
So far we have seen that Sk is strategically dependent
upon mode. If mode is not defined for a distribution Kurtosis
we cannot find Sk .But empirical relation between It is actually the measure of outliers present in the
mean, median and mode states that, for a moderately distribution. The outliers in a sample, therefore, have
symmetrical distribution, we have even more effect on the kurtosis than they do on the
Mean − Mode ≈ 3(Mean − Median) skewness.
Hence Karl Pearson's coefficient of skewness is Higher kurtosis means more of the variance is the
defined in terms of median as result of infrequent extreme deviations, as opposed to
frequent modestly sized deviations. In other words, it’s
3(x̄ − x̃)
where:
the tails that mostly account for kurtosis, not the
x̄ is the mean Sk = central peak.
x̃ is the median
s
The kurtosis decreases as the tails become lighter. It
s is the sample standard deviation increases as the tails become heavier.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

• Mesokurtic (Kurtosis=3): This distribution has


kurtosis statistic similar to that of the normal Percentile Coefficient of Kurtosis
distribution.
A measure of kurtosis based on quartiles and
• Leptokurtic (Kurtosis>3): Peak is higher and percentiles is
sharper than normal distribution, which means that
QD
data are heavy-tailed or profusion of outliers. k=
P90 − P10
• Platykurtic (Kurtosis<3): where:
Compared to a normal
Q3 − Q1
distribution, its tails are shorter QD is semi-interquartile range QD =
and thinner, and often its central 2
peak is lower and broader.

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
How to Calculate Measures of Central Tendency, 1. Click “DATA” on the menu bar and Click “DATA
Measures of Variation, Skewness and Kurtosis for ANALYSIS” on the tool bar. The Dialog box will appear.
Ungrouped and Sample Data Using Excel? 2. Select “Descriptive Statistics” then click “OK”.
Example:
The data given below are the scores of randomly
selected applied statistics undergraduate students in
Section A and Section B. Compare the scores of Section
A and Section B based on measures of central tendency,
and measures of variation and determine which section
performed better in their final examination. Also,
describe the shape of the distribution of these two data
sets using skewness and kurtosis
Data Set A 40 38 42 40 39 39 43 40 39 40
Data Set B 46 37 40 33 42 36 40 47 34 45

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

3. Highlight your data for the “INPUT RANGE” and click


the box of “LABELS IN FIRST ROW” then click “OK”.
4. Click “Summary statistics” and then click “OK”. Repeat the
process for Data Set B.

When comparing distributions, it is better to use a measure of


variation/dispersion in addition to a measure of central tendency
but because in this example Data set A and Data set B have the
same value for measures of central tendency, we will just used
Polytechnic University of the Philippines
College of Science
measure of variation/dispersion to compare these two data set.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Based on the result, Data set B has a larger variability since it
has larger value computed based on different measures of
variation. This means that Data Set B is much more spread
Normal Distribution
out than the Data Set A. ✦ The normal distribution is sometimes called the bell curve
In this example, we want a data set with a large mean value because the graph of its probability density looks like a
and a small standard deviation so we can say that this is the bell.
section that performed better. Section A and Section B have
the same mean value but in terms of standard deviation ✦
It is also known as the Gaussian distribution, after the
Section A have smaller value compared to Section B,
German mathematician Carl Friedrich Gauss who first
therefore, Section A performed better in their final
described it.
examination.
In terms of the shape of the distribution, these two data sets ✦ It is a probability function that describes how the values
have the shape in terms of Skewness and kurtosis. It shows
of a variable are distributed.
that Data Set A and Data Set B have platykurtic shaped and it
is skewed to the right.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

No data will ever be exactly/perfectly normally


Normal Curve
distributed in reality. If so, how do we know
whether or not a collected data set is normally
distributed?

50 100 150 We can begin with a visual display of the data in a


The red curve is a model called the normal curve , histogram to see if the data set is normally
which is used to describe continuous random variables distributed. However, a visual check, alone, may not
that are said to be normally distributed. be sufficient to know whether the data are normally
distributed. There are statistical measures,
A continuous random variable is normally distributed,
skewness and kurtosis, which, along with a
or has a normal probability distribution, if its relative
histogram, allow us to determine whether the set is
frequency histogram has the shape of a normal curve.
normally distributed.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Why is it important to know if the data follows
a normal distribution? Properties of Normal Curve
The most important reason is that many human 1. The normal curve is bell-shaped and symmetric
characteristics fall into an approximately normal about the mean, μ.
distribution and that the measurement scores are
2. Because mean, median and mode are equal, the
assumed to be normally distributed when
normal curve has a single peak and the highest
running most statistical analyses. Therefore, the point occurs at x = μ.
statistical results you get at the end may not be
trustworthy if the variable is not normally 3. The normal curve has
distributed. inflection points at μ − σ Inflection point Inflection point

and μ + σ.

Polytechnic University of the Philippines Polytechnic University of the Philippines


μ−σ μ μ+σ
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

μ1 = μ2, σ1 < σ2 μ1 < μ2, σ1 < σ2


Properties of Normal Curve
4. The area under the normal curve is 1.

5. The area under the normal curve to the right Mean:


of μ equals the area under the curve to the
✦ Changing the mean shifts the entire
curve left or right on the X-axis.
left of μ, which equals 0.50
Standard Deviation:
6. The normal curve approaches, area = 1
✦ Changing the standard deviation
but never touches the x-axis either tightens or spreads out the
μ1 < μ2, σ1 = σ2
as it extends farther and width of the distribution along the X-
axis.
farther away from the mean. 0.50 0.50 Larger standard deviations produce distributions that are more
spread out.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Determine whether the graph represent a normal Role of Area under a Normal
curve. Curve
Suppose that a random variable X is normally
A. C. distributed with mean μ and standard deviation σ . The
area under the normal curve for any interval of values of
the random variable X represents either

the proportion of the population with the characteristic
described by the interval of values or
B. D.
✦ the probability that a randomly selected individual
from the population will have the characteristic
described by the interval of values.
All of them did not represent the normal curve.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Standardizing a Normal Random Variable


Standard Normal Distribution The normal random variable of a standard
x−μ
z=
normal distribution is called a standard
score or a z-score. Every normal random
A normal random variable having mean variable X can be transformed into a z score σ
value μ = 0 and standard deviation σ = 1 is via the following equation:
called a standard normal random variable, where X is a normal random variable, μ is the mean of X, and
and its density curve is called the standard σ is the standard deviation of X.
normal curve. Probabilities for a standard normal
random variable are computed
It will always be denoted by the letter Z. using Standard Normal
Distribution Table which shows
a cumulative probability associated
with a particular z-score.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Standard Normal Distribution Table 1 (Positive Side P(Z < z))

Remember!
Positive values of z-score indicate how far above
the mean a score falls and negative values
indicate how far below the mean a score falls.

Whether positive or negative, larger z-scores


mean that scores are far away from the mean and
smaller z-scores means that scores are close to
the mean.

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Standard Normal Distribution Table 2 (Negative Side P(Z < − z))


Patterns for Finding Areas under a Standard Normal Curve
Using Table 1
A. Area to the right of a negative z value or to the left of a
positive z value.
Use Table 1 directly
0 z1 z1 0
B. Area between z values on either side of 0.
= -
z1 0 z2 0 z2 z1 0
1 − Area
C. Area between z values on same side of 0.

= -
z1 z2 0 z1 0 z2
1 − Area 1 − Area
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Patterns for Finding Areas under a Standard Normal Curve Patterns for Finding Areas under a Standard Normal Curve

Using Table 1 Using Table 2


A. Area to the right of a positive z value or to the left of a
D. Area to the right of a positive z value or to the left of a negative z value.
negative z value. Use Table 2 directly
z1 0 0 z1
= - B. Area between z values on same side of 0.
0 z1 0 0 z1 = -
Area = 1
z1 z2 0 z1 0 z2
E. Area between a given z value and 0. C. Area between z values on either side of 0.

= - = +
0 z1 0 z1 0 z1 0 z2 0 z2 z1 0
Area = 0.50 0.50 − Area 0.50 − Area
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Patterns for Finding Areas under a Standard Normal Curve Example 1:


Using Table 2 Scores on a standardized college entrance examination (CEE)
are normally distributed with mean 510 and standard
D. Area to the right of a negative z value or to the left of a deviation 60. A selective university considers for admission
positive z value. only applicants with CEE scores over 560. Find proportion of
all individuals who took the CEE who meet the university's
= + CEE requirement for consideration for admission.
z1 0 z1 0 0 Solution:
0.50 − Area Area = 0.50 Given: μ = 510,σ = 60 and x = 560
Area = P(X > 560)
E. Area between a given z value and 0. Step 1: Draw a normal curve and
shade the desired area.
= -
X
0 z1 0 0 z1 450 510 570
Area = 0.50
Polytechnic University of the Philippines Polytechnic University of the Philippines
560
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Using Table 1 By-hand Approach! Using Table 2 By-hand Approach!
Step 2: Convert the value of x to a z-score. Step 2: Convert the value of x to a z-score.
P(X > 560) = P (Z > z) Area = P(Z > 0.83) P(X > 560) = P (Z > z) Area = P(Z > 0.83)
560 − 510 = 0.2033 = 0.2033
( )
560 − 510
( )
=P Z> =P Z>
60 60
= P(Z > 0.83)
= P(Z > 0.83)
= 1 − P(Z ≤ 0.83)
= 1 − 0.7967 Z
= 0.2033 Z
−2 −1 0 1 2 −2 −1 0 1 2
= 0.2033
0.83 0.83
Use the Complement Rule The proportion of all CEE
and determine one minus scores that exceed 560 is
the area. 0.2033 or 20.33%.
The proportion of all CEE scores that exceed 560 is
0.2033 or 20.33%.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Step 2: Used Excel to determine the area under Example 2:


any normal curve. Technology Approach!
A pediatrician obtains the heights of her three-year-old female
Use “TRUE” for patients. The heights are approximately normally distributed,
cumulative since we
with mean 38.72 inches and standard deviation 3.17 inches.
want the area under the
normal curve.
Determine the proportion of the three-year-old females that
have a height less than 35 inches.
Solution:
Given: μ = 38.72,σ = 3.17 and x = 35
Step 1: Draw a normal curve and shade
the desired area.
Area = P(X < 35)
The proportion of all CEE
scores that exceed 560 is
X
0.2033 or 20.33%. 35.55 38.72 41.89
Polytechnic University of the Philippines Polytechnic University of the Philippines
35
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Using Table 1 By-hand Approach! Using Table 2 By-hand Approach!
Step 2: Convert the value of x to a z-score. Step 2: Convert the value of x to a z-score.
P(X < 35) = P (Z < z) Area = P(Z < − 1.17) = 0.1210 P(X < 35) = P (Z < z) Area = P(Z < − 1.17) = 0.1210
35 − 38.72 35 − 38.72
( 3.17 ) ( 3.17 )
=P Z< =P Z<
= P(Z < − 1.17) = P(Z < − 1.17)
= 1 − P(Z ≥ − 1.17) = 0.1210
= 1 − 0.8790 Z Z
−2 −1 0 1 2 −2 −1 0 1 2
= 0.1210
Use the Complement Rule −1.17 −1.17
and determine one minus
the area.
The proportion of the pediatrician’s three-year-old The proportion of the pediatrician’s three-year-old
females who are less than 35 inches tall is 0.1210 or females who are less than 35 inches tall is 0.1210 or
12.10%.
Polytechnic University of the Philippines
College of Science
12.10%.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Step 2: Used Excel to determine the area under Example 3:


any normal curve. Technology Approach!
A pediatrician obtains the heights of her three-year-old female
Use “TRUE” patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
for cumulative
Determine the probability that a randomly selected three-year-
since we want old girl is between 35 and 40 inches tall, inclusive.
the area under
Solution:
the normal
Given: μ = 38.72,σ = 3.17, and 35 ≤ X ≤ 40
curve. Area = P(35 ≤ X ≤ 40)
Step 1: Draw a normal curve and
shade the desired area.
The proportion of the
pediatrician’s three-
year-old females who
are less than 35 inches X
35.55 38.72 41.89
tall is 0.1210 or 12.10%.
Polytechnic University of the Philippines Polytechnic University of the Philippines
35 40
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
Using Table 1 By-hand Approach! Using Table 2 By-hand Approach!
Step 2: Convert the value of x to a z-score. Step 2: Convert the value of x to a z-score.
P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z) P(35 ≤ X ≤ 40) = P(z ≤ Z ≤ z)
35 − 38.72 40 − 38.72
( 3.17 3.17 )
35 − 38.72 40 − 38.72
( 3.17 3.17 )
=P ≤Z≤ =P ≤Z≤
= P(−1.17 ≤ Z ≤ 0.40) = P(−1.17 ≤ Z ≤ 0.40)
= P(Z ≤ 0.40) − [1 − P(Z ≥ − 1.17)] = [0.50 − P(Z ≥ 0.40) + [0.50 − P(Z ≤ − 1.17)]
= 0.6554 − [1 − 0.8790] Area = P(−1.17 ≤ Z ≤ 0.40) = [0.50 − 0.3446] + [0.50 − 0.1210]
= 0.6554 − 0.1210 = 0.1554 + 0.3790
= 0.5344 Area = P(−1.17 ≤ Z ≤ 0.40)
= 0.5344
The probability a randomly The probability a randomly selected
selected three-year-old female three-year-old female is between 35
and 40 inches tall is 0.5344.
is between 35 and 40 inches tall X
−2 −1 0 1 2
is 0.5344.
−1.17 0.40 X
−2 −1 0 1 2
Polytechnic University of the Philippines Polytechnic University of the Philippines
−1.17 0.40
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

Step 2: Used Excel to determine the area under ACTIVITIES/ASSESSMENTS:


any normal curve. Technology Approach!
1. Which one do you think is more informative?
Use “TRUE” for Why?
cumulative since
we want the area
under the normal
curve.

Polytechnic University of the Philippines


Polytechnic University of the Philippines
College of Science
College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS: ACTIVITIES/ASSESSMENTS:
2. What features 3. Review the table and consider questions such as the
of the ‘Good following.
Presentation’ Origin / Rating Poor
Needs
Satisfactory V Good Excellent Total
Improvement
make it better External 0% 2% 12% 19% 9% 41%
than the ‘Bad Internal 4% 8% 15% 23% 9% 59%
Presentation’?
A. Grand Total 4% 10% 27% 41% 17% 100%
1. What percentage of the employees originated from within the
organization?
2. What percentage of the employees are both internal and rated
‘Very Good’?
3. What percentage of the employees received ‘Needs Improvement’
or ‘Poor’?
4. What category contains the greatest number of employees?
5. Do you see any notable differences in the percentage by category?
B.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS: ACTIVITIES/ASSESSMENTS:
4. Consider the above Frequency Distribution of 5. The length of life of an instrument produced by a machine has a normal
distribution with a mean of 12 months and standard deviation of 2 months.
Salaries. Find the probability that an instrument produced by this machine will last
Salary Frequency Percentage
A. less than 7 months.
41,000 - 50,000 1 1%
51,000 - 60,000 20 13% B. between 7 and 12 months.
61,000 - 70,000 53 35% Be sure to draw a normal curve with the area corresponding to the
71,000 - 80,000 43 29% probability shaded.
81,000 - 90,000 26 17% 6. The lengths of human pregnancies are approximately normally distributed,
91,000 - 100,000 6 4% with mean μ = 266 days and standard deviation σ = 16 days.
101,000 - 110,000 1 1% What proportion of pregnancies lasts more than 270 days?
Total 150 100% B. What proportion of pregnancies lasts less than 250 days?
1.What percentage of the employees earns less than or C. What proportion of pregnancies lasts between 240 and 280 days?
equal 80,000? D. What is the probability that a randomly selected pregnancy?
2.What is the salary range of values? lasts more than 280 days?
3.What salary categories have percentage less than 5? Be sure to draw a normal curve with the area corresponding to the
4.What salary category includes the most employees? probability shaded.
Polytechnic University of the Philippines
Polytechnic University of the Philippines
College of Science
College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS: ACTIVITIES/ASSESSMENTS:
7. Construct frequency distribution table based on the A. Based on the frequency distribution, compute measures of
scores of 75 randomly selected students. central tendency, measures of variation, Q1, D9, P10 , Skewness
37 46 37 26 30 41 28 49 29 34 46 50 38 35 42 and kurtosis.
B. Based on the raw data, compute measures of central
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48 tendency, measures of variation, Skewness and kurtosis using
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26 Excel.
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50 C. Compute Skewness and kurtosis of grouped and ungrouped
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31 data. Make sure to describe the shape of the distribution
Scores Frequency Percentage (%) D. Do you think that computed value for grouped and
26 to 30 ungrouped data are the same?
31 to 35
36 to 40 8. Begin with the following set of data, call it Data Set I.
41 to 45 5, −2, 6, 14, −3, 0, 1, 4, 3, 2, 5
46 to 50 A. Compute the sample standard deviation and sample mean of
Total Data Set I.
Polytechnic University of the Philippines Polytechnic University of the Philippines
College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

ACTIVITIES/ASSESSMENTS:
B. Form a new data set, Data Set II, by adding 3 to each
number in Data Set I. Calculate the sample standard deviation References
and sample mean of Data Set II. https://prezi.com/rirrca9ckuiz/textual-
C. Form a new data set, Data Set III, by subtracting 6 from presentation-of-data/
each number in Data Set I. Calculate the sample standard
deviation and sample mean of Data Set III. https://www.toppr.com/guides/economics/
D. Comparing the answers to parts (a), (b), and (c), can you presentation-of-data/textual-and-tabular-
guess the pattern? State the general principle that you expect presentation-of-data/
to be true.
Statistics. Informed Decision using Data by
9.Using “Encoded Data file”, construct frequency distribution Michael Sullivan, III,. Fifth Edition
table for age, sex, marital status and educational attainment
and interpret the table.

Polytechnic University of the Philippines Polytechnic University of the Philippines


College of Science College of Science
Department of Mathematics and Statistics Department of Mathematics and Statistics

You might also like