Module # 3 - MMW Part 1 Central Tendency

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 22

Module 3

Mathematics as a Tool

Data Management

Mathematics in the Modern World


Mathematics as a Tool


Overview

Hello students! How are you keeping up with our modular classes? I hope that you are
all doing fine. If not, please feel free to ask for our help  In this module, we are going to
review some concepts that you have learned from the Statistics courses you took in
your basic education. Specifically, this will cover the following topics: Descriptive
Statistics, Inferential Statistics and Planning or Conducting an Experiment or Study. We
will briefly discuss the measures of central tendency, measures of variation, normal
distributions, and linear regression and correlation.

This module is designed for you to finish in three weeks. Time management and self-motivation
is the key to accomplishing all designated tasks/activities of this module, to achieve
every objectives and outcomes of this course.

I hope you will be able to appreciate the importance of managing data as it is relevant to the
pandemic that we are struggling with right now. It can help us fight through this battle and all
other disasters which could strike us in the future. Stay safe everyone!

Learning Outcomes

After completing the study of this module, you should be able to:

Use variety and appropriate statistical tools to process and manage numerical data

Apply the methods of linear regression and correlation to predict the value of a variable
given certain conditions
Make use of statistical data in drawing important decisions

Initial Activity (Accessing Prior Knowledge)

I want you to access the DOH COVID-19 Tracker through the link
www.doh.gov.ph/covid19tracker and browse the webpage. If you can’t
access this page, refer to a screenshot below of a portion the webpage. Of all
the data being shown in this platform, can you identify which data show average
values?

1
Can you interpret these? Do the graphs make sense to you? Do they display a trend which
could help us predict the behavior of the data in the next 7-14 days or more?

Review of Descriptive Statistics

Data management is an administrative process that includes acquiring, validating,


storing, protecting, and processing required data to ensure the accessibility, reliability, and
timeliness of the data for its users. Data are individual pieces of factual information recorded
and used for the purpose of analysis. It is the raw information from which statistics are created.
Statistics are the results of data analysis - its interpretation and presentation. Statistics has two
branches. The one that involves collection, organization, summarization and presentation is
called descriptive statistics. The branch of statistics which involves interpretation and
drawing conclusions is called inferential statistics.

Typically, there are two general types of statistic that are used to describe data:
measures of central tendency and measures of variance. A measure of central
tendency, which is sometimes referred to as “averages”, describes a set of data by
identifying its central position. It provide us the value that is typical or representative of the
whole data set. In this module, will
consider three types of averages: mean, median and mode. In the first section of discussion,
we will take a look at these measures specifically on how to calculate them.
Mean
The mean, also called the arithmetic mean, is the most Did you know?
frequently used measure of central tendency. It is obtained by
getting the sum of all values and divide it by the number of Seaweed can grow up to 12
values in the data set. So if we have � values in the data set inches per day averagely.
and they have values �1, �2, �3, … , �� , the mean
An average person eats almost
usually denoted by � (read as x bar) is: 1500 pounds of food a year.

�1 + �2 + �3 + ∑ The human eye blinks an


⋯ + �� � average of 4,200,000 times a
�= year.
= �

∑𝑥 The summation notation “Σ”
�=� denotes the sum of all The most children born to one
values in a given data set. woman is 69, she was a
peasant who lived a 40 year
life, in which she had 16 twins,
x is the variable usually used to represent the individual data 7 triplets, and 4 quadruplets.
values

n represents the number of values in a sample *Source: statisticbrain.com

To be precise, � is known as sample mean. In some


situations,
data are collected from small portions of a large group in order to represent the determined
information about the group. The whole group under consideration is called a population,
while any subset of the population is called a sample.
So if we want to compute for the value of the population mean, denoted by µ, we use the
formula:


𝜇=

where N is the size of the population.

Example
1.) In a grocery store, price (₱) of bath soap products of different brands are the following:
35 46 40 31 39
Find the mean of the prices of these five brands of bath soap.

Solution
The five brands are all of the bath soap brands in the grocery store (population size N = 5). We use
μ to represent the mean.
∑ � 35 + 46 + 40 + 31 + 39
𝜇= =
� 5
Thus, the mean of the prices of these five brands of bath soap is ₱38.20.
Example (cont.)

2.) The following are the ages (in years) of eight of the 67 employees of a small company:
25 32 61 42 39 48 56 29
Find the mean age of the employees.

Solution
Because the given data set only includes eight of the 67 employees of the company, it
represents the population. Hence, N = 8. The population mean is
∑� 25 + 32 + 61 + 42 + 39 + 48 + 56 + 29 = 332 332
�= = = 41.5 ≈ 42
= 8 8

Thus, the mean age of all eight employees of this company is 42 years.

Sometimes a data set may contain a few very small or a few very large values; such values are
called outliers or extreme values. Outliers may contain valuable information or be meaningless
aberrations caused by measurement and recording errors. A major shortcoming of the mean as
a measure of central tendency is that it is very sensitive to outliers.

Example

3.) The following are the savings (₱) of five siblings of a certain family (arranged from the
youngest to the eldest sibling):

2500 3000 2800 2900 35900


Notice that the eldest sibling’s savings (35900) is very large compared to the others. Hence,
this is an outlier. Show how this affects the value of the mean.

Solution
If we do not include the data of the eldest sibling, then the mean is
2500 + 3000 + 2800 + 2900
��𝑎� = = ₱2800
4
Now, if we include it to our data, then the mean is
2500 + 3000 + 2800 + 2900 + 35900
��𝑎� = ₱9420
=
5
Thus, including the savings of the eldest sibling causes more than three times increase in the
value of the mean, which changes from ₱2800 to ₱9420.

Note: In the last example, where we included the outlier in the computation of the
mean, it seemed that the measure that we got does not provide us the value that is
typical or representative of the whole data set. The first computation seemed to be the more
sensible
value. However, it is NOT acceptable to drop an observation just because it is an outlier.
Outliers can be legitimate observations and are sometimes the most interesting ones. It is
important to investigate the nature of the outlier before deciding whether or not to drop
an outlier in the analysis of data. Please check out the following link
https://www.theanalysisfactor.com/outliers- to-drop-or-not-to-drop/ for guidelines in dealing with
outliers.

Learning check

Activity 1
Compute the mean of the following set of data values
a.) Anika went to the supermarket to buy some packed potatoes, with her uncertainty of
estimating the difference between the sizes of the packed potatoes, she looks at the
price tags and finds the following prices (₱)
125 142 132.5 201.25 160
172.75
b.) Archie collects stamps and glues them to his notebook. The following shows the
tally of his monthly collection from June to November
16 25 18 30 29 23

c.) The following shows the grades of a student in Physics, Math, English, Biology,
Chemistry and Filipino, respectively.
86 89 91 93 88 90

Median
Another important measure of central tendency is the median. The median of the set of
scores is the middle value when the scores are arranged in order of increasing (or
decreasing) magnitude. Aside from being a measure of central tendency, it is also a positional
score as it divides the data set into two equal parts: one- half of the observations above it, and
the other half below or equal to it. To find the median of a data set, we first rank the data values
(arrange them in increasing or decreasing order), then get the median in one of the following
ways.
 If the number of scores is odd, the median is the number that is exactly in the middle of
the list.
 If the number of scores is even, the median is found by computing the mean of the two
middle numbers. That is, we add the two middlemost values and divide the sum by 2.
Example
1.) The following data give the age of students in a ballet class. Find the median age.
12 15 17 11 13 16 19
Solution

First, we rank the given data in an increasing order. Since there are seven (odd) values in this
data set as it was arranged accordingly, the fourth term is the middle term. Hence,
11 12 13 15 16 17
19
Median

The median age of the students in a ballet class is 15.

2.) The following are the number of smart phone users in 10 households in a certain
barangay:
7 9 4 10 5 6 5 8 12 5

Solution

Since there are ten values in this data set as it was arranged accordingly, the fifth term and
the sixth term are the middle term:
4 5 5 5 6 7 8 9 10
12
Median
To get the median, we get the average of the two middle terms
6+7
���𝑖𝑎� = = 6.5
2
The median number of smart phone users among the 10 households in the barangay is 6.5.

Learning check

Activity 2
Compute the median of each of the following sets of data
a.) 3, 4, 7, 11, 12, 12, 15, 16
b.) -8, -5, -12, -1, 4, 7, 11
c.) 6, 4, 8.5, 9, 11, 8.25, 6.5, 8.75
d.) The following are the ages of the 5 siblings of the Reyes family. Find the age of the
middle child.
25 21 23 18 27
e.) In a supermarket, the following table shows the number of tissue roll of different
brands sold in a certain week (Brand 1 to Brand 8, respectively)
25 12 5 17 20 13 9
21
Mode
Another measure of central tendency is the mode, which is the value that occurs most
frequently in a data set. There is no formula in finding the mode of an ungrouped data, it is just
found by inspection. When two scores occur with the same greatest frequency, each one is a
mode and the data set is bimodal. When more than two scores occur with the same greatest
frequency, each is a mode and the data set is said to be multimodal. It is also possible that a
data set has no mode; that is when no score is repeated more than the others. In this case, we
stipulate that there is no mode.

Example
1.) The following gives the general weighted average of the top 10 students with the highest
grades in a certain class.
95 93.5 91 89 92 94 91 93 90 91.5
Solve for the mode.

Solution
In this data set, all values appeared only once except for 91 which appeared twice. Because
91 has the highest frequency, hence
���� =
91

2.) The following are the number of ball pens that each of the eight students owned in a
certain group of friends: 1 3 2 0 2 1 3 5
Solve for the mode.

Solution
There are three data values with the highest frequency in this data set which are 1, 2, and 3.
Therefore, the data set is multimodal and the modes are:
���� = 1,
2, 3

Example
3.) A statistician conducted a survey in a certain barangay to obtain the profile of households
wherein the statistician gathered information about the appliances owned by each household.
The table below shows the summary of the number of appliances of by the households
Name of Appliance No. of Households
Television 35
Refrigerator 26
Microwave oven 10
Kitchen Stove 36
Washing Machine 15
Clothing Iron 30
What is the most popular appliance owned in that barangay?

Solution
Since the kitchen stove is the appliance that has the highest frequency, then the most popular
appliance owned in that barangay is kitchen stove.
Example
4.) The following are the number of tourist arrivals in Honda bay for a sample of 1
week, beginning from Monday to Sunday:
87, 45, 63, 49, 75, 80, 100.
What is the busiest day of the week in Honda Bay?

Solution
Since the number of tourists is the highest on Sunday, then the busiest day of the week in Honda
Bay is on Sunday.

Example
5.) The following are the scores of 10 students in Math.
82, 78, 95, 83, 89, 75, 90, 80, 88, 92
What is the modal grade?

Solution
Since there is no grade that occurred more than the others, then this data set has no mode.

Note:
 Among the different measures of central tendency, the mode is the only one that can be
used with the data at the nominal level of measurement. The median can only be used if
the variable is at least at the ordinal level of measurement. While the median can only
be used if the variable is at least at the ordinal level of measurement. For a review of the
levels of measurement, check out this link https://conjointly.com/kb/levels-
of- measurement/.

 Unfortunately, the term average is sometimes used for any measure of central tendency
and is sometimes used for the mean. Because of this ambiguity, we should not use the
term average when referring to a specific measure of central tendency. Instead we
should use the specific term, such as mean, median, or mode.
Learning check

Activity 3
Solve for the mode of each of the given data set.

a.) 13, 15, 16, 12, 11, 17, 13, 15, 10, 14
b.) 7.5, 8.5, 6.5, 3.5, 5, 5.5, 5, 9.5, 10.5
c.) -11, -4, -5, |-1|, 0, 1, 5, 4, 11
d.) The following shows the number of hops in a skipping rope that 10 students can
complete in one minute
60 35 55 78 56 55 69 48 58 57
e.) The following table shows an estimate of the number of strawberries that a farmer
was able to harvest per day in a certain week
Day No. of Strawberries Harvested
Sunday 560
Monday 980
Tuesday 760
Wednesday 670
Thursday 950
Friday 980
Saturday 1000

Mean for Grouped Data

If the data are given in the form of a frequency table, we no longer know the values of
individual observations. In such cases, we cannot obtain the sum of individual values.
We find an approximation for the sum of these values using the formulas shown below

𝜇 = ��

�= ∑
��

where m is the midpoint; f is the frequency of a class interval; �� is the product of the
midpoint and frequency in a class interval, and ∑ �� is the summation of the product of the
midpoint and frequency in all class intervals.

For a quick review of some important terms about a frequency distribution table, we have the
following sidetrip lesson:

The Frequency Distribution (fd). When the set of data contains a large number of elements
or observations, grouping a data set using a frequency distribution (fd) table can give us a
better picture of the behavior of the data. A frequency distribution table is an arrangement of
data into mutually exclusive classes along with the corresponding frequency falling in each
class. Below is an example of a simple frequency distribution and some terms described.
Height of Men (in Number of Men
inches) (X) (frequency)

50-54 1

55-59 2

60-64 3

65-69 49

70-74 46

75-79 1

 Class intervals are the mutually exclusive classes or categories. In our sample
frequency the class intervals are 50-54, 55-59, 60-64, 65- 69, 70-74, and 75- 79.
 The class size or class width is the distance from each lower limit to its corresponding
upper limit, and it is uniform in all classes. For example, in the interval 50- 54, the lower
limit is 50 and the upper limit is 54, and the distance between them is 5 (same with the
distance between 55 and 59, 60 and 64, and so on.)
 The midpoint of an interval is found by adding the lower limit and upper limit of the class
and dividing the sum by 2.
Example

The table below gives the frequency distribution of the daily commuting times (in minutes)
from home to work for all 40 employees of a company.
��𝒊�𝒚 ������𝒊�� 𝑻𝒊�� ������
�� �����𝒚���
1-9 4
10-18 12
19-27 9
28-36 7
37-45 5
46-54 3

Let x denote the daily commuting times (in minutes) from home to work of the 40 employees
of a company, and f denote the frequency. The values of m and mf are calculated in the table
below
� � �
��
1-9 4 5 20
10-18 12 14 168
19-27 9 23 207
28-36 7 32 224
37-45 5 41 205
46-54 3 50 150
N = 40
∑ �� = ���
974
𝜇= = ��. ��
40
Hence, the mean daily commuting times of the 40 employees of the company is 24.35
minutes.

Learning check

Activity 4

The table below shows the frequency distribution of the number of customers received in a
cafe each day during the past 24 days.
Number of Orders Number of Days
10-12 12
13-15 20
16-18 14
19-21 25
22-24 15
Median of Grouped Data

To compute for the median of a grouped set of data, the median class shall be identified by

locating the 𝑡ℎ (or half of the) data at the >cf column. Then, we will use the formula
below: 2



2
�����
𝑖𝑎� = � +
𝐵
�

where L is the lower class boundary of the median class, n is the number of observations in the
data set, ��𝐵 is the cumulative frequency of the class before the median class, ��
is the frequency of the median class, and w is the class width.

Note:
 The median class is found by dividing n by 2 and looking it up through the cf< column.

Pick the row that has a value that is equal or nearest greater than the quotient 2 .

 The cumulative frequency less than (cf<) column is found by starting at the frequency of
the lowest class interval and adding the frequency of the next class each time.

 The lower boundary L is found by subtracting one- half of the distance between an upper
limit and the succeeding lower limit from the lower limit. Sounds vague? Here’s what it
means. Suppose the class intervals are 50-54, 55-59, 60-64, 65- 69, 70-74, and 75- 79,
the distance between an upper limit and the succeeding lower limit is 1. (See 54 and 55;
59 and 60, and so on.) One- half of 1 is 0.5. So the lower boundary of the class 50-54 is
49.5 (that’s 50- 0.5), the lower boundary of the class 55-59 is 54.5, and so on. Now,
suppose the class intervals are 1.0- 1.4, 1.5- 1.9, 2.0- 2.4, 2.5- 2.9. The distance
between an upper limit and the succeeding lower limit is 0.1, and one- half of it is 0.05.
So, the lower boundary of the class 1.0- 1.4 is 0.95 (that’s 1.0- 0.05), the lower boundary
of the class 1.5- 1.9 is 1.45, and so on. (This second example is seldom used.)
Example

The table below gives the frequency distribution of the number of hours spent in studying by
50 students before a quarter exam.

Hours Number of Students


0-3 9
4-7 13
8-11 10
12-15 11
16-19 7

Solution
Let x denote the number of hours spent in studying by 50 students before a quarter
exam, and f denote the frequency. The values of � − � and (� − �)2 , and
�(� − �)2 are calculated in the table below.
X f cf< CB
0-3 9 9 -0.5-3.5
4-7 13 22 3.5-7.5
Median
8-11 10 32 7.5-11.5
class
12-15 11 43 11.5-15.5
16-19 7 50 15.5-19.5

� 50
First, we have to identify the median class. In this example, 2 =2 = 25. Hence, the 25th data
can be found at the third class. (Since the second class contains only 22 data, and the third
class already contains 32 data which includes the 25𝑡ℎ one.) Remember to pick the class

that has a value that is equal or nearest greater than the quotient .
2

With the third class as the median class, � = 7.5 , ��𝐵 = 22, �� = 10, and � = 4
25 − 22
���𝑖𝑎� = × 4 = �. �
7.5 + 10

Therefore, the median number of hours spent in studying by 50 students before a


quarter exam is 8.7 hours.
Learning check

Activity 5
Calculate the median score in a Math quiz as shown in the table below:

Score Number of Students


12-15 4
16-19 6
20-23 4
24-27 5
28-31 7
32-35 4

Mode of Grouped Data

To compute for the mode of a grouped set of data, the modal class shall be identified by
locating the modal class- the class with the highest frequency at the f column. If in case there is
more than 1 class with the highest frequency, we will have to compute for more than 1 mode.
Then, use the formula below:
��� − ���−1
���� = � �
−� ) + (� − � )
+
(� �� ��−1 �� ��+1

Where L is the lower boundary of the modal class, ��� is the frequency of the modal class,
���−1is the frequency of the class before the modal class, and ���+1 is the frequency of the
class afer the modal class and w is the class width.
The formula above can be written as

���� = � +
1
�
�1 +
�2

where
�1= the difference between the frequency of the modal class and the frequency of the class
before the modal class;
�2= the difference between the frequency of the modal class and the frequency of the class
after the modal class
Example
Calculate the mode of the number of hours spent in studying by 50 students before a quarter
exam.
Hours Number of Students
0-3 9
4-7 13
8-11 10
12-15 11
16-19 7

Solution
Let x denote the number of hours spent in studying by 50 students before a quarter exam,
and f denote the frequency. The values of f and �� are calculated in the table below.
x f >cf CB
0-3 9 9 -0.5-3.5
4-7 13 22 3.5-7.5 Modal
8-11 10 32 7.5-11.5 class
12-15 11 43 11.5-15.5
16-19 7 50 15.5-19.5

In this frequency distribution table, the modal class is the second class because it has the
highest frequency which is 13. So, L = 3.5, ��� = 13, ���−1 = 9, ���+1 = 10, � = 4

13 − 9
���� = 3.5 + ×4
(13 − 9) + (13 − 10)
4
= 3.5 + 4+3
×4
4
= 3.5 + 7 × 4

= 5.79

Hence, the mode of the number of hours spent in studying by 50 students before a
quarter exam is 5.79 hours.
Learning check

Activity 6
Calculate the modal score in a Math quiz as shown in the table below:

Score Number of Students


12-15 4
16-19 6
20-23 4
24-27 5
28-31 7
32-35 4

Relationships among the Mean, Median, and Mode

A histogram or a frequency distribution curve can assume shapes which are symmetric and
skewed. The shape of a frequency distribution curve can be identified or described using the
knowledge of the values of the mean, median, and mode of a certain set of data.

1. If the values of the mean, median, and mode are identical, and they lie at the center of
the distribution, then, a symmetric histogram and frequency distribution curve has one
peak. This graph is bell- shaped and is called a normal curve.
2. If the value of the mean is the largest, that of the mode is the smallest, and the value of
the median lies between these two, then a histogram and a frequency distribution curve
is skewed to the right (Notice that the mode always occurs at the peak point.) This graph
is called positively skewed.

The value of the mean is the largest in this case because it is sensitive to outliers that occur in
the right tail. These outliers pull the mean to the right. (Note: The horizontal line in the graph is
the x- axis, so values further to the right are greater than those on the left. Hence we say that
the mean here is the largest among the three averages.)

3.) If the value of the mean is the smallest and that of the mode is the largest, with the
value of the median lying between these two, then, histogram and a frequency distribution
curve are skewed to the left.

In this case, the outliers in the left tail pull the mean to the left. This graph is called negatively
skewed.
Following is a matrix comparing the three averages.

COMPARISON OF MEAN, MEDIAN, AND MODE


Takes
Affected
every Advantages
by and
Average Definition How Common Existence score
Extreme Disadvantages
into
Scores?
account?

Works well with


∑ X Most Always
Mean X = Yes Yes many statistical
n familiar”average” exists
methods

middle Commonly used Always Often a good


Median score No No
exists choice if there
are some
extreme scores

Might not
Appropriate for
exist; may be
Mode most Sometimes used No No data at the
more than
frequent nominal level
one mode
score

Here are some more takeaways in the comparison of the three averages.

 For a data collection that is approximately symmetric with one mode, the mean, median,
and mode tend to be about the same.
 For a data collection that is obviously symmetric, it would need to report both the mean
and median.
 The mean is relatively reliable. That is, when samples are drawn from the same
population, the sample means tend to be more consistent than the other
averages (consistent in a sense that the means of samples drawn from the same
population don’t vary as much as the other averages).
 A comparison of the mean, median and mode can reveal information about the
characteristic of skewness. A distribution of data is skewed if it is not symmetric and
extends more to one side than the other.
 With a symmetric distribution, if the data is graphed using a histogram, we will see that
the left half of the histogram is roughly a mirror image of its right half. The graph is
roughly bell-shaped.
 Data skewed to the left are said to be negatively skewed; the mean and median are to
the left of the mode. Although not always predictable, negatively skewed data generally
have the mean to the left of the median.
 Data skewed to the right are said to be positively skewed; the mean and median are to
the right of the mode. Again, although not always predictable, negatively skewed data
generally have the mean to the right.

REFERENCES

1. Bluman, A. G. (2003). Elementary Statistics: A Step by Step Approach. 5 th Ed. McGraw


Hill, Inc.
2. CENGAGE (2018). Mathematics in the Modern World.
3. Guillermo, R.M. (2018). Mathematics in the Modern World. Quezon City: Nieme Publishing
House Co. Ltd.
4. Lactuan, I. R. et. al. (2018). Instructional Material in Mathematics in the Modern World.
Puerto Princesa City: Palawan State University.
5. Walpole, M. and M. (2002). Probability and Statistics for Engineers and Scientists. 7 th Ed.
Prentice Hall Int’l. Inc.
6. Triola, Mario F. (1994) Elementary Statistics 6 th Ed. Addison- Wesley
7. wac.colostate.edu http://wac.colostate.edu/docs/llad/v4n1/jamison.pdf
8. https://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture9.pdf
9. https://www.dummies.com/education/math/statistics/how-to-interpret-a-correlation-
coefficient-r/
10.https://www.theanalysisfactor.com/outliers-to-drop-or-not-to-drop/
11.https://conjointly.com/kb/levels-of-measurement/.

You might also like