Ge104 Chapter4 Module
Ge104 Chapter4 Module
Ge104 Chapter4 Module
: 00
VISION MISSION
A center of human development committed to the pursuit of wisdom, truth, Establish and maintain an academic environment promoting the pursuit of
justice, pride, dignity, and local/global competitiveness via a quality but excellence and the total development of its students as human beings,
affordable education for all qualified clients. with fear of God and love of country and fellowmen.
GOALS
Kolehiyo ng Lungsod ng Lipa aims to:
1. foster the spiritual, intellectual, social, moral, and creative life of its client via affordable but quality tertiary education;
2. provide the clients with reach and substantial, relevant, wide range of academic disciplines, expose them to varied curricular and co-curricular
experiences which nurture and enhance their personal dedications and commitments to social, moral, cultural, and economic transformations.
3. work with the government and the community and the pursuit of achieving national developmental goals; and
4. develop deserving and qualified clients with different skills of life existence and prepare them for local and global competitiveness
MODULE
FIRST Semester, AY 2020-2021
IV. ENGAGEMENT
Descriptive Statistics. If statistics, in general, basically deals with analysis of data, then descriptive
statistics part of the general field is about “describing” data in symbolic forms and abbreviated fashions.
Sometimes we dealing with a large amount of data and that it is impossible to describe it as it is being a
large amount of data but descriptive statistics will provide us certain tools to make the data manageable to
handle and conveniently neat to describe.
This statement is a piece of information that described a particular trait or characteristic of a group of
workers. Supplied with this singular information but armed with statistical inquisitiveness, descriptive
statistics can further describe the given information to the extent of its depth and breadth.
Inferential Statistics. We could probably argue that descriptive statistics, with its characteristic to
describe, is sufficient to depict any given information. While it is effective to describe a manageable size of
data, it can hardly engulf a sizeable amount of data. Thus, for this kind of situation, inferential statistics is
the alternative technique that can be used. Inferential statistics has the ability to “infer” and to generalize
and it offers the right tool to predict values that are not really known.
Let us consider the fictitious situation we made under descriptive statistics, but this time instead of
reporting the approximate monthly earning of some workers, we want to determine the estimated monthly
earnings of all the workers in a certain region. By attempting to apply descriptive statistics, it would be
impossible to ask all the workers in the entire region about their monthly income. But by using inferential
statistics, we would instead practically decide to select just a small number of workers and ask them of
their monthly income. From there, we can predict or approximate in a “more or less” fashion the monthly
income of all workers in the entire region.
Of course, inference or generalization is a risky process that is why we need to ensure that the small
group of workers we selected are the approximate representative of the workers in the entire region. But
nevertheless, this inference or prediction is better than chance accuracy.
Measurement
It essentially means quantifying an observation according to a certain rule. For instance, the presence of
fever can be quantified by using a thermometer. Body weight can be determined by using a weighing
scale. Or the mental ability can be quantified by using written examination that can generate scores. The
quantification sometimes can be done is simply counting. In quantifying an observation, there are two
types of quantitative informations: variable and constant. A variable is something that can be measured
and observed to vary. While a constant is something that does not vary, and it only maintains a single
value.
Scales of Measurement
- Nominal Scale : Categorical Data
- Ordinal Scale : Ranked Data
- Interval/Ratio Scale : Measurement Data
Nominal Scale. It concerns with categorical data. It simply means using numbers to label categories. This
is done by counting the occurrence of frequency within categories. One condition is that the categories
must be independent or mutually exclusive. This implies that once something is identified under a certain
category, then that something cannot be reassigned at the same time to another category.
An example for this, if we want to measure a group of people according to marital status. We can
categorize marital status by simply assigning a number. For instance “1” for single and “2” for married.
Obviously, those numbers only serve as labels and they do not contain any numerical weight. Thus, we
cannot say that married people (having been labelled 2) have more marital status than single people
(having been labelled 1).
Ordinal Scale: It concerns with ranked data. There are instances wherein comparison is necessary and
cannot be avoided. Ordinal scale provides ranking of the observation in order to generate information to
the extent of “greater than” or “less than;”. But the ranked data generated is limited also the extent of
“greater than” or “less than;”. It is not capable of telling information about how much greater or how much
less.
Ordinal scale can be best illustrated in sports activities like fun run. Finding the order finish among the
participants in a fun run always come up with a ranking. However, ranked data cannot provide information
as to the difference in time between 1 st placer and 2nd placer. Relative to this, reading reports with ordinal
information is also tricky. For example, a TV commercial extol a certain brand for being the number one
product in the country. This may seem acceptable, but if you learned that there is no other product then
definitely the message of the commercial will be swallowed with an smirking face.
Interval Scale: It deals with measurement data. In the nominal scale, we use numbers to label categories
while in the ordinal scale we use numbers to merely provide information regarding greater than or less
than. However, in interval scale we assign numbers in such a way that there is meaning and weight on the
value of points between intervals. This scale of measurement provides more information about the data.
Consider the comparative illustration below:
As you may have noticed, the interval scale provides substantial information about the grades of students.
Student A earned a grade of 99, and so on and so forth. Now look at the information given by ordinal data.
It is simply about ranking. With this of information, Student B can proudly and rightfully claim the 2 nd place
in the ranking. Ordinal scale is a trusted friend to keep a secret, that the grade of student B though
claiming 2nd place is actually 74. Let us analyze the nominal data in our example. With this scale, it is also
alright for the school sadly to announce that only one student passed and four students failed. Nominal
data cannot provide more information specifically provide brighter limelight to student A. Audience may
assume that Student A just got passing grade a little bit higher than the passing mark but student A grade
of 99 will remain hidden forever.
Ratio Scale. This is an extension of an interval scale. It also pertains with measurement data but ratio’s
point of view is about absolute value. Because of this, we oftentimes cannot utilize ratio scale in the social
sciences. We cannot justify an absolute value to gauge intelligence. We cannot say that our student A
with a grade of 99 has an intelligence several points superior than student E who hardly but successfully
achieved a grade of 70.
Population. A population can be defined as an entire group people, things, or events having at least one
trait in common (Sprinthall, 1994). A common trait is the binding factor in order to group a cluster and call
it a population. Merely having a clustering of people, things or events cannot be considered as a
population. At least one common trait must be established to make a population. But, on the other hand,
adding too many common traits can also limit the size of the population. In the illustration below, notice
how a trait can severely reduce the size or membership in the population.
As we read the list, we can mentally visualize that the size of the population is dramatically becoming
smaller and as we add more traits we may wonder if anyone still qualifies. The more common traits we
add, the more we reduce the designated population.
Sample. The small number of observation taken from the total number making up a population is called a
sample. As long as the observation or data is not the totality of the entire population, then it is always
considered a sample. For instance, in a population of 100, then 1 is considered as a sample. 30 is clearly
a sample. It may seem absurd but 99 taken from 100 is still considered a sample. Not until we include that
last number (making it 100) could we claim that it is already a population and no longer a sample.
Statistic. In gauging the sample, any measure obtained from the sample is called a statistic. Whenever
we describe the sample, then it is called statistics. Since a sample is easier to observe or gather than the
population, then statistics are simpler to gather than the parameter.
Graphical representation
Graphs. It is another way to visually show the behavior of data. To create a graph, distribution of scores
must be organized. For instance, in the scores provided below, presenting the scores in an unorganized
manner can provide confusing or no information at all; Reporting raw can even hide some significant
scores to be noticed.
120, 65, 110, 75, 105, 80, 105,
85, 100, 85, 100, 90, 95, 90, 90
But when we arrange the scores from highest to lowest, which is a form of score distribution, some
pieces of information can gradually brought forth and exposed.
Distribution of Scores
120
110
105
105
100
100
95
90
90
90
85
85
80
75
The score distribution can still be organized in a form of a frequency distribution. Frequency distribution
provides information about raw scores, and the frequency of occurrences. Frequency distribution provides
clearer insights about the behavior of scores.
X f
(Raw score) (Frequency of Occurrence)
---------------------------------------------------------------------------
120 1
110 1
105 2
100 2
95 1
90 3
85 2
80 1
75 1
65 1
------------------------------------------------------------------------
Another alternative way of presenting data in frequency distribution is to present them in a tabular form. A
tabular form has the advantage of showing the visual representation of the data. This kind of presentation
is more appealing to the general audience.
Frequency of Occurrence
Raw scores
Another way of showing the data in graphical form is by using Microsoft Excel, as also illustrated in the
graphs below. It is the frequency polygon of the scores in our cited example above.
(Do Activity 1)
Specific Objectives :
A. To know the different measures of central tendency.
B. To comprehend the limitations of the three measures.
C. To realize the effect of the measures in the distribution.
D. To critically know how to select appropriate measure to describe a certain distribution.
Discussion
As we venture into the realm of descriptive statistics, let us now focus in describing the nature of a
quantitative data. By using an appropriate descriptive technique, we can organize and neatly summarize
small amounts and large amounts of data distribution. The procedure, utilizing measures of central
tendency, allows us to precisely describe the centrality of data distribution.
The Mean
The most widely used measure of the central tendency is the mean ( ). It is the arithmetic average
of all the scores. The mean can be determined by adding all the scores together and then by dividing by
the total number of scores. The basic formula for the mean is as follows:
∑𝑥
= 𝑁 The entire number of
observations being dealt with
Mean
In the example below concerning the annual income of 12 workers, the mean can be found by calculating
the average score of the distribution.
X
===========================
Php 200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
=========================== ∑ 𝑥 =
Php 2, 281,000.00
Mean of Skewed Distribution. There are situations wherein the mean cannot be trusted to provide a
measure of central tendency because it portrays an extremely distorted picture of the average value of a
distribution of scores. For instance, let us still consider our example of annual incomes but this time with
some adjustment. Let us introduce another score. The annual income of an affluent new neighbor who
happened to move to this town just recently. This new neighbor has a frugal high annual income so
extremely far above the others.
New neighbor
===========================
Php2,500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
∑ 𝑥 = Php 4, 481,000.00
∑𝑥 4,281,000.00
Php 367,769.00
When the tail goes to the right, the curve is positively skewed; when it goes to the left, it is negatively
skewed. The skew is in the direction of the tail-off of scores, not of the majority of scores. The mean is
always pulled toward the extreme score in a skewed distribution. When the extreme score is at the low
end, then the mean is too low to reflect centrality. When the extreme score is at the high end, the mean is
too high.
The Median
The median is the point that separates the upper half from the lower half of the distribution. It is the
middle point or midpoint of any distribution. If the distribution is made up of an even number of scores, the
median can be found by determining the point that lies halfway between the two middlemost scores.
193,000.00
190,000.00
Median=
185,000.00
180,000.00
Arranging scores to form a distribution means listing them sequentially either highest to lowest or lowest to
highest. Unlike the mean, the median is not affected by skewed distribution. Whenever the mean cannot
provide centrality because of extreme scores present, the median can be used to provide a more accurate
representation.
X
===========================
➔➔➔ Php 2,
500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00 ----- 194,000.00 Median
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
As you observed, even with the presence of extreme score at the high end of the distribution- the value of
the median is still undisturbed.
The Mode
Another measure of central tendency is called the mode. It is the most frequently occurring score in a
distribution. In a histogram, the mode is always located beneath the tallest bar.
X
===========================
Php 2, 500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00 Mode
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
The mode provides an extremely fast way of knowing the centrality of the distribution. You can
immediately spot the mode by simply looking at the data and find the dominant constant. It is the
frequently occurring scores.
The best way to illustrate the comparative applicability of the mean, median and mode is to look again at
the skewed distribution.
10,000
Frequency of Occurrence
Mode
100,000
Mean
20,000
Median
The scale of measurement in which the data are based oftentimes dictates the measures of central
tendency to be used. The interval data can entertain the calculations of all three measures of central
tendency. The modal and ordinal data cannot be used to calculate for the mean. Ordinal mean can
provide an extremely confusing wrong result. Since median is about ranking, a rank above the score falls
and a rank below a score falls; the ordinal arrangement is necessary in finding the median. For the
nominal data, however, neither the mean nor the median can be used. Nominal data are restricted by
simply using a number as a label for a category and the only measure of central tendency permissible for
nominal data is the mode.
In summary, if the interval data distribution is fairly well balanced, it is appropriate to use the mean to
measure the central tendency. If the distribution of the interval data is skewed, you may either remove the
outlier or adopt the median. If the interval data distribution manifests a significant clustering of scores,
then consider to visually analyze the scores and find the presence of dominant constant which is the
Mode.
(Do Activity 2)
B. Measures of Dispersion
Specific Objectives
Discussion
Measures of Variability
There are three measures of variability: the range, the standard deviation and the variance. These
three measures give information about the spread of the scores in a distribution. Metaphorically, variability
assert that a glass half-full is also half empty. Being half-full is about centrality and being half-empty is
about variability.
The Range. The range, symbolized by R, describes the variability of scores by merely providing the width
of the entire distribution. The range can be found by simply determining the difference between the
highest score and the lowest score. This difference always has a single value answer.
The example below shows the calculation of the range from a distribution of annual incomes:
X
===========================
The capability of the range is to give information about the scattering of the scores by merely using two
extreme points. But one the hand, capability of range to report score deviation poses a severe limitation. If
you add new scores within the distribution, the range can never report any changes in the deviation. Also,
just by adding one extreme score amidst normal distribution can definitely increase or decrease in range
The Standard Deviation. The standard deviation (SD) is the life-blood of the variability concept. It
provides measurement about how much all of the scores in the distribution normally differ from the mean
of the distribution. Unlike the range, which utilizes only two extreme scores, SD employs every score in
the distribution. It is computed with reference to the mean (not the median or the mode) and it requires
that the scores must be in interval form.
A distribution with small standard deviation shows that the trait being measured is homogenous. While a
distribution with a large standard deviation is indicative that the trait being measured is heterogeneous. A
distribution with zero standard deviation implies that scores are all the same (i.e. 10, 10, 10, 10, 10).
Although it may seem like stating the obvious, it is important to note that if all the scores are the same,
there is no dispersion, no deviation, and no scattering of scores in the distribution --- so much so that
there can never be less than zero variability.
In calculating the standard deviation, we can either use the computational method or the deviation
method. Both methods provide the same answer. However, in this lesson, we will use the computational
method because it is designed for electronic calculators.
X
The raw score in a distribution is symbolized as
തതതത
𝑋
The mean of a distribution is symbolized as
N
The number of scores in a distribution is symbolized as
The formula simply states that the standard deviation (SD) is equal to the square root of the difference
between the sum of raw score squared, which is divided by the number of cases, and the mean squared
(Sprinthall, 1994). Below is an example on how to obtain the standard deviation using the computational
method.
𝑺𝑫 = 𝟕𝟔𝟓𝟑. 𝟓𝟐𝟏
The concept of standard deviation can further be clarified by using an illustration of score distribution of
students in Section A and in Section B, assuming that both distributions (Section A scores and Section B
scores) have precisely the same measures of central tendency and the same range. The only unusual
things about these two distributions is that they differ in terms of their standard deviations, Section A
having a value that is greater than the value of Section B. The data are clearly shown in the figure below.
Frequency of occurrence
Frequency of occurrence
----------------------------- ----------------------------
0 70 100 130 0 70 100 130
Section A Section B
Two Frequency Distributions of Scores
As can be noticed in the figure above, there is just a slight bulge in the middle of the distribution of Section
A. This means that it has many scores deviating widely from the mean (100) and this is the result of
having a large standard deviation (10). However, Section B having a smaller standard deviation (2), most
of the scores gathers closely around the mean (100) thereby creating a towering lump. These two
distributions being compared reveals the disparity in the values of standard deviation between the two
sections. The section A having a large standard deviation, is behaving in a heterogenous manner while
the section B having small standard deviation acting in a homogenous way.
The Variance. Variance is another technique for assessing disparity in a distribution. In the simplest
sense, variance is the square of the standard deviation. The formula is illustrated below:
𝑿is any raw score in 𝒙it is the deviation score. It is equal to the raw𝑿,score,
the distribution minus the mean,𝑋ത: 𝑥 = 𝑋 −𝑋ത
2 2
𝑉= 2
𝑆𝐷 = Σ𝑋 − 𝑋ത2 = Σ𝑥
𝑁 𝑁
While standard deviation finds out how to spread out the distribution scores from the mean by exploring
the square root of the variance, the variance, on the other hand, calculates the average degree by which
each score differs from the mean -
i.e. the average of all the scores in the distribution. It may appear to be unnecessary to study variance
where, in fact, standard deviation seems complete. But there are situations wherein it is more efficient to
work directly with variances than to frequently make courtesy appearances to the standard deviation. In
fact, F Ratio takes full utilization of this special property of variability.
(Do Activity 3)
C. Measure of Relative Position
Specific Objectives
In the previous lesson, we have demonstrated two separate but related measures that can show the
characteristics of the scores in a distribution. These are the measures of central tendency and the
measures of variability. In this lesson, we can further explore all the possibilities that might occur in the
relationship of centrality and variability (i.e., mean and standard deviation). Let us consider having two
sets of distribution and different case scenarios that might occur in comparing their respective means and
standard deviations.
Discussion
The z- Score
Case A
𝜇1 = 𝜇2 𝜎1 = 𝜎2
As shown in Case A, it is possible that two distributions can generate almost the same means (𝜇) and
almost the same standard deviations (𝜎).
Case B
𝜇1 ≠ 𝜇2
𝜎1 = 𝜎2
It is also possible that two distributions have different means (𝜇) but similar standard deviations (𝜎).
Case C
𝜇1 = 𝜇2
𝜎1 ≠ 𝜎2
Here in Case C, the two distributions have the same means (𝜇) but they differ in standard deviation
(𝜎).
Case D
𝜇1 ≠ 𝜇2
𝜎1 ≠ 𝜎2
In Case D, the distributions differ in terms of means (𝜇) and in terms of standard deviations (𝜎).
This preliminary discussion basically shows that comparing two distributions is complex. Case scenarios
must be considered. Sometimes two distributions differ in terms of means and sometimes they differs in
terms of standard deviations. The groups usually differ in terms of centrality as well as in terms of
disparity. Thus, in order to compare two different groups, there must be a common scale that can
reconcile both means and standard deviation in a single standard form. It is only when we convert scores
𝑋−𝜇 𝑋−𝑥̅ 𝑧 = 𝑧=
𝜎 𝑠
𝑋refers to the raw scores from the population. 𝑋refers to raw score from the sample
𝜇 pertains to the mean of the population 𝑥ത pertains to the
mean of the sample
𝜎population standard deviation 𝑠 estimated standard deviation
Both formulas indicate the same relationship shared by the raw score, mean and standard deviation. The
only distinction between the two formulas is that whether the distribution was generated from the
population or from the sample. The formula in the left refers to the z-scores from the population while the
formula in the right refers to the z-scores from the sample.
𝑋−𝜇
𝑧=
𝜎
The formula explains that values generated by the mean and standard deviation can be integrated to
transform a raw score (𝑋) into a standard score (𝑧). The z-
𝑋−𝜇 score equation, 𝑧 = , can convert the raw score of any group into a
common
𝜎
value and it enables comparison between scores coming from different group distributions. The below is
an illustration of a standardized scale. As you may have noticed in this z-scaling, the mean is always zero
and the standard deviation is always one unit.
𝜇=0
𝜎=1
It seems obvious based on the face value of the scores, that you did better in physics than in biology. But
to come up with a serious comparison about your scores between the two tests, we must take into
consideration the question about how well your classmates perform as a whole group. This requires
additional information about the mean and standard deviation values of both physics and biology groups.
But let us assume that we can right away get those needed information. As such:
𝜇 𝜎
(population (population
mean) SD)
Physics 85 10
Biology 75 5
Now, let us substitute that information into the z-score formula and compute for the z score values
Physics Biology
𝑋𝑝 − 𝜇𝑝 𝑋𝑏 − 𝜇𝑏
𝑍𝑝 = 𝑍𝑏 =
𝜎𝑝 𝜎𝑏
𝑍𝑝 = = 1.0 𝑍𝑏 = = 2.0
Finally, let us place these z-score values into a z-scale to clearly illustrate the measures.
𝒁𝒑=1.0 𝒁𝒃=2.0
|_____|_____|_____|_____|_____|______|_____|_____|
𝒁 -4 -3 -2 -1 0 +1 +2 +3 +4
Physics 45 45 55 65 75 85 95 105 115 125
Biology 55 55 60 65 70 75 80 85 90 95
Notice that in the illustration, we can clearly compare the relative position of scores in one standardized
scale. Notice also that the means of both subjects reconcile to adopt a common mean of 0 (𝜇 = 0).
Likewise, both subjects agree to calibrate their standard deviations into a unit of one (𝜎 = 1). Thus,
comparison can now be made on your final examination scores. As displayed, your score of 95 in physics
falls directly below 1.0 on the z-scale. Your score of 85 in biology falls directly below 2.0 on the z-scale. It
Percentile
To locate a specific point in any distribution, percentiles, quartiles and deciles are the tools that can be
used. The relative position of the raw score can be described precisely by converting it into a percentile. A
percentile refers to a point in the distribution below which a given percentage of scores fall.
Based on the figure above, a score at the 97th percentile (P97) is at the very high end of the distribution
because an enormous number (97%) of scores are below that point. A score at the 3rdpercentile (P3),
however, is an extremely low score because only 3% of the scores are below that point. The figure above
also show that the 50th percentile divides the distribution exactly in half. The position of the 50th percentile
is also the location of the median.
To provide a better understanding on the role of the percentile, let us assume that your College Admission
Test Result reflected the 97th percentile score. This does not indicate that out of 100 items of questions,
you just made around three mistakes. Instead, it means that 97% of those who took the exam did not
perform better than you. However, a significant 3% did perform better than you.
The percentile of any given data value score (x) can be determined by dividing the number of data values
less than x with total number of data values, and then multiplying the obtained result by 100. For
instance, consider a College Admission Test administered to 5000 students, and your score of 800 was
= 80
Your score of 800 places you at the 80th percentile.
Quartiles. As the name implies, quartiles divide the distribution into quarters.
Q1 Q2 Q3
The first quartile, Q1, is actually on the 25thpercentile. The second quartile, Q2, coincides with the median,
which is on the 50th percentile. The 3rd Quartile, Q3, is on the 75th percentile. The Q can be determined by
using the following procedures:
X
===========================
Php 200,000.00
200,000.00
195,000.00
194,000.00
193,000.00
192,000.00
191,000.00
190,000.00
185,000.00
181,000.00
180,000.00
176,000.00
===========================
First, make sure that the scores are arranged from highest to lowest.
Q1 =.25 (n+1)
Q1 =.25 (12+1)
Q1 = 3.25
Q1=182,000
The value of x corresponding to the position is 181,000 + .25 (185,000-181,000). Thus, Q1 = 182.000
The value of x corresponding to the position is 191,000 + .50 (192,000-191,000). Thus, Q2 =191,500
Box-and-Whisker Plots
A box and whisker plot displays a graphical summary of a set of data. It provides information about the
minimum and the maximum scores in the distribution, the 1 st Quartile and 3rdQuartile as well as the 2nd
quartile or the median. Observe the figure below.
X
===============
Php 200,000.00 HS
200,000.00
195,000.00 Q2
194,000.00
193,000.00
192,000.00 Median
191,000.00
190,000.00
185,000.00 Q1
181,000.00
180,000.00
176,000.00 HS
================
Box-and-Whisker plots are easy to construct and they outrightly show important information about the
distribution of scores in a simple diagram. Also, it is not necessary to label the final product.
|---|---|---|---|---|---|---|---|---|---|---|
(Do Activity 4)
Discussions
1. Majority of the scores cluster around the middle of the distribution and fewer scores scattered
in both extreme sides or tail ends of the curve.
2. It is always symmetrical and perfectly balanced.
3. Being a theoretical distribution, the mean, median and the mode are all equal.
5. The normal curve is asymptotic to the abscissa and the total area under the curve is
approximating 1.0 or 100%
6. The normal curve has a mean of zero and standard deviation of 1 unit.
The table we will be using is a right tail z-table. This table is used to find the area between z=0 and any
positive value and reference the area to the right side of the standard deviation curve. The z-score table
gives only the percentage for the half of the curve. But since the normal curve is symmetrical, a z-score
that is given to the right of the mean yields the same percentage as a z score to the left of the mean
Mean line
For example, to look up a z-score of .68 using the z-score table, look for 0.6 in the far left of the column
then look for the second decimal 0.08 in the top row. The table value is 0.25175. It represents a
percentage of 25.17 %. It is the percentage of cases falling between the z score and the mean.
Mean 0.68
Z score
25.17% is the percentage of cases falling between the z score (0.68) and the mean.
Now, let us consider some situations that might possibly occur in using the z-table
Case 1. Finding percentage of cases falling between z-score and the mean.
This area is 24.215%
This area is 24.215%
Case 2. Finding the percentage of cases above the given z-score. It is important to remember for this
case that the total area of the normal curve is 1.0 or 100%. It is also essential to keep in mind that the
right half of the normal curve is 50% as well as the left half (50%). You also need to consider that the z-
table always provide a percentage value in relation to the mean.
+0.75 -0.75
Mean
++
Z score - Z score Mean
(a) (b)
For Case 2(a), To find the area above the given z-score, the equivalent z-table value must be determined
then subtract it from the total area of the right half which is 50%. For example, to find the percentage of
cases above the z-score of +0.75. Find the z-table value of +.75 which is 0.24215 (24.215%) then
subtract it from the total area of the right half of the normal curve which is 50%. This is 50% - 24.214% =
25.785%
For Case 2(b), in order the determine the area above the given z-score (the z-score here is a negative
number because it is situated in the left side of the normal curve)
, simply find the equivalent z-table value then add 50%. Again, always keep in mind that the z-table only
provide a percentage of cases between the z-score the mean and not the entire right side of the curve. To
cite another example, let us find the percentage of cases above the z-score of -0.75. The z-table value of
-0.75 is 0.24215. This is equivalent to 24.215%. With this number just add the percentage area of the
entire right side which is 50%. So this is 24.215% + 50% =74.215%.
Case 3. Finding the percentage of cases below the given z-score. The principle we made in Case 2 is the
same principle that can be applied in Case 3.
Mean + Z score
-0.75
+0.75
Mean
-Z score
(a) (b)
For case 3(a), try to determine the percentage of cases below the z-score of -0.75. Using similar analysis
made in case 2(a), the total area of the left side must be subtracted. If your computation is correct, your
answer is 25.785%.
For case 3(b), to determine the percentage of cases below the z-score of +0.75. The z-table value will
only cover the percentage of cases between the z-score and the mean, so you need to add 50% which is
the l percentage of cases of the left side of the normal curve. Your computation must generate an answer
of 74.215%.
-0.75 +0.75
Mean
-Z score +Z score
To illustrate Case 4, let us try to determine the percentage of cases between the two z-scores. The -
0.75 Z-score and +0.75 z-score. The -0.75 z-score generates a z-table value of 24.215%. Also +0.75 z-
score generates the same z-table value of 24.215%. Thus, the percentage of cases between -0.75 and
+0.75 is simply to add the two percentage of cases and that is (24.215% + 24.215%) 48.43%.
We are now familiar with the z-score concepts and having a knowledge about percentages of area above,
below and between z-scores. Likewise, we are also equipped with certain knowledge regarding the z-
The z-score reveals the location of the raw score from the mean in the standard deviation units. The z
score accounts both the mean of the distribution and the amount of variability. Now, let us determine the
practical use z-score in the context of normal distribution of raw scores.
Case A. When the percentage of cases is between the raw score and the mean. The normal
distribution of physics scores has mean of 85 and a standard deviation of 10. What percentage of scores
will fall between the physics score of 95 and the mean?
Initially, we need to convert the raw score of 95 into its equivalent z-score.
𝑥−𝑥̅ 95−85
𝑧= = = 1.0
𝑆𝐷 10
85 95
𝑋ത (1.00)
Case B. When the percentage of cases fall below a raw score. Using the same example, on a normal
distribution of scores in physics class, with a mean of 85 and a standard deviation of 10, what percentage
of physics scores fall below a score of 95?
34.13
%
%
50
85 95
𝑋ത (1.00)
Finally, look up the z-score in the z- table ( https://www.calculator.net/z-scorecalculator.html )take the right
value. It is 0.34134 or 34.13%. Lastly, add the 50% to 34.13% to get the sum 84.13%. The percentage of
physics scores fall below a score of 95 is 84.13%. This means that if 100 students took the examination
and your score is 95. Then your physics grade surpassed the grade of 84 students.
Case C. When the percentage of cases is above a raw score. On a normal distribution of scores in
physics class, with a mean of 85 and a standard deviation of 10, what percentage of physics scores
above a score of 95?
Again, we need to convert the raw score of 95 into its equivalent z-score.
𝑥−𝑥̅ 95−85
𝑧= = = 1.0
𝑆𝐷 10
34.13
%
85 95
𝑋ത
(1.00)
We look up the z-score in the table ( https://www.calculator.net/z-scorecalculator.html )take the correct
value. It is 0.34134 or 34.13%. Then subtract 34.13% from 50%. The answer is 15.87%. This is the
percentage of cases above the score of 95. This means that if 100 students took the examination and
your score is
95. Then around 15 students surpassed your physics grade of 95.
Case D. When the percentage of cases is between raw scores. On a normal distribution of physics
scores, the mean is 85 and the standard deviation is 10. Your physics score is 95 and your friends score
is 80. You wanted to determine how many students got a score between your friend’s score of 80 and
your score of 95.
Again, convert the raw score of 95 and the raw score of 80 into its equivalent zscores.
𝑥−𝑥̅ 95−85 𝑥−𝑥̅ 80−85
𝑧= = = 1.0 𝑧= = = - 0.5
𝑆𝐷 10 𝑆𝐷 10
%
%
80 85 95
(-0.5) 𝑋ത (1.00)
At this point, we already made a significantly long journey. From the measures of central tendency to the
measures of variability and finally to measures of relative position. We are now in the position no longer
seeking answers to questions but seeking questions beyond the conventions established by the answers.
(Do Activiy 5)
Specific Objective
At the beginning of this course, we defined mathematics as the science of patterns. We realized that
nature follows a certain kind of mathematical structure as we observed some patterns and irregularities
and whenever we see patterns, irregularity also beg also to be noticed. Also, whenever we see
irregularities, some patterns suddenly waving for attention.
The linear correlation is not about patterns, but it is about looking on irregularities and patiently waiting for
the patterns to manifest. This lesson deals with determining the connections of the things seemed
unrelated and to declare whether some correlations are indeed significant .
Discussions
The Product-Moment Correlation Coefficient or Pearson r is an statistical tool that can determine the linear
association between two distributions or groups. This tool can only establish the strength of association or
correlation but it can never justify any causal relation that may appear or seemed obvious.
The number of
subjects Means
Σ 𝑋𝑌
𝑁
−(𝑥̅ )(𝑦ത)
𝑟= 𝑆𝐷𝑥 𝑆𝐷𝑦
Standard Deviations
The pearson r value may provide three possible scenarios. If the value of 𝑟 is + then it is a positive
correlation. If it is - then it is a negative correlation. If 𝑟’s value is around “0” then it means that almost no
linear correlation found.
𝒓 = +𝟏 𝒓 = −𝟏 𝒓=𝟎
An example of positive correlation is height and weight of a person. Under normal circumstances
whenever a person gain height it means also a gain in weight. An example of negative correlation is the
relationship between length of employment and degree of attractiveness. As you may observe physically
attractiveness of an employee is affected by the chronologically advancement of his or her age. An
example of zero correlation might be relationship between grade of student living in high land areas and
the study habits of students living in the low land areas. You should also remember that Pearson 𝑟 does
not generate a value less than -1 or more than +1. Any answer outside below -1 and above +1 can be
attributed to a wrong computation made.
We will explain the nature of linear correlation by using an example. Assuming that we want to determine
if there is a correlation between hours of study and grades of students last semester. Initially, we
need to randomly select students (let say 10) and ask them about their averaged grade last semester as
well as the number of hours they spent in studying per week in that semester. Let us presume that right
away they provided us these two informations.
But before we can immediately use the Pearson r formula, we need to ensure that this is the correct
statistical tool in determining the correlation between hours of study and grades. Let us check some basic
Pearson r requirements:
================================================================================
Student Hours of Study (𝑥) 𝑥2 Grade (𝑦) 𝑦2 𝑥𝑦
================================================================================
A 15 225 2.75 7.56 41.25
B 35 1225 1.25 1.56 43.75
C 05 25 3.00 9.00 15.00
D 20 400 2.50 6.25 50.00
E 30 900 1.50 2.25 45.00
F 40 1600 1.00 1.00 40.00
G 20 400 2.25 5.06 45.00
H 25 625 1.75 3.06 43.75
I 25 625 2.00 4.00 50.00
J 08 64 3.00 9.00 24.00
================================================================================
𝚺𝒙=223 𝚺𝒙𝟐=6089 𝚺𝐲=21 𝚺 𝒚𝟐 = 48.75 𝚺𝒙𝒚=397.75
Σ𝑥 223 Σy 21
𝑥̅ = = = 22.3 𝑦ത = = = 2.1
𝑁 10 𝑁 10
𝑆𝐷𝑥 ̅ 𝑆𝐷𝑦
397.75
𝑟= 10 − (22.3)(2.1)
(10.56)(0.682)
𝒓= -0.979
Thus, we could say that the correlation between hours of study and grades of students achieved a
Pearson r value of -0.979. Do not be confused by the that there is a negative sign in our final answer. This
sign provides an idea of the direction of correlation line. You should take into consideration that a grade of
1.0 has a strong academic weight in our grading system but once plug in into the computation it is
interpreted by formula as a small number. Nevertheless, with full knowledge of the concept you can
always come up with the right interpretation.
Since the distribution exclusively concerns the 10 students and it is not a population sample, then
Guilford’s suggested interpretation for the values of r can be used without hindrance.
Does it mean that better grades can be achieved by spending more time studying?
Does it mean that spending more time studying is a by-product of better grades? Does it mean
that another factor influenced better grades and study habits?
All three of these questions are possible. But the point is that correlation alone is not enough to
identify which is the real explanation. Pearson r is not a tool for establishing causation. It can only a
tool describe linear correlation between to observed traits.
Specific Objective
In the previous lesson, we discussed Pearson r as a powerful tool in determining linear correlation. It is an
important tool to investigate associations considering that different mathematical patterns are all around
us. Such as, the connection of high tide and low tide in human behavior, the association between height
and weight. And the correlation between the metaphoric flap of butterfly in Japan to a weather disturbance
in South America a year after.
But correlation entrapped and cloistered us within the parameter of merely associating. Correlation in and
by itself cannot establish causation to warrant prediction. But in this lesson of regression analysis, not only
that we can connect and associate some observable patterns, it also permits us finally make basic
predictions.
Discussion
A bivariate simply means that we can graphically represent two variables (x and y) in a scatter plot
wherein each point in a scatter plot represent a pair of scores. Scatter plot is necessary in order to
determine the regression line. The regression line a generated straight line that lies closest to all the point
in the scatter plot.
Our example below illustrates the construction of scatter plot based on some data information regarding
the association of our previous example on hours of study and grade.
As shown in the scatter plot above, the straight line is called the least-squares regression line. This
generated line minimizes the sum of the squares of the vertical deviation from each data point to the line.
This means that of all the possible lines that can suggest the correlation line strength of all the points, the
equation of this generated line has the best fit. The 𝑑𝑛 represents the distance from point (x,y) to the line.
𝑑12 + 𝑑22 + 𝑑32 + 𝑑42 + 𝑑52 + 𝑑62 + 𝑑72 + 𝑑82 + 𝑑92 + 𝑑102
In the least-squares line, this correlation that can be established around the regression line is the basis for
resulting prediction. But in order to make predictions, three important ingredients must be on hand: 1. The
equation of the best fit line.
2. Slope of the line, and 3. The y-intercept of the line.
To apply this formula to our given data, we need to find the value of each summation.
There must be𝑛 ordered pairs: (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), (𝑥3 , 𝑦3), … , (𝑥𝑛 , 𝑦𝑛 )
𝑦 = 𝑚𝑥 + 𝑏
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) (Σ𝑦)−𝑚(Σ𝑥)
𝑚= 𝑏=
2 2
𝑛(Σ𝑥 )−(Σ𝑥) 𝑛
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) 10(397.75)−(223)(21)
𝑚= 𝑛(Σ𝑥2)−(Σ𝑥)2 =
10(6089)−49729= -0.06321
(Σ𝑦)−𝑚(Σ𝑥) (21)−(−0.06321)(223)
𝑏= = =3.509
𝑛 10
𝒚𝒑𝒓𝒆𝒅 = 𝒎𝒙 + 𝒃
Since we have already determined the regression the line, let us just simply plug all the necessary values
then “𝑦”.
================================================================
𝒚𝒑𝒓𝒆𝒅 = 𝒎𝒙 + 𝒃
𝒚𝒑𝒓𝒆𝒅 = -0.06321x + 3.509
================================================================
𝒙 − 𝟎. 𝟎𝟔𝟑𝟐𝟏𝒙 − 𝟎. 𝟎𝟔𝟑𝟐𝟏𝒙 + 𝟑. 𝟓𝟎𝟗 = 𝒚𝒑𝒓𝒆𝒅
37 -2.33877 1.17023
22 -1.39062 2.11838
08 -0.50568 3.00332
================================================================
The predicted grade of students is around 1.17 for the student who spends 37 hours of study, 2.12 for the
one spending 22 hours of study, and just a passing grade of 3.0 for the one engaged for eight hours of
study.
(Do Activity 7)
V. ACTIVITIES
Activity 1:
Direction: Indicate which scale of measurement- nominal ordinal or interval is being used.
1. Both Globe and Smart phone number prefix 0917 and 0923 served 1 million and 2.5 subscribers,
respectively.
2. The Philippine Statistics Office announces that the average height of Filipino male is 156.41 cm tall.
3. Postal Office shows that 4,231 individuals have a zip code of 4231.
4. The Sportsfest committee posted the names of individuals with their order of finish for the first 50
runners to reach the finish line.
Activity 2:
Direction: Answer the following question.
1. A class of 13 students takes a 20-item quiz on Science 101. Their scores were as follows: 11, 11, 13, 14,
15, 18, 19, 9, 6, 4, 1, 2, 2.
2. A day after, the of 13 students mentioned in problem 1 takes the same test a second time. This time
their scores were: 10, 10, 10, 10, 11, 13, 19, 9, 9, 8, 1, 7, 8.
3. For the set of scores: 1000, 50, 120, 170, 120, 90, 30, 120.
Activity 3:
Direction: Analyze and answer the questions below.
1. At ABC University, a group of students was selected and asked how much of their weekly
allowance they spent in buying mobile phone load. The following is the list of amounts spent: Php
120, 110, 100, 200, 10, 90, 100, 100. Calculate the mean, the range, and the standard deviation.
2. At XYZ University, another group of students was selected and asked how much of their weekly
allowance they spent in buying mobile phone load. The following is the list of amounts spent: Php
200, 180, 30, 20, 10, 160, 150, 80. Calculate the mean, the range, and the standard deviation.
3. Consider the data in problems 1 and 2, in what way do the two distribution differ? Which group is
more homogeneous?
Activity 4:
1. You have taken final exams. Your score in science 101 was 80. Your score in math 101 was 95
n Σ𝑥 Σ𝑥2
Science 101 120 7120 2800
Math 101 75 2275 325
2. The score of all students at ABC school were obtained. The highest score was 140, and the lowest
score was 110. The following scores were identified as to their percentile:
__________________________
X Percentile
--------------------------------------------------
112 10th
119 25th
123 50th
127 75th
134 90th
3. The data given are the calories per 200 milliliters of popular sodas.
21,18,21,20,26,31,18,16,25,27,13,27,36,24,25
Activity 5:
Direction: Choose the correct letter of the question.
1. Road test of MG5 Sedan compact car show a fuel mean rating of 20 kilometers per liter in
highways, with a standard deviation of 1.5 kilometers per liter. What percentage of these cars
(MG5) will achieve results of
Activity 6:
Seven randomly selected participants were given both math and music tests. Their scores are as follows:
====================
Math Music
====================
16 14
6 7
17 15
11 14
Activity 7:
The research office is interested in the possible relationship anxiety and aptitude scores of randomly
selected eight students; they are given both the anxiety test and aptitude test. Their weighted, scaled
scores are as follows:
============================================
Subjects Aptitude Test Anxiety Test
============================================
A 10 12
B 7 9
C 13 14
D 8 7
E 11 11
F 6 7
G 10 12
H 11 10
============================================
VI. OUTPUT
Submit your output through LMS or send it to the following email address of your respective instructor.
For this culminating requirement in Module Five, you need to work together in groups of 3 or 4.
1. Your task is to prepare a proposal study that can contribute to a solution to any social problem.
2. You must use statistical methods for your data processing and analyses.
3. Your final output must be no more than 8 pages that details your project proposal.
4. Please follow the outline provided below:
a. Title page (not included in the page count)
-An example of problem to be addressed: In this COVID-19 pandemic, how can we
reduce human traffic in wet market places.
b. Background and Statement of the Problem
c. Literature Review
d. Proposed Study with emphasis on how statistics will be used
i. Data to be collected
ii. Methods of data collection and data gathering instrument
iii. Data gathering procedure iv. Method of Analyses
e. Discussion of how your project proposal can address the identified problem.
f. References (APA or MLA)
__________ 2. It is a branch of statistics that has the ability to “infer” and to generalize.
It is also the right tool to predict values that are not really known.
a. Variable c. Data
b. Measurement d. Constant
__________ 4. If the data are labelled 1st, 2nd, 3rd, and so on, in what kind of scale does
it falls?
__________ 8. It is the middle point or midpoint of any distributions. It separates the upper
half from the lower half of distribution.
a. Mean c. Mode
b. Median d. Range
__________ 9. The following is a list of retirement ages for the workers in production
plant: 65, 64, 65, 61, 62, 64, 65, 63, 63, 65, 64. What is the median?
a. 64 c. 63
b. 65 d. 62
a. Mean c. Mode
b. Median d. Range
__________ 11. These three measures can provide the information about spread of the
scores in the distribution.
The grade-point average for the selected university students were computed. The data are
as follows:
Student GPA
1 3.75
2 3.00
3 3.00
4 1.75
5 2.00
6 2.25
7 3.25
a. 1.00 c. 1.75
b. 2.25 d. 2.00
a. 3. 00 c. 1.75
b. 2.25 d. 2.00
a. 8 c. 32
b. 16 d. 64
__________ 16. In comparing different groups, there must be a standard scale that can
reconcile both means and standard deviation in single standard form. It is only then that
direct comparison is possible because transformed scores from different distributions will
share common scores and these common scores are called ____.
a. Percentile c. T-score
b. Quartile d. Z-score
__________ 17. Jerry took College Admission Test which reflected at 89th percentile,
a. 89% of those who took the exam did not get it right than Jerry.
b. Out of 100 items of questions, Jerry had 11 mistakes.
c. Jerry answered 89 questions correctly.
d. 89% of those who took the exam get it right than Jerry.
__________ 18. It divides the distribution into quarters.
a. Percentile c. T-score
b. Quartile d. Z-score
__________ 21. It is a unimodal frequency distribution where the scores are scattered on
the X-axis while the frequency of occurrence is defined by the Y-axis. a. Z-distribution
curve c. Normal curve
b. Distribution curve d. X-axis and Y-axis curve
__________ 22. This table only gives the percentage for the half curve but both the right
and the left of the mean yields the same percentage since the said curve is symmetrical.
a. Z-table c. T-table
b. Percentage Table d. Normal table
__________ 23. Scores on English Test have an average of 80 with a standard deviation
of 6. What is the z-score of the student who earned a 75 on test?
a. -0.97 c. -0.88
b. -0.76 d. -0.83
__________ 24. Group of children compared what they received while trick or treating. The average
number of pieces of candy received is 43 with a standard deviation of 2. What is the z-
score corresponding to 20 pieces of candy?
a. -11.5 c. -9.5
b. -10.5 d. -12.5
__________ 25. The mean growth of the thickness of tree in a forest is found to be 0.5
cm per year with a standard deviation of 0.1 cm per year. What is the z- score
corresponding to 1 cm per year?
a. 4 c. 6
b. 5 d. 7
a. -1 c. 2
b. +1 d. 0
__________ 28. The linear correlation is said to be substantial relationship if the value of
r is ___.
Given the data chart of selected persons with their ages and daily incomes, calculate the
Pearson’s correlation coefficient.
a. 1 c. 0.99
b. 0.89 d. 0
__________ 30. What is the interpretation for the r value?
The data below pertains to the experience of some workers in a company (number of years)
and their performance rating. Estimate the performance rating for a worker with 20 years of
experience.
a. 1.13 c. 1.15
b. 1.14 d. 1.16
a. 69.6 c. 69.8
b. 69.7 d. 69.9
__________ 2. What is the value of regression line?
a. 92.0 c. 92.2
b. 92.1 d. 92.3
1. C
2. A
3. B
4. D
5. A
6. A
7. C
8. B
9. A
10. C 11. C
12. D
13. B 14. A
15. B
16. D
17. A
18. B
19. D
20. B
21. C
22. A
23. D
24. A
25. B
26. C
27. D
28. B
29. C
30. A
31. C
32. A 33. A
34. B 35. D
Coolman, R. (2015, June 5). What is Symmetry? | Live Science. Https://Www.Livescience.Com/. Retrieved
from https://www.livescience.com/51100-what-is-symmetry.html
Discovery Cube. (2018, February 2). Moment of Science: Patterns in Nature. Discoverycube.Org/Blog.
Retrieved from https://www.discoverycube.org/blog/momentscience-patterns-in-nature/
Irish Times. (2018, October 18). Who Uses Maths? Almost Everyone! Https://Thatsmaths.Com/. Retrieved
from https://thatsmaths.com/2018/10/18/who-uses-maths-almost-everyone/
The numbers of nature- the Fibonacci sequence - Eniscuola. (2016, June 27).
Http://Www.Eniscuola.Net/En/. Retrieved from
http://www.eniscuola.net/en/2016/06/27/the-numbers-of-nature-the-fibonacci-sequence/
Basic Statistical Analysis, Sprinthall, Richard C, 4th Edition, Ally and Bacon 1992 Massachusetts
Guilford, J. P. (1956). Fundamental statistics in psychology and Education (3 rd. ed.). New York:
McGrawHill, p.145.
Yount, William R. Research Design & Statistical Analysis in Christian Ministry 4th Edition. (USA, 2006)
Mathematics in the Modern World (Calingasan, Martin & Yambao) C & E Publishing, Inc. 2018 (Quezon
City) https://www.calculator.net/z-score-calculator.html
Turing The chemical basis of morphogenesis Philos. Trans. R. Soc. Lond. Ser. B, 237 (1952), 3
p.
6
KIMBERLY L. SORUILA
Instructor I
BERLYN A. FAMILARAN
Instructor I
PEDRO B. KATIGBAK
Associate Professor I
Checked by:
Approved by:
Noted by: