Chapter 4 Describing Educational Data Libmanan Group
Chapter 4 Describing Educational Data Libmanan Group
Chapter 4 Describing Educational Data Libmanan Group
Professor
Chapter 4
DESCRIBING EDUCATIONAL DATA
Objectives:
At the end of this chapter, the students should be able to:
➢ 1. present and interpret correctly data in a tabular or graphic data
➢ 2. arrange data in tables and graphs in a correct fashion
➢ 3. compute the mean, median and mode of a set of test scores.
➢ 4. differentiate the relationship between the shape of and the relative positions of
measures of central tendency.
➢ 5. explain how the measures of central tendency differ and significance of those
differences
➢ 6. determine the standard deviation and semi-interquartile range of a set of test
scores.
➢ 7. express the relationship between standard deviation units and area under a normal
curve.
➢ 8. compute the coefficient of correlation using different methods
➢ 9. Interpret the Pearson and Spearman of measures of relationship
➢ 10. Show appreciation for the value of the information presented in the chapter to an
educator who wishes to describe and interpret data.
Math Test is given to a class. Here are the scores of A the 50 pupils. Let's make a
frequency distribution and tally the frequency.
Graphic Representation
Here we have plotted a point at the mid-point of each of our score intervals. The
height at which we have plotted the point corresponds to the number of cases, or frequency
(1), in the interval. These points have been connected and the jagged line provides a
somewhat different picture of the same set of data illustrated in Fig. 1.
MEASURES OF CENTRAL TENDENCY
is a single value that attempts to describe a set of data by identifying the central
position within that set of data. As such, measures of central tendency are sometimes called
measures of central location.
MEAN
➢ The Mean (x̄ ) is the arithmetic average of a set of scores. It involves the values of the
scores in the distribution.
➢ The most dependable measure of central tendency.
➢ The most reliable since all scores are important.
Median is used:
A. When one wants the exact midpoint or 50% of the distribution.
B. When there are extreme scores which would markedly affect the score.
C. When it is desired that certain scores should influence the central tendency but all that
is known about them is that they are above or below the median.
Mode is used:
A. When a quick/approximate measure of central tendency is all that is wanted.
B. When the measures of central tendency should be the most typical value.
MEASURES OF VARIABILITY
Measures of Variation of dispersion
-indicate the degree or extent to which numerical value are dispersed or spread out
about the average value in a distribution.
Range
-the difference between the highest and the lowest score.
Formula:
R = HS – LS
Example:
In reading test, the highest score is 59 and the lowest score is 17. What is the
range?
*FOR GROUPED DATA
Range
-is determined by subtracting the lower class boundary of the lower class interval from
the upper boundary of the highest class interval of class distribution
Percentile rank
-tells what percent of the cases got below the rank position.
Here are the steps followed in computing the S.D. of the Ungrouped Scores.
a. Find the Mean
b. Subtract the Mean from the scores.
c. Square the deviation
d. Find the sum of the squared deviation (∑ 𝐝𝟐 ).
e. Divide the sum of the squared deviation by the number of cases.
f. Find the square root of the answer in e.
Here are the steps followed in computing the S.D. of the Grouped Scores.
1. Do steps 1-5 in finding the Mean
2. Add the column f𝐝𝟐 . To get the f𝐝𝟐 , multiply the d by the fd.
3. Get the sum of f𝐝𝟐 .
4. Divide the sum of f𝐝𝟐 by the number of cases(N).
5. Divide the sum of f𝐝 by the number of cases.
6. Square the result in step no. 5.
7. Subtract the result in step No. 6 from the result in step no. 4
8. Extract the square root of the difference found in step no 7.
9. Multiply the class interval by the result in step No. 8.
INTERPRETING THE STANDARD DEVIATION
Thorndike and Hagen believe that it is almost impossible to say in any simple terms
what the standard deviation is or what it corresponds to in pictorial or geometric terms.
Primarily, it is a statistic that characterizes a distribution of scores. It increases in direct
proportion as the scores spread scores. It increases in direct proportion as the scores spread out
more widely. The larger the standard deviation, the wider the spread of scores.
The standard deviation gets its most clear-cut materials meaning for one particular
type of distribution of scores. This distribution is called the “normal” distribution. It is defined
by a particular mathematical equation, but to the everyday user it is defined approximately
by its pictorial qualities. The “normal” curve is a symmetrical curve having a bell-like shape.
That is, most scores pile up in the middle scores values; as one goes away from the middle in
either direction the pile drops off, first slowly and then more rapidly, and the cases tail out to
relatively long tails on either end.
Pearson Product-Moment Correlation Coefficient
CORRELATION
➢ is a measure of relationship between two variables.
Most measures of correlation indicate two things:
➢ the magnitude or size of relationship
➢ the direction of relationship between two sets of measurements
Note:
➢ For instance, a correlation of +85 and -85 are of the same size. The size does not have
anything to do with the size of the relationship; rather, it indicates the direction of the
relationship.
➢ When two variables are positively related, one increases as the other increases.
➢ On the on the other hand, when two variables are negatively related one increases as
the other decreases.
∑𝑿𝒀 ∑𝑿 ∑𝒀
− ( )( )
𝑵 𝑵 𝑵
𝒓=
𝟐 𝟐 𝟐 𝟐
√∑𝑿 − (∑𝑿) √∑𝒀 − (∑𝒀)
𝑵 𝑵 𝑵 𝑵
Directions for computing a product-moment correlations coefficient (r) from ungrouped
data.
1. Write pairs of scores to be studied in two columns. Be sure that the pair of scores for
each pupil is in the same row. Label one set of scores X and the other Y.
2. Squares each score in the X column and write the result in the X 2 column.
3. Squares each score in the Y column and enter each result in the Y 2 column.
4. Multiply each score in the X column by its pair in the Y column. Enter the product in
the XY column.
5. Add all the entries in each column to get the sum of (∑) for each column.
6. Note the number (N) of pairs of scores.
7. Substitute the values obtained in the formula.
The computation of the coefficient of correlation actually involves the mean and standard
deviations of each set of scores (X and Y), although this is not readily apparent in the above
formula. (Gronlund 1981) Thus, the formula can also be written as:
∑𝑿𝒀
− (𝑴𝑿 )(My)
𝒓= 𝑵
(𝑺𝑫𝑿 )(𝑺𝑫𝒚 )
Where:
𝑴𝑿 = mean of scores in column X
My = mean of scores in column Y
𝑺𝑫𝑿 = standard deviation of scores in column X.
𝑺𝑫𝒚 = standard deviation of scores in column Y.
Thus for these data:
𝟏𝟖𝟐𝟒
− (𝟏𝟓)(11)
𝒓= 𝟏𝟎
(𝟒. 𝟔𝟓)(𝟒. 𝟑𝟏)
𝒓 = 𝟎. 𝟖𝟕
Directions for computing the Pearson r from the deviations from the means:
1. Begin by writing the pairs of scores to be studied in each pupil is in the same row. Label
one set of scores X the other Y.
2. Get the sums (∑) of the scores in each column. Divide the sum by the number of scores
(N) in each column to get the mean (M).
3. Subtract each score in column X from the mean x to get the deviation from the mean.
Enter the result under column x. Be sure to write the sign.
4. Subtract each score in column Y from the mean y to get its deviation from the mean .
Enter the result under column y.
5. Square each entry in x. Write the result in column x2.
6. Square each entry in y. Write the result in column y2.
7. Multiply each entry in column x by each entry in column x by each entry in column y.
Enter the product in column xy.
8. Get the sum (∑) of all entries in xy, x2 , and y2.
9. Apply the formula.
Directions for computing the Pearson r from Standard Scores:
1. Begin by writing the pairs of scores to be studied in two columns. Be sure that the pair
of scores for each pupil is in the same row. Label one set of scores X, the other Y.
2. Get the sum (∑) of the scores for each column. Divide the sum by the number of scores
(N) in each column to get the mean (M).
3. Subtract each score in column X from the mean X. Write the difference I column x. Be
sure to put the algebraic signs.
4. Subtract each score in column Y from the mean y. Write the difference in column y.
Don’t forget the signs.
5. Steps 5 and 6 may be omitted if the standard deviation for each set of scores has been
previously computed. Square each score in column X. Enter each result under X 2. Then
apply the formula for finding the standard deviaton to find SD x.
6. Square each score in column Y. Enter each result under Y2. Then apply the formula for
finding the standard deviaton to find SDy.
7. Divide each entry in column x by the standard deviation SD x and enter the result
under Zx (standard score).
8. Divide each entry in column Y by the standard deviation SD y to get the standard
scores. Enter the result under Zy.
9. Multiply each Z score in Zx by Zy and enter the results under Zx Zy.
10. Get the sum of (∑) Zx Zy.
r value =
+.70 or higher Very strong positive relationship
+.40 to +.69 Strong positive relationship
+.30 to +.39 Moderate positive relationship
+.20 to +.29 weak positive relationship
+.01 to +.19 No or negligible relationship
0 No relationship [zero correlation]
-.01 to -.19 No or negligible relationship
-.20 to -.29 weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or higher Very strong negative relationship
Like the split-half method, the Richardson formula also yields a coefficient of internal
consistency. This formula is easy to apply when an item analysis of a test has been made.
Since the item analysis provides a difficulty measure if each test item, (or the percentage of
examinees who answer each test item correctly) the preparation of a worksheet for a Kuder-
Richardson solution can easily be prepared. Directions on how to prepare it follow.
Directions for computing reliability coefficient using the Kuder-Richardson Formula 20:
1. Begin by identifying test items through numbers.
2. Determine the percentage of examinees who answered each item correctly. Enter each
percentage under column P?
3.Subtract each percentage under column P from 1. Enter the result under column q.
4. Multiply each entry in column P by each entry in column q. Enter the result under column
Pq.
5. Add all the entries in column Pq to obtain the sum of Pq.(∑ 𝑷𝒒 )
6. Substitute obtained values in the formula
Test Scores are useless unless given meaning. The following can be done to give
meaning to a set of scores.
a. Scores can be arranged into a frequency distribution or plotted in a histogram.
b. To represent the middle of the group, the median (the 50 th percentile) or the
arithmetic mean (common average) and the mode can be computed.
c. To represent the spread of scores, statisticians have developed the semi-interquartile
range, half the distance between the 25th and 75th percentile and the standard deviation, a
type of average of the deviations of the scores away from the average.
d. The individual score takes on meaning as it is a translated into percentile rank, i.e.,
the percentage of the group he surpassed, or into a standard score. i.e., his position in the
group in terms of the number of standard deviations above or below the mean.
e. A measure of relationship is given by the correlation coefficient, a numerical index of
“going togetherness”. This index is important in describing the prescription or reliability of a
test and in describing the accuracy with which a test score predicts some other factor such as
school grades or job success.