EDUC 203 - Assessment in Learning 2 Module (Midterm)
EDUC 203 - Assessment in Learning 2 Module (Midterm)
EDUC 203 - Assessment in Learning 2 Module (Midterm)
of the Philippines
1|Page
2|Page
Assessment in Learning 2
EDUC 203
Course Outline
I. Basic Concepts in Assessment
II. Principles of High Quality Assessment
III. Measures of Central Tendency and Variability
IV. Performance-based Assessment
V. Assessment in the Affective Domain
VI. Portfolio Assessment Methods
VII. Educational Evaluation
VIII. Grading and Reporting
4|Page
Chapter 1 – Basic Concepts in Assessment
At the end of this chapter, the students will be able to:
b. Assessment OF learning
It is usually given at the end of a unit, grading period or a term like a semester. It
is meant to assess learning for grading purposes, thus the term assessment of
learning.
c. Assessment AS learning
It is associated with self-assessment. As the term implies, assessment by itself is
already a form of learning for the students.
As students assess their own work (e.g. a paragraph) and/or with their peers with
the use of scoring rubrics, they learn on their own what a good paragraph is. At
the same, as they are engaged in self-assessment, they learn about themselves
5|Page
as learners become aware of how they learn. In short, in assessment AS learning,
students set their targets, actively monitor and evaluate their own learning in
relation to their set target. As a consequence, they become self-directed or
independent learners. By assessing their own learning, they are learning at the
same time.
Assessment AS
learning
Assessment FOR Self-assessment
learning Assessment OF learning
Placement assessment Summative assessment
Diagnostic assessment
Formative assessment
ASSESSMENT
6|Page
Exercises
A. Determine whether the following statements are test, measurement, assessment
or evaluation.
1. Over-all goal is to provide information regarding the extent of attainment of
student learning outcomes.
2. Uses such instruments as ruler, scale, or thermometer.
3. Process designed to aid educators make judgment and indicates solutions to
academic situations.
4. Results show the more permanent learning and clear picture of student‘s
ability.
5. Instrument to gather data
B. ―All tests are forms of assessment, but not all assessments are test.‖ Do you
agree? Why or why not?
C. Assessment for learning is ―when the cook tastes the food‖ while assessment of
learning is ―when the guest tastes the food.‖ Do you agree? Why or why not?
D. List down three (3) activities or processes involved in each of the following:
1. Measurement
(a)
(b)
(c)
2. Assessment
(a)
(b)
(c)
3. Evaluation
(a)
(b)
(c)
7|Page
Chapter 2 – Principles of High Quality Assessment
At the end of this chapter, the students will be able to:
Level 3. Application refers to the transfer of knowledge from one field of study to another or
from one concept to another concept in the same discipline.
Level 4. Analysis refers to the breaking down of a concept or idea into its components and
explaining the concept as a composition of these concepts.
Level 5. Synthesis refers to the opposite of analysis and entails putting together the
components in order to summarize the concept.
Level 6. Evaluation refers to valuing and judgment or putting worth to a concept or principle.
8|Page
2.1.2 Skills, Competencies and Abilities Targets
Skills refer to specific activities or tasks that a student can proficiently do. Skills can be
clustered together to form specific competencies. Related competencies characterize a
student‘s ability. It is important to recognize a student‘s ability in order that the program of
study can be so designed as to optimize his/her innate abilities.
Abilities can be roughly categorized into: cognitive, psychomotor and affective abilities. For
instance, the ability to work well with others and to be trusted by every classmate (affective
ability) is an indication that the student can most likely succeed in work that requires
leadership abilities. On the other hand, other students are better at doing things alone like
programming and web designing (cognitive ability) and, therefore, they would be good at
highly technical individualized work.
10 | P a g e
A self-checklist is a list of several characteristics or activities presented to the subjects of
a study. The individuals are asked to study the list and then to place a mark opposite the
characteristics which they possess or the activities which they have engaged in for a
particular length of time. Self-checklists are often employed by teachers when they want to
diagnose or to appraise the performance of students from the point of view of the students
themselves.
Observation and self-reports are useful supplementary assessment methods when used in
conjunction with oral questioning and performance tests. Such methods can offset the
negative impact on the students brought about by their fears and anxieties during oral
questioning or when performing actual task under observation. However, since there is a
tendency to overestimate one‘s capabilities, it may be useful to consider weighing self-
assessment and observational reports against the results of oral questioning and
performance tests.
2.3.1 Validity
12 | P a g e
Reliability and validity are related concepts. If an instrument is unreliable, it cannot yield
valid outcomes. As reliability improves, validity may also improve (or not) however, if an
instrument is shown scientifically to be valid then it is almost certain that it is also reliable.
Something reliable is something that works well and that you can trust.
A reliable test is a consistent measure of what it is supposed to measure.
The following table is a standard followed almost universally in educational test and
measurement.
Reliability Interpretation
0.90 and above Excellent reliability; at the level of the best standardized tests
0.80 – 0.90 Very good for a classroom test
0.70 – 0.80 Good for a classroom test; in the range of most. There are
probably a few items which could be improved
0.60 – 0.70 Somewhat low. This test needs to be supplemented by other
measures (more tests) to determine grades. There are probably
some items which could be improved
0.50 – 0.60 Suggests need for revision of test, unless it is quite short (ten or
fewer items). The test definitely needs to be supplemented by
other measures (more tests) for grading
13 | P a g e
0.50 or below Questionable reliability. This test should not contribute heavily to
the course grade and it needs revision
14 | P a g e
characteristic, but not for a test of mood, since mood fluctuates over time, or a test of
creativity, which might be affected by previous exposure to test items.
b. Alternate (Equivalent, Parallel) Forms Reliability
To assess a test's alternate forms reliability, two equivalent forms of the test are
administered to the same group of examinees and the two sets of scores are correlated.
Alternate forms reliability indicates the consistency of responding to different item samples
(the two test forms) and, when the forms are administered at different times, the
consistency of responding over time.
The alternate forms reliability coefficient is also called the coefficient of equivalence when
the two forms are administered at about the same time; The coefficient of equivalence and
stability when a relatively long period of time separates administration of the two forms.
The primary source of measurement error for alternate forms reliability is content sampling,
or error introduced by an interaction between different examinees' knowledge and the
different content assessed by the items included in the two forms (e.g.: Form A and Form
B). The items in Form A might be a better match of one examinee's knowledge than items
in Form B, while the opposite is true for another examinee.
In this situation, the two scores obtained by each examinee will differ, which will lower the
alternate forms reliability coefficient. When administration of the two forms is separated by
a period of time, time sampling factors also contribute to error.
Like test-retest reliability, alternate forms reliability is not appropriate when the attribute
measured by the test is likely to fluctuate over time (and the forms will be administered at
different times) or when scores are likely to be affected by repeated measurement.
If the same strategies required to solve problems on Form A are used to solve problems
on Form B, even if the problems on the two forms are not identical, there are likely to be
practice effects, when these effects differ for different examinees (i.e., are random),
practice will serve as a source of measurement error.
Although alternate forms reliability is considered by some experts to be the most rigorous
(and best) method for estimating reliability, it is not often assessed due to the difficulty in
developing forms that are truly equivalent.
c. Internal Consistency Reliability
Reliability can also be estimated by measuring the internal consistency of a test.
Split-half reliability and coefficient alpha are two methods for evaluating internal
consistency. Both involve administering the test once to a single group of examinees, and
15 | P a g e
both yield a reliability coefficient that is also known as the coefficient of internal
consistency.
To determine a test's split-half reliability, the test is split into equal halves so that each
examinee has two scores (one for each half of the test). Scores on the two halves are then
correlated. Tests can be split in several ways, but probably the most common way is to
divide the test on the basis of odd- versus even-numbered items.
A problem with the split-half method is that it produces a reliability coefficient that is based
on test scores that were derived from one-half of the entire length of the test. If a test
contains 30 items, each score is based on 15 items. Because reliability tends to decrease
as the length of a test decreases, the split-half reliability coefficient usually underestimates
a test's true reliability.
For this reason, the split-half reliability coefficient is ordinarily corrected using the
Spearman-Brown prophecy formula, which provides an estimate of what the reliability
coefficient would have been had it been based on the full length of the test.
Cronbach's coefficient alpha also involves administering the test once to a single group
of examinees. However, rather than splitting the test in half, a special formula is used to
determine the average degree of inter-item consistency.
One way to interpret coefficient alpha is as the average reliability that would be obtained
from all possible splits of the test. Coefficient alpha tends to be conservative and can be
considered the lower boundary of a test's reliability (Novick and Lewis, 1967).
When test items are scored dichotomously (right or wrong), a variation of coefficient alpha
known as the Kuder-Richardson Formula 20 (KR-20) can be used.
The Kuder-Richarson is the more frequently employed formula for determining internal
consistency, particularly KR20 and KR21. We present the latter formula (KR21) since
KR20 is more difficult to calculate and requires a computer program:
( )
16 | P a g e
A 30 item test was administered to a group of 30 students. The mean score was 25 while
the standard deviation was 3. Compute the KR21 index of reliability.
So,
( )
( )
Content sampling is a source of error for both split-half reliability and coefficient alpha.
For split-half reliability, content sampling refers to the error resulting from differences
between the content of the two halves of the test (i.e., the items included in one half may
better fit the knowledge of some examinees than items in the other half);
For coefficient alpha, content (item) sampling refers to differences between individual test
items rather than between test halves. Coefficient alpha also has as a source of error, the
heterogeneity of the content domain. A test is heterogeneous with regard to content
domain when its items measure several different domains of knowledge or behavior.
The greater the heterogeneity of the content domain, the lower the inter-item correlations
and the lower the magnitude of coefficient alpha.
Coefficient alpha could be expected to be smaller for a 200-item test that contains items
assessing knowledge of test construction, statistics, ethics, epidemiology, environmental
health, social and behavioral sciences, rehabilitation counseling, etc. than for a 200-item
test that contains questions on test construction only.
The methods for assessing internal consistency reliability are useful when a test is
designed to measure a single characteristic, when the characteristic measured by the test
fluctuates over time, or when scores are likely to be affected by repeated exposure to the
test. They are not appropriate for assessing the reliability of speed tests because, for these
tests, they tend to produce spuriously high coefficients. (For speed tests, alternate forms
reliability is usually the best choice.)
17 | P a g e
d. Inter-Rater (Inter-scorer, Inter-Observer) Reliability
Inter-rater reliability is of concern whenever test scores depend on a rater's judgment.
A test constructor would want to make sure that an essay test, a behavioral observation
scale, or a projective personality test have adequate inter-rater reliability. This type of
reliability is assessed either by calculating a correlation coefficient (e.g., a kappa
coefficient or coefficient of concordance) or by determining the percent agreement
between two or more raters.
Although the latter technique is frequently used, it can lead to erroneous conclusions since
it does not take into account the level of agreement that would have occurred by chance
alone. This is a particular problem for behavioral observation scales that require raters to
record the frequency of a specific behavior.
In this situation, the degree of chance agreement is high whenever the behavior has a high
rate of occurrence, and percent agreement will provide an inflated estimate of the
measure's reliability.
Sources of error for inter-rater reliability include factors related to the raters such as lack of
motivation and rater biases and characteristics of the measuring device.
An inter-rater reliability coefficient is likely to be low, for instance, when rating categories
are not exhaustive (i.e., don't include all possible responses or behaviors) and/or are not
mutually exclusive.
The inter-rater reliability of a behavioral rating scale can also be affected by consensual
observer drift, which occurs when two (or more) observers working together influence each
other's ratings so that they both assign ratings in a similarly idiosyncratic way. (Observer
drift can also affect a single observer's ratings when he or she assigns ratings in a
consistently deviant way.) Unlike other sources of error, consensual observer drift tends to
artificially inflate inter-rater reliability.
The reliability (and validity) of ratings can be improved in several ways:
Consensual observer drift can be eliminated by having raters work
independently or by alternating raters.
Rating accuracy is also improved when raters are told that their ratings will
be checked.
Overall, the best way to improve both inter- and intra-rater accuracy is to
provide raters with training that emphasizes the distinction between
observation and interpretation.
18 | P a g e
Factors that affect the Reliability Coefficient
The magnitude of the reliability coefficient is affected not only by the sources of error
discussed earlier, but also by the length of the test, the range of the test scores, and the
probability that the correct response to items can be selected by guessing.
a. Test Length - The larger the sample of the attribute being measured by a test, the less
the relative effects of measurement error and the more likely the sample will provide
dependable, consistent information.
Consequently, a general rule is that the longer the test, the larger the test's reliability
coefficient.
The Spearman-Brown prophecy formula is most associated with split-half reliability but can
actually be used whenever a test developer wants to estimate the effects of lengthening or
shortening a test on its reliability coefficient.
For instance, if a 100-item test has a reliability coefficient of .84, the Spearman-Brown
formula could be used to estimate the effects of increasing the number of items to 150 or
reducing the number to 50. A problem with the Spearman-Brown formula is that it does not
always yield an accurate estimate of reliability: In general, it tends to overestimate a test's
true reliability. This is most likely to be the case when the added items do not measure the
same content domain as the original items and/or are more susceptible to the effects of
measurement error.
Note that, when used to correct the split-half reliability coefficient, the situation is more
complex, and this generalization does not always apply: When the two halves are not
equivalent in terms of their means and standard deviations, the Spearman-Brown formula
may either over- or underestimate the test's actual reliability.
Where:
rKK = reliability of a test “k” times as long as the original test
r11 = reliability of the original test
K = factor by which the length of the test is changed. To find k, divide the number
of items on the new test by the number of items on the original. If you had 10 items
on the original and 20 on the new, k would be 20 / 10 = 2
19 | P a g e
Example:
A test made up of 12 items has reliability (r11) of 0.68. If the number of items is doubled to
24, will the reliability of the test improve?
Solution:
r11 = 0.68
k = 24 / 12 = 2
So,
20 | P a g e
2.3.3 Fairness
An assessment procedure needs to be fair. This means many things:
First, students need to know exactly what the learning targets are and what method of
assessment will be used. If students do not know what they are supposed to be achieving,
then they could get lost in the maze of concepts being discussed in class. Likewise,
students have to be informed how their progress will be assessed in order to allow them to
strategize and optimize their performance.
Test results and assessment results are confidential results. Such should be known only
by the student concerned and the teacher. Results should be communicated to the
students in such a way that other students would not be in possession of information
pertaining to any specific number of the class.
The third ethical issue in assessment is deception. Should students be deceived? There
are instances in which it is necessary to conceal the objective of the assessment from the
students in order to ensure fair and impartial results. When this is the case, the teacher
has a special responsibility to determine whether the use of such techniques is justified by
the educational value of the assessment, determine whether alternative procedures are
available that does not make use of concealment and ensure that students are provided
with sufficient explanation as soon as possible.
Finally, the temptation to assist certain individuals in class during assessment or testing is
ever present. In this case, it is best if the teacher does not administer the test himself if he
believes that such a concern way, at a later time, be considered unethical.
Exercises
A. Classify the cognitive objectives below in terms of Bloom’s taxonomy.
1. Identify the parts of a flower.
2. Enumerate the characteristics of a good test.
3. Determine the function of a predicate in a sentence.
4. Summarize the salient features of a good essay.
5. Use the concept of ratio and proportion in finding the height of a building.
6. Name the past presidents of the Philippines.
7. Determine the sufficiency of information given to solve a problem.
8. Identify the resulting product of a chemical reaction.
9. Select a course of action to be taken in the light of possible consequences.
22 | P a g e
10. Enumerate the parts of a cell.
B. A test may be reliable but not necessarily valid. Is it possible for a test to be valid
but not reliable? Discuss.
C. A 50 item test was administered to a group of 20 students. The mean score was 35
while standard deviation was 5.5. Compute the KR21 index of reliability.
D. Answer the following questions:
1. Ms. Plantilla developed an Achievement Test in Math for her grade three pupils. Before
she finalized the test she examined carefully if the test items were constructed based on
the competencies that have to be tested. What test of validity was she trying to establish?
a. Content-validity
b. Concurrent validity
c. Predictive validity
d. Construct validity
2. What type of validity does the Pre-board examination possess if its results can
explain how the students will likely perform in their licensure examination?
a. Concurrent
b. Predictive
c. Construct
d. Content
3. The students of Mrs. Valino are very noisy. To keep them busy, they were
given any test available in the classroom and then the results were graded as a
way to punish them. Which statement best explains if the practice is acceptable or
not?
a. The practice is acceptable because the students behaved well when
they were given a test.
b. The practice is not acceptable because it violates the principle of
reliability.
c. The practice is not acceptable because it violates the principle of
validity.
d. The practice is acceptable since the test results are graded.
4. Mr. Gringo tried to correlate the scores of his pupils in the Social studies test
with their grades in the same subject last 3rd quarter. What test validity is he trying
to establish?
a. Content validity
b. Construct validity
23 | P a g e
c. Concurrent validity
d. Criterion related validity
5. Which of the following situations may lower the validity of test?
a. Mrs. Josea increases the number of items measuring each specific skill
from three to five.
b. Mr. Santosa simplifies the language in the directions for the test.
c. Ms. Lopeza removes the items in the achievement test that everyone
would be able to answer correctly.
d. None of the above.
24 | P a g e
Chapter 3 – Measures of Central Tendency and Variability
At the end of this chapter, the students will be able to:
1. Explain the meaning and function of the measures of central tendency and
measures of dispersion/variability.
2. Distinguish among the measures of central tendency and measures of
variability/dispersion.
3. Explain the meaning of normal and skewed score distribution
4. Compute for the values of the different measures of central tendency and
measures of variability
3.1 Introduction
A measure of central tendency is a single value that attempts to describe a set of data
(like scores) by identifying the central position within that set of data or scores. As such,
measures of central tendency are sometimes called measures of central location.
Central Tendency refers to the center of a distribution of observations. Where do scores
tend to congregate? In a test of 100 items, where are most of the scores? Do they tend to
group around the mean score of 50 or 80?
There are three measures of central tendency – the mean, median and the mode.
Perhaps you are most familiar with the mean (often called the average). But there are two
other measures of central tendency, namely the median and the mode. Is there such a
thing as the best measure of central tendency?
If the measures of central tendency indicate where scores congregate, the measure of
variability indicate how spread out a group of scores is or how varied the scores are or
how far they are from the mean. Common measures of dispersion or variability are range,
variance and standard deviation.
The mean, median and mode are valid measures of central tendency but under different
conditions, one measure becomes more appropriate than the others. For example, if the
25 | P a g e
scores are extremely high and extremely low, the median is a better measure of central
tendency since the mean is affected by extremely high and extremely low scores.
Mean
The mean or average or arithmetic mean is the most popular and most well-known
measure of central tendency. The mean is equal to the sum of all the values in the data set
divided by the number of values in the data set. For example, 10 students in a Graduate
School class got the following scores in a 100-item test: 70, 72, 75, 77, 78, 80, 84, 87, 90
and 92.
The mean score of the group of 10 students is the sum of all their scores divided by 10.
The mean, therefore, is 805/10 equals 80.5.
80.5 is the average score of the group. There are 6 scores below the average score
(mean) of the group (70, 72, 75, 77, 78 and 80) and there are 4 scores above the mean of
the group (84, 87, 90 and 92).
The mean has one main disadvantage. It is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by being
especially small or large in numerical value.
For example, consider the scores of 10 Grade 12 students in a 100-item Statistics test
below:
5 38 56 60 67 70 73 78 79 95
The mean score for these ten Grade 12 students is 62.1. However, inspecting the raw data
suggests that this mean may not be the best way to accurately reflect the score of the
typical Grade 12 student as most students have scores in the 5 to 95 range. The mean is
being skewed by the extremely low and extremely high scores. Therefore, in this situation,
we would like to have a better measure of central tendency. As we will find out later, taking
the median would be a better measure of central tendency in this situation.
26 | P a g e
Median
The median is the middle score for a set of scores arranged from lowest to highest. The
mean is less affected by extremely low and extremely high scores.
65 55 89 56 35 14 56 55 87 45 92
To determine the median, first we have to rearrange the scores into order of magnitude
(from smallest to largest).
14 35 45 55 55 56 56 65 87 89 92
Our median is the score at the middle of the distribution. In this case 56 is the middle
score. There are 5 scores before it and 5 scores after it. This works fine when you have an
odd number of scores, but what happens when you have an even number of scores? What
of you have 10 scores like the scores below?
65 55 89 56 35 14 56 55 87 45
Arrange that data according to order of magnitude (from smallest to largest) then take the
two middle scores (55 and 56) and compute the average of the two scores. The median is
55.5. This gives us a more reliable picture of the tendency of the scores.
Mode
This is the simplest both in concept and in application. By definition, the mode is referred
to as the most frequent value in the distribution. We shall use the symbol ̂ (read as –x
hat) to represent the mode.
14 35 45 55 55 56 56 56 65 84 89
27 | P a g e
There are two most frequent scores 55 and 56 so we have a score distribution with two
modes, hence a bimodal distribution.
Mean
where
To be able to apply the formula for the mean of a grouped data, we shall follow the step
below:
28 | P a g e
Median
Just like the mean, the computation of the value of the median is done through
interpolation. The procedure requires the construction of the less than cumulative
frequency column ( ). The first step in finding the value of the median is to divide the
total number of frequencies by 2. This is consistent with the definition of the median. The
value shall be used to determine the cumulative frequency before the median class
denoted by . refers to the highest value under the column that is less than
. The median class refers to the interval that contains the median, that is, where the
value is located. Hence, among the entries under the column which are greater than
, the smallest shall be the frequency of the median class. If a distribution contains an
interval where the cumulative frequency is exactly , the upper boundary of that class will
be the median and no interpolation is needed.
After identifying the median class, we shall approximate the position of the median within
the median class. This approximation shall be done by subtracting the value of from
. Then, the difference is divided by the frequency of the median class times the size of
the class interval. The result is then added to the lower boundary of the median class to
get the median of the distribution.
The computing formula for the median for grouped data is given below.
̃ ( )
where
To be able to apply the formula for the median for grouped data, we shall follow the steps
below:
29 | P a g e
Step 1. Get .
Step 2. Determine the value of .
Step 3. Determine the median class.
Step 4. Determine the lower boundary and the frequency of the median class and the size
of the class interval.
Step 5. Substitute the values obtained in Steps 1 – 4 to the formula. Round off the final
result to two decimal places
Mode
In the computation of the value of the mode for grouped data, it is necessary to identify the
class interval that contains the mode. This interval, called the modal class, contains the
highest frequency in the distribution. The next step after getting the modal class is to
determine the mode within the class. This value may be approximated by getting the
differences of the frequency of the modal to the frequency before and to the frequency
after the modal class. If we let be the difference of the frequency of the modal class and
the frequency of the interval preceding the modal class and be the difference of the
frequency of the modal class and the frequency of the interval after the modal class, then
the mode within the class shall be approximated using the expression:
( )
If this expression is added to the lower boundary of the modal class, then we can come up
with the computing formula for the value of the mode for grouped data. The formula is:
̂ ( )
To be able to apply the formula for the mode for grouped data, we shall consider the
following steps:
30 | P a g e
Step 1. Determine the modal class
Step 2. Get the value of
Step 3. Get the value of
Step 4. Get the lower boundary of the modal class
Step 5. Apply the formula by substituting the values obtained in the preceding steps
Try this!
Scores
To be able to compute the value of the mean, we shall follow the steps discussed earlier.
Step 1. Get the midpoint of each class. The midpoints are shown in the 3 rd column.
Scores
11–22 3 16.5
23–34 5 28.5
35–46 11 40.5
47–58 19 52.5
59–70 14 64.5
71–82 6 76.5
83–94 2 88.5
31 | P a g e
Step 2. Multiply each midpoint by its corresponding frequency. The products are shown in
the 4th column.
Scores
11–22 3 16.5 49.5
23–34 5 28.5 142.5
35–46 11 40.5 445.5
47–58 19 52.5 997.5
59–70 14 64.5 903
71–82 6 76.5 459
83–94 2 88.5 177
Scores
11–22 3 16.5 49.5
23–34 5 28.5 142.5
35–46 11 40.5 445.5
47–58 19 52.5 997.5
59–70 14 64.5 903
71–82 6 76.5 459
83–94 2 88.5 177
Step 4. Divide the result in Step 3 by the sample size. The result is the mean of the
distribution. Hence,
To compute for the median, we shall construct the less than cumulative frequency column.
We can use the existing table when we solved for the mean.
32 | P a g e
Scores
11–22 3 16.5 49.5 3
23–34 5 28.5 142.5 8
35–46 11 40.5 445.5 19
47–58 19 52.5 997.5 38 Median class
59–70 14 64.5 903 52
71–82 6 76.5 459 58
83–94 2 88.5 177 60
Step 1.
Step 2.
Step 4.
Step 5.
̃ ( )
̃ ( )
To compute for the mode, we can still use the existing table.
Scores
11–22 3 16.5 49.5 3
23–34 5 28.5 142.5 8
35–46 11 40.5 445.5 19
33 | P a g e
47–58 19 52.5 997.5 38 Modal class
59–70 14 64.5 903 52
71–82 6 76.5 459 58
83–94 2 88.5 177 60
̂ ( )
̂ ( )
3.2.3 Comparison
Although there are many types of averages, the three measures that were discussed are
considered the simplest and the most important of all.
In the case of the mean, the following are some of the observations that can be made.
a) The mean always exists in any distribution. This implies that for any set of data,
the mean can always be computed
b) The value of the mean in any distribution is unique. This implies that for any
distribution, there is only one possible value of the mean
c) In the computation for this measure, it takes into consideration all the values in
the distribution
In the case of the median, we have the following observations.
a) Like the mean, the median also exists in any distribution
b) The value of the median is also unique
c) This is a positional measure
34 | P a g e
For the third measure, the mode has the following characteristics.
a) It does not always exist
b) If the mode exists, it is not always unique
c) In determining the value of the mode, it does not take into account all the values
in the distribution
Skewness
Of the three measures of central tendency, the mean is considered the most important.
Since all values are considered in the computation, it can be used in higher statistical
treatment.
There are some instances, however, when the mean is not a good representative of a set
of data. This happens when a set of data contains extreme values either to the left or to
the right of the average. In this situation, the value of the mean is pulled to the direction of
these extreme values. Thus, the median should be used instead.
When a set of data is symmetric or normally distributed, the three measures are
identical or approximately equal. When the distribution is skewed, that is, either negatively
or positively skewed, the three averages diverge. In any case, however, the value of the
median will always be between the mode and the mean.
A set of data is said to be positively skewed when the graph of the distribution has a
longer tail to the right. The data is said to be negatively skewed when the longer tail is
at the left.
35 | P a g e
3.3 Measures of Variability
The measures of central tendency discussed earlier simply approximate the central value
of the distribution but such descriptions are not enough to be able to adequately describe
the characteristics of a set of data. Hence, there is a need to consider how the values are
scattered on either side of the center. Values used to determine the scatter of values in a
distribution are called measures of variation. We will discuss in this part the range, the
variance and the standard deviation.
3.3.1 Range
Among the measure of variation, the range is considered the simplest. Earlier, we defined
the range as the difference between the highest and the lowest value in the distribution.
For example, if the lowest value in the distribution is 12 and the highest value is 125, then
the range is the difference between 125 and 12 which is 113. In symbols, if we let R be the
range, then
R=H–L
Where H – represents the highest value
L – represents the lowest value
In the case of grouped data, the difference between the highest upper class boundary and
the lowest lower class boundary is considered the range. The rationale is that the class
boundaries are considered the true limits.
The range, of course has some disadvantages. First, this value is always affected by
extreme values. Second, in the process of computing the value of the range, not all values
are considered. Thus, the range does not consider the variation of the items relative to the
central value of the distribution.
3.3.2 Variance
Variability can also be defined in terms of how close the scores in the distribution are to the
middle of the distribution. Using the mean as the measure of the middle of the distribution,
the variance is defined as the average squared difference of the scores from the mean.
The formula for variance (s2) is given below
∑ ̅
36 | P a g e
where
– midpoint of each class interval
̅ – mean
– sample size
To be able to apply the formula for the variance, we shall consider the steps below
Step 1. Compute the value of the mean
Step 2. Determine the deviation – by subtracting the mean from the midpoint
of each class interval
Step 3. Square the deviations obtained in Step 2
Step 4. Multiply the frequencies by their corresponding squared deviations
Step 5. Add the results in Step 4
Step 6. Divide the result in Step 5 by the sample size
∑ ̅
√ √
or simply, the standard deviation is just the square root of the variance.
Try this!
Compute the Range, Variance and Standard Deviation of the example given earlier
(Computation of Measures of Central Tendency).
37 | P a g e
Range
R = H – L = 94 – 11 = 83
Variance
First, we will reproduce the frequency distribution. Applying the steps stated before, we
have
Scores – – –
11 – 22 3 16.5 49.5 -36.4 1324.96 3974.88
23 – 34 5 28.5 142.5 -24.4 595.36 2976.80
35 – 46 11 40.5 445.5 -12.4 153.76 1691.36
47 – 58 19 52.5 997.5 -0.4 0.16 3.04
59 – 70 14 64.5 903.0 11.6 134.56 1883.84
71 – 82 6 76.5 459.0 23.6 556.96 3341.76
∑ ( )
= = 273.44
Standard Deviation
It is just the square root of the variance so,
σ=√
σ=√
σ = 16.54
38 | P a g e
3.3.4 Sample Variance and Sample Standard Deviation
Sometimes, our data are only a sample of the whole population.
Example: Sam has 20 rose bushes, but only counted the flowers on 6 of them.
The population is all 20 rose bushes, and the sample is the 6 bushes that Sam counted
among the 20. Let us say that Sam‘s flower counts are 9, 4, 6, 13, 18 and 13, we can still
estimate the Variance and Standard Deviation.
When we use the sample as an estimate of the whole population, The formula for the
variance will change to:
∑ ( )
s2 =
And the Standard deviation formula is
∑ ( )
s =√
Just remember that Standard Deviation will always be the square root of the Variance.
The important change in the formula is ―n-1‖ instead of ―n‖ (which is called Bessel‘s
correction) but it does not affect the calculations. The symbol will also change to reflect
that we are working on a sample instead of the whole population. (σ will be changed to s
when using the sample SD)
Why take a sample?
Mostly because it is easier and cheaper. Imagine you want to know what the whole
university thinks. You cannot ask thousands of people, so instead you may ask maybe
only 300 people. Samuel Johnson once said ―You don‘t have to eat the whole ox to know
that the meat is tough‖.
More notes on Standard Deviation
The Standard Deviation is simply the square root of the variance. It is an especially useful
measure of variability when the distribution is normal or approximately normal because the
proportion of the distribution within a given number of standard deviations from the mean
can be calculated.
For example. 68% of the distribution is within one standard deviation of the mean and
approximately 95% of the distribution is within two standard deviations of the mean.
Therefore, if you have a normal distribution with a mean of 50 and a standard deviation of
10, then 68% of the distribution would be between 50 – 10 = 40 and 50 + 10 = 60.
39 | P a g e
Similarly, about 95% of the distribution would be between 50 – (2 x 10) = 30 and 50 + (2 x
10) = 70. The symbol for the population standard deviation is σ.
Standard deviation is a measure of dispersion, the more dispersed the data, the less
consistent the data are. A lower standard deviation means that the data are more clustered
around the mean and hence the data set is more consistent.
Exercises
Find the mean, median, mode, range and standard deviation of the table below. Determine
also whether is normally distributed, positively skewed or negatively skewed.
Scores
40 | P a g e
Chapter 4 – Performance-based Assessment
At the end of this chapter, the students will be able to:
Simpson (1972) built this taxonomy on the work of Bloom and others:
41 | P a g e
Articulation - Two or more skills combined, sequenced, and performed
consistently.
Naturalization - Two or more skills combined, sequenced, and performed
consistently and with ease. The performance is automatic with little physical or
mental exertion.
Information about outcomes is of high importance; where students ―end up‖ matters
greatly. But to improve outcomes, we need to know about student experience along the
way – about the curricula, teaching, and kind of student effort that lead to particular
outcomes.
Assessment can help us understand which students learn best under what conditions; with
such knowledge comes the capacity to improve the whole of their learning. Process-
oriented performance-based assessment is concerned with the actual task
performance rather than the output or product of the activity.
Learning Competencies
Objectives: The activity aims to enable the students to recite a poem entitled ―The Raven‖
by Edgar Allan Poe. Specifically‖
Notice that the objective starts with a general statement of what is expected of the student
from the task and then breaks down the general objective into easily observable behaviors
when reciting a poem. The specific objectives identified constituted the learning
competencies for this particular task. As in the statement of objectives using Bloom‘s
taxonomy, the specific objectives also range from simple observable processes to more
complex observable processes e.g. creating an ambiance of the poem through appropriate
rising and falling intonation. A competency is said to be more complex when it consists of
two or more skills.
Recite a poem with feeling using appropriate voice quality, facial expressions and
hand gestures;
Construct an equilateral triangle given three non-collinear points;
Draw and color a leaf with green crayon.
Learning tasks need to be carefully planned. In particular, the teacher must ensure that the
particular learning process to be observed contributes to the overall understanding of the
subject or course. Some generally accepted standards for designing a task include:
43 | P a g e
Identifying an activity that would entail more or less the same sets of
competencies. If an activity would result in too many possible competencies then
the teacher would have difficulty assessing the student‘s competency on the task.
Finding a task that would be interesting and enjoyable for the students. Tasks
such as writing an essay are often boring and cumbersome for the students.
For example:
Bring the students to a pond or creek. Ask them to find all living organisms as they can find
living near the pond or creek. Also, bring them to the school playground to find as many
living organisms as they can. Observe how the students will develop a system for finding
such organisms, classifying the organisms and concluding the differences in biological
diversity of the two sites.
Rubric is a scoring scale used to assess student performance along a task-specific set of
criteria. Authentic assessments are typically criterion-referenced measures, that is, a
student‘s aptitude on a task is determined by matching the student‘s performance against
a set of criteria to determine the degree to which the student‘s performance meets the
criteria for the task. To measure student performance against a pre-determined set of
criteria, a rubric, or scoring scale which contains the essential criteria for the task and
appropriate levels of performance for each criterion is typically created. For example, the
following rubric covers the recitation portion of a task in English.
44 | P a g e
Recitation Rubric
Criteria
Number of Appropriate
Hand Gestures
As in the above example, a rubric is comprised of two components: criteria and levels of
performance. Each rubric has at least two criteria and at least two levels of performance.
The criteria, characteristics of good performance on a task, are listed in the left column in
the rubric above. Actually, as is common in rubrics, a shorthand is used for each criterion
to make it fit easily into the table. The full criteria are statements of performance such as
―include a sufficient number of hand gestures‖ and ―recitation captures the ambiance
through appropriate feelings and tone in the voice‖.
For each criterion, the evaluator applying the rubric can determine to what degree the
student has met the criterion, i.e., the level of performance. In the above rubric, there are
three levels of performance for each criterion. For example, the recitation can contain lots
of inappropriate, few inappropriate or no inappropriate hand gestures.
45 | P a g e
Finally, the rubric above contains a mechanism for assigning a score to each project. In
the second-to-left column a weight is assigned in each criterion. Students can receive 1, 2,
or 3 points for ―number of sources.‖ But appropriate ambiance, more important in the
teacher‘s mind, is weighted three times ( as heavily. So, students can receive 3, 6, or
9 points (i.e., 1, 2, or 3 times 3) for the level of appropriateness in this task.
Descriptors
The above rubric includes another common, but not a necessary, component of rubrics —
descriptors. Descriptors spell out what is expected of students at each level of
performance for each criterion. In the above example, ―lots of historical inappropriate hand
gestures,‖ ―monotone voice used‖ are descriptors. A descriptor tells students more
precisely what performance looks like at each level and how their work may be
distinguished from the work of others for each criterion. Similarly, the descriptors help the
teacher more precisely and consistently distinguish between student works.
1. Clearer expectations
2. More consistent and objective assessment
3. Better feedback
For a particular task you assign students, do you want to be able to assess how well the
students perform on each criterion, or do you want to get a more global picture of the
students‘ performance on the entire task? The answer to that question is likely to
determine the type of rubric you choose to create or use — analytic or holistic.
46 | P a g e
Analytic rubric Holistic rubric
Recitation Rubric
3 – Excellent Speaker
47 | P a g e
How many levels of performance should a teacher include in his/her rubric?
There is no specific number of levels a rubric should or should not possess. It will vary
depending on the task and your needs. A rubric can have as few as two levels of
performance as long that it is appropriate. Also, it is not true that there must be an even
number or an odd number of levels. Again, that will depend on the situation.
Generally, it is better to start with a smaller number of levels of performance for a criterion
and then expand, if necessary. Making distinctions in student performance across two or
three broad categories is difficult enough. As the number of levels increases and those
judgments become finer and finer, the likelihood of error increases. Thus, start small. For
example, in an oral presentation rubric, amount of eye contact might be an important
criterion. Performance on that criterion could be judged along three levels of performance:
never, sometimes, always.
Although these three levels may not capture all the variation in student performance on the
criterion, it may be sufficient discrimination for your purposes. Or, at the least, it is a place
to start. Upon applying the three levels of performance, you might discover that you can
effectively group your students‘ performance in these three categories. Furthermore, you
might discover that the labels of never, sometimes and always sufficiently communicate to
your students the degree to which they can improve on making eye contact.
On the other hand, after applying the rubric you might discover that you cannot effectively
discriminate among student performance with just three levels of performance. Perhaps, in
your view, many students fall in between never and sometimes, or between sometimes
and always, or neither label accurately captures their performance. So, at this point, you
may decide to expand the number of levels of performance to include never, rarely,
sometimes, usually and always.
48 | P a g e
Makes eye
never rarely sometimes usually always
contact
There is no ―right‖ answer as to how many levels of performance there should be for a
criterion in an analytic rubric; that will depend on the nature of the task assigned, the
criteria being evaluated, the students involved and your purposes and preferences. For
example, another teacher might decide to leave off the ―always‖ level in the above rubric
because ―usually‖ is as much as normally can be expected or even wanted in some
instances. Thus, the ―makes eye contact‖ portion of the rubric for that teacher might be:
Makes eye
never rarely sometimes usually
contact
Exercises 4.2
A. For each of the following tasks, identify at least three (3) process-oriented learning
competencies:
1. Constructing an angle bisector using a straight edge and a compass
2. Constructing three-dimensional models of solids from cardboards
3. Role playing to illustrate the concept of Filipino family values
B. Choose any five activities below and then construct your own scoring rubrics.
1. Use evidence to solve a mystery.
2. Devise a game.
3. Participate in a debate.
4. Infer the main idea of a written piece.
49 | P a g e
5. Draw a picture that illustrates what‘s described in a story or article. Explain
what you have drawn, using details from the story or article.
6. Write a research paper.
7. Apply a scoring rubric to a real or simulated piece of student work.
8. Write an outline of a text or oral report.
9. Propose and justify a way to resolve a problem.
10. Design a museum exhibit.
11. Develop a classification scheme for something and explain and justify the
categories.
12. Justify one point of view on an issue and then justify the opposing view.
13. Given background information, predict what will happen if ____________.
14. Evaluate the quality of a writer‘s arguments.
15. Draw conclusions from a text.
50 | P a g e
performance of making that product. It is concerned on the product and not on the
process. It also focuses on achievement of the learner.
The learning competencies associated with products or outputs are linked with an
assessment of the level of ―expertise‖ manifested by the product. Thus, product-oriented
learning competencies target at least three (3) levels: novice or beginner‘s level, skilled
level, and expert level. Such levels correspond to Bloom‘s taxonomy in the cognitive
domain in that they represent progressively higher levels of complexity in the thinking
processes.
Level 1: Does the finished product or project illustrate the minimum expected parts
or functions? (Beginner)
Level 2: Does the finished product or project contain additional parts and functions
on top of the minimum requirements which tend to enhance the final output?
(Skilled level)
Level 3: Does the finished product contain the basic minimum parts and functions,
have additional features on top of the minimum, and is aesthetically pleasing?
(Expert level)
51 | P a g e
Examples:
The desired product is a scrapbook illustrating the historical event called EDSA I People
Power
1. Contain pictures, newspaper clippings and other illustrations for the main
characters of EDSA I People Power namely: Corazon C. Aquino, Fidel V. Ramos,
Juan Ponce Enrile, Ferdinand E. Marcos, Cardinal Sin. – (minimum specifications)
2. Contain remarks and captions for the illustrations made by the student himself for
the roles played by the characters of EDSA 1 People Power – (skilled level)
3. Be presentable, complete, informative and pleasing to the reader of the
scrapbook. – (expert level)
52 | P a g e
Performance-based assessment for products and projects can also be used for
assessing outputs of short-term tasks such as the one illustrated below for outputs in a
typing class:
a. Complexity. The level of complexity of the project needs to be within the range of
ability of the students. Projects that are too simple tend to be uninteresting for the
students while projects that are too complicated will most likely frustrate them.
b. Appeal. The project or activity must be appealing to the students. It should be
interesting enough so that students are encouraged to pursue the task to
completion. It should lead to self-discovery of information by the students.
53 | P a g e
c. Creativity. The project needs to encourage students to exercise creativity and
divergent thinking. Given the same set of materials and project inputs, how does
one best present the project? It should lead the students into exploring the various
possible ways of presenting the final output.
d. Goal-Based. Finally, the teacher must bear in mind that the project is produced in
order to attain a learning objective. Thus, projects are assigned to students not just
for the sake of producing something but for the purpose of reinforcing learning.
Scoring rubrics are descriptive scoring schemes that are developed by teachers
or other evaluators to guide the analysis of the products or processes of students‘ efforts
(Brookhart, 1999). Scoring rubrics are typically employed when a judgment of quality is
required and may be used to evaluate a broad range of subjects and activities. For
instance, scoring rubrics can be most useful in grading essays or in evaluating projects
such as scrapbooks. Judgments concerning the quality of a given writing sample may vary
depending upon the criteria established by the individual evaluator. One evaluator may
heavily weigh the evaluation process upon the linguistic structure, while another evaluator
may be more interested in the persuasiveness of the argument. A high quality essay is
likely to have a combination of these and other factors. By developing a pre-defined
scheme for the evaluation process, the subjectivity involved in evaluating an essay
becomes more objective.
Criteria Setting
The criteria for a scoring rubrics are statements which identify ―what really counts‖ in
the final output. The following are the most often used major criteria for product
assessment:
Quality
54 | P a g e
Creativity
Comprehensiveness
Accuracy
Aesthetics
From the major criteria, the next task is to identify substatements that would make the
major criteria more focused and objective. For instance, if we were scoring an essay on:
―Three Hundred Years of Spanish Rule in the Philippines‖, the major criterion ―Quality‖ may
possess the following substatements:
The example below displays a scoring rubric that was developed to aid in the evaluation of
essays written by college students in the classroom (based loosely on Leydens &
Thompson, 1997). The scoring rubrics in this particular example exemplify what is called a
―holistic scoring rubric‖. It will be noted that each score category describes the
characteristics of a response that would receive the respective score. Describing the
characteristics of responses within evaluators would assign the same score to a given
response. In effect, this increases the objectivity of the assessment procedure using
rubrics. In the language of test and measurement, we are actually increasing the ―inter-
rater reliability‖.
The document can be easily followed. A combination of the following are apparent
in the document:
1. Effective transitions are used throughout.
2. A professional format is used.
55 | P a g e
3. The graphics are descriptive and clearly support the document‘s purpose.
The document is clear and concise and appropriate grammar is used throughout.
*Adequate
The document can be easily followed. A combination of the following are apparent
in the document:
1. Basic transitions are used.
2. A structured format is used.
3. Some supporting graphics are provided, but are not clearly explained
The document contains minimal distractions that appear in a combination of the
following forms:
1. Flow in thought
2. Graphical presentations
3. Grammar/mechanics
*Needs Improvement
Grading essays is just one example of performances that may be evaluated using
scoring rubrics. There are many other instances in which scoring rubrics may be used
successfully: evaluate group activities, extended projects and oral presentations. Also,
56 | P a g e
rubrics scoring cuts across disciplines and subject matter for they are equally appropriate
to the English, Mathematics and Science classrooms. Where and when a scoring rubric is
used does not depend on the grade level or subject, but rather on the purpose of the
assessment.
Other Methods
Authentic assessment schemes apart from scoring rubrics exist in the arsenal of a
teacher. For example, checklists may be used rather than scoring rubrics in the evaluation
of essays. Checklists enumerate a set of desirable characteristics which are actually
observed. As such, checklists are an appropriate choice for evaluation when the
information that is sought is limited to the determination of whether specific criteria have
been met. On the other hand, scoring rubrics are based on descriptive scales and support
the evaluation of the extent to which criteria have been met.
The ultimate consideration in using a scoring rubric for assessment is really the
―purpose of the assessment.‖ Scoring rubrics provide at least two benefits in the evaluation
process. First, they support the examination of the extent to which the specified criteria
have been reached. Second, they provide feedback to students concerning how to
improve their performances. If these benefits are consistent with the purpose of the
assessment, then a scoring rubric is likely to be an appropriate evaluation technique.
In the development of scoring rubrics, it is well to bear in mind that it can be used
to assess or evaluate specific tasks or general or broad category of tasks. For instance,
suppose that we are interested in assessing the student‘s oral communication skills. Then,
a general scoring rubric may be developed and used to evaluate each of the oral
presentations given by that student. After each such oral presentation of the students, the
general scoring rubrics is shown to the students which then allows them to improve on
their previous performances. Scoring rubrics have this advantage of instantaneously
providing a mechanism for immediate feedback.
In contrast, suppose now that the main purpose of the oral presentation is to determine the
students‘ knowledge of the facts surrounding the EDSA I revolution, then perhaps a
57 | P a g e
specific scoring rubrics would be necessary. A general scoring rubric for evaluating a
sequence of presentations may not be adequate since, in general, events such as EDSA I
(and EDSA II) differ on the surrounding factors (what caused the revolutions) and the
ultimate outcomes of these events. Thus, to evaluate the students‘ knowledge of these
events, it will be necessary to develop specific rubrics scoring guide for each presentation.
The development of scoring rubrics goes through a process. The first step in the
process entails the identification of the qualities and attributes that the teacher wishes
to observe in the students‘ outputs that would demonstrate their level of proficiency
(Brookhart, 1999). These qualities and attributes from the top level of the scoring criteria
for the rubrics. Once done, a decision has to be made whether a holistic or an analytic
rubric would be more appropriate. In an analytic scoring rubric, each criterion is
considered one by one and the descriptions of the scoring levels are made separately.
This will then result in separate descriptive scoring schemes for each of the criterion or
scoring factor. On the other hand, for holistic scoring rubrics, the collection of criteria is
considered throughout the construction of each level of the scoring rubric and the result is
a single descriptive scoring scheme.
The next step after defining the criteria for the top level of performance is the
identification and definition of the criteria for lowest level of performance. In other
words, the teacher is asked to determine the type of performance that would constitute the
worst performance or a performance which would indicate lack of understanding of the
concepts being measured. The underlying reason for this step is for the teacher to capture
the criteria that would suit a middle level performance for the concept being measured. In
particular, therefore, the approach suggested would result in at least three levels of
performance.
A note of caution, it is suggested that each score category should be defined using
descriptors of the work rather than value-judgment about the work (Brookhart, 1999). For
example, ―Student‘s sentences contain no errors in subject-verb agreements,‖ is preferable
over, ―Student‘s sentences are good.‖ The phrase ―are good‖ requires the evaluator to
make a judgment whereas the phrase ―no errors‖ is quantifiable. Finally, we can test
whether our scoring rubric is ―reliable‖ by asking two or more teachers to score the
same set of projects or outputs and correlate their individual assessments. High
correlations between the raters imply high interrater reliability. If scores assigned by
teachers differ greatly, then such would suggest a way to refine the scoring rubrics so that
they would mean the same thing to different scorers.
Exercises 4.3
Performance task provide students‘ need to work more independently and to encourage
them to pay attention to the quality of their work. This also enables the teacher to efficiently
provide students with information on the strengths and weaknesses of students' works.
According to McTighe and Wiggins (2004) a performance task is authentic if its ‗‘reflect the
way in which people in the world outside of school must use knowledge and skills that
address various situation were expertise is challenged‖
60 | P a g e
Designing and constructing authentic performance task can be tricky, but Wiggins and
McTighe‘s GRASPS model is an excellent starting point.
GRASPS Model
The GRASPS Model is an authentic assessment design model to help you develop
authentic performance task, project units and/or inquiry lessons.
a. Goal – the goal provides the student with the outcome of the learning experience
and the contextual purpose of the experience and product creation.
b. Role – the role is meant to provide the student with the position or individual
persona that they will become to accomplish the goal of the performance task. The
majority of roles found within the tasks provide opportunities for students to
complete real-world applications of standards-based content.
c. Audience – the audience is the individual(s) who are interested in the findings and
products that have been created. These people will make a decision based upon
the products and presentations created by the individual(s) assuming the role
within the performance task.
d. Situation – the situation provides the participants with a contextual background for
the task. Students will learn about the real-world application for the performance
task
e. Performance or Product – the products within each task are designed using the
multiple intelligences. The products provide various opportunities for students to
demonstrate understanding. Based upon each individual learner and/or individual
class, the educator can make appropriate instructional decisions for product
development.
f. Standard or Expectation – provide student with a clear picture of success.
Identifies specific standards of success. Issues rubrics to the student or develop
them with the students.
These six parts come together to form an authentic assessment that include an essential
question to share with the student.
61 | P a g e
Example:
You are a member of a team of scientists investigating deforestation of the Papua New
Guinean rainforests. You are responsible for gathering scientific data (including visual
evidence such as photos) and producing a scientific report in which you summarize current
conditions, possible future trends and the implications for both Papua New Guinea and its
broader influence on our planet. Your report, which you will present to a United Nations
subcommittee, should include detailed and fully supported recommendations for an action
plan that are clear and complete.
G – The goal (within the scenario) is to determine current deforestation conditions and
possible future trends
R – Student is a member of a team of investigative scientists
S – The scenario: inform the United Nations subcommittee of the effects of deforestation
on the Papua New Guinean rain forest and convince them to follow the recommended
action plan.
S – The standards by which the product will be judged are detailed and fully supported
recommendations in an action plan that is both clear and complete.
Exercises 4.4
A. Explain the GRASPS model.
B. Use one of the sentence starters from each letter to help you write your task. Once
you have your sentences, then write it up as a task.
62 | P a g e
63 | P a g e
References:
Bolaños, A. B. (1997). Probability and Statistical Concepts : An Introduction. Manila: REX
Book Store
Brookhart, S. M. (1999). The Art and Science of Classroom Assessment: The Missing Part
of Pedagogy. ASHE-ERIC Higher Education Report (Vol. 27, No. 1). Washington, DC: The
George Washington University, Graduate School of Education and Human Development.
Navarro, R. L. et. al. (2017). Assessment of Learning 1. Quezon City, Metro Manila:
Lorimar Publishing, Inc.
64 | P a g e