Assignment 01: Hafiz Noor-ul-Amin BP615652

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Hafiz Noor-ul-Amin BP615652

B.Ed (Science Edu.) 4-years


Semester: Aut-2020

ASSIGNMENT 01
Course: Assessment in Science Education
Code: 6436
Department of Science Education
Allama Iqbal Open University, Islamabad
Q.NO.01
Criterion-referenced test
A criterion-referenced test is a style of test which uses test scores to generate a
statement about the behavior that can be expected of a person with that score. Most
tests and quizzes that are written by school teachers can be considered criterion-
referenced tests. In this case, the objective is simply to see whether the student has
learned the material. Criterion-referenced assessment can be contrasted with norm-
referenced assessment and ipsative assessment.
A common misunderstanding regarding the term is the meaning of criterion. Many,
if not most, criterion-referenced tests involve a cutscore, where the examinee
passes if their score exceeds the cutscore and fails if it does not (often called a
mastery test). The criterion is not the cutscore; the criterion is the domain of
subject matter that the test is designed to assess. For example, the criterion may be
"Students should be able to correctly add two single-digit numbers," and the
cutscore may be that students should correctly answer a minimum of 80% of the
questions to pass.
The criterion-referenced interpretation of a test score identifies the relationship to
the subject matter. In the case of a mastery test, this does mean identifying whether
the examinee has "mastered" a specified level of the subject matter by comparing
their score to the cutscore. However, not all criterion-referenced tests have a
cutscore, and the score can simply refer to a person's standing on the subject
domain. The ACT is an example of this; there is no cutscore, it simply is an
assessment of the student's knowledge of high-school level subject matter.
Because of this common misunderstanding, criterion-referenced tests have also
been called standards-based assessments by some education agencies, as students
are assessed with regards to standards that define what they "should" know, as
defined by the state.

Comparison of criterion-referenced and norm-


referenced tests
Both terms criterion-referenced and norm-referenced were originally coined
by Robert Glaser. Unlike a criterion-reference test, a norm-referenced test indicates
whether the test-taker did better or worse than other people who took the test. For
example, if the criterion is "Students should be able to correctly add two single-
digit numbers," then reasonable test questions might look like "" or "" A criterion-
referenced test would report the student's performance strictly according to
whether the individual student correctly answered these questions. A norm-
referenced test would report primarily whether this student correctly answered
more questions compared to other students in the group. Even when testing similar
topics, a test which is designed to accurately assess mastery may use different
questions than one which is intended to show relative ranking. This is because
some questions are better at reflecting actual achievement of students, and some
test questions are better at differentiating between the best students and the worst
students. (Many questions will do both.) A criterion-referenced test will use
questions which were correctly answered by students who know the specific
material. A norm-referenced test will use questions which were correctly answered
by the "best" students and not correctly answered by the "worst" students (e.g.
Cambridge University's pre-entry 'S' paper). Some tests can provide useful
information about both actual achievement and relative ranking. The ACT provides
both a ranking, and indication of what level is considered necessary to likely
success in college Some argue that the term "criterion-referenced test" is a
misnomer, since it can refer to the interpretation of the score as well as the test
itself. In the previous example, the same score on the ACT can be interpreted in a
norm-referenced or criterion-referenced manner.
Many high-profile criterion-referenced tests are also high-stakes tests, where the
results of the test have important implications for the individual examinee.
Examples of this include high school graduation examinations and licensure testing
where the test must be passed to work in a profession, such as to become a
physician or attorney. However, being a high-stakes test is not specifically a
feature of a criterion-referenced test. It is instead a feature of how an educational or
government agency chooses to use the results of the test. It is moreover an
individual type of test.

Examples
 Driving tests are criterion-referenced tests, because their goal is to see whether
the test taker is skilled enough to be granted a driver's license, not to see
whether one test taker is more skilled than another test taker.
 Citizenship tests are usually criterion-referenced tests, because their goal is to
see whether the test taker is sufficiently familiar with the new country's history
and government, not to see whether one test taker is more knowledgeable than
another test taker.
Sample scoring for the history question:
What caused World War II?
Student Answers Criterion referenced Norm referenced
Assessment Assessment
Student #1 This answer is correct. This answer is worse than
World War II was caused by Student #2's answer, but better
Hitler and Germany invading than Student #3's answer.
Poland.
Student #2 This answer is correct. This answer is better than
World War II was caused by Student #1's and Student #3's
multiple factors, including answers.
the Great Depression and the
general economic situation,
the rise of nationalism,
fascism, and imperialist
expansionism, and
unresolved resentments
related to World War I. The
war in Europe began with the
German invasion of Poland.
Student #3 This answer is wrong. This answer is worse than
World War II was caused by Student #1's and Student #2's
the assassination of answers.
Archduke Ferdinand.

Q.NO.02
Validity
Research validity in surveys relates to the extent at which the survey measures right
elements that need to be measured. In simple terms, validity refers to how well an
instrument as measures what it is intended to measure. Reliability alone is not
enough, measures need to be reliable, as well as, valid. For example, if a weight
measuring scale is wrong by 4kg (it deducts 4 kg of the actual weight), it can be
specified as reliable, because the scale displays the same weight every time we
measure a specific item. However, the scale is not valid because it does not display
the actual weight of the item. Research validity can be divided into two groups:
internal and external. It can be specified that “internal validity refers to how the
research findings match reality, while external validity refers to the extend to which
the research findings can be replicated to other environments”

Types of Validity
1. Face Validity is the most basic type of validity and it is associated with a
highest level of subjectivity because it is not based on any scientific approach. In
other words, in this case a test may be specified as valid by a researcher because it
may seem as valid, without an in-depth scientific justification.

Example: questionnaire design for a study that analyses the issues of employee
performance can be assessed as valid because each individual question may seem to
be addressing specific and relevant aspects of employee performance.

2. Construct Validity relates to assessment of suitability of measurement tool to


measure the phenomenon being studied. Application of construct validity can be
effectively facilitated with the involvement of panel of ‘experts’ closely familiar
with the measure and the phenomenon.

Example: with the application of construct validity the levels of leadership


competency in any given organisation can be effectively assessed by devising
questionnaire to be answered by operational level employees and asking questions
about the levels of their motivation to do their duties in a daily basis.

3. Criterion-Related Validity involves comparison of tests results with the


outcome. This specific type of validity correlates results of assessment with
another criterion of assessment.

Example: nature of customer perception of brand image of a specific company can


be assessed via organising a focus group. The same issue can also be assessed
through devising questionnaire to be answered by current and potential customers of
the brand. The higher the level of correlation between focus group and questionnaire
findings, the high the level of criterion-related validity.

4. Formative Validity refers to assessment of effectiveness of the measure in


terms of providing information that can be used to improve specific aspects of the
phenomenon.

Example: when developing initiatives to increase the levels of effectiveness of


organisational culture if the measure is able to identify specific weaknesses of
organisational culture such as employee-manager communication barriers, then the
level of formative validity of the measure can be assessed as adequate.

5. Sampling Validity (similar to content validity) ensures that the area of


coverage of the measure within the research area is vast. No measure is able to
cover all items and elements within the phenomenon, therefore, important items
and elements are selected using a specific pattern of sampling method depending
on aims and objectives of the study.

Example: when assessing a leadership style exercised in a specific organisation,


assessment of decision-making style would not suffice, and other issues related to
leadership style such as organisational culture, personality of leaders, the nature of
the industry etc. need to be taken into account as well.

Q.NO.03
A Table of Specification
A table of specification (ToS) is the technical term given to the plan for writing
items for a test. A table of specification should reflect what has been taught in the
instructional sequence. In other words, the testing mode is a mirror of the
instructional mode. Since instructional mode has basically two dimensions -
content matter and intellectual process, the ToS should likewise reflect both
content and process. By process we mean the intellectual level with which the
student engage a specific content or unit of information. We can use the categories
of Bloom's taxonomy to help define the process.
In developing the ToS proceed with a plan which reflects not only on
what has been taught, but also as what intellectual level the students are
functioning. Furthermore, the test is designed to test achievement. All achievement
test should be content-process valid.

A table of specification can help in construction of a


test of science subject
Test construction strategies are the various ways that items in a psychological
measure are created and decided upon. They are most often associated
with personality tests, but can also be applied to other psychological constructs
such as mood or psychopathology. There are three commonly used general
strategies: Inductive, Deductive, and Empirical. Scales created today will often
incorporate elements of all three methods.
Also known as itemetric or internal consistency methods. The inductive method
begins by constructing a wide variety of items with little or no relation to an
established theory or previous measure. The group of items is then answered by a
large number of participants and analyzed using various statistical methods, such
as exploratory factor analysis or principal component analysis. These methods
allow researchers to analyze natural relationships among the questions and then
label components of the scale based on how the questions group together. The Five
Factor Model of personality was developed using this method.
Advantages of this method include the opportunity to discover previously
unidentified or unexpected relationships between items or constructs. It also may
allow for the development of subtle items that prevent test takers from knowing
what is being measured and may represent the actual structure of a construct better
than a pre-developed theory. Criticisms include a vulnerability to finding item
relationships that do not apply to a broader population, difficulty identifying what
may be measured in each component because of confusing item relationships, or
constructs that were not fully addressed by the originally created questions.
Also known as rational, intuitive, or deductive method. The deductive method
begins by developing a theory for the construct of interest. This may include the
use of a previously established theory. After this, items are created that are
believed to measure each facet of the construct of interest. After item creation,
initial items are selected or eliminated based upon which will result in the strongest
internal validity for each scale.
Advantages of this method include clearly defined and face valid questions for
each measure. Measures are also more likely to apply across populations.
Additionally, it requires less statistical methodology for initial development, and
will often outperform other methods while requiring fewer items. However, the
construct of interest must be well understood to create a thorough measure, and it
may be difficult to prevent or determine if individuals are faking on the measure.

Q.NO.04
Importance of Psychomotor objectives
This domain is characterized by progressive levels of behaviors from observation
to mastery of a physical skill. Several different taxonomies exist.
Simpson (1972) built this taxonomy on the work of Bloom and others:
 Perception - Sensory cues guide motor activity.
 Set - Mental, physical, and emotional dispositions that make one respond in
a certain way to a situation.
 Guided Response - First attempts at a physical skill. Trial and error coupled
with practice lead to better performance.
 Mechanism - The intermediate stage in learning a physical skill. Responses
are habitual with a medium level of assurance and proficiency.
 Complex Overt Response - Complex movements are possible with a
minimum of wasted effort and a high level of assurance they will be
successful.
 Adaptation - Movements can be modified for special situations.
 Origination - New movements can be created for special situations.
Dave (1970) developed this taxonomy:
 Imitation - Observing and copying someone else.
 Manipulation - Guided via instruction to perform a skill.
 Precision - Accuracy, proportion and exactness exist in the skill performance
without the presence of the original source.
 Articulation - Two or more skills combined, sequenced, and performed
consistently.
 Naturalization - Two or more skills combined, sequenced, and performed
consistently and with ease. The performance is automatic with little physical
or mental exertion.
Harrow (1972) developed this taxonomy. It is organized according to the
degree of coordination including involuntary responses and learned
capabilities:
 Reflex movements - Automatic reactions.
 Basic fundamental movement - Simple movements that can build to more
complex sets of movements.
 Perceptual - Environmental cues that allow one to adjust movements.
 Physical activities - Things requiring endurance, strength, vigor, and agility.
 Skilled movements - Activities where a level of efficiency is achieved.
Level of Learning Objectives of Affective Domain
1. Receiving
This objective expects students to know, willing, accept and pay attention to
various stimulation. In this case students still are passive, merely just listen or pay
attention to school activities.
Examples of operational verb:
Listening choose
Attending follow
View giving
Pay attention embracing
Interested hold
Examples of learning objective affective domain of in physics
Students are willing to listen to the teacher's explanation of the concept of uniform
rectilinear motion
Students are willing to follow the practice of the convex lens.
Students pay attention very well delivered his presentation on the dangers of
erosion.
2. Giving Response (Responding)
The objective expects students have the desire to do something in reaction to an
idea, object or system of values, more than just the introduction alone, in this case
students are asked to demonstrate the manner requested.
Examples of operational verb:
Discuss help
Participate ask
Answer select
Helping approved
Practicing report.
Examples of learning objective affective domain of in physics
Students are willing to discuss lab results determining the specific heat of
substances
Students willing to participate actively in extracurricular activities.
Students are willing to practice using the oscilloscope.
3. Appreciation Value
The objective expects students to understand the value of respect for a belief or a
feeling assumption that an idea, object or a particular way of thinking has value. In
this case the student is consistently behaves in accordance with a value even if
there are no teachers who request or require.
Examples of operational verb:
Select believe
Convincing faith
Acting Donate
Arguing resolve
Gesture support consider
Examples of learning objective affective domain of in physics
Students show attitude of support the use of computers in teaching physics.
Students participate voluntarily in collecting used goods to be made simple physics
experiment equipment.
4. Organizing
The objective expects students were able to organize the things that show the
interconnectedness between certain values within a system of values and determine
which values have higher priority than the value of the other. In this case the
student is consistent with a value system. Students are required to organize a
variety of chosen values into a value system, and determine the relationship
between these values.
Examples of operational verb:
Decide compare
Formulate create systematization
Adopting building
Change discuss
Maintain Integrating
Examples of learning objective affective domain of in physics
Students are able to formulate a variety of alternative ways to raise funds and
choose the alternative that the public according to its value system to overcome the
coastal erosion.
Students are able to maintain the basis of why she agrees with the study of physics
is given starting from the first grade of primary school.
5. Practice (characterization)
The objective expects students were able to show that the practice associated with
organizing and integrating the values into a personal value system. This is shown
through the behavior of the lower level, but have been integrating these values into
a philosophy of life is complete and convincing, and consistent behavior will
always primarily to the philosophy of life. Philosophy of life is part of the
character.
Examples of operational verb:
Demonstrate an attitude study
Refuse qualify
Demonstrated adjust
Having good moral verification
Changing behavior actualize
Examples of learning objective affective domain of in physics
Students will show the scientific attitude by mentioning and testing a hypothesis
before accepting it.
Students reject authoritarian attitudes in the lab working group

Q.NO.05
Higher ability skills
Higher-order thinking skills (HOTS) is a concept popular in American education.
It distinguishes critical thinking skills from low-order learning outcomes, such as
those attained by rote memorization. HOTS include synthesizing, analyzing,
reasoning, comprehending, application, and evaluation. HOTS is based on various
taxonomies of learning, particularly the one created by Benjamin Bloom in his
1956 book, "Taxonomy of Educational Objectives: The Classification of
Educational Goals." Higher-order thinking skills are reflected by the top three
levels in Bloom’s Taxonomy: analysis, synthesis, and evaluation.
Higher ability skills could be assessed in science
learning
Observing
This is the most fundamental of science skills. That’s because most students are
born with five senses, which inform how they experience the world.

Classifying

This skill builds upon observation. Students can learn to separate and sort objects
based on properties. Younger students can learn to sort using a single factor (e.g.,
number of legs: spiders have eight and insects have six), while older students can
classify using several factors at once.

Quantifying

One of the most valuable skills needed for science study is the ability to measure
accurately.

Predicting

This skill derives from your students being able to spot patterns in past
experiments or existing evidence (i.e., from the natural world).

Controlling variables

Many different factors can affect the outcome of an experiment. You can help
students understand this by discussing potential factors before starting. This
provides context.

Interpreting

This skill is closely related to inferring, which means coming to a conclusion after
analyzing information. Interpreting, is inferring, from a point of view. Two
students may interpret an experiment’s results differently.

Communicating

This skill touches every other one. Students must be able to transmit information
through words, charts, diagrams, and other mediums.
Forming conclusions

This skill is connected to interpreting. Students cannot make conclusions hastily;


they must be reached through careful reasoning.

When forming conclusions, have your students look back at their predictions and
compare them with the actual results. Make sure they take all the information they
gathered into account as they draw a conclusion.

The End

You might also like