Quantitative Analysis - Sir Audrey
Quantitative Analysis - Sir Audrey
Quantitative Analysis - Sir Audrey
1. Relevance- the extent to which task represents a real situation. Ex. Earthquake drill – teach students
how to respond to real-life situations
2. Representativity – means a small sample was tested from a larger group (repre– ex. In a class of 60
students, only 15 were tested. Those 15 will represent the whole class. Their score must be very similar.
3. Authenticity – extent to which the situation and the interaction are meaningful and representative in the
word of individual user. Ex. Authentic test assessments or authentic tasks that allow them to read, listen
and write – by providing them task that has something to do with real-life situations
Authenticity means that the language response that students give in the test is appropriate to the language
of communication. The test items should be related to the usage of the target language.
Other definitions of authenticity are rather similar. The Dictionary of language testing, for instance, states
that “a language test is said to be authentic when it mirrors as exactly as possible the content and skills
under test”. It defines authenticity as “the degree to which test materials and test conditions succeed in
replicating those in the target situation”.
Authentic tests are an attempt to duplicate as closely as possible the circumstances of real-life situations. A
growing commitment to a proficiency-based view of language learning and teaching makes authenticity in
language assessment necessary.
4. Balance – if the teacher give equal importance to each topics being discussed or skills. Your attention to
the diff topic is balanced. Avoiding biased treatment to the concept or topic. Not because you like the topic,
you will just focus on it.
5. Reliability –
Measure of stability – one test tool is given to different classes and get the same test result– conducting
same test tool given for years then you are measuring stability – Use Test-retest method using Pearson r
Measure of equivalence – correlation bet scores on two similar forms of the same test taken by the same
individual – two test tool used – the same competency but different test and time frame – the result must
be significantly different – i
Measure of Internal Consistency – how well the items on the test measure the same construct or ideas
- also measures the choices or options in the test – identifying the discrimination test
If there are multiple choices item - Use Split-Half Method – Spearman-Brown Prophecy Formula
- Specificity
- Difficulty
- Length
- Time
- Item construction
6. Usefulness/Practicality – have practical value from time, economy, and administration point of view.
Practicality – cost efficient, time efficient
Useful to the student – in developing to what needs to be developed. It has to target what it is
aimed. Serve its specific purpose
Practicality refers to the economy of time, effort, and money in testing. A practical test should be easy to
design, easy to administer, easy to mark, and easy to interpret its results.
Traditionally, test practicality has referred to whether we have the resources to deliver the test that
we design.
7. Practicality
8. Washback – effects the test have on instruction in terms of how students prepare for the test
9. Transparency – clear to the students how they will be assessed, manner in which assignments need to
be submitted, deadlines, assessment procedures, how the final mark will be calculated. Clear instructions
10. Security – quality or state of being secured. Larger scale, like NAT. How the BRE of Central Office
assure the security of NAT. Security protocol for the test paper to be recycled or reused in the next coming
years. Check if it is sealed. If the booklets are sealed. Protecting intellectual property. Cheating not
allowed.
11. Validity - efers to how well a test measures what it is purported to measure.
Why is it necessary?
While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs to
be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an
excess of 5lbs. The scale is reliable because it consistently reports the same weight every
day, but it is not valid because it adds 5lbs to your true weight. It is not a valid measure of your
weight.
Construct Validity is used to ensure that the measure is actually measure what it is intended
to measure (i.e. the construct), and not other variables. Using a panel of “experts” familiar with
the construct is a way in which this type of validity can be assessed. The experts can examine
the items and decide what that specific item is intended to measure. Students can be involved
in this process to obtain their feedback.
Content Validity – if the items or tasks of which it is made up constitute a representative samples of items
or task for the area of knowledge. Synchronous to the syllabus or curriculum
. Face Validity ascertains that the measure appears to be assessing the intended construct under study.
The stakeholders can easily assess face validity. Although this is not a very “scientific” type of validity, it
may be an essential component in enlisting motivation of stakeholders. If the stakeholders do not believe
the measure is an accurate assessment of the ability, they may become disengaged with the task.
Example: If a measure of art appreciation is created all of the items should be related to the
different components and types of art. If the questions are regarding historical time periods,
with no reference to any artistic movement, stakeholders may not be motivated to give their
best effort or invest in this measure because they do not believe it is a true assessment of art
appreciation.
The aesthetics. The test should look like a test. Set of guidelines in designing a test.
1. Make sure your goals and objectives are clearly defined and operationalized. Expectations of
students should be written down.
2. Match your assessment measure to your goals and objectives. Additionally, have the test reviewed
by faculty at other schools to obtain feedback from an outside party who is less invested in the
instrument.
3. Get students involved; have the students look over the assessment for troublesome wording, or
other difficulties.
4. If possible, compare your measure with other measures, or data that may be available.
1. A good test item is relevant. It should test the learning objective(s) being measured; nothing more and nothing
less. This may sound obvious, but when a student who is highly skilled at taking tests scores better on an item than
one who is less skilled, even though he has no more knowledge on the subject, this principle is probably being
violated.
2. A good test item is important. Items must clearly address learning objectives, not trivia. Memorization of obscure
facts is much less important than comprehension of the concepts being taught. Trivia, on the other hand, should not
be confused with "core" knowledge that is the foundation of a successful education. Examples of "core", nontrivial
knowledge include multiplication facts, common formulas, and common geographic names.
3. A good test item is comprehensible. Reading difficulty and choice of vocabulary should be as simple as possible
relevant to the grade level being tested. This is a corollary of Characteristic #1. If you are not testing reading skills
with an item, then do not make reading the item part of the problem. A good author is invisible; that is, you can read
his story without being distracted by the style or skills of the storyteller. In the same way, the wording of a good test
item should be "invisible". It should be simple, clear, and not a distraction from the concept at hand. In addition,
because of this principle, there should be no objection to an item being read verbally to reading impaired students.
This, of course, assumes that the item is not intended to evaluate reading skills.
4. A good test item is unambiguous. If a word has more than one possible definition, the context in which it is used
should leave no reasonable doubt as to which definition is intended. Directions also should contain no ambiguity. If
the student is to circle the correct answer, he should not be instructed to mark the correct answer.
5. A good test item is straightforward. There should be no trick questions. Tricky items often turn on the meaning of
a single word that is not the focus of the item. This is often a flaw in true/false items. Use of the
words always and never, and opinions stated as facts are often an unneeded source of confusion to test-takers. If the
correct response hinges on a single word, that word should be clearly emphasized. Humor should be used with care
as well. The personality of an individual teacher may shine through in the tests he gives his students, but for serious
or high-stakes tests, any attempt at humor can be confusing and distracting.
6. A good test item is uncontroversial. Items should be supportable facts or qualified opinions, not unqualified
opinions. This principle is closely related to Characteristic #5. For selected-response items, there should be an
unarguably correct answer. If more than one option could possibly be correct, the directions should call for
the best answer, rather than the correct answer.
7. A good test item is independent. Items should not provide clues to the answers of other items. Sometimes a
series of comprehension items all relate to a single reading passage, or multiple math problems are taken from a
single scenario. This approach simplifies item-writing and can be effective, as long as the individual items are still
independent of each other. On the other hand, if getting the correct answer on Item #2 depends on getting the correct
answer on Item #1, then item #2 tells you absolutely nothing about the skills of the student who missed Item #1.
Furthermore, this student is being penalized twice, in effect, for one mistake.
DESCRIPTIVE ANALYSIS
2. Median – the middle value when a data set is ordered from least to greatest.
CORRELATION
1.Pearson r
2. Spearman Rank
Assignment in 204