Quantitative Analysis

Quantitative Analysis
- Shown and described by using numerical symbols or numbers

- Mainstream component of language testing
- A descriptive analysis
Criteria of a Good Test
1. Relevance- the extent to which task represents a real situation. Ex. Earthquake drill – teach students
how to respond to real-life situations
2. Representativity – means a small sample was tested from a larger group (repre– ex. In a class of 60
students, only 15 were tested. Those 15 will represent the whole class. Their score must be very similar.
3. Authenticity – extent to which the situation and the interaction are meaningful and representative in the
word of individual user. Ex. Authentic test assessments or authentic tasks that allow them to read, listen
and write – by providing them task that has something to do with real-life situations
Authenticity means that the language response that students give in the test is appropriate to the language
of communication. The test items should be related to the usage of the target language.
Other definitions of authenticity are rather similar. The Dictionary of language testing, for instance, states
that “a language test is said to be authentic when it mirrors as exactly as possible the content and skills
under test”. It defines authenticity as “the degree to which test materials and test conditions succeed in
replicating those in the target situation”.
Authentic tests are an attempt to duplicate as closely as possible the circumstances of real-life situations. A
growing commitment to a proficiency-based view of language learning and teaching makes authenticity in
language assessment necessary.
4. Balance – if the teacher give equal importance to each topics being discussed or skills. Your attention to
the diff topic is balanced. Avoiding biased treatment to the concept or topic. Not because you like the topic,
you will just focus on it.
5. Reliability –
the results are consistent.
Measure of stability – one test tool is given to different classes and get the same test result– conducting
same test tool given for years then you are measuring stability – Use Test-retest method using Pearson r
Measure of equivalence – correlation bet scores on two similar forms of the same test taken by the same
individual – two test tool used – the same competency but different test and time frame – the result must
be significantly different – i
Use Parallel or Alternate Forms of Method – Pearson r
Measure of Internal Consistency – how well the items on the test measure the same construct or ideas
- also measures the choices or options in the test – identifying the discrimination test
If there are multiple choices item - Use Split-Half Method – Spearman-Brown Prophecy Formula
Dichotomous answer – only two possible items – uSe Kuder-Richardson Estimates
Variables that may affect reliability:
- Specificity
- Difficulty
- Length
- Time
- Item construction
 Reliable = Stable = Consistent

 Reliability means Consistency, dependability, or stability in measurements over observations or time.
Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result
repeatedly. For example, if a test is designed to measure a trait, then each time the test is administered to a
subject, the results should be approximately the same. Unfortunately, it is impossible to calculate reliability
exactly, but it can be estimated in a number of different ways.

A reliable test is the test that can produce stable scores or consistent scores.
A reliable test should be demonstrated by the score consistency within raters or intra raters, between raters
or inter raters, and across time and place.
We can say that a test has a high reliability if the scores demonstrate consistency no matter who
administer the test, when the test is administered, and where the test is administered.
6. Usefulness/Practicality – have practical value from time, economy, and administration point of view.
Practicality – cost efficient, time efficient
Useful to the student – in developing to what needs to be developed. It has to target what it is
aimed. Serve its specific purpose
Practicality refers to the economy of time, effort, and money in testing. A practical test should be easy to
design, easy to administer, easy to mark, and easy to interpret its results.
Traditionally, test practicality has referred to whether we have the resources to deliver the test that
we design.
7. Practicality
A test is practical when it:

 is not too expensive,
 stays with appropriate time constraints,
 is relatively easy to administer, and
 has a scoring/evaluation procedure that is specific and time efficient.
Administration – how many people are needed to administer the test?
How much will it cost?
8. Washback – effects the test have on instruction in terms of how students prepare for the test
9. Transparency – clear to the students how they will be assessed, manner in which assignments need to
be submitted, deadlines, assessment procedures, how the final mark will be calculated. Clear instructions
10. Security – quality or state of being secured. Larger scale, like NAT. How the BRE of Central Office
assure the security of NAT. Security protocol for the test paper to be recycled or reused in the next coming
years. Check if it is sealed. If the booklets are sealed. Protecting intellectual property. Cheating not
allowed.
11. Validity - efers to how well a test measures what it is purported to measure.
Why is it necessary?
While reliability is necessary, it alone is not sufficient. For a test to be reliable, it also needs to
be valid. For example, if your scale is off by 5 lbs, it reads your weight every day with an
excess of 5lbs. The scale is reliable because it consistently reports the same weight every
day, but it is not valid because it adds 5lbs to your true weight. It is not a valid measure of your
weight.
Construct Validity is used to ensure that the measure is actually measure what it is intended
to measure (i.e. the construct), and not other variables. Using a panel of “experts” familiar with
the construct is a way in which this type of validity can be assessed. The experts can examine
the items and decide what that specific item is intended to measure. Students can be involved
in this process to obtain their feedback.
Example: A women’s studies program may design a cumulative assessment of learning

throughout the major. The questions are written with complicated wording and phrasing. This
can cause the test inadvertently becoming a test of reading comprehension, rather than a test
of women’s studies. It is important that the measure is actually assessing the intended
construct, rather than an extraneous factor.
Construct Validity – the fitt of the test from theory and

- theory (objectives) – questions should be congruent (coherent) to the objectives or competencies
(TOS)
Content Validity – if the items or tasks of which it is made up constitute a representative samples of items
or task for the area of knowledge. Synchronous to the syllabus or curriculum
A test has content validity if it measures knowledge of the content domain of

which it was designed to measure knowledge. Another way of saying this is
that content validity concerns, primarily, the adequacy with which the test
items adequately and representatively sample the content area to be measured.
For e.g., a comprehensive math achievement test would lack content validity if
good scores depended primarily on knowledge of English, or if it only had
questions about one aspect of math (e.g., algebra). Content validity is
primarily an issue for educational tests, certain industrial tests, and other tests
of content knowledge like the Psychology Licensing Exam.
Expert judgement (not statistics) is the primary method used to determine
whether a test has content validity. Nevertheless, the test should have a high
correlation w/other tests that purport to sample the same content domain.
This is different from face validity: face validity is when a test appears valid
to examinees who take it, personnel who administer it and other untrained
observers. Face validity is not a technical sense of test validity; i.e., just b/c a
test has face validity does not mean it will be valid in the technical sense of
the word. "just cause it looks valid doesn’t mean it is."
. Face Validity ascertains that the measure appears to be assessing the intended construct under study.
The stakeholders can easily assess face validity. Although this is not a very “scientific” type of validity, it
may be an essential component in enlisting motivation of stakeholders. If the stakeholders do not believe
the measure is an accurate assessment of the ability, they may become disengaged with the task.

Example: If a measure of art appreciation is created all of the items should be related to the
different components and types of art. If the questions are regarding historical time periods,
with no reference to any artistic movement, stakeholders may not be motivated to give their
best effort or invest in this measure because they do not believe it is a true assessment of art
appreciation.
The aesthetics. The test should look like a test. Set of guidelines in designing a test.
- From A to D. according to increasing or decreasing length

- Format, font, style, arrangement of the choices
- Parallelism of the answers – if you started with verb, then all choices must be verbs
What are some ways to improve validity?
1. Make sure your goals and objectives are clearly defined and operationalized. Expectations of
students should be written down.
2. Match your assessment measure to your goals and objectives. Additionally, have the test reviewed
by faculty at other schools to obtain feedback from an outside party who is less invested in the
instrument.
3. Get students involved; have the students look over the assessment for troublesome wording, or
other difficulties.
4. If possible, compare your measure with other measures, or data that may be available.
1. A good test item is relevant. It should test the learning objective(s) being measured; nothing more and nothing
less. This may sound obvious, but when a student who is highly skilled at taking tests scores better on an item than
one who is less skilled, even though he has no more knowledge on the subject, this principle is probably being
violated.
2. A good test item is important. Items must clearly address learning objectives, not trivia. Memorization of obscure
facts is much less important than comprehension of the concepts being taught. Trivia, on the other hand, should not
be confused with "core" knowledge that is the foundation of a successful education. Examples of "core", nontrivial
knowledge include multiplication facts, common formulas, and common geographic names.
3. A good test item is comprehensible. Reading difficulty and choice of vocabulary should be as simple as possible
relevant to the grade level being tested. This is a corollary of Characteristic #1. If you are not testing reading skills
with an item, then do not make reading the item part of the problem. A good author is invisible; that is, you can read
his story without being distracted by the style or skills of the storyteller. In the same way, the wording of a good test
item should be "invisible". It should be simple, clear, and not a distraction from the concept at hand. In addition,
because of this principle, there should be no objection to an item being read verbally to reading impaired students.
This, of course, assumes that the item is not intended to evaluate reading skills.
4. A good test item is unambiguous. If a word has more than one possible definition, the context in which it is used
should leave no reasonable doubt as to which definition is intended. Directions also should contain no ambiguity. If
the student is to circle the correct answer, he should not be instructed to mark the correct answer.
5. A good test item is straightforward. There should be no trick questions. Tricky items often turn on the meaning of
a single word that is not the focus of the item. This is often a flaw in true/false items. Use of the
words always and never, and opinions stated as facts are often an unneeded source of confusion to test-takers. If the
correct response hinges on a single word, that word should be clearly emphasized. Humor should be used with care
as well. The personality of an individual teacher may shine through in the tests he gives his students, but for serious
or high-stakes tests, any attempt at humor can be confusing and distracting.
6. A good test item is uncontroversial. Items should be supportable facts or qualified opinions, not unqualified
opinions. This principle is closely related to Characteristic #5. For selected-response items, there should be an
unarguably correct answer. If more than one option could possibly be correct, the directions should call for
the best answer, rather than the correct answer.
7. A good test item is independent. Items should not provide clues to the answers of other items. Sometimes a
series of comprehension items all relate to a single reading passage, or multiple math problems are taken from a
single scenario. This approach simplifies item-writing and can be effective, as long as the individual items are still
independent of each other. On the other hand, if getting the correct answer on Item #2 depends on getting the correct
answer on Item #1, then item #2 tells you absolutely nothing about the skills of the student who missed Item #1.
Furthermore, this student is being penalized twice, in effect, for one mistake.
DESCRIPTIVE ANALYSIS
Measures of Central Tendency
1. Mean – average of data set
2. Median – the middle value when a data set is ordered from least to greatest.
The outliers are not included unlike in the mean.
3. Mode – number that occurs most often in a data set.
CORRELATION
1.Pearson r
2. Spearman Rank
Assignment in 204
Reasons why learners have difficulty in listening.
Name three and give contextualized example: based on your experience.

Quantitative Analysis - Sir Audrey

Uploaded by

Copyright:

Available Formats

Quantitative Analysis - Sir Audrey

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantitative Analysis - Sir Audrey

Uploaded by

Copyright:

Available Formats

- Shown and described by using numerical symbols or numbers

Criteria of a Good Test

the results are consistent.

Use Parallel or Alternate Forms of Method – Pearson r

Dichotomous answer – only two possible items – uSe Kuder-Richardson Estimates

Variables that may affect reliability:

 Reliable = Stable = Consistent

A test is practical when it:

Administration – how many people are needed to administer the test?

How much will it cost?

Example: A women’s studies program may design a cumulative assessment of learning

Construct Validity – the fitt of the test from theory and

A test has content validity if it measures knowledge of the content domain of

- From A to D. according to increasing or decreasing length

What are some ways to improve validity?

Measures of Central Tendency

1. Mean – average of data set

The outliers are not included unlike in the mean.

3. Mode – number that occurs most often in a data set.

Reasons why learners have difficulty in listening.

Name three and give contextualized example: based on your experience.

You might also like