Unit 9

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

The Research Design

DETAILS OF STUDY MEASUREMENT

Study setting Measurement DATA


Purpose of the Types of Extent of researcher
and ANALYSIS
study investigation interference
measures
Exploration Establishing: Minimal: Studying 1. Feel
Description – Causal relationships events as they normally Contrived for data
Noncontrived Operational
Hypothesis testing – Correlations occur
definition
PROBLEM STATEMENT

– Group differences, Manipulation and/or


Scaling
ranks, etc. control and/or
simulation

2. Goodness
of data

Data-
Unit of analysis Sampling Time
collection
(population to be Design horizon
method
studied) 3. Hypothesis
Individuals testing
Observation
Dyads Probability/ One-shot
Interview
Groups nonprobability (crosssectional)
Questionnaire
Organizations Sample size (n) Longitudinal
Physical
Machines
measurement
etc.
Goodness of Measures:
Reliability and validity Test-retest reliability

Stability
Reliability Parallel forms reliability
Accuracy in
measurement
Consistency
Chronbach’s α
Goodness Split half
reliability
of data
Content validity
Validity (Are we
measuring Criterion related validity
the right thing)
Construct validity
Reliability

Reliability applies to measure when similar results are


obtained overtime and across situations
Theory of Reliability
In research, the term reliability means "repeatability" or
"consistency". A measure is considered reliable if it would
give us the same result over and over again (assuming that
what we are measuring isn't changing!).
In more detail what it means to say that a measure is
"repeatable" or "consistent".
Consider a measure that we'll arbitrarily label X. It
might be a person's score on a math achievement test
or a measure of severity of illness. It is the value
(numerical or otherwise) that we observe in our study.
Now, to see how repeatable or consistent an
observation is, we can measure it twice.
If we assume that what we're measuring doesn't
change between the time of our first and second
observation, we can begin to understand how we get
at reliability.
While we observe a score for what we're measuring,
we usually think of that score as consisting of two
parts, the 'true' score or actual level for the person on
that measure, and the 'error' in measuring it.
It's important to keep in mind that we observe
the X score -- we never actually see the true
(T) or error (e) scores.
 Reliability = Var (T) / Var(X)
= Var(T) / [Var(T)+Var(e)]
 Var(T) = variance of true score
 Var(X) = variance of the measure

 If a measure is perfectly reliable, there is no error in measurement


-- everything we observe is true score. Therefore, for a perfectly
reliable measure, Var(e) = 0 and reliability = 1.
 if we have a perfectly unreliable measure, there is no true score --
the measure is entirely error. In this case, Var(T) = 0 and
reliability = 0.
 The value of a reliability estimate tells us the proportion of
variability in the measure attributable to the true score.
 A reliability of .5 means that about half of the variance of the
observed score is attributable to truth and half is attributable to
error. A reliability of .8 means the variability is about 80% true
ability and 20% error. And so on.
Test-retest reliability

Stability
Reliability Parallel forms reliability
Accuracy in
measurement
Consistency
Chronbach’s α
Goodness Split half
reliability
of data
Content validity
Validity (Are we
measuring Criterion related validity
the right thing)
Construct validity
Stability of Measures
The ability of a measure to remain the same over
time-despite uncontrollable testing conditions or
the state of the respondents themselves is
indicative of its stability and low vulnerability to
changes in the situation
Two tests of stability are
Test-Retest reliability
Parallel-Form reliability
Test-Retest reliability
Test retest reliability estimates are used to evaluate the
error associated with administering a test at two different
times to the same sample
Used to test characteristics that does not change over
time
ie.when a questionnaire containing some items that are
supposed to measure a concept is administered to a set of
respondents now and again to the same respondents
several weeks or months later, then the correlation
between the scores obtained at the two different times
from the same set of respondents is called test-retest
coefficient
The shorter the time gap, the higher the correlation; the
longer the time gap, the lower the correlation.
Parallel forms reliability
Compares two equivalent forms of a test
The two forms have similar items, and the same response
format, the only changes being the order or sequence of
the questions
Here we try to establish the error variability resulting
from wording and ordering of the questions
Two forms, different items, difficulty level same
The correlation between the two parallel forms is the
estimate of reliability.
One major problem with this approach is that you have
to be able to generate lots of items that reflect the same
construct.
Internal Consistency of Measures

Indicates the homogeneity of the items in the


measure that tap the construct
ie. the items should be capable of independently
measuring the same concept so that the
respondents attach the same overall meaning to
each of the items.
Consistency can be examined through
Inter item Consistency Reliability
Split-Half Reliability
Inter item Consistency Reliability
In internal consistency reliability estimation we
use our single measurement instrument
administered to a group of people on one occasion
to estimate reliability.
In effect we judge the reliability of the instrument
by estimating how well the items that reflect the
same construct yield similar results.
We are looking at how consistent the results are
for different items for the same construct within
the measure.
Split-Half Method
Test is divided into halves that are scored
separately
In split-half reliability we randomly divide all
items that measure the same construct into two
sets.
We administer the entire instrument to a sample of
people and calculate the total score for each
randomly divided half.
The result of one half of the test are then compared
with the results of the other
Formula
Cronbach's Alpha is mathematically equivalent to
the average of all possible split-half estimates,
although that's not how we compute it.

Alpha=2{Sx2 – (S2y1 + S2y2)}/Sx2

Sx2 = the variance for the scores on whole test


S2y1 + S2y2 = the variances for the two separate halves
of the test
Factors affecting reliability
Method of estimating reliability
Individual differences among respondents
Length of measure
Test question difficulty
Homogeneity of a measure’s content
Response format
Administration of a measure
Validity

The ability of a scale or measuring instrument to


measure what it is intended to measure
Distinguish Reliability & Validity
Eg.an archer’s bow and target as an analogy
High reliability means that repeated arrows shot from
the same bow would hit the target in essentially the
same place – although not necessarily the intended
place
High validity means that the bow would shoot true
every time. Arrows shot from a high-validity bow will
be clustered around a central point, even when they are
dispersed by reduced reliability
Think in terms of ‘the
purpose of tests’ and Validity and Reliability
the ‘consistency’ with
which the purpose is
fulfilled/met

Reliable
but not
Neither Valid
Valid
nor Reliable

Fairly Valid but


not very Valid &
Reliable Reliable
Content validity
Content validity ensures that the measure includes
an adequate and representative set of items that tap
the concept
In other words, it is a function of how well the
dimensions and elements of a concept have been
delineated
Subjective agreement among professional
Logically appears to reflect its purpose to measure
Concept clarity
Face Validity
 Face validity indicates that the items that are intended to measure a
concept, do on the face look like they measure the concept?
 ie.Does it appear to measure what it is supposed to measure?

 Example: Let’s say you are interested in measuring, ‘Propensity


towards violence and aggression’. By simply looking at the
following items, state which ones qualify to measure the variable of
interest:
Have you been arrested?
Have you been involved in physical fighting?
Do you get angry easily?
Do you sleep with your socks on?
Is it hard to control your anger?
Do you enjoy playing sports?
Criterion validity
 It can be studied by comparing test or scale scores with one
or more external variables
 The degree to which content on a test (predictor)
correlates with performance on relevant criterion measures
(concrete criterion in the "real" world?)
 If they do correlate highly, it means that the test (predictor)
is a valid one!
 E.g. if you taught skills relating to ‘public speaking’ and
had students do a test on it, the test can be validated by
looking at how it relates to actual performance (public
speaking) of students inside or outside of the classroom
 2 types: Concurrent and Predictive validity
Concurrent validity
A type of criterion validity whereby a new measure
correlates with a criterion measure taken at the same
time
It indicates how well performance on a test estimates
current performance on some valued measure (criterion)?
(e.g. test of dictionary skills can estimate students’
current skills in the actual use of dictionary –
observation)
 ie. The scale should discriminate the individuals based on the test
Predictive validity
A type of criterion validity whereby a new measure predicts
a future event or correlates with a criterion measure
administered at a later time
 Indicates how well performance on a test predicts future
performance on some valued measure (criterion)? (e.g.
reading readiness test might be used to predict students’
achievement in reading)
 Ie. The scale should differentiate among individuals with
reference to a future criterion
 Both are only possible IF the predictors are VALID
Factors affecting validity
Reliability of criterion and predictor
Violation of statistical assumptions
Small sample size
Typographical error

You might also like