MODULE 2 Handout

Psych 3140 Psychological Assessment
Prepared by:
ELIZABETH S. SUBA, Ph.D., RPsy, RPm, RGC
ANGELO R. DULLAS, MA Clinical Psych
Central Luzon State University

Science City of Munoz 3120
Nueva Ecija, Philippines
Instructional Module for the Course

PSYCH 3140 Psychological Assessment
Topic 1: Principles of Psychological Testing
Overview
In this module, we will provide you with the Principles of Psychological Assessment and
Psychological Testing, the definition and its basic concepts. You are expected to define
what are the Principles of Psychological Assessment and Psychological Testing, the
definition and its basic concepts primarily the statistical foundation of modern
psychometrics. The following are the outline of this chapter.
1. Scales of Measurement
2. Statistical Interpretation of Tests Scores (Raw and Derived Scores)
3. Measure of Central Tendencies
4. Measures of Variability
5. Norms
5.1Linear and Non-Linear Transformation
5.2Types of Norms
6. Test Reliability
6.1General Model of Reliability
6.2Test-retest
6.3Alternate Form
6.4Split Half Reliability
6.5Kuder Richardson
6.6Standard Error of Measurement
7. Test Validity
7.1Content Validity
7.2Criterion-related Validity
7.3Construct Validity
8. Item Analysis
9. Item Response Theory
Edited by the Instructor for further examples and explanation.

I. Objectives:
Upon the completion of this module, you are expected to:
1. Describe the basic principle of Psychological Assessment and Psychological Testing.

2. Describe the statistical foundation of Psychological Assessment and Psychological
Testing.
II. Learning Activities
MEASUREMENT AND STATISTICS
Statistical Interpretation of Test Scores
Descriptive Statistics - procedures used to summarize and describe a set of data in

quantitative terms, where complete population data are available.
Inferential Statistics - procedures used in drawing inferences about the properties

and characteristics of populations from sample data. Inferences are logical deductions
about things that cannot be observed directly.
Scales of Measurement
A measurement scale differentiates people from each other on any one variable.
Variable- a factor, property, attribute, characteristic, or behavior dimension along

which people or objects differ.
Examples: Physical dimension- length or weight

Psychological dimension- intelligence or self-concept
Image:
https://www.graphpad.com/support/faq/
what-is-the-difference-between-ordinal-
interval-and-ratio-variables-why-should-i-
care/

SCALES OF
MEASUREMENT
SCALES DESCRIPTION LIMITATIONS/APPLICATION
1. Nominal - Numbers are used to -they do not provide very precise
classify and identify information about individual
people or objects differences; do not really quantify
according to category test-taker’s performance;
labels.
Examples: -they indicate the presence or
A. Gender can be absence of a property but not the
categorized as “male” or extent or amount of a property
“female”
We can choose to give all Compare:
females a “score” of 1 IQ = 102, IQ= 108 IQ = Average
and all “males” a score of
B. We can administer an Note: when we transform scores to
IQ test to a group of a nominal scale our information
people and reclassify becomes more general and less
their scores as “below precise.
average”, “average”, or
“above average”.
2. Ordinal -When we classify people It does not indicate the precise
or objects by ranking extent by which the group
them on some members differ.
dimensions or in terms of
the attributes being Ex: the ranks simply tell us that
measured. one child is taller than another, but
not exactly how much taller.
An ordinal scale provides
information about where -they do not provide the type of
group members fall individual differences information
relative to each other. Ex. that we want.
1st, 2nd, 3rd , . . . .
3. Interval When we classify people Application: -Assume that 3
or objects by ranking people, A, B, and C receive scores
them with an equal-unit of 65, 55, and 45 respectively on a
scale. We need to standardized test of anxiety.
establish that a difference
of 1 or 3 or 5 units is If this is an interval-level test, we
equivalent at any place can draw 3 conclusions:
along the scale. 1. Person A demonstrates a higher
level of anxiety than Person B, who
Ex. Height in turn is more anxious than person
C. The scores permit us to

The difference between determine the relative extent of

60 and 65 inches, a 5 anxiety in these three people.
unit difference, is exactly
2. The difference between a score
the same as the of 55 and a score of 65 for persons
difference between 40 A and B is equivalent to the
and 45 inches. difference between 45 and 55 for
person B and C. Each pair
Note: Scores on most represents a difference of 10 units.
psychological tests are 3. The difference in extent of
designed to represent anxiety between persons A and C
interval scales of (20 units) is twice as great as the
measurement difference between persons A and B
(10 units).
4. Ratio -When we rank people or Application:
object with an - Ratio scales are rare in
equalinterval scale that psychological measurement since it
has a true zero point. is virtually impossible to define a
- True zero point- true zero point for most
indicates the absence of psychological characteristics.
the characteristics being
measured. Could a person ever be classified as
possessing no intelligence, no
Example: miles per aggression, or no self-concept?
hour
It measures the extent of

speed attained by a
moving object. At 0 miles
per hour, the object is
not moving. Each mile
per hour increment above
0 indicates the increase in
speed on an equal
interval scale.
NOTE:
As we move from nominal scale to interval and ratio scales, we increase the
precision of the measurement process.
Interval and ratio scales with their equal units are most appropriate for
comparing people, for the study of individual differences.

Types of Scores
Raw Score- scores obtained directly from test performance.

It is usually meaningless unless they are transformed to other scales.
Transformed scores or Derived Scores- it is a score/s resulting from the

transformation of raw score into other scales in order to facilitate analysis and
interpretation.
re and transformed scores will be
related in a linear manner.
Describing Score Distributions
Frequency Distribution- is a technique for systematically displaying or representing

scores to show how frequently each value was obtained. This distribution can also be
shown graphically by plotting the frequencies either as a frequency polygon or a
histogram.
Properties of Frequency Distribution
1. Central Location or Central Tendency- refers to a value or measure near the

center of the distribution which represents the average score of the group.
 Mean- the arithmetic average or the value obtained by adding together a set of
measurements and then dividing by the number of measurements in the set.
 Median- the middlemost score or the score above and below which 50% of the
score fall. It is sometimes referred to as the 50th percentile, the 5th decile, and
the second quartile.
 Mode- the score that occurs more frequently in a set of test scores or the score
obtained by the most number of people. When test scores are grouped into
intervals, the mode is the midpoint of the interval containing the largest number
of scores.

2. Measures of dispersion or Variation- refers to the extent of the clustering about
a central value or the dispersion of scores around a given point. If all scores are close
to the central value, their variation will be less than if they tend to depart more
markedly from the central values.
 Range- the simplest measure; the difference between the largest and smallest
score.
 Standard Deviation- most commonly used measure of variability; appropriate

when the arithmetic mean is the reported average; gives an index of how widely
the scores are dispersed about the mean. The larger the standard deviation, the

more widely scattered the scores. The standard deviation is actually the square
root of the variance.
 Variance- is a measure of the total amount of variability in a set of test scores.
3. Skewness- refers to the symmetry or asymmetry of a frequency distribution.
 Positively skewed- if the larger frequencies tend to be concentrated toward

the low end of the variable and the smaller frequencies toward the high end.
Few high scores and many low scores. Mean is larger than the median.
Example: if a test is difficult, scores could cluster at the low end.
Image: https://www.analyticsvidhya.com/blog/2021/05/shape-of-data-skewness-and-kurtosis/
 Negatively skewed- the larger frequencies are concentrated toward the high
end of the scale and the smaller frequencies toward the low end. Many high
scores and few low scores. The median is larger than the mean.
Example: If a test is easy, the scores would cluster at the high end of the scale
and tail off toward the low end.

 Normal Curve- if the distribution is symmetrical, bell shaped and the larger
frequencies are clustered around the average. The mean, median and mode
coincide.
 Kurtosis- refers to the flatness or peakedness of one distribution in relation to

another.
Leptokurtic- if one distribution is more peaked than normal.
Platykurtic- if it is less peaked
Mesokurtic- normal distribution
Image: https://www.researchgate.net/figure/Value-of-kurtosis-for-different-Gaussian-distribution-compared-with-normal-
distribution_fig16_318491600

NORM REFERENCED VS. CRITERION REFERENCED TESTS
Criterion-Referenced Test
Example:
Academic performance where you need to score 90% or better correct in a test for a
grade of 1.00 ; 80% or better for 1.50 and so forth. Professional licensing examinations
are examples that include a mastery component.
In this example there is a mastery component; a predetermined cut-off score indicates

whether the person has attained an established level of mastery.
Norm-Referenced Test
individuals who have taken the test
often called standardization sample or normative group.
Example: IQ test, Aptitude test
THE MEANING AND APPLICATION OF NORMS
Standardization and Norming
 Standardization - involves administering the constructed test to a large sample

of people (the standardization sample) selected as representative of the
target population of persons for whom the test is intended.
 NORMS- refer to the performance of the standardization sample used in

the process of standardizing the test; empirically established and presented in
tabular form.
- Raw scores are converted to some form of derived scores or norms.
Two essential points should be stressed:
-based interpretations could be made for a given raw score,

depending on which normative group is chosen.
KINDS OF NORMS
Developmental Norms- indicate how far along the normal developmental path the
individual had progressed. (Anastasi & Urbina, 1997).
1. Age norms- Age equivalent is the median score on a test obtained by persons
(standardization sample) of a given chronological age.

Mental Age score of an examinee corresponds to the chronological age of the subgroup
in the standardization group whose median is the same as that of the examinee.
2. Grade norms or equivalents- often used in interpreting educational achievement

tests. Grade norms are found by computing the mean or median raw score obtained by
students at a given grade level.
For example: if the average number of problems solved correctly on an

Arithmetic test by the fourth graders in the standardization sample is 23,then a raw
score of 23 corresponds to a grade equivalent of 4.
Within Group Norms
The individual’s performance is evaluated in terms of the performance of the most

nearly comparable standardization group.
Percentile- scores that are expressed in terms of the percentage of persons in the
standardization sample who fall below a given raw score, it is also called percentile
rank.
Limitations: inequality of their units, especially at the extremes of the distribution.
Percentage scores are raw scores expressed in terms of percentage of correct items.
Standard Scores
It is a raw score that has been converted from one scale to another scale, where it can
be interpreted easily than a raw score.
in
standard deviation units.
2 Kinds of Standard Scores Transformation
Linear transformation- scores retain their exact numerical relations of the original
raw scores because they are computed by subtracting a constant (mean) from each raw
score and then dividing the result by another constant (standard deviation).
Linearly derived standard scores are often designated as standard scores or “z scores”.
 z Scores
-score is considered the base of standard scores, since it is used for conversion
to another type of standard score.
-score by subtracting the mean of the
instrument from the client’s raw score and dividing by the standard deviation of the
instrument.

- Aside from providing an easy context for comparing scores on the same test,
standard scores also provides an easy way to compare scores on different test.
Example:
Consider Marites’ raw score on the Psychological Assessment test was 24 and that her
raw score on the Abnormal Psychology test was 42. Without knowing any other
information than these raw scores, we can conclude that Marites did better on the
Abnormal Psychology test than on the other test. But if the two raw scores are
converted in z scores it will become more informative.
Converting Marites’ raw scores to z scores based on the performance of her classmates,
assume that we find her z score on psychological assessment was 1.32 and that her z
score on abnormal psychology was -0.75. Thus, although her raw score in abnormal
psychology test was higher than the other, the z scores will tell a different picture.
z score will show that, as compared to her other classmates (assuming that there is a
normal distribution of scores), Marites performed above average on the psychological
assessment test and below average on the abnormal psychology test (we assumed that
the interpretation is based on the tables detailing distances under the normal curve as
well as the resulting percentage of cases that could be expected to fall above or below
a particular standard deviation point (z score).
Nonlinear or Normalized standard scores- expressed in terms of a distribution that

has been transformed to fit a normal curve.
 T Scores
The normalized standard score is multiplied by 10 and added to or subtracted from 50,
has a fixed mean of 50 and a standard deviation of 10.
A score of 50 corresponds to the mean, a score of 60 to 1 SD above the mean, and so
forth. Some test developers prefer T-scores because they eliminate the decimals and
positive and negative signs of z-scores.
Stanines
rd deviation of 1.96 except for the
stanines of 1 and 9.
ng the lowest 4 percent of the
individuals receive a stanine score of 1, the next 7 percent receive a stanine of 2, the
next 12 percent receive a stanine of 3, and then just keep progressing through the
group.
present a range of scores, and sometimes
people do not understand that one number represents various raw scores.

Example: Ottis Lennon School Ability test is one example that shows how raw scores
are converted to different scales to obtain a logical interpretation of the scores.
Source: https://www.psi-services.net/services/educational-assessment/

Deviation IQs
SD that approximate the SD of the
Stanford-Binet IQ distribution. It resembles an IQ scale because of the use of
100.
deviations from the mean are converted into standard scores, which typically have
a mean of 100 and a standard deviation of 15.
nt) used in early intelligence tests.
They are more preferred now than the ratio IQ.
CEEB (College Entrance Examination Board) Score

score with a mean of 500 and a standard
deviation of 100.
(GRE)

CORRELATIONAL STATISTICS
Correlation is concerned with determining the extent to how some things (such as
traits, abilities or interests) are related to other things (such as behavior or intelligence).
Correlation coefficient – it is a numerical index that describes the magnitude and

direction of the relationship between two variables. (Aiken, 2000) It may be either
Positive or Negative.
+/- 0.00 to 0.19 Very weak, negligible correlation
+/- 0.20 to 0.39 Weak, low correlation
+/- 0.40 to 0.59 Moderate correlation
+/- 0.60 to 0.79 Strong high correlation
+/- 0.80 to 1.00 Very strong correlation
Coefficient of Determination – squared value of the correlation coefficient. It is the

proportion of the total variation in scores on Y that we know as a function of
information about X.
Pearson Product-Moment Correlation or Pearson r - is the most popular

measure;
- It is used when the variables are of interval or ratio type of measurement.
- It ranges from –1.00 (a perfect inverse relationship) to +1.00 (a perfect direct
relationship.)
The Meaning of Correlation (Aiken, 2000)
Correlation implies predictability

-The accuracy with which a person’s score on measure Y can be predicted from his or
her score on measure X depends on the magnitude of the correlation between the two
variables.
-The closer the correlation coefficient is to an absolute value of 1.00 (either

+1.00 or – 1.00, the smaller the average error made in predicting Y scores from X
scores
For example,
If the correlation between tests X and Y is close to +1.00, it can be predicted with
confidence that a person who makes a high score on variable X will also make a high
score on variable Y and a person who makes a low score on X will also obtain a low

score on Y. On the other hand, if the correlation is close to – 1.00 what could be your
prediction?
Correlation does not imply causation

-The fact that two variables are significantly correlated facilitates predicting
performance on one from performance on the other, but it provides no direct
information on whether the two variables are causally connected.
Simple Linear Regression – procedure for determining the algebraic equation of the
best-fitting line for predicting scores on a dependent variable from one or more
independent variables. The product moment correlation coefficient, which is a measure
of the linear relationship between two variables, is actually a by-product of the
statistical procedure for finding the equation of the straight line that best fits the set of
points representing the paired X-Y values.
Multiple Regression Analysis – it is an extension of simple linear regression analysis

to two or more variables, with Y as the criterion variable and X1, X2, and X3 as the
independent variables.
Factor Analysis -a mathematical procedure for analyzing a matrix of correlations

among measurements to determine what factors (constructs) are sufficient to explain
the correlations. Its major purpose is to reduce the number of variables in a group of
measures by taking into account the overlap (correlations) among them.
Other Statistical Tools
-square test for goodness of fit

Chi-square (x2) is used to determine the strength of association between two nominal
variables. In the test of goodness of fit, x2 test is used to determine whether a
significant difference exists between the observed frequency distribution and the
expected frequency distribution.
-square for independence – it sets to find out whether two nominal

variables A and B are independent of each other, or whether an association exists
between them.
Example: is there an association between the manager’s gender preference (Male,

Female, LBGTQIA) and leadership style (Coaching, Democratic, Visionary...)

CHARACTERISTICS OF A GOOD TEST
DESIGN PROPERTIES OF A GOOD TEST (Freidenberg, 1995)
 A clearly defined purpose.

- What is the test supposed to measure? (knowledge, skills, behavior, and
attitudes and other characteristics)
- Who will take the test? The format of the test may be varied to suit the test
taker (oral, written, pictures, words, manipulations)
- How will the test scores be used? Appropriateness of different types of test items
and test scores.
 A specific and standard content.

Content is specific to the domain to be measured standard - all test takers are tested on
the same attributes or knowledge.
 A set of standard administration procedures.

Standard conditions are necessary to minimize the effects of irrelevant variables,
 A standard scoring procedure.
PSYCHOMETRIC PROPERTIES OF A GOOD TEST
It refers to the consistency of scores obtained by the same person when retested with
the same test or with an equivalent form of the test on different occasions.
Refers to the degree to which a test measures what it is supposed to measure.
Item Analysis- process of statistically reexamining the qualities of each item of the test.
It includes Item Difficulty Index and Discrimination Index.

TEST RELIABILITY
urement or the degree to which test

scores are consistent, dependable, repeatable and free from errors or free from bias.
individual differences in test

scores are attributable to “true differences in the characteristics under consideration
and the extent to which they are attributable to chance errors”
Despite optimum testing conditions, however, no test is a perfectly reliable instrument.
Reliability Coefficient – It is a numerical index (between .00 and 1.00) of the

reliability of an assessment instrument. It is based on the correlation between two
independently derived set of scores.
General Model of Reliability

Theories of test reliability were developed to estimate the effects of inconsistency on
the accuracy of psychological measurement.
This conceptual breakdown is typically represented by the simple equation
Observed test score = True Score + errors of measurement
X=T+E
where X = score on the test
T = True Score
E = Error of measurement
Error in measurement represent discrepancies between scores obtained on tests and

the corresponding true scores Thus,
E=X–T
The goal of reliability theory is to estimate errors in measurement and to suggest ways
of improving tests so that errors are minimized.

METHODS OF OBTAINING RELIABILITY

Methods Procedure Coefficient Problems
a. Test-Retest Same test given Coefficient of Memory effect
twice with time Stability Practice effect
interval between Change over time
testing Practice effect-
- The error variance may produce
corresponds to improvement in
random fluctuation retest scores.
of performance Thus, the
from one test correlation between
session to another the 2 tests will be
as a result of spuriously high.
uncontrolled testing The time interval
conditions. must be recorded.
Source of Error:
Time Sampling
b. Alternate Equivalent tests Coefficient of Hard to develop two
Form or given with time Equivalence equivalent tests.
Parallel interval between and Coefficient of
Form testing. stability May reflect change
in behavior over
-Uses one form of Consistency of time
test on the first response to
testing and with different item Practice effect may
another comparable samples. tend to reduce the
form on the second. correlation between
-In the development the two test forms.
of alternate forms,
need to ensure that The degree to
they are truly which the nature of
parallel. the test will change
in repetition.
Source of Error:
Item Sampling
c. Inter-rater Different scorers or Consistency of Source of Error:
or Inter- observers rate the ratings Observer
scorer items or responses differences
reliability independently.
Used for free
responses
d. Internal One test given at Coefficient of Uses shortened

Consistency one time only. internal consistency forms (split-half)

or Split-Half Only good if traits
-Two scores are Coefficient of are unitary or
obtained by dividing Equivalence homogenous Gives
the test into high estimate on a
comparable halves. speeded test. The
(split-half method) correlation gives the
-uses corrected reliability of only
correlation between one half.
two halves of the
test
-Temporal stability Hard to compute by
is not a problem hand
because only one
test session is
involved.
Note: The two following items below are statistical formulas used to
compute for reliability
Kuder Richardson -utilizes a single Consistency of Two sources of
Reliability (Inter- administration of a responses to all error:
item consistency) single form. items a) content sampling
- KR20 for b) heterogeneity of
heterogenous the behavior
instruments domain sampled
- KR 21 for
homogenous
instruments
Coefficient Alpha Appropriate for Consistency of
or Cronbach’s instruments where responses to items
Alpha the scoring is not
dichotomous such
as scales.
Takes into
consideration the
variance of each
item
Other things being equal, the longer the test, the more reliable it will be.
Lengthening a test, however, will only increase its consistency in terms of content
sampling, not its stability over time. The effect that lengthening or shortening a test will
have on its coefficient can be estimated by means of the Spearman-Brown formula.

Spearman Brown formula is used to correct the split-half reliability estimates.

- Provides a good estimate of what the reliability coefficient would be if the two halves
were increased to the original length of the instrument.
Standard Error of Measurement (Whiston, 2000)
 Is an estimate of the standard deviation of a normal distribution of scores that

would presumably be obtained if a person took the test an infinite number of
times
 It provides a band or a range of where a Psychologist or Counsellor can expect a

client’s “true score” to fall if he is to take the instrument over and over again
 The mean of this hypothetical score distribution is the person’s true score on the
test. If a client took a test 100 times, we would expect that one of those test
scores would be his or her true score.
 Depending on the confidence level that is needed, standard error of

measurement can be used to predict where a score might fall 68%, 95% or
99.5% of the time.
The formula for calculating the standard error of measurement (SEM) is:
SEM= s√ 1- r
Where: s represents the standard deviation and r is the reliability coefficient
Question:
Given this information, how would you help Anne, if you are the counsellor?
If Anne is applying to a graduate program that only admits students with GRE-V scores
of 600 or higher, what are her chances of being admitted?
As a Psychologist or Counselor, one might assist Anne in examining her GRE scores and
considering other options or other graduate programs.
TEST VALIDITY
The degree to which a test measures what it purports (what it is supposed) to measure
when compared with accepted criteria. (Anastasi and Urbina, 1997).
TYPES OF VALIDITY
Types Purpose/Description Procedure Types of Tests
CONTENT To compare whether Compare test -Survey

the test items match blueprint with achievement

the set of goals and the school, tests
objectives; course, program -Criterion-
-if the test items are objectives and referenced tests
representative of the goals. -Essential skills
defined universe or tests
content domain that Have panel -Minimum
they are supposed to experts in -level skills tests
measure. content area - State
- Concern is on test (e.g. teachers, assessment tests
items (content), professors), to -Professional
objectives, and format. do the following: licensing exams
-Aptitude Tests
-Examine
whether the
items represent
the defined
universe or
content domain.
- Utilize
systematic
observation of
behavior
(observe skills
and
competencies
needed to
perform a given
task;
CRITERIONRELATED To predict performance Use a rating, Aptitude Tests
on another measure or observation or
to predict an another test as Ability Tests
individual’s behavior in criterion.
specified situations
Concurrent - criterion measure Correlate test Personality Tests

obtained as same time scores with Employment
criterion measure Tests
obtained at the Achievement
same time. tests certification
example: Test tests
correlated with
supervisory

ratings of the
worker’s
performance
conducted at the
same time
CRITERIONRELATED -criterion measure is to Correlate test -Scholastic
be obtained in the scores with aptitude tests
future. criterion measure -General
obtained after a aptitude
period of time. batteries
Predictive -Goal is to have test -Prognostic tests
scores accurately -Readiness tests
predict criterion Ex. Predictive -Intelligence
performance identified. validities of tests
Admission tests
CONSTRUCT To determine whether Conduct Intelligence tests
a construct exists and multivariate Aptitude Tests
A construct is not to understand the statistical Personality Tests
directly observable but traits or concepts that analysis such as
usually derived from makes up the set of factor analysis,
theory, research or scores or items. discriminant
observation. analysis, and
-The extent to which a multivariate
test measure a analysis of
theoretical construct or variance.
trait. Such as
intelligence, -Requires
mechanical evidence that
comprehension, and support the
anxiety. interpretation of
test scores in line
-involves gradual with “theoretical
accumulation of implications
evidence associated with
the construct
label.
-The authors
should precisely
define each
construct and
distinguish it
from other
constructs.

Validity Coefficient – is the correlation between the scores on an instrument and the
correlation measure.
ITEM ANALYSIS
A general term for procedures designed to assess the utility or validity of a set of
test items.
 Validity concerns the entire instrument, while item analysis examines the
qualities of each item.
 It is done during test construction and revision; provides information that can be
used to revise or edit problematic items or eliminate faulty items.
Item Difficulty Index

An index of the easiness or difficulty of an item
• it reflects the proportion of people getting the item correct, calculated by dividing the
number of individuals who answered the item correctly by the total number of people.
p =number who answered correctly

total number of examinees
 item difficulty index can range from .00 (meaning no one got the item correct) to
1.00 (meaning everyone got the item correct.
 item difficulty actually indicate how easy the item is because it provides the
proportion of individuals who got the item correct.
Example: in a test where 15 of the students in a class of 25 got the first item
on the test correct.
p = 15 = .60
25
 the desired item difficulty depends on the purpose of the assessment, the group
taking the instrument, and the format of the item.
Item Discrimination Index
A measure of how effectively an item discriminates between examinees who

score high on the test as a whole (or on some other criterion variable) and those
who score low. (Aiken 2000).
I. Extreme Group Method

 Examinees are divided into two groups based on high and low scores.

 Calculate by subtracting the proportion of examinees in the lower group from the
proportion of examinees in the upper group who got the item correct or who
endorsed the item in the expected manner.
 Item discrimination indices can range from + 1.00 (all of the upper group got it
right and none of the lower group got it right) to – 1.00 (none of the upper
group got it right and all of the lower group got it right)
 The determination of the upper and lower group will depend on the distribution
of scores. If normal distribution, use the upper 27% for the upper group and
lower 27% for the lower group (Kelly,1939). For small groups Anastasi and
Urbina (1997) suggest the range of upper and lower 25% to 33%.
 In general, negative item discrimination indices, particularly and small positive

indices are indicators that the item needs to be eliminated or revised.
ITEM RESPONSE THEORY (IRT) OR LATENT TRAIT THEORY
• Theory of test in which item scores are expressed in terms of estimated scores on a
latent-ability continuum.
• it rests on the assumption that the performance of an examinee on a test item can be
predicted by a set of factors called traits, latent traits or abilities.
• using IRT, we get an indication of an individual’s performance based not on the total
score, but on the precise items the person answers correctly.
• it suggests that the relationship between examinees’ item performance and the
underlying trait being measured can be described by an item characteristic curve.
Item characteristic curve

A graph, used in item analysis, in which the proportion of examinees passing a specified
item is plotted against total test scores.
• Item response curve is constructed by plotting the proportion of respondents who

gave the keyed response against estimates of their true standing on a uni-dimensional
latent trait or characteristic. An item response curve can be constructed either from the
responses of a large group of examinees to an item, or if certain parameters are
estimated from a theoretical model
Rasch Model – one parameter (item difficulty) model for scaling test items for
purposes of item analysis and test standardization.
- The model is based on the assumption that indexes of guessing and item
discrimination are negligible parameters. As with other latent trait models, the Rasch
model relates examinees’ performances on test items (percentage passing) to their
estimated standings on a hypothetical latent-ability trait or continuum.

References
Anastasi, Anne and Urbina, Susana (1997).Psychological Testing. 7th edition, New York:
McMillan Publishing.
Aiken, Lewis R. (2000) Psychological Testing and Assessment. Boston: Allyn and Bacon
Inc.
Cohen, Ronald Jay &Swerdlik, Mark E. (2010). Psychological Testing and

Assessment.New York: McGraw-Hill Companies, Inc.
Cohen, Ronald Jay &Swerdlik, Mark E. (2018). Psychological Testing and Assessment.9th
Edition, New York: McGraw-Hill Companies, Inc.
Cronbach, Lee J. 1984. Essentials of Psychological Testing.4 th edition. Harper and

Row Publishers. New York.
Del Pilar, Gregorio H. (2015) Scale Construction: Principles and Procedures, Workshop
powerpoint presentation. AASP-PAP, 2015, Cebu City
Drummond, Robert J. (2000). Appraisal Procedure for Counselors and Helping

Professional. 4th edition, New Jersey: Prentice Hall.
Friedenberg, Lisa (1995). Psychological Testing: Design, Analysis and Use.Boston.Allyn

and Bacon Inc.
Groth-Marnat, Gary (2009) Handbook of Psychological Assessment 5th edition.John
Wiley and Sons Inc.
Kaplan, Robert M. And Sacuzzon, Dennis P. (1997) Psychological Testing: Principles and
Application and Issues. 4th edition, California: Brooks/Cole Publishing Company.
Kellermen, Henry and Burry, Anthony (1991) Handbook of Psychological Testing.2nd

edition, Boston:Allyn and Bacon Inc.
Murphy, Kevin R. and Davidsholer, Charles O. (1998) Psychological Testing: Principles

and Application. New Jersey: Prentice Hall Inc.
Newmark, Charles S. (1985) Major Psychological Assessment Instruments. Boston: Allyn

and Bacon.
Orense, Charity and Jason Parena (2014) Lecture in Psychological Assessment, Review
Manual in RGC Licensure Examination, Assumption College, Makati.

Suba, Elizabeth S. (2014) Lecture (powerpoint) in Psych 140 Psychological Assessment,

CLSU, Nueva Ecija.
Suba, Elizabeth S. (2013) Lecture (powerpoint) in GU 722 Psychological Assessment,

CLSU, Nueva Ecija
Suba, Elizabeth S. (2005) Lecture notes in Assessment Tools in Counseling.DLSU.

(unpublished).
Walsh, w. Bruce and Bets, Nancy E. (1995) Test Assessment. New Jersey: Prentice Hall
Inc.
Morrison, J. (2014). DSM-5 Made Easy. The Clinician’s Guide to Diagnosis.The Guilford
Press. New York.
Nolen-Hoeksema, S. (2014). Abnormal Psychology (6th Ed.). Mcgraw-Hill. New York,

NY.
Sarason, I.G. &Sarason, B.R. (2005). Abnormal Psychology.The Problem of Maladaptive
Behavior (11th Edition). Pearson Prentice Hall. New Jersey.
Others:
Manual of psychological tests
Psychological Resources Center – test brochures and test descriptions.
www.AssessmentPsychology.com
Microsoft Word - Ethical Guidelines- Final _as 9 August_.doc (pap.ph)

MODULE 2 Handout

Uploaded by

Copyright:

Available Formats

MODULE 2 Handout

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MODULE 2 Handout

Uploaded by

Copyright:

Available Formats

Psych 3140 Psychological Assessment

Central Luzon State University

Instructional Module for the Course

Topic 1: Principles of Psychological Testing

Edited by the Instructor for further examples and explanation.

Upon the completion of this module, you are expected to:

1. Describe the basic principle of Psychological Assessment and Psychological Testing.

II. Learning Activities

MEASUREMENT AND STATISTICS

Statistical Interpretation of Test Scores

Descriptive Statistics - procedures used to summarize and describe a set of data in

Inferential Statistics - procedures used in drawing inferences about the properties

Variable- a factor, property, attribute, characteristic, or behavior dimension along

Examples: Physical dimension- length or weight

Edited by the Instructor for further examples and explanation.

Edited by the Instructor for further examples and explanation.

The difference between determine the relative extent of

It measures the extent of

Edited by the Instructor for further examples and explanation.

Raw Score- scores obtained directly from test performance.

Transformed scores or Derived Scores- it is a score/s resulting from the

Describing Score Distributions

Frequency Distribution- is a technique for systematically displaying or representing

1. Central Location or Central Tendency- refers to a value or measure near the

 Standard Deviation- most commonly used measure of variability; appropriate

Edited by the Instructor for further examples and explanation.

3. Skewness- refers to the symmetry or asymmetry of a frequency distribution.

 Positively skewed- if the larger frequencies tend to be concentrated toward

Example: if a test is difficult, scores could cluster at the low end.

Edited by the Instructor for further examples and explanation.

 Kurtosis- refers to the flatness or peakedness of one distribution in relation to

Edited by the Instructor for further examples and explanation.

NORM REFERENCED VS. CRITERION REFERENCED TESTS

In this example there is a mastery component; a predetermined cut-off score indicates

THE MEANING AND APPLICATION OF NORMS

Standardization and Norming

 Standardization - involves administering the constructed test to a large sample

 NORMS- refer to the performance of the standardization sample used in

Two essential points should be stressed:

-based interpretations could be made for a given raw score,

Edited by the Instructor for further examples and explanation.

2. Grade norms or equivalents- often used in interpreting educational achievement

For example: if the average number of problems solved correctly on an

Within Group Norms

The individual’s performance is evaluated in terms of the performance of the most

Limitations: inequality of their units, especially at the extremes of the distribution.

2 Kinds of Standard Scores Transformation

Edited by the Instructor for further examples and explanation.

Nonlinear or Normalized standard scores- expressed in terms of a distribution that

Edited by the Instructor for further examples and explanation.

Edited by the Instructor for further examples and explanation.

CEEB (College Entrance Examination Board) Score

Edited by the Instructor for further examples and explanation.

Correlation coefficient – it is a numerical index that describes the magnitude and

Coefficient of Determination – squared value of the correlation coefficient. It is the

Pearson Product-Moment Correlation or Pearson r - is the most popular

The Meaning of Correlation (Aiken, 2000)

Correlation implies predictability

-The closer the correlation coefficient is to an absolute value of 1.00 (either

Edited by the Instructor for further examples and explanation.