The art of psychological assessment is an infrequent description in the current era, (For
the purposes of this article, "assessment" refers to the administration of multiple psycho-
logical tests, instruments, or techniques, as well as behavioral observation, to obtain a
pool of data.) Contributing factors include the increasingly cost-conscious control of
managed care (Marlowe, Wetzler, & Gibbings, 1992; Moreland, Fowler, & Honaker,
1994), which encourages a results-oriented, empirical approach; the increasing use of
highly reliable computer programs in testing (Butcher, 1994; Schlosser, 1991; Tallent,
1987); the increasing scrutiny of testing in the courtroom (Matarazzo, 1990; Skidmore,
1992); and the increasing sophistication of the field itself (Ritz, 1992; Watkins, 1992;
Wetzler, 1989). Both professional texts and journal articles expound the science of assess-
ment. The role of inference, intuition, and creativity is a peripheral observation, at best.
Still, the task of interpreting data in a reliable and valid manner and assembling the data
and their interpretation into a useable format for the client remains an art.
In giving expert testimony, I have sometimes mused on my response if an astute
attorney asked, "Doctor, would you please describe to the court the reliability, validity.
The American Psychological Association bas for over a decade tnoved toward increas-
ingly specific standards of conduct. The Standards for Educational and Psychological
Measurement (American Educational Research Association, American Psycbological Asso-
ciation, and National Council on Measurement in Education, 1985), were closely fol-
lowed by the Guidelines for Computer-Based Tests and Interpretations (American
Psycbological Association, 1986). By 1991, issues in forensic psychology had become
sufficiently complex so that a special committee published tbe Specialty Guidelines for
Forensic Psychologists (Committee on Ethical Guidelines for Forensic Psychologists,
1991). In 1992, the American Psychological Association released tbe revised Ethical
Principles of Psychologists and Code of Conduct. Following tbis expansion of tbe code,
the American Psychological Association published Record-Keeping Guidelines (1993),
and Guidelines for Child Custody Evaluations in Divorce Proceedings (1994).
The increasing specificity of these codes serves to comprehensively address the eth-
ical dilemmas tbat routinely confront psycbologists. A simultaneous disadvantage to such
specificity is tbe potential for psychologists to overlook ethical dilemmas that are not
directly addressed. Aspirational goals, because of their idealistic and global format, may
be ignored or deemed inappropriate to the current situation.
Available information routinely addresses issues of test ethics. Adequacy, adminis-
tration, bias, and security are all concems that authors (Keith-Spiegel & Koocher, 1985;
Koocher, 1993) and ethical codes and principles delineate. However, a more global eth-
ical concern, wbich moves into the area of assessment, is competence. As Weiner (1989)
noted, "To sustain ethical practice of psychodiagnosis, psychologists need to combine
good judgment with competence sustained by constant attention to newly emerging infor-
mation concerning what tests can and cannot do" (p, 830), Demonstrated competence or
expertise using a single test does not translate to competence in integrating data from
several assessment instruments, and psychologists are ethically compelled to practice
within the limits of the scope of our knowledge. Each evaluator, then, must answer the
question at a personal level: What principle(s) guide the selection of instruments for an
assessment? Further, by the combination of these instruments, what data can be obtained
or clarified that would otherwise be missing or vague?
Acting in the best interest of the client, indeed at times acting as an advocate for the
client, involves careful thought as to the potential long-term consequences of the data to
be obtained. This is a point of particular concern. For example, even if a computer-
generated narrative report is not considered a permanent part of the record and is destroyed,
an answer sheet remains as raw data, and can be obtained by a non-mental health pro-
fessional under certain conditions (Skidmore, 1992). Therefore, in the context of the
assessment, the psychologist must decide how much information regarding the client
needs to be available. Is the MMPI-2 (Butcher & Williams, 1992), with its plethora of
scales and interpretive possibilities the most appropriate instrument if the needed infor-
mation can be obtained from the Brief Symptom Inventory (Derogatis, 1993)? The Hare
Psychopathy Checklist-Revised (Hare, 1991) may provide useful information but also
highlights possible psychopathic behaviors. Would the MMPI-2 in this case serve equally
well, with less risk of a negative label for the client, should the report or raw data be
obtained by others less familiar with assessment techniques?
The question of amount of raw data obtained raises an ethical question in itself. For
psychologists working in outpatient settings, third-party payors demand increasing cost-
effectiveness via the minimal assessment necessary. The psychologist must carefully con-
sider a rationale not only to include the individual instrument in the battery, but to determine
what information that instrument uniquely and conjointly contributes to the assessment.
The obvious detriment of such an approach is the potential to omit much-needed data
because of the financial constraints imposed by a disinterested third party. On the other
hand, such limitations may serve as an impetus for the psychologist to increase the range
of assessment techniques used, and to better understand the efficacy of an individual tool.
The ethical issues surrounding an assessment have yet to be clearly codified, as the
aforementioned examples suggest. In no way does the lack of specific code expectations
exempt the psychologist from the need to consider the assessment in the broad context of
ethical behavior, advocating both in the present and potentially in the future for the best
interests of the client.
Psychological assessment itself remains a hotly debated pursuit, with little consensus in
the field. Although the majority of experts consider the assessment enterprise more com-
plex than testing (Beutler & Rosner, 1995; Matarazzo, 1990; Tallent, 1987; Zeidner &
Most, 1992), the view also prevails that testing and assessment are interchangeable (Hood
& Johnson, 1991; Sugarman, 1991) or that assessment is an ongoing activity in any
interaction with the client (Spengler, Strohmer, Dixon, & Shivy, 1995). Projective tech-
niques are enjoying a renaissance (Bellak, 1992; Watkins, 1994); projective techniques
are in decline (Goldstein & Hersen, 1990) and may even be unethical due to the deception
perpetrated on the client (Schweighofer & Coles, 1994), Despite the emphasis on more
focused and goal-oriented assessments (Wetzler, 1989) the call for a broader, psycho-
analytic perspective remains strong (Jaffe, 1990,1992),
This difference of opinion may well refiect the healthy state of psychological assess-
ment (Masling, 1992). Surveys of practicing psychologists repeatedly acknowledge the
routine use of assessment instruments (Archer, Maruish, Imhof, & Piotrowski, 1991;
Piotrowski & Keller, 1989; Piotrowski & Lubin, 1990; Watkins, Campbell, Nieberding,
& Hallmark 1995) and even suggest the significant amount of clinical time devoted to the
administration and interpretation of these tests (Ball, Archer, & Imhof, 1994). These
surveys indicate that a broad range of assessments (using both projective and objective
techniques) is common.
The student or novice practitioner of assessment finds numerous articles and texts
that describe the use of the psychological test. Important initial steps include the devel-
opment ofthe assessment question (Hood & Johnson, 1991), and outcome goals (Karoly,
1993). The practitioner is also reminded ofthe importance of using reliable and valid test
instruments (Beutler & Rosner, 1995; Smith & McCarthy, 1995; Zeidner & Most, 1992)
with appropriate normative standards (Nelson, 1994). (More advanced literature addresses
the potential for clinical observation from techniques designed for other purposes as
well—e.g., Kaufman, 1994.) The scientific literature carefully addresses appropriate stan-
dards for development, review, and use of the individual assessment instrument. The
combination of data from these instruments remains in the realm of clinical judgment.
Theoretical orientation has a minimal influence on instrument choice; the psychol-
ogist is likely to use a prescribed series of tests learned in graduate training (Fischer,
1992; Marlowe et al., 1992; Stout, 1992; Watkins, 1991). This reliance on a handful of
instruments is met with tempered skepticism. Wetzler (1989) noted, "No matter what the
referral question, they [psychologists] administer the standard battery. Their loyalty to
the standard battery is based on 40 years of clinical experience from which there now
exists a large body of knowledge on intensive personality assessment" (p. 7). Others are
more critical of current practices: "By insisting that we confront such perennial problems
as ovednterpretation, descriptor fallacy, and pseudoparallelism, our goal is the presenta-
tion of clinical data that is useful and not misleading" (Rogers, 1995, p. 295).
Problems arise because no empirical approach is available to determine the appro-
priateness of interpretations gleaned from a battery of assessment techniques. The psy-
chologist aspiring to a valid assessment battery can apply rigorous empirical standards to
the selection of individual tests. However, at the next level—combining these tests—the
psychologist must rely on personal experience or tum to another psychologist in a mentor
role for guidance. The common opinion suggests that the majority of psychologists fail to
adequately perform this task (e.g., Spengler et al., 1995).
This system of ensuring accuracy or reliability in the assessment is relatively weak.
And yet, the very reliance on a handful of techniques, which is heavily criticized, may
serve as a stabilizing force to ensure reliability. If, for example, the Wechsler scales (e.g.,
Wechsler, 1981), the MMPI-2, and the Rorschach Inkbiots are repeatedly administered as
a standard battery, the psychologist develops an expectation for the normative perfor-
mance. Marked deviations from that performance may reflect important clinical issues,
similar to Exner's (1993) hypothesis that marked deviations in form quality on the Ror-
schach refiect symbolically significant distortions of reality. The assessment is held to a
level of reliability for the individual psychologist, if not at a more global level.
As noted earlier, psychologists often fail to consider the importance of theoretical
orientation in the integration and interpretation of assessment data. Psychoanalytic/
dynamic theories provide the most comprehensive perspective (Jaffe, 1990, 1992; Sug-
arman, 1991). However, psychologists with other orientations can and routinely do make
equally valid use of the information gleaned in an assessment. Problems arise as a psy-
chologist fails to report assessment results within an overarching theoretical framework.
For example, behavioral observations of the client may be based on observable, clearly
reported behaviors (e.g., "The client was early for the appointment, sat quietly, smiled
and made appropriate eye contact on greeting, and was cooperative throughout the bat-
tery of tests"). At the same time, an objective self-report inventory, with a strong empha-
sis on internally reported state, may suggest the presence of a severe depression and be
duly noted. If the behavioral observations reflected greater inference of the internal state
(e.g., "The client was early for the appointment and appeared somewhat anxious. She
was soft-spoken and seemed somewhat fatigued even before the testing began, despite
cooperation with all tasks"), the behavioral observation and computer-generated inter-
pretations might more readily match.
Critics of assessment techniques as practiced today also address the issue as if there
are two types of assessments: good assessments that accurately portray the client, the
client's issues, or both, and bad assessments that paint an inaccurate portrayal. Few cri-
tiques address the possibility that the reliability of assessments forms a distribution. It is
to be hoped that the distribution is normal, with the majority of assessments falling in an
acceptable midrange and not a positive skew, suggesting a preponderance of assessments
with questionable accuracy (a negative skew seems too hopeful!). Variables that infiu-
ence the effort to determine a reliable assessment include definition of an appropriate
assessment, the changing goal of the assessment based on the task at hand, and opera-
tional definitions of accuracy. As Masling (1992) noted
We are all agreed, Psy.D. and Ph.D., practicing psychologist and academician, that the ability
to use psychological assessment methods is a unique and valuable skill in clinical psychology.
Beyond that there is considerable controversy. It is something psychologists are uniquely
qualified to do, but what it is intended to do and how well we do it remain unspecified, (p. 53)
The validity of a psychological test refers to its usefulness in a number of domains (for an
excellent review, see Cichetti, 1994). Does the content of the test adequately sample the
state or trait to be measured? Does the test appear to the client to measure what it purports
to measure; that is, does it exhibit face validity? Can the proposed factors or variables be
demonstrated; that is, does the test exhibit construct validity? Compared to a similar
measure or behavioral sample, does the correlation indicate a robust construct (does it
exhibit criterion-oriented validity)? Compared to a later measure or behavioral sample,
does the correlation indicate predictive power (does it exhibit predictive validity)? Does
the construct differ from unrelated constructs (discriminant validity), yet correlate with
related constructs (convergent validity)? These are the long-tested trials of validity applied
to the psychological test (Cichetti, 1994; Foster & Cone, 1995; Haynes, Richard, & Kubany,
1995; Messick, 1994).
In recent years, validity has grown beyond the individual test instrument. Although
not directly addressing the validity of a psychological assessment or battery of tests or
techniques, the arguments put forth in favor of a broader interpretation do hold promise
for this neglected issue.
Foster et al. (1995) referred indirectly to the broader consequences of administering
and interpreting a battery of tests: "Consequential validity goes beyond whether the mea-
sure fulfills its intended purposes and asks the larger question of whether it is consistent
with other social values" (p. 248). Thus, the validity of a measure extends beyond its
power to sample a unitary construct in an appropriately representative manner; the mea-
sure must also conform to expected social values that respect the rights of the client.
Messick (1994, 1995) approached the same concem from a different angle, arguing for
unified validity. The traditional evidence of validity is supplemented by consideration of
the interpretive or applied outcome ofthe test (Messick, 1994, 1995).
These views reflect the expanding domain of validity, particularly toward the appro-
priate use of an instrument. Such views have less to do with the instmment itself than the
application of the data obtained from the instmment. Application of that data within a
battery of tests then refers to its use in the context of a larger goal or purpose.
Cross-cultural psychological testing expands validity concems in still another direc-
tion directly applicable to the assessment issue (e.g., Kehoe & Tenopyr, 1994). Tests
using norms developed with Caucasian Americans may be inappropriately used with
African Americans due to a lack of understanding of African American culture, insuffi-
cient rapport, and subtle differences in item interpretations between African Americans
and Caucasians (Bryan, 1989). Too often, adaptation of tests for persons of another cul-
ture and language refers to translation of test items, with little concem for the integrity of
meaning in the translation, context of items, and appropriate standardization (Geisinger,
1994). Further, cross-cultural assessment must consider the continuum of acculturation,
from extremely traditional, with few ties to the dominant culture, to largely acculturated,
with few ties to traditions (Dana, 1995, 1996).
Computer-generated psychological tests have created yet another area of concem,
which again relates to the validity of the assessment battery. Computers may bestow tests
administered or interpreted through them with greater authority than they actually pos-
sess, due to the sophisticated scoring and reports generated (Butcher, 1994). Yet much of
computer-generated data relies on clinical interpretation and inference (Tallent, 1987).
Critics point to tbe need for criterion validity for computer measures and to the need to
establish a level of similarity with expert clinical opinion for interpretive programs (Butcher,
1987; Honaker & Fowler, 1990). Indeed, Butcher (1987) has noted, "The development of
valid psychological measures has lagged behind the rapid innovations in computer tech-
nology, and research on combining various psychological measures is rudimentary at
best" (p. 11).
Computers, with the narrative descriptions so frequently provided, give the impres-
sion of a tremendous increase in the information available from an individual instmment.
The temptation to administer an assessment of relatively brief duration that yields a rich
set of data (or perhaps more appropriately, a well-constmcted narrative report) can be
strong. Whether the information obtained is most appropriate in the context of the assess-
ment goals and whether the information is over or undemtilized is another matter entirely.
There is no empirical measure of the validity of a battery of tests. Indeed, as the
aforementioned discussions suggest, there is no uniform definition for the validity of a
battery of tests. Considering the context of the assessment and, subsequently, the context
of the tests to be used in that assessment serves as a means of maintaining appropriate
consequential (Foster & Cone, 1995) and unified (Messick, 1994) validity, a first step. At
a fundamental level, psychologists practice this routinely. Few would administer a bat-
tery of personality tests to a client referred to determine level of intelligence and aca-
demic skills, for example. At a more complex level, however, psychologists are asked to
make determinations of much finer discrimination. For example, what is the age of the
client referred for intelligence testing? Has the client personally requested testing, or has
a tbird party referred the assessment? What is actually needed—a breakdown of cogni-
tive strengths and weaknesses, or a more global measure of intelligence? Likewise, what
level of discrimination is needed in determining academic skills? Emotional and person-
ality functioning? These questions serve to place the assessment in context, lending valid-
ity to the subsequently obtained data and interpretation.
1. Art rests on science. Choosing reliable, valid, and appropriate assessment tools is
fundamental to adequate assessment. Even in the process of collecting nonstan-
dardized data, particularly behavioral observations, the psychologist applies the
principles of science. This includes careful consideration of the validity of obser-
vations, testing of hypotheses, and clarification of ambiguous data.
2. An assessment is a snapshot, not a film. No matter how exhaustive the battery of
assessment techniques, no matter how many corroborative sources, and no matter
how lengthy the assessment procedure, the assessment describes a moment frozen
in time, described from the viewpoint of the psychologist. Although results may
be indicative of long-term functioning, such results are nevertheless tentative and
should be treated with this awareness.
3. The appropriate assessment is tailored to the needs of the client, the referral
source, or both. A clothing store that offered the customer an expensive one-size-
fits-all garment would not long be in business. The temptation to remain with the
familiar is an easy one to rationalize but may serve the client poorly. Maintaining
familiarity with a variety of assessment techniques allows greater freedom in
tailoring an assessment to the requisite goals.
4. The psychologist should be responsible to the client, not the computer. In an age
of computer scoring, replete with well-developed narrative descriptions, the temp-
tation to take these interpretations at face value can be overwhelming. Psycholo-
gists must consider the validity of the behavioral data in addition to the validity
scales of the test and must also consider the validity of the narrative statements
generated by the scoring program,
5. Information is power. Assessment information is life-impacting power. The psy-
chologist does well to remember the significance clients place on the written
word of the report. If the client is an individual, the psychologist may be approached
with fear, anger, or awe, depending on the client's interpretation of the report. If
the client is a third party, the psychologist may be asked to perform increasingly
difficult or even unrealistic feats with psychological assessment techniques (e.g..
Did he sexually harass his coworkers? Was she really sexually abused? Should we
place him in a residential facility?). The psychologist bears a continuing respon-
sibility to educate consumers on the appropriate uses and limitations of psycho-
logical assessment techniques.
In beginning this article, I mused on the question of an astute attorney. The response was
a brief apologia. Perhaps a better response would state, "An assessment is an art, but an
art grounded in psychological science. The expertise of the psychologist, the care to
validate the findings, and the care to ensure the ethical treatment of both these findings
and the client combine to create a rigorous and exacting standard, albeit one that has yet
to be statistically testable."
