PowerPoint Presentation Module 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

Personality Assessment

Psychology of Personality
Module 2
Structure

• Why do we need assessment?


• How can we measure personality?
• Self-report
• Observer-report
• Projective / Implicit tests
• (Digital) behavioral measures
• What makes a good assessment?
• Validity
• Reliability
• Other criteria (e.g., fakeability)
Why do we need
assessment?
Theory Measurement
“manifest” (observable)
behaviors

… Self-report
“latent” (unobservable) I feel good around people
traits
True I talk a lot Assessed
Observation
Sociability I have many friends Sociability

I like to go to parties
… Tests
Types of measures
Types of measures

Self-report

Observer-report

Projective Test

Behavioral measures
Self-report questionnaires
Self-report questionnaires

Some information is only known to us

• Questionnaires are most frequently used personality assessment


• Easy and cheap to administer and score!
• Length typically varies from 10 (TIPI) to 240 (NEO-PI-R)
statements/items
Example self-report
I see myself as someone who …
1. …is talkative
10.…works in a thorough fashion
2. …is full of energy 11. …is a reliable worker
3. …evokes enthousiasm 12.…persists until the job is done
4. …is moody 13.…is cooperative
5. …can be tense 14.…is forgiving
6. …worries a lot 15.…trusts people in general Write down your responses
1 = Strongly disagree
7. …is original, full of new ideas
2 = Disagree
8. …is curious for many new things 3 = Undecided
9. …is intelligent, a thinker 4 = Agree
5 = Strongly Agree
Example self-report
1. …is talkative
Write down your responses
2. …is full of energy
1 = Strongly disagree
3. …evokes enthousiasm 2 = Disagree
4. …is moody 3 = Undecided
5. …can be tense 4 = Agree
6. …worries a lot 5 = Strongly Agree
7. …is original, full of new ideas
8. …is curious for many new things
9. …is intelligent, a thinker Scores:
10.…works in a thorough fashion Extraversion = 1+2+3
11. …is a reliable worker
12.…persists until the job is done Neuroticism = 4+5+6
13.…is cooperative Openness = 7+8+9
14.…is forgiving
15.…trusts people in general Conscientiousness = 10+11+12
Agreeableness = 13+14+15
Scores and Norms

Very low low average high Very high

Extraversion 3-7 7-9 9 - 11 11 - 13 13 - 15

Neuroticism 3-5 5-7 7 - 10 10 - 12 12 - 15

Openness 3-8 8 - 10 10 - 12 12 - 14 14 - 15

Conscientiousness 3-7 7-9 9 - 11 11 - 13 13 - 15

Agreeableness 3-8 8 - 10 10 - 12 12 - 13 13 - 15
Scores and Norms

• Raw scores in
themselves are not very
informative
• Scores are thus often
compared to a norm
sample
• The quality (e.g., size;
representativeness) of
the norm sample is
important
Scores and Norms

How to report scores:


• Z-scores (SD from mean):
• very low < -2
• low < -1
• average > -1 & < 1
• high > 1
• very high > 2
• T-scores (transformed z scores with
mean = 0 and SD = 10)
• Note: IQ-scores are like z/T-scores with
a mean = 100 and SD = 15
• Percentiles from 0 to 100%
The Big Five Inventory 2

Big Five measure


• 60 items
• 15 facets
• Item keys balanced
• Available for free
• Good reliability &
validity
Scores and Norms: BFI-2

Percentlile 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Extraversion 12 29 33 35 37 39 41 43 46 49 60
E Sociability 4 8 10 11 12 13 14 14 16 17 20
E Assertiveness 4 9 10 11 12 13 14 14 16 17 20
E Energy Level 4 10 12 12 13 14 15 15 16 17 20
Issues of self-report questionnaires

• High cognitive demand (e.g., for children, seniors)


• Participants’ lack of introspection / self-reflection
• Participants lack of motivation (careless responding)

Strongly Strongly
I am… Disagree Neutral Agree
disagree agree
friendly O O O O O
kind O O O O O
helpful O O O O O
hard-working O O O O O
Issues of self-report questionnaires

Response styles
• Acquiescence: Agreeing to statements in general (choosing agree or strongly
agree) – can be addressed by using negatively keyed (reverse coded) items
• Extreme response style: Preferring extreme categories (strongly disagree or
agree)
• Middle response style: Preferring middle categories (disagree, neutral, agree)
Strongly Strongly
I am… Disagree Neutral Agree
disagree agree
Friendly O O O O O
Rude (neg. keyed) O O O O O
Helpful O O O O O
Quarrelsome (neg. keyed) O O O O O
Issues of self-report questionnaires

• Social desirable responding / positive self-evaluation tendencies


• Higher agreement to desirable items like “helpful”, “fair”, “hard-working”
(generally increases A & C scores, decreases N scores)
• Faking: Self-report in high-stakes assessment (e.g., job selection) is
easy to fake (ability to fake good in this context is related to job
knowledge and intelligence)
Strongly Strongly
I am… Disagree Neutral Agree
disagree agree
Friendly O O O O O
Organized O O O O O
Hard-working O O O O O
Stressed out easily O O O O O
Situational Judgment Test

• Harder to fake
• Allows for situational
embeddedness
• Can help with introspection
But:
• More time consuming
• Complex to score
• Difficult to write for all Big
Five
Situational Judgment Test

Options & scoring (usually)


done by experts
Extraversion scores:
0
1
2
3
Forced Choice

Ideally: pairs of equally desirable


statements representing different traits Pick the statement that describes you best

Very hard to fake! I am…


Eliminates response tendencies … social. O O … diligent.
But: … organized. O O … helpful.
(Usually) only allows for intra-individual … stress
ranking of traits (I am more agreeable … kind. O O resistant.
than extraverted), but no comparison … calm. O O … curious.
between people (I am more agreeable
than Peter) … creative. O O … assertive.
Forced choice scoring
I am…
EX … social. −1 1 … diligent. CO
CO … organized. 1 −1 … helpful. AG
AG … kind. −1 1 … stress resistant. ES (NE-)
ES (NE-) … calm. −1 1 … curious. OP
OP … creative. 1 −1 … assertive. EX

Personality profile
2
1
0
EX CO AG ES OP
-1
-2
Ranked response format (Q-sort)

• Statements/adjectives have
to be ranked from least to
most characteristic
• Number of statements per
category is limited
• Similar strengths and issues
as forced-choice
• Makes respondents read the
statements more than once
Rank your fundamental life goals
Experience sampling

• Short self-report questionnaires administered


via the smartphone several times per day (or
once per day; daily diary) for several
days/weeks
• Can be used to measure current states:
• I am sad right now
• I am talkative right now
• I am curious right now
• But also to measure current situation to study
personality vs. situation effects
Experience sampling
Experience sampling

• Very useful approach to study personality (and situation effects) in


day-to-day ABCDs
• Experience sampling can be combined with smartphone tracking
(e.g., gps; used apps) to get more information about situation /
behavior
• Similar weaknesses as (short) self-report questionnaires (e.g.,
response styles; desirable responding)
• Further issues: Participant drop-out / non-response to questionnaires
/ limitations to number of items that can be asked
Other-report
questionnaires
Other-report questionnaires

People constantly judge and rate others

• Self-report questionnaires can easily be administered as other report


(he/she is…)
• Can address some issues with response biases and social
desirability/faking
• The more raters (e.g., friends, partner, parents, colleagues), the
better!
Example observer-report
I see the person sitting next to me as someone who …
1. …is talkative
10.…works in a thorough fashion
2. …is full of energy 11. …is a reliable worker
3. …evokes enthousiasm 12.…persists until the job is done
4. …is moody 13.…is cooperative
5. …can be tense 14.…is forgiving
6. …worries a lot 15.…trusts people in general Write down your responses
1 = Strongly disagree
7. …is original, full of new ideas
2 = Disagree
8. …is curious for many new things 3 = Undecided
9. …is intelligent, a thinker 4 = Agree
5 = Strongly Agree
I see the person sitting next to me as someone
who …
1. …is talkative
Write down your responses
2. …is full of energy
1 = Strongly disagree
3. …evokes enthousiasm 2 = Disagree
4. …is moody 3 = Undecided
5. …can be tense 4 = Agree
6. …worries a lot 5 = Strongly Agree
7. …is original, full of new ideas
8. …is curious for many new things Scores:
9. …is intelligent, a thinker Extraversion = 1+2+3
10.…works in a thorough fashion
11. …is a reliable worker Neuroticism = 4+5+6
12.…persists until the job is done Openness = 7+8+9
13.…is cooperative
14.…is forgiving Conscientiousness = 10+11+12
15.…trusts people in general Agreeableness = 13+14+15
Scores and Norms

Very low low average high Very high

Extraversion 3-7 7-9 9 - 11 11 - 13 13 - 15

Neuroticism 3-5 5-7 7 - 10 10 - 12 12 - 15

Openness 3-8 8 - 10 10 - 12 12 - 14 14 - 15

Conscientiousness 3-7 7-9 9 - 11 11 - 13 13 - 15

Agreeableness 3-8 8 - 10 10 - 12 12 - 13 13 - 15
Self-other correlations by rater

Conelly & Ones 2010


• Correlations between self and others
are around r = .40 to .50 for all traits (0
= no agreement; 1 = perfect
agreement)
• Family, close friends & cohabitators are
usually the best raters (in terms of
agreement with self)
Table not exam relevant
Self- vs. Other-report

• Do self- and other-reports measure the same thing?


• Only partly: Self- and other-reports in combination predict behavior
or academic/work success better than alone (Back et al., 2009;
Conelly & Ones, 2010)
• Some things are only known to us (e.g., how worried we really are)
• But we may not be aware of some things we do / how we are
perceived by others
Self- vs. Other-report

• Other-report can also address/balance out self-report response style,


self-evaluation and social desirability issues
• But has its own set of biases (response bias of rater; faking of rater;
negative vs. positive relationship between rater and ratee;
honeymoon effect)
• A combination of self-report and several other-reports is the gold
standard in personality measurement!
Projective and Implicit
Tests
“Objective” Tests
Raven Matrix to measure fluid intelligence
• Tests are the gold standard in
intelligence research
• “Objective” test of intelligence –
not possible to pretend to be
smarter than one is
• For self-report its easy to pretend
to be more Conscientious etc.
(e.g., job application)
Projective and Implicit Tests

How can we build “objective” tests for personality?

Some ideas:
• Implicit association test (from social psych.)
• Rorschach inkblots (from clinical psych.)
• Thematic Apperception test (from social)
Implicit Association Test (IAT)

Reaction time test


Used in social psychology to
measure “undesirable” attitudes
(e.g., racism)
Idea:
When “opposing (i.e., in your own
perception)” words are paired
(good & black), reactions to the
stimuli are slower than for
“congruent” pairs (white & good)
IAT idea:

Implicit (racist) connections:

Black Bad

White Good

40
Personality IAT idea:
sociable
Implicit self-concept:

I/me outgoing

They/them shy

41
passive
Personality Implicit Association Test

• Back et al., 2009: Neuroticism and Extraversion IATs associated with


self-report scales and predicted some behaviors
• Schmuckle et al., 2008: Only Extraversion & Conscientiousness IAT
associated with self-report
• Falk et al., 2013: Generally no associations between different IATs

-> Evidence is rather weak


Projective Tests

Rohrschach Thematic Apperception Test


Projective Tests

What do you see here?


What is happening in the picture?

The interpretation of ambiguous stimuli


will indicate hidden motives/ needs/
desires/ conflicts
E.g., is the person brooding (Neu), thinking
about work (Con), reflecting on life (Ope),
or being lonely (Ext)?
Rohrschach

Responses need to be coded (e.g., Morey & McCredie, 2019)


Rohrschach

Rohrschach usually used to


identify personality disorders

Morey & McCredie, 2019:


Some weak associations of
“pathological responses” with
low openness, agreeableness and
high neuroticism self-report
But findings weak in general…
Thematic Apperception Test

Serfass & Sherman, 2013:


Some response patterns associated
with self-reported Neuroticism &
Openness in a “meaningful” manner

But findings weak…


Issues of projective tests

• What do they really measure? Can they really explain real behaviors?
• Interpretations of same stimulus can vary across repeated
measurement occasions (not ideal for “stable” personality)
• Time consuming and costly to administer
• Very difficult to score (good coding sheets required)
Behavioral Measures
Behavioral Measures

Assessing personality where it matters – in the actual behaviors


• Instead of measuring “self-concept” or “reputation”, the idea
is to measure real personality-related behavior
• Often done as lab observations (e.g., let people do a group-
task and observe whether someone emerges as leader ->
high Extraversion)
• Issue: Very narrow behaviors, lab setting not natural
• New advancements in Big Data: smartphone tracking, social
media usage, spending data
Electronically Activated Recorder (EAR)

• Audio recording device (now app)


• Records 30-50s of audio every 12 minutes
• Also tracks GPS position at time of audio
• GPS & Audio used to identify a wide set of behaviors:
• Being alone vs a group
• Talking so someone
• Being outside / at work / in a restaurant / at home
• Watching TV / listening to music (and what kind)
• …
Electronically Activated Recorder (EAR)

Example: Daily (hearable) behavior of


Narcissists (Holtzman et al., 2010)
80 students were tracked over 4 days with 30s
recordings every 12 min
Students with higher Narcissism scores …
• Were more often socializing / in groups
• Were more often arguing
• Were less often in class
• Used sexual language more often
EAR
Slide not exam relevant

Example: EAR and


Extraversion (Tackman
et al., 2020)

Despite only audible


behavior, many different
possibilities!

But very work intensive


coding…
Smartphone usage (Stachl et al., 2017)

Conscientiousness:
Uses news apps, keeps battery > 60%,
starts phone at same time in the morning,
sleeps at same time, many contacts during
week, checks weather app frequently
Extraversion:
Many calls (also at night), listens to dance
music, battery not fully charged, uses social
apps frequently, several email programs

https://www.zeit.de/2021/07/smartphone-apps-
musik-akku-persoenlichkeit-psychologie-studie
Social media language use (Schwartz et al. 2013)
Social media language use High Neuroticism
based on trait levels
Low Neuroticism
High Extraversion

Low Extraversion
Personality and facebook likes

Slide not exam relevant

Kosinski et al., 2013


Predict personality (the “dark side”)
Cambridge Analytica:
• Predicting Big Five levels
based on Facebook
profiles, cookies,
demographics, etc.
• Targeted advertisements
for predicted personality
profiles
• Goal: Changing voting
behavior of voters with
https://www.youtube.com/watch?v=n8Dd5aVXLCc
high vote likelihood but
unclear party allocation
Not exam relevant
Comparison of accuracy (from 0 to 1)

• The prediction/
measurement is still far
from perfect ( = 1)
• Some studies have shown
that facebook based
measures have similar or
higher agreement with self-
report than other-report!
• With more data and more
approaches it can get
better

Gladstone et al. (2019)


Behavioral measures

• Despite Big Data, still a narrow focus on a specific set of behaviors:


• Smartphone related behaviors
• Social media related behaviors
• Financial spending behavior
• Location
• “hearable” behaviors (EAR)
• Focus on observable behaviors – feelings and thoughts only indirectly
assessed
• While tracking can be implemented without burden to participant,
analyzing data is very complex (e.g., machine learning; audio coding)
• Legal and ethical issues!!
Summary

• Self-report is still the most widely used personality assessment tool


• A combination of self- and other-report (of close raters) is the gold
standard
• Projective and implicit tests are still not able to replace
questionnaires
• Big Data (social media presence, smartphone usage, tracking,
spending data) personality measurement is still in its infancy, but
shows strong potential
• But: Complexity of Big Data still difficult to transform into
interpretable personality scores
What makes a good
assessment?
Example: Self-report questionnaire

Strongly Strongly
I am… Disagree Neutral Agree
disagree agree
friendly O O O O O
kind O O O O O
helpful O O O O O
hard-working O O O O O
diligent O O O O O
organized O O O O O
Validity & Reliability

• Validity: How well the measure captures the trait (and not something
else)
• Reliability: How precise the measure is in measuring the trait
Strongly Strongly
I am… Disagree Neutral Agree
disagree agree
friendly O O O O O
kind O O O O O
helpful O O O O O
hard-working O O O O O
diligent O O O O O
organized O O O O O
Validity & Reliability

Theory Measurement
“manifest” (observable)
behaviors


“latent” (unobservable) I feel good around people
traits
True I talk a lot Self-reported
Sociability I have many friends Sociability
I like to go to parties

Validity
Does it measure sociability?
Validity

Degree to which test measures what it claims to measure


Ways to determine validity:
• Construct validity: Does it fit the theory? (judged by experts)
• Face validity: Does it look like its measuring what it should? (judged by
participants / laypersons)
• Predictive validity: Does it predict associated behaviors / life outcomes?
• Convergent validity: Does it correlate with other assessments of the trait?
• Discriminant validity: Does it not correlate with other assessments of
different traits
Test-evaluation criteria

Construct validity: Does it fit the theory? My opinion:


Agreeableness is lacking trust components
Curious is a measure of Openness
Strongly Strongly
I am… Disagree Neutral Agree
disagree agree
friendly O O O O O
Agreeableness kind O O O O O
helpful O O O O O

curious O O O O O
Conscientiousness diligent O O O O O
organized O O O O O
Test-evaluation criteria

Face validity: Does it look like its measuring the trait to participants?
What do you think?

Strongly Strongly
I am… Disagree Neutral Agree
disagree agree
trusting O O O O O
Agreeableness kind O O O O O
helpful O O O O O

controlled O O O O O
Conscientiousness diligent O O O O O
organized O O O O O
Test-evaluation criteria

Predictive validity: Does it predict behaviors / life outcomes?


Generally moderate for personality self-reports (Lecture 4 & 5)

Narrow behaviors:
I am… SD D N A SA Helps other people
trusting O O O O O Is on time
Agreeableness kind O O O O O Keeps promises
helpful O O O O O
Broad outcomes:
controlled O O O O O Income
Conscientiousness diligent O O O O O Job performance
organized O O O O O Relationship satisfaction
Test-evaluation criteria

Convergent validity: Does it correlate with other assessments of the


trait? Good test
Bad test Covid test 1
Covid test 1 Covid test 2 Covid test 2

I am… SD D N A SA
trusting O O O O O Agreeableness
Agreeableness kind O O O O O Agreeableness
here
helpful O O O O O NEO-PI-R

controlled O O O O O
Conscientiousness diligent O O O O O
organized O O O O O
Test-evaluation criteria

Discriminant validity: Does it not correlate with other assessments of


different traits
Good test
Covid test 2 Flu test
Bad test Covid test 1
Flu test

I am… SD D N A SA
trusting O O O O O
Agreeableness kind O O O O O
Agreeableness
helpful O O O O O

controlled O O O O O
Conscientious
Conscientiousness diligent O O O O O ness
organized O O O O O
Reliability

Degree to which an obtained measure represents ‘true’ level of trait


being measured (i.e., the precision of determining the level)
Ways to determine validity:
• Test-retest reliability: Do I get the same score if I use the measure
again in one week?
• Internal consistency: Do items of the same trait correlate with each
other?
• Inter-rater reliability: Do X people administering the same measure
(e.g., three interviewers) arrive at the same conclusion?
Test-evaluation criteria

Test-retest reliability: Do I get the same score if I use the measure


again after X weeks?
Good test
I am… SD D N A SA I am… SD D N A SA
friendly X friendly X
kind X kind X
helpful X helpful X
2
weeks Bad test
I am… SD D N A SA later I am… SD D N A SA
hard-working X hard-working X
diligent X diligent X
organized X organized X
Test-evaluation criteria

Internal consistency: Do items of the same trait correlate with each


other?
Good Agreeableness test Bad A test
I am… SD D N A SA I am… SD D N A SA
friendly X friendly X
kind X kind X
helpful X helpful X

Good Conscientiousness test Bad C test


I am… SD D N A SA I am… SD D N A SA
hard-working X hard-working X
diligent X diligent X
organized X organized X
Test-evaluation criteria

Inter-rater reliability: Do X people administering the same measure


(e.g., three interviewers) arrive at the same conclusion?

Good test Bad test


He/She is… SD D N A SA He/She is… SD D N A SA
friendly X friendly X
Rater 1 kind X Rater 1 kind X
helpful X helpful X
He/She is… SD D N A SA He/She is… SD D N A SA
friendly X friendly X
Rater 2 kind X Rater 2 kind X
helpful X helpful X
Myers-Briggs vs NEO-PI-R/BFI-2/HEXACO-PI-R
SAPA test

Vs.
Personality Types - Issues

Assumption Reality Distinction into types is artificial, nearly all


psychological constructs are normally
distributed.

Low retest reliability:


• Your type can easily change across
measurement occasions
Low validity:
• High face validity
• Low construct V: No good theoretical
foundation
• Low predictive V: Types are weak at
predicting relevant outcomes
Other relevant criteria

• Generalizability/Fairness: Can the measure be applied to different


genders, ages, languages, cultures, educational levels? Is the test fair
or does it discriminate against specific groups (e.g., females get lower
scores despite having the same Conscientiousness level as males)?
• Scaling/norms/standardization: Does the test have good comparison
samples and standardized scores to compare test takers?
• Fakeability: Is the test safe against the possibility to “fake” better
scores?
• Efficiency: Is the test cheap and fast to administer?
• Useability: Is the physical/cognitive/mental burden on participants
reasonably low?
Thank you for your attention!

You might also like