Force Choice
Force Choice
Force Choice
To cite this article: Jesús F. Salgado & Gabriel Táuriz (2014) The Five-Factor Model, forced-choice personality inventories
and performance: A comprehensive meta-analysis of academic and occupational validity studies, European Journal of Work
and Organizational Psychology, 23:1, 3-30, DOI: 10.1080/1359432X.2012.716198
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
European Journal of Work and Organizational Psychology, 2014
Vol. 23, No. 1, 3–30, http://dx.doi.org/10.1080/1359432X.2012.716198
This article reports a comprehensive meta-analysis of the criterion-oriented validity of the Big Five personality
dimensions assessed with forced-choice (FC) inventories. Six criteria (i.e., performance ratings, training proficiency,
productivity, grade-point average, global occupational performance, and global academic performance) and three types
of FC scores (i.e., normative, quasi-ipsative, and ipsative) served for grouping the validity coefficients. Globally, the
results showed that the Big Five assessed with FC measures have similar or slightly higher validity than the Big Five
assessed with single-stimulus (SS) personality inventories. Quasi-ipsative measures of conscientiousness (K ¼ 44,
N ¼ 8794, r ¼ .40) are found to be better predictors of job performance than normative and ipsative measures. FC
inventories also showed similar reliability coefficients to SS inventories. Implications of the findings for theory and
practice in academic and personnel decisions are discussed, and future research is suggested.
Since at least the beginning of the 1990s, a large extraversion, openness to experience, agreeableness,
number of meta-analyses have found that personality and conscientiousness) have received more attention
measures predict academic and job performance, than any other personality construct. The meta-
training proficiency, counterproductive behaviours, analytic evidence mentioned above has demonstrated
accidents, productivity data, salary, promotions and that conscientiousness and emotional stability gen-
progress, grade point average, and other relevant eralized validity across samples, criteria, and occupa-
educational and organizational criteria (Barrick & tions, and that the other three personality dimensions
Mount, 1991; Barrick, Mount, & Judge, 2001; were valid predictors for specific criteria and specific
Bartram, 2005; Clarke & Robertson, 2005; Hogan occupations. For example, openness to experience
& Holland, 2003; Hough, 1992; Hurtz & Donovan, predicted training proficiency, and extraversion and
2000; Mount & Barrick, 1995; Ng, Eby, Sorensen, & agreeableness predicted performance in occupations
Feldman, 2005; O’Connor & Paunonen, 2007; Ones, characterized by a high level of interpersonal
Viswesvaran, & Schmidt, 1993; Poropat, 2009; relationships. This was found not only in American
Salgado, 1997, 1998a, 2000, 2002, 2003; meta-analytic integrations (e.g., Barrick & Mount,
Tett, Rothstein, & Jackson, 1991; Trapmann, Hell, 1991; Hogan & Holland, 2003; Hough, 1992; Hurtz &
Hirn, & Schuler, 2007). Consequently, they can be Donovan, 2000; Tett et al., 1991), but also in
useful for personnel selection and academic decisions. European meta-analyses (Salgado, 1997, 1998a),
Among personality measures, the Big Five per- South-African meta-analyses (Rothmann, Meining,
sonality dimensions (i.e., emotional stability, Van der Walt, & Barrick, 2002; Van der Walt,
Correspondence should be addressed to Jesús F. Salgado, Department of Social Psychology, University of Santiago de Compostela,
Campus Vida, 15782 Santiago de Compostela, Spain. E-mail: [email protected]
We thank Neil Anderson, Dave Bartram, Lew Goldberg, Robert Hogan, Tim Judge, Kevin Murphy, Frank Schmidt, and two anonymous
reviewers for their comments on a previous version of this manuscript. The research reported in this article was partially supported by Grant
SEJ-2008-3070 from the Ministry of Science and Innovation (Spain) and Grant PSI2011-27947 from the Ministry of Economy and
Competitiveness to Jesús F. Salgado.
Meiring, Rothmann, & Barrick, 2002), Korean meta- 2007b). Their main points were that: (1) the validity
analyses (Yoo & Min, 2002), and meta-analyses of of personality measures is small, even when correc-
the studies conducted in several Asian countries (Oh tions for criterion reliability and range restriction are
et al., 2011). made; (2) they can show some incremental validity
Additionally, Ones and her colleagues showed over general mental ability; (3) measures based on
that criterion occupational personality scales self-reports can be faked and this can change the rank
(COPS), such as integrity tests, managerial potential, orders of individuals and, consequently, affect hiring
drug abuse, stress tolerance, and service orientation decisions; (4) corrections for faking, mainly based on
scales, were also valid predictors of job performance the scores in social desirability scales, do not seem to
and that they generalized validity across samples improve validity; and (5) the faking scales do not
(Ones & Viswesvaran, 2001a; Ones, Viswesvaran, & work well for identifying distorted responses. The
Schmidt, 1993, 2003; Viswesvaran, Ones, & Hough, criticisms by Murphy and Dzieweczynski (2005) and
2001; see also Hogan & Brinkmeyer, 1997). The Morgeson et al. (2007a) were answered by Barrick
series of meta-analyses conducted by Judge and (2005), Hogan (2005a, 2005b), Hough and Oswald
colleagues (Judge & Bono, 2001; Judge, Bono, Ilies, (2005), Ones, Viswesvaran, and Dilchert (2005),
Downloaded by [Florida International University] at 09:01 27 September 2014
& Gerdhardt, 2002; Judge, Bono, & Locke, 2000; Ones, Dilchert, Viswesvaran, and Judge (2007), Tett
Judge, Heller, & Mount, 2002) showed that core-self and Christiansen (2007), among others, who provided
evaluations predicted performance, leadership, and job a large amount of evidence supporting personality
satisfaction. Ng and colleagues’ meta-analysis showed measures at work. With regards to faking, Hogan,
the validity generalization of locus of control for Barrett, and Hogan (2007) tested a large sample of
predicting job performance (Ng, Sorensen, & Eby, job applicants twice, 6 months apart. They found that
2006). Meta-analyses by Salgado and Moscoso (2000) only 5.25% improved their scores on the second
and Judge, Jackson, Shaw, Scott, and Rich (2007) occasion. Equally important, the Hogan, Barrett, and
showed that self-efficacy was a predictor of various Hogan paper shows that the same number of
criteria, including job performance, satisfaction, and applicants reduce their scores by trying to fake. The
absenteeism. More recently, meta-analyses by Dalal two groups cancel each other out, suggesting that
(2005), Hershcovis et al. (2007), and Kaplan, Bradley, faking may be a random process.
Luschman, and Haynes (2010) showed that positive Overall, these findings suggest that faking on
and negative affect predicted performance and deviant personality measures in the context of personnel
behaviours at work. It was also demonstrated that the selection may be less important than it was previously
facets of the Big Five were predictors of job thought. Furthermore, personality measures remain
performance (e.g., Dudley, Orvis, Liebicki, & Cortina, very popular in the US, and even more so in other
2006; Hurtz & Donovan 2000; Salgado, 2004). countries such as the European Union (see Tett,
Finally, there is also evidence that ‘‘dark side’’ Christiansen, Robie, & Simonet, 2011; Zibarras &
personality measures predict task and contextual Woods, 2010).
performance and counterproductive behaviours (Ho- The members of the panel also made two
gan & Hogan, 2001; Rolland & De Fruyt, 2003; interesting recommendations: (1) that research
Moscoso & Salgado, 2004). The validity coefficients should be done to look for alternatives to typical
ranged from .20 to .45 when the personality measures self-report inventories, and among these alternatives
showed evidence of validity generalization. forced-choice (FC) inventories and conditional rea-
Despite the meta-analytic evidence mentioned, soning tests were suggested; and (2) that the criterion
there still remains no unanimous agreement that domain should be expanded and other measures
personality measures are relevant for making person- should be used in addition to job performance
nel decisions. For example, Murphy and Dziewec- ratings. In this article, we will centre on the relation-
zynski (2005) made three general criticisms: (1) ship between the FC personality inventories and four
theories linking personality constructs and job types of work and educational criteria: job perfor-
performance are often vague and unconvincing, (2) mance ratings, training proficiency, objective perfor-
little is known about how to match personality mance (e.g., sales), and educational success (i.e.,
dimensions and occupations, and (3) the most valid grade point average).
personality-related measures have involved measures
of poorly defined constructs, such as integrity.
FORCED-CHOICE MEASURES IN
More recently, a panel of past and current editors
PERSONALITY ASSESSMENT
of the top-tier journals in industrial, work, and
AND PERSONNEL AND
organizational psychology discussed the evidence on
EDUCATIONAL DECISIONS
various aspects of the validity of personality inven-
tories for personnel selection and concluded with a According to Travers (1951), it was Paul Horst and
rather pessimistic view (Morgeson et al., 2007a, Robert Wherry who, independently of each other,
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 5
developed the seminal ideas about FC measures. criterion of pure ipsativity suggested by Clemans,
Horst applied the concept to the development of because, for example, not all alternatives ranked by
personality inventories and Wherry to the problem of respondents are scored or the scales have different
rating army officer performance. In this sense, FC is numbers of items. The main characteristic of the third
simply a specific format of rating and assessment type of measures is that items representing a given
procedures. Typically, the FC method gives the bipolar scale are never paired with items representing
individual (e.g., the applicant, the rater) a number another bipolar scale. For example, items assessing
of words or phrases, along with instructions to select extraversion are never paired with items of conscien-
the ones he or she most, or in some cases least, likes tiousness. It is important to point out that the SS
when it is applied to the evaluated person. The format produces normative scores only, in contrast
number of words or phrases may be, for instance, with the FC format, which can produce three types of
pairs, triads, or tetrads, which are paired in terms of scores.
an index of preference and discrimination (e.g., social As Cattell (1948), Clemans (1966), and Hicks (1970)
desirability). Thus, the FC formats can be distin- have shown, each score type (i.e., normative, quasi-
guished from SS formats (e.g., Likert, yes/no, true/ ipsative, and purely ipsative) has important character-
Downloaded by [Florida International University] at 09:01 27 September 2014
false) in that a choice must be made among the istics. In the case of normative scoring, the scores of an
alternatives rather than rating each statement as it individual are statistically dependent on other indivi-
occurs in the single-stimulus formats. duals in the population and independent of other
Since the 1950s, a number of popular personality scores of the assessed individual (e.g., scores in other
inventories have used FC modalities, including the attributes). This kind of score allows the comparison of
Edwards Personal Preferences Schedule (EPPS; individuals or groups on each measured variable (i.e.,
Edwards, 1973), the Gordon Personal Profile-Inven- they are interindividual scores). In the case of purely
tory (GPP-I; Gordon, 1993), Myers-Briggs Type ipsative measurement, the scores in a variable are
Indicator (MBTI; Myers, McCaulley, Quenk, & dependent on the level of the individual in other
Hammer, 1998), and the family of Occupational variables which are assessed. Therefore, ipsative scores
Personality Questionnaires (OPQ; SHL, 2006). FC allow the comparison of the level of the individual
inventories have not only been used in personality across variables (i.e., they are intraindividual scores).
assessment, but also for assessing learning styles Normative scores can be transformed into ipsative
(Kolb, 1985), vocational interests (Kuder, 1975), scores by a simple mathematical transformation (e.g.,
social power bases (French & Raven, 1959), team by subtracting each individual’s average scale score
conflict (Thomas & Killman, 1974), and team roles from each scale) but the reverse is not possible. Quasi-
(Belbin, 1981). FC formats were also used for ipsative scores share some psychometric characteristics
assessing job performance (e.g., Bartram, 2005; King with both normative and purely ipsative scores. For
Hunter & Schmidt, 1980). FC formats have been used example, they allow the comparison of individuals and
in personnel and student selection mainly because groups, which is a characteristic of normative scores
they reduce or eliminate uniform biases such as acqui- (Bartram, 1996; Cattell & Brennan, 1994; Clemans,
escence responding and faking, and can reduce ‘‘halo’’ 1966; Heggestad, Morrison, Reeve, & McCloy, 2006;
effects (Bartram, 2005, 2007; Cheung & Chan, 2002). Horn, 1971). Simultaneously, some degree of depen-
A relevant characteristic of the FC formats is that dence can be generally found among the quasi-ipsative
they can result in several different scoring methods scales of the questionnaire, which is a characteristic of
with specific statistical and psychometric particula- ipsative scores (Clemans, 1966; Gordon, 1993). How-
rities. In his seminal review of the FC personality ever, as Horn (1971) pointed out, quasi-ipsative scoring
measures, Hicks (1970) drew up a classification that does not always introduce algebraic dependence.
remains a classic to this day. The basis of Hicks’ In Hicks’ (1970) view, ipsativity can be quantified
taxonomy is the difference between normative and in terms of the deviation of the conditions of
ipsative scores, as suggested by Cattell (1944) and ipsativity. According to the mathematical examina-
Clemans (1966). According to Clemans, ‘‘any score tion carried out by Clemans (1966) and Radcliffe
matrix is said to be ipsative when the sum of the scores (1963), ipsative scores show five properties: (1) Once
obtained over the attributes measured for each the scores obtained in ipsative form are converted to
respondent is constant’’ (p. 4). Following this defini- deviation scores by columns, the sums of the columns
tion, Hicks (1970) suggested that three different types or rows of an ipsative covariance matrix must be
of FC measures can be distinguished: (1) purely zero; (2) if the ipsative variances are equal, the sums
ipsative measures, (2) quasi-ipsative (or partially) of columns or rows of the ipsative intercorrelation
ipsative measures, and (3) normative FC measures. matrices are equal to zero; (3) when the ipsative
The first type refers to those measures that totally variances are equal, the average intercorrelation
meet Clemans’ criterion of ipsativity. The second type value will be limited by 71/(m71) where m is the
includes measures that do not totally meet the number of scales; (4) the sum of the covariance terms
6 SALGADO AND TÁURIZ
obtained between a specific criterion and a set of correlate with cognitive ability when individuals
ipsative scores is zero; and (5) when ipsative variances respond as job applicants (Vasilopoulos, Cucina,
are equal, the sum of the ipsative validity coefficients Dyomina, Morewitz, & Reilly, 2006); (7) the ipsative
is zero. Therefore, a simple comparison between the and quasi-ipsative measures may produce gender
empirical means and standard deviations of the sum- differences in some cases and, consequently, equal
med validities, and the mean intercorrelation and the opportunities may also be negatively affected
predicted validities, can serve to quantify ipsativity. (Anderson & Sleap, 2004); and (8) according to
The more deviation there is from the predicted values, Ones et al. (2007), they do not improve the criterion-
the less pure ipsativity is found in the data. Hicks related validity of personality measures and, there-
demonstrated that this procedures works. fore, are useless for most purposes in organizational
There are also different strategies to produce a decision making.
quasi-ipsative score or successfully reduce ipsativity Despite these criticisms, FC inventories continue
(see Hicks, 1970). For example, the Gordon Personal to be used in personnel selection. For example, Tett,
Profile–Inventory (GPP-I; Gordon, 1993) and the Christiansen, Robie, and Simonet (2011) found that
IPIP-MFC (Heggestad et al., 2006) avoid full ipsa- 30% of companies used FC inventories, although the
Downloaded by [Florida International University] at 09:01 27 September 2014
tivity by introducing some changes in the way FC percentage of ipsative, quasi-ipsative, and normative
inventories are scored. The GPP-I and the IPIP- measures is unknown. Furthermore, several research-
MFC, instead of having a fixed sum, allow a range of ers have questioned the psychometric criticisms
possible values within each tetrad. This allows people mentioned before. For example, Baron (1996) sug-
to be high on all the scales, low on all, or have an gested that ipsative measures are appropriate for
intermediate level in all. Consequently, the method factor analysis if the number of items or scales is large
does not result in a fixed sum of scale scores, and (4 30) because the intercorrelations become practi-
normative arrangements of the individuals are cally zero, although this result may also be substan-
possible. This scoring system is quasi-ipsative in the tially artifactual (Cattell & Brennan, 1994; Dunlap &
sense that very high scores in all scales are not Cornwell, 1994; Meade, 2004). Bartram (1996) found
possible. However, from the empirical point of view, that ipsative measures could show similar or even
this limitation is not very important because the larger internal consistency coefficients than SS
number of people in the population who would score measures when the appropriate formula is used.
very highly in, for example, emotional stability, Furthermore, confirmatory factor analysis (CFA) can
conscientiousness, openness, agreeableness, and ex- be done using Jackson and Alwin’s (1980) procedure
traversion, all at the same time, is minimal. as modified by Chan and Bentler (1993), and ipsative
A number of researchers have suggested using FC scores can also be transformed into a normal
measures in personnel selection because they are valid distribution using the formula suggested by Hayes
predictors of job performance and resistant to faking (1967; see also Chapman, Blackburn, Austin, &
(Bartram, 2005, 2007; Brogden, 1954; Christiansen, Hutcheson, 1983; Feather, 1973), allowing compar-
Burns, & Montgomery, 2005; Jackson, Wroblewski, isons of individuals. More recently, developments in
& Ashton, 2000; Norman, 1964). Nevertheless, if the IRT technology have produced methods for analys-
FC inventories produce ipsative scores, they have ing multidimensional forced-choice items (Brown &
some characteristics that some researchers consider Maydeu-Olivares, 2011; Chernyshenko et al., 2007;
problematic from the psychometric point of view. Heggestad et al., 2006; Maydeu-Olivares & Brown,
Among the limitations, it has been mentioned that: 2010; McCloy, Heggestad, & Reeve, 2005; Stark,
(1) Ipsative measures produce scores that negatively Chernyshenko, & Drasgow, 2005; Stark, Cherny-
correlate with each other, which limits the type of shenko, Drasgow, & Williams, 2006). Even though
statistical analysis which is possible (Meglino & the number of studies is too small at present to be
Ravlin, 1998); (2) they can affect the size of reliability conclusive about the usefulness of IRT approaches in
coefficients (Bartram, 1996; Horn, 1971; Johnson, personnel and academic selection, recent findings
Wood, & Blinkhorn, 1988; Tenopyr, 1988; Thomp- show that the predictive validity of FC inventories is
son et al., 1982); (3) they are not appropriate for similar or larger than the predictive validity of their
factor analysis (Cornwell & Dunlap, 1994; Dunlap & SS counterparts (Brown & Bartram, 2009).
Cornwell, 1994; Meade, 2004); (4) they are inap- With regard to quasi-ipsative measures, Hicks
propriate for comparisons among individuals (Cattell (1970) hypothesized that validity increases as an
& Brennan, 1994; Hicks, 1970); (5) the average inverse function of ipsativity. Therefore, quasi-
correlation between m scales of a purely ipsative ipsative measures would be more valid predictors
questionnaire is bound below by 71/(m71) and than purely ipsative ones. Furthermore, he suggested
above by (m74)/m, whereas for nonipsative ques- that there may exist cases in which quasi-ipsative
tionnaires the average correlation ranges between measures are also more valid than SS ones, because
71/(m71) and 1 (Gleser, 1972); (6) they appear to some of the advantages of the FC format (e.g., more
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 7
resistance to faking) can be successfully exploited analytic research examining whether the different
without the statistical limitations of ipsativity. More- types of scores (e.g., normative, purely ipsative, and
over, Cattell and Brennan (1994) demonstrated that quasi-ipsative) yielded by FC inventories have similar
factor analysis is not affected if quasi-ipsative data or different validity, or whether the validity of FC
are used. measures is moderated by variables such as criterion
An important implication of Hicks’ (1970) review type. Also, previous meta-analyses did not distinguish
has to do with the conditions that ipsative measures between SS and FC personality measures. In this
should fulfil to be used in psychology and, specifi- sense, based on the reference list of the previously
cally, in personnel selection and academic decisions. published meta-analyses, it can confidently be
Hicks suggested that the value of ipsative measures affirmed that those meta-analyses were mainly con-
depends on three conditions being fulfilled: (1) that ducted with data obtained with SS inventories. For
SS measures are affected by faking, (2) that faking example, an inspection of Table 1 in the meta-
diminishes the validity of SS measures, and (3) that analysis by Tett et al. 1991) shows that more than
ipsative measures control faking better than other SS 85% of studies used SS inventories. Similar or even
controls of faking, and simultaneously increase the larger figures were found in the meta-analyses
Downloaded by [Florida International University] at 09:01 27 September 2014
validity. With regard to the third condition, research conducted by Hurtz and Donovan (2000), O’Connor
has shown that FC measures are more resistant to and Paunonen (2007), Poropat (2009), Salgado (1997,
faking than their SS counterparts (see Nguyen & 2003), and Trapmann et al. (2007) on the validity of
McDaniel, 2000). However, it has not been conclu- the Big Five personality dimensions. Barrick and
sively demonstrated that quasi-ipsative and purely Mount (1991) and Hough (1992) did not include the
ipsative measures are equal or more valid predictors list of studies in their meta-analysis, but it is likely
of job and academic performance than SS inven- that they shared the majority of the studies with those
tories. Some researchers have reported high validities of Tett et al.
for FC measures. For example, Brogden (1954) found
validity coefficients of .33 and .42, and Villanova,
AIMS OF THE STUDY
Bernardin, Johnson, and Dahmus (1994) found
similar coefficients for an FC job compatibility In summary, this research has five goals. The main
questionnaire. However, according to Ones et al. objective of this article is to conduct a meta-analytic
(2007), the precise reasons for the higher correlations study of the validity of the Big Five personality
remain unclear. dimensions as assessed with FC personality measures
It is surprising that despite the years which have for predicting work and educational criteria (e.g., job
passed since Hicks’ (1970) review and the large performance, productivity, GPA). The second goal is
number of meta-analyses of the criterion-oriented to ascertain whether the type of score (i.e., purely
validity of personality measures, to date no meta- ipsative, quasi-ipsative, and normative) moderates
analysis has been conducted to examine the validity of the validity of FC questionnaires. The third objective
FC personality inventories for predicting job and is to compare the results with the previous meta-
academic performance. For example, there is no meta- analytic findings (e.g., Barrick & Mount, 1991; see
TABLE 1
Distributions of reliability of the Big Five personality dimensions assessed with forced-choice questionnaires
Internal consistency
Mean .73 .75 .81 .80 .72
SD .09 .13 .12 .08 .12
Square root of reliabilities .85 .87 .90 .90 .86
K 10 6 4 8 11
Range .52–.81 .47–.84 .60–.88 .70–.89 .53–.92
Ones & Viswesvaran (1999) .78 .78 .73 .75 .78
Square root of reliabilities .88 .88 .85 .86 .88
Test–retest
Mean .76 .75 .75 .71 .77
SD .13 .06 .04 .14 .11
Square root of reliabilities .87 .87 .87 .84 .88
K 13 13 8 4 15
Range .50–.92 .65–.86 .67–.81 .53–.85 .61–.96
Ones & Viswesvaran (1999) .77 .76 .71 .69 .72
Square root of reliabilities .87 .87 .84 .83 .84
8 SALGADO AND TÁURIZ
also Schmidt, Shaffer, & Oh, 2008), conducted mainly than purely ipsative inventories for predicting job
with SS personality measures. The fourth objective is performance.
to examine if the criterion type moderates the validity
of FC measures. The fifth goal is to examine the METHOD
reliability of the Big Five when assessed with FC
Literature search and coding of studies
inventories. In order to achieve these objectives, we
decided to use the Five-Factor Model as the frame- Computer-based and manual literature searches were
work for classifying the validity coefficients of the FC conducted in order to identify published and unpub-
personality measures, both for theoretical and lished studies carried out up until and including
practical reasons. From the theoretical point of September 2011. To cover the literature on FC
view, the classification of FC measures into the Big personality measures as exhaustively as possible,
Five will allow us to compare our results with the and to prevent any bias in the inclusion of studies,
findings of previous meta-analyses, mostly conducted we adopted a series of search strategies. First, we
with SS inventories, and respond to the question identified the most popular FC inventories. They
posed by Hicks (1970) about the smaller, equal, or included, for example, the Occupational Personality
Downloaded by [Florida International University] at 09:01 27 September 2014
larger validity of ipsative versus normative data Questionnaire (OPQ), Edwards Personal Preferences
(based on SS inventories). From the practical point Schedule (EPPS), Myers-Briggs Type Indicator
of view, the Big Five is a useful taxonomy which is (MBTI), Description en Cinq Dimensions (D5D),
unrivalled at present, although it may not be Survey of Interpersonal Values Inventory (SIV), and
exhaustive (Hogan, 2005a). the Gordon Personal Profile-Inventory (GPP-I).
This research refers to the same variables used in Second, PsycInfo, Social Sciences Citation Index,
previous meta-analyses of personality inventories and ABI/Inform databases were searched to identify
(e.g., Barrick & Mount, 1991; Hurtz & Donovan, studies on the relationship of FC measures and
2000; Salgado, 1997) and, therefore, some of the organizational criteria. Several keywords were used
seminal hypotheses stated by Barrick and Mount for the computer-based literature search (e.g., ipsa-
(1991) are appropriate in the present case. Conse- tive, forced-choice, ipsativity, job performance), as
quently, we advance the following hypotheses: well as the acronyms of the most popular FC
personality inventories. Third, electronic searches
Hypothesis 1: Emotional stability and conscien- using Google were carried out systematically in order
tiousness are valid predictors for all academic and to look for articles, unpublished manuscripts, and
job performance criteria and they generalize master and doctoral dissertations not included in the
validity across samples, criteria, and FC person- most common databases. Fourth, a manual article-
ality measures. by-article search was carried out in a number of top-
tier journals (e.g., Educational and Psychological
Hypothesis 2: Openness to experience is a valid Measurement, International Journal of Selection and
predictor of training proficiency and it generalizes Assessment, Journal of Applied Psychology, Journal of
validity across samples and FC personality Occupational and Organizational Psychology, Person-
measures. nel Psychology). Fifth, the reference sections of
several published meta-analyses (e.g., Barrick &
On the other hand, Hicks’ (1970) analysis Mount, 1991; Bartram, 2005, 2007; Dudley et al.,
suggested that the criterion-oriented validity is 2006; Hurtz & Donovan, 2000; Salgado, 1997, 2003;
dependent on the score type (e.g., normative scores Tett et al., 1991) were reviewed to identify articles not
would be more valid than ipsative scores) and that covered in our computer-based search. Sixth, we
validity is also dependent on the assessment condi- contacted a number of researchers and asked for both
tions (e.g., personnel selection vs. laboratory condi- published articles and unpublished papers on the
tions). According to Hicks, quasi-ipsative measures topic in order to avoid or reduce file drawer effects
may be more valid than SS ones in personnel and publications bias. Seventh, the technical manuals
selection conditions, because some of the advantages of the most popular FC personality questionnaires
of the FC format (e.g., more resistance to faking) (e.g., EPPS, GPP-I, MBTI, OPQ) were examined in
can be successfully exploited without the statistical order to find validity coefficients. By means of these
limitations of ipsativity. Consequently, we also state search strategies, a preliminary database of over 180
a hypothesis concerning the type of scores (i.e., documents (i.e., articles, manuals, technical reports,
purely ipsative, quasi-ipsative) derived from FC unpublished papers, dissertations, and so on) was
inventories: established for further inspection. There were 58
studies excluded from the total pool for various
Hypothesis 3: Quasi-ipsative personality inven- reasons: (1) Some studies reported only the significant
tories will show larger criterion-oriented validity correlations, (2) a number of studies only reported
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 9
multiple correlation results, (3) several of them did not occupation and related information; (3) personality
report correlations or enough information to calculate measures used; (4) criterion type; (5) reliability of
the effect size, and (d) several studies reported findings personality measures; (6) criterion reliability; (7) range
for the same data set. As a result of these points, the restriction value or data for calculating this value; (8)
meta-analysis was conducted with 122 independent statistics concerning the relation between personality
samples. With regard to these studies, we are not able measures and the criterion; and (9) correlation among
to ascertain how much overlap, if any, exists between the personality measures when more than one was
our meta-analysis and Barrick and Mount’s (1991) used. This complies with the American Psychological
meta-analysis because these researchers did not report Association guidelines on meta-analysis reporting
the list of studies included in their meta-analysis. The standards (APA, 2009). When a study contained
same is true for Hough’s (1992) meta-analysis. With conceptual replications (i.e., two or more measures of
regard to Tett et al.’s (1991) and Salgado’s (1997) the same construct were used in the same sample),
meta-analyses, we can say that the degree of over- linear composites with unit weights for the compo-
lapping is practically irrelevant. We share six studies nents were formed. Linear composites provide esti-
out of 122 with Tett et al.’s meta-analysis and we share mates that are more construct-valid than the use of the
Downloaded by [Florida International University] at 09:01 27 September 2014
three studies with Salgado’s meta-analysis. average correlation. Nunnally (1978, pp. 166–168) and
The next step was to classify the scales from the Hunter and Schmidt (1990, pp. 457–463; see also
inventories into the Big Five personality dimensions. Hunter & Schmidt, 2004) provided Mosier’s formula
This task was relatively easy because a number of for the correlation of variables with composites. As
studies used a Big Five measure or estimates of the demonstrated by Warr, Bartram, & Brown (2005), the
Big Five (e.g., Bartram, 2007; McDaniel et al., 2004; average validity, corrected with Mosier’s formula for
Nyfield et al., 1995; Robertson, Baron, Gibbons, composite reliability, produces very accurate estimates
MacIver, & Nyfield, 2000; SHL, 2006; Warr, when the appropriate intercorrelations are used.
Bartram, & Brown, 2005). With the rest of the An important difference between this meta-
studies, we used the following method. First, an analysis and previous meta-analyses of the relation-
exhaustive description of the Big Five was written ship between personality and job and academic
and given to the coders (based on the definitions of performance is that we used the type of FC measure
the Big Five given by Barrick & Mount, 1991; for grouping the validity coefficients. This difference
Hough, 1992; McCrae & Costa, 1990; Hough & is especially relevant because different degrees of
Ones, 2001; and Salgado, 1997; among other ipsativity can result in different validity levels
sources). A list and the definition of the personality (Clemans, 1966; Hicks, 1970; Radcliffe, 1963).
scales from each inventory were then provided for Because the number of studies allowed us to
each coder with instructions to assign each scale to consider each one as a separate entity, we classified
the most appropriate factor. Furthermore, some the personality inventories in three categories: purely
studies reporting factor analyses of the inventories ipsative, quasi-ipsative, and normative FC question-
were also used as a basis for the decision (e.g., naires. In order to classify the inventories, each one
Matthews, Stanton, Graham, & Brimelow, 1990; was inspected in terms of the scoring method and the
McCrae & Costa, 1990; Piedmont, Costa, & McCrae, format of items. Furthermore, we used the technical
1992; SHL, 2006) because these factor analyses were manuals of the inventories when available, and other
informative about the Five-Factor structure of some articles that included relevant information about the
FC inventories (e.g., OPQ, EPPS). Finally, we also inventory characteristics and scoring system. The
checked the coding lists used by Hough and Ones initial agreement level of the coders was 95% and
(2001), Ones (1993), and Salgado (2003) in order to the disagreements were solved by a discussion until
compare our classification and the classification of the coders agreed on a questionnaire category. We
these researchers when the same personality measures classified the questionnaires as being purely ipsative
were used. If the coders agreed on a dimension, the if the sum of the scores obtained over the scales was
scale was coded in that dimension. The disagreements constant. Examples of purely ipsative inventories in
(less than 10%) were solved by a discussion until the our database are the Edwards Personality Prefer-
coders agreed on a dimension. All the scales were ences Schedule (EPPS; Edwards, 1957, 1973), the
assigned to a single dimension. Two researchers Occupational Personality Questionnaire 32i
served as coders, working independently to code (OPQ32i; SHL, 2006), the Occupational Personality
every study. Appendix A includes a complete list of Questionnaire CM4.2 (OPQ CM4.2; Saville, Holds-
the scales that were assigned to each Big Five worth, Nyfield, Cramp, & Mabey, 1984), and the
personality dimension. Description en Cinq Dimensions (D5D; Rolland &
For each study, the following information was Mogenet, 2001). Based on Hicks’ (1970) criteria of
recorded, if available: (1) sample characteristics, such ipsativity, we classified inventories as quasi-ipsative
as gender, age, education, and so forth; (2) if any of the following alternatives applied: (1)
10 SALGADO AND TÁURIZ
Knapp, Heggestad, & Young, 2004; White, 2002). for emotional stability, extraversion, openness,
Finally, we classified inventories as normative if it agreeableness, and conscientiousness, respectively.
yielded scores that posses the empirical properties of We also examined whether the three types of FC
absolute measures. This is the case of the inventories scoring systems showed different levels of reliability,
in which items representing different degrees of a but they proved to be very similar. Therefore, we used
personality dimension are never paired with items the average reliabilities in the meta-analytic
representing another personality dimension. The calculations. Table 1 presents a summary of these
MBTI and the ‘‘Need of Achievement’’ question- artifact distributions. In Table 1, we have also included
naire (Fineman, 1975) are representative examples of the reliabilities found by Ones and Viswesvaran (1999)
normative FC questionnaires. Following the stan- in their meta-analysis of Big Five reliability. As can be
dards of the American Psychological Association for seen, our estimates are very similar to the estimates
reporting meta-analytic research (APA Publications found by Ones and Viswesvaran and are also very
and Communications Board Working Group, 2008) similar to the estimates used in previous meta-analyses
and the recent suggestion by Sackett and Schmitt (e.g., Barrick & Mount, 1991; Hurtz & Donovan, 2000;
(2012) on data reporting in meta-analysis, we have Salgado, 1997, 2002, 2003).
included an appendix with the totality of studies and Although we used the internal consistency coeffi-
validity coefficients we have used in our meta- cients in our meta-analysis, this decision may be
analysis (see Appendix B). controversial for some researchers. For example,
After the studies were collated and their character- Heggestad et al. (2006) pointed out that Cronbach’s
istics recorded, the following step was to apply the alpha is not an appropriate coefficient for estimating
psychometric meta-analysis of Hunter and Schmidt the reliability of FC inventories because the responses
(1990, 2004). Psychometric meta-analysis estimates to items are not independent and, therefore, the
how much of the observed variance of findings across observed item covariances are not accurate estimates
studies is due to artifactual errors. The artifacts of the true item covariances. They suggested that
considered here were sampling error, criterion and test–retest coefficients are the appropriate estimates.
predictor reliability, and indirect range restriction in For this reason, we also developed an empirical
personality scores. To correct the observed validity distribution of test–retest coefficients for the Big Five,
for these last three artifacts, the most common using the coefficients reported in the studies included
strategy was to develop specific distributions for in our database. The average test–retest reliability
each of them. Some of these artifacts reduce the was .76, .75, .75, .71, and .77, for emotional stability,
correlations below their operational value (e.g., extraversion, openness, agreeableness, and conscien-
criterion reliability and range restriction), and all of tiousness, respectively. These estimates are very
them produce artifactual variability in the observed similar to the test–retest estimates reported by
validity (Carretta & Ree, 2000, 2001). In our analysis, Visesvaran and Ones (1999) and are also similar to
we corrected the observed mean validity for criterion the average internal consistency estimates found here.
reliability and range restriction in the predictor in Thus, the use of Cronbach’s alpha or test–retest
order to obtain the operational validity (which is of coefficients has no differential effect in the current
interest for personnel selection and academic deci- meta-analysis.
sions), and we corrected the operational validity for In summary, the empirical evidence suggests that
predictor reliability in order to obtain the true validity when the Big Five personality dimensions are
(which is of interest for modelling the theoretical measured with FC personality inventories, the
relationship between personality and performance). reliability is practically the same as that obtained
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 11
with SS inventories. This is an important finding as also Hunter & Hunter, 1984) and many meta-analyses
some researchers had suggested that FC inventories of the Big Five (e.g., Barrick & Mount, 1991). The
would show a different level of reliability than SS average reliability for the objective productivity
questionnaires. measures was .83, based on seven studies. This value
is similar to the one found by Schmidt and Zimmer-
Criterion reliability. In this research, six types of man (2004) and it is the same at that found by
criteria were used: (1) supervisory ratings of job Schmitt, Gooding, Noe, & Kirsch (1984). We found
performance, (2) productivity data (e.g., sales), (3) six coefficients of GPA reliability, which produced a
training proficiency, (4) grade point average, (5) mean reliability of .80 (SD ¼ .10) and we found two
global academic performance, and (6) global job coefficients of class attendance reliability. Pooling
performance. This choice was made for two reasons: together the reliability coefficients and weighting for
(1) Previous meta-analyses of personality measures the number of studies using each criterion type, we
used some of these types of criteria, and one of our calculated the average reliability coefficient for global
objectives was to provide a comparison with those job performance and for global academic perfor-
meta-analyses; consequently, it was important to mance, so we had two additional composite criteria.
Downloaded by [Florida International University] at 09:01 27 September 2014
retain the same criteria; (2) other criteria, such as The respective average coefficients were .61 (SD ¼
tenure, turnover, or accidents were used in a very .13) and .81 (SD ¼ .09). These last two distributions
small number of studies, or were totally absent from were used for the meta-analyses carried out with the
our database; therefore, we would not be able to pooled sample of validity studies. Out of six criterion
carry out meta-analyses in these cases. reliability estimates (four criterion categories plus two
Not all studies provided information regarding the overall composites), four were .80 or slightly higher,
criteria reliability and, consequently, we developed one was .61 and another .52. Considered globally, this
empirical distributions for the six criteria. Fortu- artifact produced an underestimation of the observed
nately, a number of studies provided reliability validity and error variance as well.
coefficients for estimating criterion reliability. For
job performance ratings, the coefficient of interest is
Range restriction distributions
interrater reliability when a meta-analysis of random
effects is carried out (Hunter, 1986; Sackett, 2003; The distributions for range restriction were based on
Schmidt & Hunter, 1996). This is because if this type the following three strategies: (1) Some range
of reliability is used in the correction for attenuation, restriction values were obtained from the studies
it will correct most of the unsystematic errors in that reported both restricted and unrestricted stan-
supervisor ratings (Hunter & Hirsh, 1987), although dard deviation data, (2) another group of range
not all researchers agree with this point of view (e.g., restriction values were obtained using the population
Murphy & De Shon, 2000). We found 11 studies values reported in the manual of the various
reporting interrater coefficients of job performance inventories, and (3) a third group of range restriction
ratings (see Table 2). The average coefficient was .52 values was obtained using the reported selection
(SD ¼ .05). This average coefficient is the same as ratio. To use the reported selection ratio, we applied
that found by Viswesvaran, Ones, and Schmidt (1996; the formula derived by Schmidt, Hunter, and Urry
see also Salgado & Moscoso, 1996) in their meta- (1976). This triple strategy produced a large number
analysis of the interrater reliability of job perfor- of range restriction estimates, and we grouped them
mance ratings, and independently by Salgado et al. according to the personality dimensions. The average
(2003) in European criterion-oriented validity studies range restrictions (u) were .87 for emotional stability,
of cognitive tests. In the case of training proficiency, .90 for extraversion, .92 for openness to experience,
we found two studies reporting reliability (see Table .90 for agreeableness, and .88 for conscientiousness.
2). The average coefficient was .80 (SD ¼ .09). This These u values were used when no more specific ones
figure is the same as that used by Hunter (1986; see were available. We also found u ¼ .93 (SD ¼ .10)
and u ¼ .81 (SD ¼ .19) for the ipsative and quasi-
ipsative measures combined with job performance
TABLE 2
ratings; u ¼ .82 for ipsative measures combined with
Distributions of criteria reliability
training proficiency, and u ¼ .86 (SD ¼ .17) when
Criterion Mean SD the criterion was GPA. These range restriction values
Job performance ratings .52 .05 are very similar to the figures used in previous meta-
Training proficiency .80 .09 analysis (e.g., Barrick & Mount, 1991; Hurtz &
Productivity data .83 .07 Donovan, 2000; Salgado, 1997, 2003) and they are in
Grade point average .80 .10 accordance with the observation by Schmidt et al.
Occupational performance (average) .61 .13
(2008) that the range restriction of personality
Academic performance (average) .81 .09
measures is smaller than the range restriction found
12 SALGADO AND TÁURIZ
in the validity studies of cognitive ability tests. A Schmidt et al., 1993). We are interested in the
summary of these distributions appears in Table 3. relationship between the Big Five and performance,
both as theoretical constructs and as operational
predictors. Therefore, we will report both the
Meta-analytic method
operational validity and the true correlation. In
We used the psychometric meta-analysis methods summary, we will correct the observed validity for
developed by Hunter and Schmidt (1990, 2004) and criterion reliability and IRR to obtain the operational
implemented in a software program by Schmidt and validity, and for predictor unreliability to obtain the
Le (2004). This software includes some recent true correlation. The observed variance will be
advances to correct for indirect range restriction corrected for by four artifactual errors: sampling
(IRR). According to Hunter, Schmidt, and Le (2006), error, criterion and predictor reliability, and IRR.
IRR is the most common case of range restriction in
validity studies. It is present in all concurrent validity
studies and in practically all predictive validity RESULTS
studies conducted in personnel selection (some
Meta-analysis of the personality factors
Downloaded by [Florida International University] at 09:01 27 September 2014
TABLE 4
Results of the meta-analyses of the Big Five validity for occupational and academic performance
Occupational performance
Emotional stability 82 16436 .06 .13 .10 .11 .19 30 7.12
Extraversion 80 17692 .06 .13 .09 .10 .18 29 7.11
Openness to experience 63 13539 .09 .12 .13 .14 .15 33 7.05
Agreeableness 65 14740 .04 .13 .05 .06 .17 28 7.15
Conscientiousness 96 20307 .14 .13 .21 .24 .18 36 .01
Academic performance
Emotional stability 14 3916 .03 .07 .05 .05 .06 73 7.01
Extraversion 20 6884 7.02 .11 7.02 7.03 .14 25 .14
Openness to experience 16 6299 .15 .19 .19 .21 .24 11 7.09
Agreeableness 17 6560 7.06 .14 7.08 7.09 .20 13 .15
Conscientiousness 25 6314 .12 .10 .17 .19 .10 52 .05
K ¼ number of independent samples; N ¼ total sample size; rw ¼ observed validity; SDr ¼ standard deviation of observed validity;
Downloaded by [Florida International University] at 09:01 27 September 2014
rc ¼ operational validity (validity corrected for criterion reliability and indirect range restriction in predictor); ri ¼ validity corrected for
criterion reliability, predictor reliability, and indirect range restriction in predictor; SDr ¼ standard deviation of ri; %VE ¼ percentage of
variance accounted for by artifactual errors; 90CV ¼ 90% credibility value based on the operational validity (rc).
correction instead of the IRR correction used here. In current meta-analysis showed that conscientiousness
order to confirm whether the difference in the type of generalized validity across studies, although the
range restriction correction represents a large change percentage of unexplained variance is large (48%),
in the operational and true validity, we also which suggests that moderators can affect validity
conducted the analysis with DRR correction. We (e.g., university type, country, criterion measures).
found a slightly smaller validity for DRR correction, For example, Trapmann et al. found that the
but the amount is so small that it does not affect the reliability of university grades is different for oral
conclusions. Schmidt et al. (2008) also found that the and written examinations and the use of one type or
IRR correction has only a slightly larger effect on the other depends on the specific university or even the
validity of personality dimensions in comparison with academic subject. They also found that the validity of
the DRR correction. Consequently, FC and SS the Big Five for predicting grades is slightly different
personality inventories (the latter as represented, for in Australia, Europe, and North America, and very
instance, in the meta-analyses by Barrick & Mount, different in East Asia. Based on our dataset, we
1991; Hurtz & Donovan, 2000; and Salgado, 1997) cannot conduct a meta-analysis for these kinds of
show similar validity coefficients for conscientious- possible moderators. We also found that no other
ness. Based on this finding, overall validity size is not personality dimension showed to be a generalizable
the factor which should be used to decide if one type predictor of academic performance. This last result is
of inventory or the other should be preferred for similar to the previous meta-analytic findings
making personnel decisions. (O’Connor & Paunonen, 2007; Poropat, 2009;
The percentage of variance explained by artifactual Salgado, 2000; Trapmann et al., 2007).
errors is small for all personality dimensions, ranging
from 28% for agreeableness to 36% for conscien-
Meta-analysis by personality
tiousness, which suggests that the search for possible
dimensions–criterion combinations
moderators is appropriate. Criterion type and FC
scoring type are possible moderators of validity. We We have hypothesized that conscientiousness and
will show the meta-analytic results for these mod- emotional stability are valid predictors for all
erators in the following sections of the article. criterion types, and that openness is a valid predictor
With regard to academic performance, as was for training proficiency. The results of the meta-
hypothesized in Hypothesis 1, conscientiousness analysis of the personality dimensions and three
showed to be a valid and generalizable predictor. criterion types are shown in Table 5.
The r value was .19, which is slightly smaller that the For job performance ratings, only conscientious-
validity found in previous meta-analyses with SS ness showed a relevant validity of .20. This coefficient
personality inventories. For example, O’Connor and is very similar to the validity coefficients found by
Paunonen (2007) found a coefficient of .24, Poropat Barrick and Mount (1991; Barrick et al., 2001),
(2009) found a validity of .23, Salgado (2000) found a Hurtz and Donovan (2000), and Salgado (1997). It
true validity of .28, and Trapmann et al. (2007) found is also similar to the coefficients found by Salgado
a true validity coefficient of .27. The results of the (2003) when conscientiousness is not assessed with
14 SALGADO AND TÁURIZ
TABLE 5
Results of the meta-analyses of the Big Five personality dimensions—criteria combinations
K ¼ number of independent samples; N ¼ total sample size; rw ¼ observed validity; SDr ¼ standard deviation of observed validity;
rc ¼ operational validity (validity corrected for criterion reliability and indirect range restriction in predictor); ri ¼ validity corrected for
criterion reliability, predictor reliability, and indirect range restriction in predictor; SDr ¼ standard deviation of r; %VE ¼ percentage of
variance accounted for by artifactual errors; 90CV ¼ 90% credibility value based on the operational validity (rc).
measures based on the Five-Factor Model. As but this finding was not anticipated. Agreeableness
hypothesized, extraversion, openness to experience, showed a validity coefficient of 7.31, similar to but
and agreeableness were not generalizable predictors slightly larger than the validity for conscientiousness,
of job performance ratings. and was also generalizable. Only the meta-analysis by
With regard to training proficiency, conscientious- Barrick and Mount (1991) examined the validity of
ness showed a r value of .16 and the 90% CV was the Big Five for predicting productivity data. They
positive. This validity estimate is lower than the found that conscientiousness was the only general-
coefficients found by Barrick and Mount (1991) and izable predictor of productivity data, with a coefficient
Salgado (1997), but larger than the value found by of .17. Consequently, FC measures for conscientious-
Hurtz and Donovan (2000). Emotional stability did ness and agreeableness were shown to be better
not prove to be a valid predictor of training predictors than their SS counterparts (as based on
proficiency, but the coefficient found was similar to Barrick and Mount’s findings).
the values found by Barrick and Mount, and Hurtz
and Donovan. With regard to openness to experience,
Meta-analysis by personality dimensions–FC
this personality dimension was shown to be a relevant
measure combinations
and generalizable predictor of training, with a validity
of .22 (90%CV ¼ .12), as was hypothesized. This As mentioned in the introduction to this article, FC
value is similar to the figures of .25, .14, and .26 found inventories are not a unique type of measure. In fact,
by Barrick and Mount, Hurtz and Donovan, and at least three different score types can be obtained
Salgado), respectively. Agreeableness showed a valid- from FC inventories (i.e., normative, quasi-ipsative,
ity coefficient similar to the coefficient for conscien- and purely ipsative), and this clearly contrasts with
tiousness, but the 90%CV included 0; therefore it did SS inventories which directly produce only normative
not show evidence of validity generalization. Extra- scores (although they can be transformed into
version did not predict training proficiency. ipsative ones). Therefore, it is crucial to examine
The third criterion examined was productivity data whether the different types of measures show similar
(e.g., sales). The hypothesis that conscientiousness or different validity sizes. Table 6 shows the results
and emotional stability would predict this criterion for the combinations of the Big Five and the three
was only fulfilled for conscientiousness, which showed types of FC. According to Hypothesis 3, quasi-
a validity of .27, and also showed validity general- ipsative measures would show larger validity than
ization, as the whole observed variability was normative FC, and ipsative measures.
explained by artifactual errors. Another personality As can be seen in Table 6, conscientiousness was a
dimension proved to be a predictor of productivity, predictor for the three types of FC measures and
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 15
showed validity generalization, as the 90%CV was Donovan (2000), O’Connor and Paunonen (2007),
positive and did not contain zero in the three cases. Poropat (2009), Salgado (1997, 2003), Tett et al.
However, the most important finding is that quasi- (1991), and Trapmann et al. (2007). We found no
ipsative measures, as hypothesized, showed very differences between the validity of normative FC and
much larger validity than ipsative measures (and ipsative measures for conscientiousness, which is
also than normative FC measures). The true validity small in both cases. The results for these two types
coefficients were .40, .16, and .16 for quasi-ipsative, of measures are slightly lower than the values found
normative FC, and ipsative measures, respectively. In in the previous meta-analyses of SS personality
order to check if the difference in validity could be inventories mentioned earlier. No other dimension
due to differences in job type or in study authorship showed to be a predictor of job and academic
(e.g., independent research vs. consultancy firms), we performance.
ran an examination of the equivalence of samples
between quasi-ipsative and ipsative inventories. We
Meta-analysis by personality dimension–
found that 88.9% of samples in ipsative studies and
criterion–FC measure combinations
85.7% of samples in quasi-ipsative studies consisted
Downloaded by [Florida International University] at 09:01 27 September 2014
of the same seven occupational categories, and The findings presented in the previous section suggest
similar results were obtained for the study author- that a more finely tuned examination of the validity
ship. Therefore, we reject the idea that the results of the three types of FC measures could be of interest
could be due to these two potentially confounding in order to advance our knowledge of this issue.
sources. Comparing the validity of quasi-ipsative Therefore, our last meta-analysis shows the results of
measures of conscientiousness with the normative FC the triple combination of personality dimensions,
and ipsative measures, the first is 2.5 times larger than criterion type, and type of FC measures. Due to the
the validity of the other two measures. Therefore, limitations of our database, we were not able to
Hypothesis 3 was totally supported by this finding. conduct the analysis for all possible combinations
The importance of the finding can be better assessed (e.g., we have no studies for emotional stability).
if we take into account that no previous meta- Table 7 shows the meta-analyses conducted for the
analyses of the relationship between conscientious- combination of personality dimension, FC type, and
ness and job performance had found values larger GPA. Table 8 shows the meta-analyses for the
than .30. For example, comparing our results with combinations in which job performance ratings and
previous meta-analyses of the Big Five, the quasi- training proficiency were used as criteria. Never-
ipsative measures showed larger validity as compared theless, the findings should be taken with caution, as
with the values found by Barrick and Mount (1991), the number of primary studies is small for normative
Barrick et al. (2001), Hough (1992), Hurtz and FC and ipsative inventories.
TABLE 6
Results of the meta-analyses of the Big Five–forced-choice type combinations
Normative forced-choice
Extraversion 5 2122 7.02 .11 7.03 7.04 .18 20 .17
Openness to experience 4 2059 .05 .03 .08 .09 .00 100 .08
Agreeableness 4 2059 .04 .06 .07 .07 .07 54 7.02
Conscientiousness 8 2732 .09 .08 .14 .16 .10 50 .02
Quasi-ipsative
Emotional stability 35 6992 .11 .17 .18 .20 .26 20 7.13
Extraversion 34 7120 .06 .16 .09 .10 .24 19 7.19
Openness to experience 21 3677 .16 .16 .23 .25 .21 25 7.02
Agreeableness 24 4738 .10 .16 .15 .16 .22 22 7.11
Conscientiousness 44 8794 .23 .16 .36 .40 .21 33 .11
Pure ipsative
Emotional stability 39 8055 .03 .08 .05 .05 .05 86 7.01
Extraversion 39 8180 .07 .08 .11 .12 .06 82 .04
Openness to experience 37 7555 .06 .09 .08 .09 .09 61 7.02
Agreeableness 36 7695 7.01 .10 7.01 7.02 .12 48 .12
Conscientiousness 40 8669 .09 .09 .14 .16 .09 64 .03
K ¼ number of independent samples; N ¼ total sample size; rw ¼ observed validity; SDr ¼ standard deviation of observed validity;
rc ¼ operational validity (validity corrected for criterion reliability and indirect range restriction in predictor); ri ¼ validity corrected for
criterion reliability, predictor reliability, and indirect range restriction in predictor; SDr ¼ standard deviation of r; %VE ¼ percentage of
variance accounted for by artifactual errors; 90CV ¼ 90% credibility value based on the operational validity (rc).
16 SALGADO AND TÁURIZ
TABLE 7
Results of the meta-analyses of the Big Five–criterion–forced-choice type combinations in academic studies
Normative forced-choice–GPA
Extraversion 4 1163 7.06 .08 7.09 7.10 .09 53 .01
Openness to experience 3 1100 .00 .07 .01 .01 .07 55 7.08
Agreeableness 3 1100 .06 .02 .08 .09 .00 100 .08
Conscientiousness 3 1100 .12 .07 .16 .18 .04 83 .12
Quasi-ipsative–GPA
Emotional stability 5 1813 .06 .07 .09 .10 .07 60 .00
Extraversion 5 1813 7.13 .04 7.19 7.21 .00 100 7.19
Openness to experience 4 1681 .19 .04 .28 .31 .00 100 .28
Agreeableness 4 1681 .01 .04 .02 .02 .00 100 .02
Conscientiousness 6 2140 .13 .08 .19 .21 .07 67 .11
Pure ipsative–GPA
Emotional stability 9 2103 .01 .06 .02 .02 .00 100 .02
Extraversion 11 2493 7.01 .09 7.02 7.03 .10 52 .09
Downloaded by [Florida International University] at 09:01 27 September 2014
K ¼ number of independent samples; N ¼ total sample size; rw ¼ observed validity; SDr ¼ standard deviation of observed validity;
rc ¼ operational validity (validity corrected for criterion reliability and indirect range restriction in predictor); ri ¼ validity corrected for
criterion reliability, predictor reliability, and indirect range restriction in predictor; SDr ¼ standard deviation of r; %VE ¼ percentage of
variance accounted for by artifactual errors; 90CV ¼ 90% credibility value based on the operational validity (rc).
TABLE 8
Results of the meta-analyses of the Big Five–forced-choice type–criteria combinations in occupational studies
K ¼ number of independent samples; N ¼ total sample size; rw ¼ observed validity; SDr ¼ standard deviation of observed validity;
rc ¼ operational validity (validity corrected for criterion reliability and indirect range restriction in predictor); ri ¼ validity corrected for
criterion reliability, predictor reliability, and indirect range restriction in predictor; SDr ¼ standard deviation of r; %VE ¼ percentage of
variance accounted for by artifactual errors; 90CV ¼ 90% credibility value based on the operational validity (rc).
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 17
As can be seen in Table 7, conscientiousness case. For ipsative measures, conscientiousness showed
predicted GPA with the three types of FC methods, a validity of .36 and all observed variance was explai-
and it showed validity generalization in the three ned by artifactual errors. However, as mentioned
cases. Quasi-ipsative inventories were slightly more earlier, the meta-analysis was conducted with three
valid than normative FC and ipsative measures (.21 studies and the total sample size is only 262 indiv-
vs. .18), although the validity is small in the three iduals. Consequently, this validity coefficient may
cases. The validity size of conscientiousness is similar change as the number of studies and the sample grow.
to that found in previous meta-analyses (e.g.,
O’Connor & Paunonen, 2007; Poropat, 2009; Salga-
DISCUSSION
do, 2000). Another relevant result is that openness to
experience, as assessed with quasi-ipsative question- Over 20 years of meta-analyses have demonstrated
naires, proved to be the most important predictor of that personality measures are valid predictors of
GPA, with a true validity coefficient of .31 and also academic and occupational performance. Empirical
showed validity generalization (90%CV equals .28). evidence has also been found that conscientiousness
Another relevant finding is that extraversion predicts and emotional stability show validity generalization
Downloaded by [Florida International University] at 09:01 27 September 2014
negatively GPA and generalizes its validity. across criteria, samples, and occupations. The past
In Table 8, the results of the meta-analyses for the meta-analytic evidence also showed that several
triple combination (personality dimension, type of variables moderated personality validity, including
FC measure, and occupational criterion type) can be criterion type, inventory type (e.g., Big Five, COPS),
observed. We used two occupational criteria: job occupation type, and complexity level. Additionally,
performance ratings and training proficiency. In the the use of the Five-Factor Model of personality
case of job performance ratings, quasi-ipsative proved to be a useful framework for classifying the
measures proved to be excellent predictors of this various measures designed under different conceptual
criterion. Conscientiousness showed a true validity of models (see Salgado & De Fruyt, 2005, for a general
.39, which is practically identical to the validity found review).
when the criterion type was not used as moderator Nevertheless, a number of researchers have ques-
(see Table 6). The main difference between the last tioned the utility of personality measures for aca-
finding and the present one is that artifactual errors demic and personnel selection decisions (e.g.,
now explain much more of the observed variance Morgeson et al., 2007a; Murphy & Dzieweczysnki,
(60% vs. 30%) and, consequently, the 90%CV is 2005). Among other things, personality measures
larger now than in Table 6. Conscientiousness, as were criticized because the validity size was generally
assessed by normative FC and ipsative question- small, and because individuals can fake their
naires, showed a smaller validity coefficient (.13 and responses, giving the most appropriate response to
.12, respectively). A relevant, but unexpected, result be hired. On the other hand, previous meta-analyses
was found for extraversion, when assessed with quasi- did not distinguish between SS and FC personality
ipsative measures, as its true validity was .30, and it measures, and they did not estimate the validity of
also showed validity generalization (90%CV ¼ .08). FC personality questionnaires. Furthermore, some
Until now, a validity coefficient of this magnitude for critics of personality measures for making academic
extraversion had never been found. Our speculation and occupational decisions have also suggested that
is that the validity is due to the fact that a number of the criterion domain should be expanded beyond the
primary studies dealt with occupations in which traditional job performance ratings when the validity
interpersonal relationships are important (e.g., sales of personality variables is examined (e.g., K. R.
and managerial jobs). Future research should exam- Murphy, in Morgeson et al., 2007a). In line with this
ine this finding with a more microanalytic orienta- point of view, in the present meta-analytical effort we
tion. Limitations of the database and the objectives try to include, as far as was possible, other less used
of this research preclude following the analysis occupational criteria that reflect individual beha-
beyond this point. viours and outputs.
With regard to training proficiency, the limitations In connection with Hypothesis 1, our findings
of our database only allow us to conduct meta- clearly supported the idea that conscientiousness is a
analyses for quasi-ipsative and ipsative measures. valid predictor for all criteria and it generalizes its
Furthermore, the number of studies and the sample validity. The hypothesis that emotional stability
sizes are small in various combinations. Therefore, the would be a predictor of all criteria was not supported,
findings should be considered with caution. For quasi- as we found that it did not predict job performance
ipsative measures, conscientiousness, agreeableness, ratings, training proficiency, or productivity data.
emotional stability, and openness to experience sho- Therefore, the findings suggest the role of conscien-
wed a small validity, ranging from .14 to .24. Open- tiousness as a key variable for a theory of work and
ness to experience showed the largest coefficient in this academic performance. Our results indicate that
18 SALGADO AND TÁURIZ
around 16% of job performance variance is due to Therefore, taking these findings as a whole, it can
the effects of conscientiousness. This clearly contrasts be said that FC measures of the Big Five, especially in
with around 5% of variance found in the meta- the quasi-ipsative form, are useful tools for making
analyses by Barrick and Mount (1991), Hurtz and academic and occupational decisions because their
Donovan (2000), and Salgado (1997) among others validity is similar or even larger than the validity of
(see also the reanalysis by Schmidt et al., 2008). SS inventories, and because they present some
Hypothesis 2 was also supported, as openness to singularities (e.g., the relationship between agreeable-
experience was shown to be a predictor of training ness and productivity) that can contribute to
proficiency and it generalized its validity. Hypothesis improving the validity of a compound of cognitive
3 was also confirmed, as quasi-ipsative inventories and personality measures.
proved to be more valid than ipsative ones. Additionally, the findings reported in the section
In view of the objectives of this research, this meta- on methodology (see Table 1) demonstrated that the
analysis makes some unique contributions. The most reliability coefficients of the FC inventories are
important contribution of this study is to show that similar to the coefficients found for SS inventories
when conscientiousness is assessed with quasi-ipsative in previous personality validity meta-analyses (e.g.,
Downloaded by [Florida International University] at 09:01 27 September 2014
FC questionnaires, this personality dimension is the Barrick & Mount, 1991; Hough, 1992; Salgado 1997)
best single personality predictor of academic and job and in meta-analysis of personality reliability coeffi-
performance. The validity size for predicting job cients (e.g., Viswesvaran & Ones, 1999). This finding
performance is almost double that of the validity is relevant as some researchers had claimed that FC
found in classic meta-analyses of personality and job inventories showed inflated coefficients (e.g., Johnson
performance relationships (e.g., Barrick & Mount, et al., 1988; Tenopyr, 1988), whereas other research-
1991; Hough, 1992; Salgado, 1997; see also Schmidt ers had suggested that the reliability coefficients could
et al., 2008, for the results using IRR correction). be slightly smaller than those of normative inven-
Moreover, the validity for the quasi-ipsative mea- tories (e.g., Bartram, 1996). Our results totally agree
sures of conscientiousness is slightly larger than the with Bartram’s perspective. The finding is also
validity of other common predictors used for important in connection with the possible effects on
personnel and academic decisions, including assess- the size of validity coefficients. The effects of
ment centres (Glauger, Rosenthal, Thornton, & measurement error are practically the same for SS
Bentson, 1987), structured interviews (Huffcutt & and FC inventories.
Arthur, 1994; McDaniel, Whetzel, Schmidt, & It is also interesting to compare the validity of
Maurer, 1994), situational judgement tests (McDa- quasi-ipsative measures of conscientiousness with the
niel, Morgeson, Finnegan, Campion, & Braverman, validity of general mental ability (GMA). Different
2001), and similar or slightly larger than the validity meta-analyses have shown that the observed average
found for the integrity test and other COPS (Ones & validity of GMA is around .22 * .25 (e.g., Hartigan
Viswesvaran, 2001b; Ones et al., 1993). & Wigdor, 1989; Hunter & Hirsh, 1987; Hunter &
A second contribution was to show that other Hunter, 1984; Salgado et al., 2003; Schmitt et al.,
personality dimensions, such as extraversion, agree- 1984). The observed average validity of the quasi-
ableness, and openness to experience, were also pre- ipsative measures of conscientiousness was .23, a
dictors of specific criteria, with validity sizes larger than value very similar to the one found for GMA.
the values found for SS inventories (e.g., Barrick & However, as Schmidt et al. (2008) showed, an
Mount, 1991; Barrick et al., 2001, Hurtz & Donovan, important difference is that the range restriction
2000; Salgado, 1997). These two contributions suggest values of GMA are very much larger than the
that quasi-ipsative questionnaires are very robust respective values of personality measures. This last
procedures for assessing personality in academic and characteristic probably contributes to producing the
occupational settings. Until now, no previous meta- larger validity of GMA tests. Furthermore, person-
analyses had examined this issue. It was therefore ality measures are usually less reliable.
unknown that quasi-ipsative questionnaires were such Our findings as a whole, considered together with
excellent methods for making personnel decisions. the findings by Nguyen and McDaniel (2000),
The comparison between normative FC and according to which FC inventories are less fakable
ipsative inventories suggests that the latter are than SS questionnaires, have important implications
similarly or slightly more valid, although the number for the conditions that FC measures should fulfil to
of studies is relatively small to reach a firm be used in practical situations (see Hicks, 1970). First,
conclusion. Comparing the results of the previous SS inventories are more affected by faking than FC
meta-analyses, mainly conducted with SS inventories, inventories. Second, the validity of FC measures of
with those of the normative and ipsative FC conscientiousness (especially if they are quasi-ipsative
inventories, it cannot be concluded that the SS inventories) is similar to or larger than the validity of
questionnaires are superior in validity. SS inventories in some cases. Thus, it can be
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 19
concluded that FC inventories can be a good measures can be unreservedly used for making
alternative to SS questionnaires for making academic academic and personnel decisions, as a number of
and personnel decisions. issues were not answered in the present study and
future research should be conducted to respond to
them. For example, it will be necessary to investigate
Study limitations and strengths
if there are any gains in using SS and FC inventories
This research also has some limitations that should be simultaneously. Additional research should also be
mentioned. First, some criteria were not examined done on the effects of faking on FC inventories, as
(i.e., job absenteeism or tenure) because no studies suggested by Nguyen and McDaniel (2000). More
reported validity for these criteria. Second, the importantly, the problem that FC measures may
number of studies is small in some cases or the total produce ipsative scores, and that these scores could
sample size is small (see Salgado, 1998b). Conse- posit difficulties for comparing individuals, is not
quently, the estimates reported here could change as solved with this meta-analysis. In other words, this
the number of studies or the sample size is enlarged. meta-analysis does not allow us to resolve the
Third, we did not examine whether FC inventories problem of recovering normative scores from ipsative
Downloaded by [Florida International University] at 09:01 27 September 2014
based on the FFM have larger validity than FC or quasi-ipsative scores. Of course, if normative
inventories not based on the FFM. Previous research information can be recovered from ipsative scores,
found that SS inventories based on the FFM showed the FC format would have an advantage over the SS
larger criterion validities (Salgado, 2003; see also format and, therefore, it would be the format of
Warr, Bartram, & Brown, 2005a), and the same can preference. However, methodological advances have
be expected for FC inventories. Fourth, the current been made only very recently in this direction, mainly
meta-analysis did not examine the effect of using FC in the use of IRT technology. In the last few years,
methodology in both predictor and criteria, and some two approaches for using IRT approaches for
studies showed that the validity is larger when both creating and scoring FC inventories have been
measures used FC formats (e.g., Bartram, 2005, suggested and both of them are very promising. The
2007). Fifth, readers should note that a potential first approach relies on an ideal-point response
limitation is that personality instruments differ in process (Chernyshenko, Stark, Drasgow, & Roberts,
design and in terms of the variables and construct 2007; McCloy et al., 2005; Stark et al., 2005). The
measured. An anonymous reviewer suggested that second approach deals with items of dominance
there are possibly instrument design issues con- (Brown & Bartram, 2009; Brown & Maydeu-
founded with FC scoring formats. The present Olivares, 2011; Maydeu-Olivares & Brown, 2010);
meta-analysis cannot control these issues with the Maydeu-Olivares & Böckenholt, 2005. Both ap-
current database. Future studies should examine proaches are based on the assumptions posited by
these potential confounding issues, using appropriate Thurstone (1928) for the measurement of attitudes.
inventories and scoring systems. As the anonymous These recent IRT techniques applied to ipsative
reviewer noted, the new scoring model for the and quasi-ipsative formats can also have implications
OPQ32r provides the possibility of comparing the for the assessment of global dimensions rather than
validities of FC ipsative and FC normative scores facets. In this regard, Brown and Maydeu-Olivares
from the same instrument and the same set of data (2011) pointed out that ‘‘given the same number of
(see the OPQ32r manual for examples of rescoring; traits, the lower the average correlation between them
SHL, 2009). This could be the basis for future studies. the better the true scores are recovered’’. This may
Many of the strengths of this research have have an important consequence for the assessment of
already been mentioned, but we wish to emphasize the Big Five and facets using ipsative measures. At
here that we used a very comprehensive database of the construct level, the Big Five are orthogonal or
primary studies, that this is the first meta-analysis relatively independent of each other. Therefore, the
which differentiates the various types of FC mea- latter condition pointed out by Brown and Maydeu-
sures, and that the combination of personality Olivares can be fulfilled with measures at the
dimensions, criterion type, and FC type give im- construct level. However, by definition and empiri-
portant clarification on the role of personality cally, the facets of the personality dimensions largely
dimensions at work. correlated among themselves as a consequence of the
latent variable (i.e., the Big Five). Consequently,
ipsative measures may be less appropriate for
Ipsative, quasi-ipsative, and normative
recovering the true scores of the facets.
scores in personality inventories
Future research should also be devoted to the
This meta-analysis has shown that FC measures important issue of whether SS, quasi-ipsative, and
predict academic and occupational performance. ipsative measures of the same factor are really
However, these findings do not mean that FC assessing the same content. The question of whether
20 SALGADO AND TÁURIZ
FC and SS personality measures assess different conscientiousness are more valid than any other type
aspects or elements of the same personality construct of measures, including SS, normative FC, and
or they essentially assess the same elements is an issue ipsative ones. The validity size of quasi-ipsative
that is not examined in this meta-analysis. Conse- measures of conscientiousness is almost twice that
quently, based on our results, we cannot make firm of the validity of SS inventories, and equal to or
conclusions about this question. Previous research larger than the validity of assessment centres,
did not respond to it either. Therefore, this is an open structured interviews, situational judgement tests,
matter, relevant from both the theoretical and and personality composites (i.e., integrity, COPS).
practical points of view. From the theoretical point The results of the reliability estimates indicated that
of view, if SS and FC measures cover different aspects FC measures do not inflate or deflate reliability
of a specific personality dimension, then it makes coefficients. Therefore, as a whole, quasi-ipsative FC
sense to put together the two types of measures in measures can be seen as useful tools for making
order to have a better assessment of personality academic and personnel decisions, and they can be
dimensions. On the other hand, from a practical seen as a robust alternative to SS inventories because
point of view, if FC and SS measures are correlated they are more resistant to faking.
Downloaded by [Florida International University] at 09:01 27 September 2014
Bartram, D. (1996). The relationship between ipsatized and *Cocolas, G. H., Sleath, B., & Hanson-Drivers, C. (1997). Use of
normative measures of personality. Journal of Occupational the Gordon Personal Profile-Inventory of pharmacists and
and Organizational Psychology, 69, 25–39. pharmacy students. American Journal of Pharmaceutical
*Bartram, D. (2005). The great eight competencies: A criterion- Education, 61, 257–265.
centric approach to validation. Journal of Applied Psychology, *Converse, P. D., Oswald, F. L., Imus, A., Hedricks, C., Roy, R.,
90, 1185–1203. doi:10.1037/0021-9010.90.6.1185 & Butera, H. (2008). Comparing personality tests and warnings:
*Bartram, D. (2007). Increasing validity with forced-choice Effects on criterion-related validity and test-taker reactions.
criterion measurement formats. International Journal of Selec- International Journal of Selection and Assessment, 16, 155–169.
tion and Assessment, 15, 263–272. *Conway, J. M. (2000). Managerial performance development
Belbin, M. (1981). Management teams: Why they succeed or fail. constructs and personality correlates. Human Performance, 13,
London, UK: Heinemann. 23–46.
Bennett, M. (1977). Testing management theories cross-culturally. Cornwell, J. M., & Dunlap, W. P. (1994). On the questionable
Journal of Applied Psychology, 62, 578–581. soundness of factoring ipsative data: A response to Saville and
*Bhatnagar, R. P. (1969). A study of some EPPS variables as Wilson (1991). Journal of Occupational and Organizational
factors of academic achievement. Journal of Applied Psychol- Psychology, 67, 89–100.
ogy, 53, 107–111. Dalal, R. S. (2005). A meta-analysis of the relationship between
Brogden, H. E. (1954). A rationale for minimizing distortion in organizational citizenship behavior and counterproductive
personality questionnaire keys. Psychometrika, 19, 151–148. work behavior. Journal of Applied Psychology, 90, 1241–1255.
Downloaded by [Florida International University] at 09:01 27 September 2014
*Brown, A., & Bartram, D. (2009, April). Doing less but getting doi:10.1037/0021-9010.90.6.1241
more: Improving forced-choice measures with IRT. Paper *Davis, K. R., & Banken, J. A. (2005). Personality type and clinical
presented at the 24th annual conference of the Society for evaluations in an obstetrics/gynecology medical student clerkship.
Industrial and Organizational Psychology, New Orleans, LA. American Journal of Obstetrics and Gynecology, 193, 1807–1810.
Brown, A., & Maydeu-Olivares, A. (2011). Item response modeling *Dodd, W. E., Wollowick, H. B., & McNamara, W. J. (1970). Task
of forced-choice questionnaires. Educational and Psychological difficulty as a moderator of long-range prediction. Journal of
Measurement, 71, 460–502. doi:10.1177/0013164410375112 Applied Psychology, 34, 265–270.
*Buel, W. D., & Bachner, V. M. (1961). The assessment of Dudley, N. M., Orvis, K. A., Liebicki, J. E., & Cortina, J. M.
creativity in a research setting. Journal of Applied Psychology, (2006). A meta-analytic investigation of conscientiousness in
45, 353–358. the prediction of job performance: Examining the intercorrela-
Carretta, T., & Ree, M. J. (2000). General and specific cognitive tions and the incremental validity of narrow traits. Journal of
and psychomotor abilities in personnel selection: The prediction Applied Psychology, 91, 40–57. doi:10.1037/0021-9010.91.1.40
of training and job performance. International Journal of Dunlap, W. P., & Cornwell, J. M. (1994). Factor analysis of ipsative
Selection and Assessment, 8, 227–236. measures. Multivariate Behavioral Research, 29, 115–126.
Carreta, T., & Ree, M. J. (2001). Pitfalls of ability research. Edwards, A. L. (1957). The social desirability variable in personality
International Journal of Selection and Assessment, 9, 325–335. assessment and research. New York, NY: Dryden.
Cattell, R. B. (1944). Psychological measurement: Ipsative, Edwards, A. L. (1973). Edwards personal preference schedule
normative and interactive. Psychological Review, 51, 292–303. manual. New York: Psychological Corporation.
Cattell, R. B., & Brennan, J. (1994). Finding personality structure Feather, N. T. (1973). The measurement of values: Effects of
when ipsative measurements are the unavoidable basis of the different assessment procedures. Australian Journal of Psychol-
variables. American Journal of Psychology, 107, 261–274. ogy, 25, 221–231.
Chan, W., & Bentler, P. M. (1993). The covariance structure *Fine, S., & Dover, S. (2005). Cognitive ability, personality, and
analysis of ipsatives. Sociological Methods and Research, 22, low fidelity simulation measures in predicting training perfor-
214–247. mance among customer service representatives. Applied HRM
Chapman, D. W., Blackburn, R. W., Austin, A. E., & Hutcheson, Research, 10, 103–106.
S. M. (1983). Expanding analytic possibilities of Rokeach *Fineman, S. (1975). The Work Preference Questionnaire: A
values data. Educational and Psychological Measurement, 43, measure of managerial need for achievement. Journal of
419–421. doi:10.1177/001316448304300211 Occupational Psychology, 48, 11–32.
Chernyshenko, O., Stark, S., Drasgow, F., & Roberts, B. W. *Francis-Smythe, J., Tinline, G., & Allender, C. (2002). Identifying
(2007). Constructing personality scales under the assumption of high potential police officers and role characteristics. Paper
an ideal point response process: Towards increasing the presented at the Division of Occupational Psychology con-
flexibility of personality measures. Psychological Assessment, ference, British Psychological Society, Bournemouth, UK.
19, 88–106. French, J., & Raven, B. H. (1959). The basis of social power. In D.
Cheung, M. W.-L., & Chan, W. (2002). Reducing uniform response Cartwright (Ed.), Studies in social power (pp. 150–167). Ann
bias with ipsative measurement in multiple-group confirmatory Arbor, MI: University of Michigan, Institute for Social Research.
factor analysis. Structural Equation Modeling, 9, 55–77. *Furnham, A. (1994). The validity of the SHL Customer Service
*Christiansen, N. D., Burns, G. N., & Montgomery, G. E. (2005). Questionnaire (CSQ). International Journal of Selection and
Reconsidering forced-choice item formats for applicant person- Assessment, 2, 157–165.
ality assessment. Human Performance, 18, 267–307. *Furnham, A., & Stringfield, P. (1993). Personality and work
Clarke, S., & Robertson, I. T. (2005). A meta-analytic review of the performance: Myers-Briggs Type Indicator correlates of man-
Big Five personality factors and accidents involvement in agerial performance in two cultures. Personality and individual
occupational and non-occupational settings. Journal of Occupa- Differences, 14, 145–153.
tional and Organizational Psychology, 78, 355–36. *Gallessich, J. (1970). An investigation of correlates of academic
Clemans, W. V. (1966). An analytical and empirical examination of success of freshmen engineering students. Journal of Counseling
some properties of ipsative measures. Psychometric Mono- Psychology, 17, 173–178.
graphs, 14, 1–56. *Garcı́a-Izquierdo, A. L. (2001). Validación orientada al criterio de
*Clevenger, J., Pereira, G. M., Wiechman, D., Schmitt, N., & procedimientos de selección de personal de oficio para el
Schmidt-Harvey, V. (2001). Incremental validity of situational pronóstico del rendimiento laboral y formativo en el sector de la
judgment tests. Journal of Applied Psychology, 86, 410–417. construcción. Unpublished doctoral dissertation, University of
doi:10.1037/0021-9010.86.3.410 Murcia, Murcia, Spain.
22 SALGADO AND TÁURIZ
*Gebhart, C. G., & Hoyt, D. T. (1958). Personality needs of under- Hogan, J., & Holland, B. (2003). Using theory to evaluate
and over-achieving freshman. Journal of Applied Psychology, personality and job performance relations: A socioanalytic
42, 125–128. perspective. Journal of Applied Psychology, 88, 100–112.
Ghiselli, E. E. (1954). The forced-choice technique in self- doi:10.1037/0021-9010.88.1.100
description. Personnel Psychology, 7, 201–208. doi:10.1111/ Hogan, R. T. (2001). Personality and industrial and organizational
j.1744-6570.1954.tb01593.x psychology. In B. W. Roberts & R. Hogan (Eds.), Personality
Glauger, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. psychology in the workplace (pp. 3–16). Washington, DC:
(1987). Meta-analysis of assessment center validity. Journal of American Psychological Association.
Applied Psychology, 72, 493–511. doi:10.1037/0021- Hogan, R. (2005a). Comments. Human Performance, 16, 405–407.
9010.72.3.493 doi:10.1207/s15327043hup1804_6
Gleser, L. J. (1972). On bounds for the average correlation between Hogan, R. (2005b). In defense of personality measurement: New
subtest scores in ipsatively scored tests. Educational and wine for old whiners. Human Performance, 18, 331–341.
Psychological Measurement, 32, 759–765. doi:10.1207/s15327043hup1804_1
*Goffin, R. D., Jan, I., & Skinner, E. (2011). Forced-choice and Hogan, R., & Hogan, J. (2001). Assessing leadership: A view of the
conventional personality assessment: Each may have unique dark side. International Journal of Selection and Assessment, 9,
value in pre-employment testing. Personality and Individual 40–51.
Differences, 5, 840–844. Horn, J. L. (1971). Motivation and dynamic calculus concepts from
*Goodstein, L. D., & Heilbrum, A. B. (1962). Prediction of college multivariate experiment. In R. B. Cattell (Ed.), Handbook of
Downloaded by [Florida International University] at 09:01 27 September 2014
achievement from the Edwards Personal Preference Schedule at multivariate experimental psychology (2nd printing, pp. 611–
three levels of intellectual ability. Journal of Applied Psychology, 641). Chicago, IL: Rand McNally.
46, 317–320. Hough, L. M. (1992). The ‘‘Big Five’’ personality variables-
*Gordon, L. V. (1993). Gordon Personal Profile-Inventory: Manual construct confusion: Description versus prediction. Human
1993 revision. San Antonio, TX: Pearson-TalentLens. Performance, 5, 139–155.
*Graham, W. K., & Calendo, J. T. (1969). Personality correlates of Hough, L. M., & Ones, D. S. (2001). The structure, measurement,
supervisory ratings. Personnel Psychology, 22, 483–487. validity, and use of personality variables in industrial, work and
*Grimsley, G., & Jarret, H. F. (1973). The relation of past organizational psychology. In N. R. Anderson, D. S. Ones, H.
managerial achievement to test measures obtained in the K. Sinangil, & C. Viswesvaran (Eds.), Handbook of industrial,
employment situation: Methodology and results. Personnel work, and organizational psychology: Vol. 1. Personnel psychol-
Psychology, 26, 31–48. ogy (pp. 233–276). London, UK: Sage.
*Guller, M. (2003). Predicting performance of law enforcement Hough, L. M., & Oswald, F. L. (2005). They’re right, well. . .
personnel using the candidate and officer personnel survey and mostly right: Research evidence and an agenda to rescue
other psychological measures. Unpublished doctoral disserta- personality testing from 1960s insights. Human Performance,
tion, Seton Hall University, South Orange, NJ. 18(4), 373–387. doi:10.1207/S15327043hup1804_4
*Hakel, M. (1966). Prediction of college achievement from the Huffcutt, A. I., & Arthur, W. (1994). Hunter and Hunter (1984)
Edwards Personal Preference Schedule using intellectual revisited: Interview validity for entry-level jobs. Journal of Applied
ability as a moderator. Journal of Applied Psychology, 50, Psychology, 79(2), 184–190. doi:10.1037/0021-9010.79.2.184
336–340. *Hughes, G. L., & Prien, E. P. (1986). An evaluation of alternate
Hartigan, J. A., & Wigdor, A. K. (1989). Fairness in employment scoring methods for the mixed standard. Personnel Psychology,
testing: Validity generalization, minority issues, and the General 39, 839–847.
Aptitude Test Battery. Washington, DC: National Academies *Hughes, J. L., & Dood, W. E. (1961). Validity versus stereotype:
Press. Predicting sales performance by ipsative scoring of a person-
Hayes, W. O. (1967). Quantification in psychology. Belmont, CA: ality test. Personnel Psychology, 15, 343–355.
Brooks/Cole. Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job
*Helton, K. T., & Street, D. R. (1992). The five-factor model and knowledge, and job performance. Intelligence, 29, 340–362.
naval aviation candidates. Naval Aerospace Medical Research Hunter, J. E., & Hirsh, H. R. (1987). Applications of meta-analysis.
Laboratory, Naval Air Station, Pensacola, FL. International review of industrial and organizational psychology
Heggestad, E. D., Morrison, M., Reeve, C. L., & McCloy, R. A. (Vol. 2, pp. 321–357). Chichester, UK: Wiley.
(2006). Forced-choice assessments of personality for selection: Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of
Evaluating issues of normative assessment and faking resis- alternative predictors of job performance. Psychological Bulle-
tance. Journal of Applied Psychology, 91, 9–24. tin, 96, 72–98.
Hershcovis, M. S., Turner, N., Barling, J., Arnold, K. A., Dupré, Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis:
K. E., Inness, M., et al. (2007). Predicting workplace aggres- Correcting error and bias in research findings. Newbury Park,
sion: A meta-analysis. Journal of Applied Psychology, 92, 228– CA: Sage.
238. doi:10.1037/0021-9010.92.2.228 Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis:
Hicks, L. E. (1970). Some properties of ipsative, normative, and Correcting error and bias in research findings (2nd ed.). Newbury
forced-choice normative measures. Psychological Bulletin, 74, Park, CA: Sage.
167–184. Hunter, J. E., Schmidt, F. L, & Le, H. (2006). Implications of direct
*Hirsh, J. B., & Peterson, J. B. (2008). Predicting creativity and and indirect range restriction for meta-analysis methods and
academic success with ‘‘fake-proof’’ measure of the Big Five. findings. Journal of Applied Psychology, 91, 594–612.
Journal of Research in Personality, 42, 1323–1333. doi:10.1016/ doi:10.1037/0021-9010.91.3.594
j.jrp.2008.04.006 Hurtz, G. M., & Donovan, J. J. (2000). Personality and job
Hogan, J., Barrett, P., & Hogan, R. (2007). Personality measure- performance: The big five revisited. Journal of Applied
ment, faking, and employment selection. Journal of Applied Psychology, 85, 869–879. doi:10.1037/0021-9010.85.6.869
Psychology, 92, 1270–1285. *Illiescu, D., Ilie, A., & Aspas, D. G. (2011). Examining the
Hogan, J., & Brinkmeyer, K. (1997). Bridging the gap between criterion-related validity of the Employee Screening Question-
overt and personality-based integrity tests. Personnel Psychol- naire: A three-sample investigation. International Journal of
ogy, 50, 587–599. doi:10.1111/j.1744-6570.1997.tb00704.x Selection and Assessment, 19, 222–228.
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 23
*Izard, C. E. (1962). Personality characteristics (EPPS), level of Kuder, F. (1975). Manual: Kuder E general interest survey.
expectation, and performance. Journal of Consulting Psychol- Chicago, IL: Science Research associates.
ogy, 26, 394. *Kusch, R. I., Deller, J., & Albrecht, A. G. (2008, July). Predicting
Jackson, D. J., & Alvin, D. F. (1980). The factor analysis of expatriate job performance: Using the normative NEO-PI-R or
ipsative measures. Sociological Methods and Research, 9, 218– the ipsative OPQ32i? Paper presented at the 29th International
238. Congress of Psychology, Berlin, Germany.
Jackson, D. N. (2002). Employee screening questionnaire manual. Law, K. S., Schmidt, F. L., & Hunter, J. E. (1994a). Nonlinearity
Port Huron, MI: Sigma Assessment Systems, ESQ. of range corrections in meta-analysis: Test of an improved
*Jackson, D. N., Wroblewski, V. R., & Ashton, M. C. (2000). The procedure. Journal of Applied Psychology, 79, 425–438.
impact of faking on employment tests: Does forced choice offer Law, K. S., Schmidt, F. L., & Hunter, J. E. (1994b). A test of two
a solution? Human Performance, 13, 371–388. refinements in procedures for meta-analysis. Journal of Applied
Johnson, C. E., Wood, R., & Blinkhorn, S. F. (1988). Spuriouser Psychology, 79, 978–986.
and spuriouser: the use of ipsative personality tests. Journal of Le, H., & Schmidt, F. L. (2006). Correcting for indirect range
Occupational Psychology, 61, 153–162. restriction in meta-analysis: Testing a new analytic procedure.
Johnson, J. A., & Hogan, R. T. (2006). A socioanalytic view of Psychological Methods, 11, 416–438.
faking. In R. L. Griffith & M. H. Peterson (Eds.), A closer *Lievens, F., Harris, M. M., Keer, E. V., & Bisqueret, C. (2003).
examination of applicant faking behavior (pp. 209–231). Predicting cross-cultural training performance: The validity of
Charlotte, NC: IAP. personality, cognitive ability, and dimensions measured by an
Downloaded by [Florida International University] at 09:01 27 September 2014
Judge, T., & Bono, J. E. (2001). Relations of core self-evaluations assessment center and a behavior description interview. Journal
traits –self-esteem, generalized self-efficacy, locus of control, of Applied Psychology, 88, 476–489. doi:10.1037/0021-
and emotional stability–with job satisfaction and job perfor- 9010.88.3.476
mance: A meta-analysis. Journal of Applied Psychology, 86, 80– *Lunneborg, P. W. (1970). EPPS patterns and academic achieve-
92. doi:10.1037/0021-9010.86.1.80 ment in counseling clients. Educational and Psychological
Judge, T., Bono, J. E., Ilies, R., & Gerdhardt, M. W. (2002). Measurement, 30, 393–398. doi:10.1177/0013164470030000223
Personality and leadership: A qualitative and quantitative *Maher, H. (1959). Follow-up on the validity of a forced-choice
review. Journal of Applied Psychology, 87, 765–780. doi: study activity questionnaire in another setting. Journal of
10.1037/0021-9010.87.6.765 Applied Psychology, 43, 293–295.
Judge, T. A., Bono, J. E., & Locke, E. A. (2000). Personality and Matthews, G., Stanton, N., Graham, N. C., & Brimelow, C. (1990).
job satisfaction: The mediating role of job characteristics. A factor analysis of the scales of the occupational personality
Journal of Applied Psychology, 85, 237–249. doi:10.1037/0021- questionnaire. Personality and Individual Differences, 11(6),
9010.85.2.237 591–596. doi:10.1016/0191-8869(90)90042-P
Judge, T., Heller, D., & Mount, M. K. (2002). Five-factor model of Maydeu-Olivares, A., & Brown, A. (2010). Item response
personality and job satisfaction: A meta-analysis. Journal of modeling of paired comparison and ranking data. Multi-
Applied Psychology, 87, 530–541. doi:10.1037/0021-9010.87.3.530 variate Behavioral Research, 45, 935–974. doi:10.1080/
Judge, T. A., Jackson, C. L., Shaw, J. C., Scott, B. A., & Rich, B. 00273171.2010.531231
L. (2007). Self-efficacy and work-related performance: The Maydeu-Olivares, A., & Böckenhlt, U. (2005). Structural equation
integral roles of individual differences. Journal of Applied modeling of paired comparison and ranking data. Psychological
Psychology, 92, 107–127. doi:10.1037/0021-9010.92.1.107 Methods, 10, 285–304.
*Kahn, J. H., Nauta, M. M., Gailbreath, R. D., Tipps, J., & McCloy, R., Heggestad, E., & Reeve, C. (2005). A silk purse from
Chartrand, J. M. (2002). The utility of career and personality the sow’s ear. Retrieving normative information from multi-
assessment in predicting academic progress. Journal of Career dimensional forced-choice items. Organizational Research
Assessment, 10, 3–23. doi:10.1177/1069072702010001001 Methods, 8, 222–248.
Kaplan, S., Bradley, J. C., Luschman, J. N., & Haynes, D. (2010). McCrae, R. R., & Costa, P. T. (1990). Personality in adulthood.
On the role of positive and negative affectivity in job New York, NY: Guilford Press.
performance: A meta-analytic investigation. Journal of Applied McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M.
Psychology, 94, 162–176. doi:10.1037/0021-9010.94.1.162 A., & Braverman, E. P. (2001). Predicting job performance
*Kazmier, L. J. (1961). Cross-validation groups, extreme groups, using situational judgment tests: A clarification of the literature.
and the prediction of academic achievement. Journal of Journal of Applied Psychology, 86, 730–740.
Educational Psychology, 52, 195–198. McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D.
King Hunter, J. E., & Schmidt, F. L. (1980). Halo in a (1994). The validity of employment interviews: A comprehen-
multidimensional forced-choice performance evaluation scale. sive review and meta-analysis. Journal of Applied psychology,
Journal of Applied Psychology, 65, 507–516. 79(4), 599–616.
Knapp, D., Heggestad, E., & Young, M. (2004). Understanding and *McDaniel, M. A., Yost, A. P., Ludwick, M. H., Hense, R. L., &
improving the Assessment of Individual Motivation (AIM) in the Hartman, N. S. (2004, April). Incremental validity of a
Army’s GED Plus program (Study Note 2004-03). Alexandria, situational judgment test. Paper presented at the 19th annual
VA: US Army Research Institute for the Behavioral and Social conference of the Society for Industrial and Organizational
Sciences. Psychology, Chicago, IL.
*Knauft, E. B. (1955). Test validity over seventeen-year period. Meade, A. W. (2004). Psychometric problems and issues involved
Journal of Applied Psychology, 39, 382–383. with creating and using ipsative measures for selection. Journal
Kolb, D. A. (1985). Learning style inventory. Boston, MA: McBer of Occupational and Organizational Psychology, 77, 531–552.
& Co. Meglino, B. M., & Ravlin, E. C. (1998). Individual values in
*Kriedt, P. H., & Dawson, R. I. (1961). Response set and the organizations: concepts, controversies, and research. Journal of
prediction of clerical job performance. Journal of Applied Management, 24, 351–389.
Psychology, 45, 175–178. *Morgan, R. R. (1975). Prediction of college achievement using the
*Krug, R. D. (1959). Over- and under-achievement and the need achievement scale from the Edwards Personal Preference
Edwards Personal Preference Schedule. Journal of Applied Schedule. Educational and Psychological Measurement, 35, 387–
Psychology, 43, 133–136. 392. doi:10.1177/001316447503500217
24 SALGADO AND TÁURIZ
*Morgan, R. R. (1976). Utilization of levels of intellectual ability as Oh, I.-S., Schmidt, F. L., Mount, M. K., Le, H., Guay, R. P.,
a control variable in studies of non-intellectual factors in Takahashi, K., et al. (2011), April. The five-factor model of
academic achievement. Educational and Psychological Measure- personality and performance in East Asia. Paper presented at the
ment, 36, 465–472. doi:10.1177/001316447603600229 annual conference of the Society for Industrial and Organiza-
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. tional Psychology, Chicago, IL.
R., Murphy, K., & Schmitt, N. (2007a). Reconsidering the use *Olson, D. A., Shultz, K. S., & Scott, G. (2000, April). The
of personality tests in personnel contexts. Personnel Psychology, association between personality preferences and behavioral
60, 683–729. ratings for physician leaders. Paper presented at the annual
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., conference of the Society for Industrial and Organizational
Murphy, K., & Schmitt, N. (2007b). Are we getting fooled again? Psychology, New Orleans, LA.
Coming to terms with limitations in the use of personality tests for Ones, D. S. (1993). The construct of integrity tests. Unpublished
personnel selection. Personnel Psychology, 60, 1029–1049. doctoral dissertation, University of Iowa, Iowa City, IA.
Moscoso, S., & Salgado, J. F. (2004). ‘‘Dark side’’ personality Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. (2007).
styles as predictors of task, contextual, and job performance. In support of personality assessment in organizational
International Journal of Selection and Assessment, 12, 356– settings. Personnel Psychology, 60, 995–1027. doi/10.1111/
362. j.1744-6570
Mount, M. K., & Barrick, M. R. (1995). The Big Five personality Ones, D. S., & Viswesvaran, C. (1999). Job-specific applicant pools
dimensions: Implications for research and practice in human and national norms for personality scales: Implications for range
Downloaded by [Florida International University] at 09:01 27 September 2014
resources management. Research in Personnel and Human restriction correction in validation research. Paper presented at
Resources Management, 13, 153–200. the 14th annual conference of the SIOP, Atlanta, GA.
*Muchinsky, P. M., & Hoyt, D. P. (1973). Predicting college grades Ones, D. S., & Viswesvaran, C. (2001a). Integrity tests and other
of engineering graduates from selected personality and aptitude criterion-focused occupational personality scales (COPS) used
variables. Educational and Psychological Measurement, 33, 935– in personnel selection. International Journal of Selection and
937. doi:10.1177/001316447303300425 Assessment, 9, 31–39.
*Mukherjee, R. N. (1968). Achievement values and scientific Ones, D. S., & Viswesvaran, C. (2001b). Personality at work:
productivity. Journal of Applied Psychology, 52, 145–147. Criterion-focused occupational personality scales (COPS) used
Murphy, K. R., & De Shon, R. (2000). Inter-rater correlations do in personnel selection. In B. Roberts & R. T. Hogan (Eds.),
not estimate the reliability of job performance ratings. Applied personality psychology (pp. 63–92). Washington, DC:
Personnel Psychology, 53, 873–900. American Psychological Association.
Murphy, K. R., & Dzieweczysnki, J. L. (2005). Why don’t Ones, D. S., Viswesvaran, C., & Dilchert, S. (2005). Personality at
measures of broad dimensions of personality perform better work: Raising awareness and correcting misconceptions. Hu-
as predictors of job performance? Human Performance, 18, man Performance, 18, 389–404. doi:10.1207/S15327043hup
343–357. 1804_5
Myers, I. B., McCaulley, M. H., Quenk, N., & Hammer, A. (1998). Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993).
MBTI handbook: A guide to the development and use of the Comprehensive meta-analysis of integrity test validities:
Myers-Briggs Type Indicator (3rd ed.). Palo Alto, CA: Findings and implications for personnel selection and theories
Consulting Psychologists Press. of job performance. Journal of Applied Psychology, 78, 679–
*Nelson, C. A. (2008). Job type as a moderator of the relationship 703.
between situational judgment and personality. Unpublished Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (2003). Personality
doctoral dissertation, Capella University, Minneapolis, MN. and absenteeism: A meta-analysis of integrity tests. European
*Neuman, G. A. (1991). Autonomous work group selection. Journal of Personality, 17, 19–38.
Journal of Business and Psychology, 6, 283–291. *Perkins, A. M., & Corr, P. J. (2005). Can worriers be winners? The
*Neuman, G. A., & Kickul, J. R. (1998). Organizational citizenship association between worrying and job performance. Personality
behaviors: Achievement orientation and personality. Journal of and Individual Differences, 38, 25–31.
Business and Psychology, 13, 263–279. Piedmont, R. L., Costa, P. T., & McCrae, R. R. (1992). An
Ng, T. W. H., Eby, L. T., Sorensen, K. L., & Feldman, D. C. assessment of the Edwards Personal Preference Schedule from
(2005). Predictors of objective and subjective career success: A the perspective of the Five-Factor Model. Journal of Personality
meta-analysis. Personnel Psychology, 58, 367–408. Assessment, 58, 67–78.
Ng, T. W. H., Sorensen, K. L., & Eby, L. T. (2006). Locus of Poropat, A. E. (2009). A meta-analysis of the Five-Factor Model of
control at work: A meta-analysis. Journal of Organizational personality and academic performance. Psychological Bulletin,
Behavior, 27, 1057–1087. 135, 322–338. doi:10.1037/a0014996
Nguyen, N. T., & McDaniel, M. A. (2000, April). Faking and Radcliffe, J. A. (1963). Some properties of ipsative score matrices
forced-choice scales in applicant screening: A meta-analysis. and their relevance for some current interest tests. Australian
Paper presented at the 15th annual conference of the Society Journal of Psychology, 15, 1–11.
for Industrial and Organizational Psychology, New Orleans, *Robertson, I., Gibbons, P., Baron, H., MacIver, R., & Nyfield, G.
LA. (1999). Understanding management performance. British Jour-
Norman, W. T. (1964). Personality measurement, faking, and nal of Management, 10, 5–12.
detection: An assessment method for use in personnel selection. *Robertson, I. T., Baron, H., Gibbons, P., MacIver, R., & Nyfield, G.
Journal of Applied Psychology, 47, 225–241. (2000). Conscientiousness and managerial performance. Journal
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, of Occupational and Organizational Psychology, 73, 171–180.
NY: McGraw-Hill. Rolland, J. P., & De Fruyt, F. (2003). The validity of FFM
*Nyfield, G., Gibbons, P. J., Baron, H., & Robertson, I. (1995). personality dimensions and maladaptive traits to predict
The cross-cultural validity of management assessment methods. negative affects at work: A six month prospective study in a
Surrey, UK: Saville & Holdsworth. military sample. European Journal of Personality, 17, 101–
O’Connor, M. C., & Paunonen, S. V. (2007). Big Five personality 121.
predictors of post-secondary academic performance. Person- *Rolland, J. P., & Mogenet, J. L. (2001). Syste`me de description en
ality and Individual Differences, 43, 971–990. doi:10.1016/j- cinq dimensions (D5D). Manuel re´serve´ aux psychologues. Paris,
paid.2007.3.017 France: Les Editions du Centre de Psychologie Appliquée.
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 25
*Rothe, H. F. (1946). Output rates among butter wrappers. Journal Salgado, J. F., & De Fruyt, F. (2005). Personality in personnel
of Applied Psychology, 30, 320–327. selection. In A. Evers, O. Schmit-Voskuyl, & N. Anderson
*Rothe, H. F. (1947). Output rates among machine operators. (Eds.), Handbook of personnel selection (pp. 174–197). Oxford,
Journal of Applied Psychology, 31, 484–489. UK: Blackwell.
*Rothe, H. F. (1951). Output rates among chocolate dippers. Salgado, J. F., & Moscoso, S. (1996). Meta-analysis of the
Journal of Applied Psychology, 35, 94–97. interrater reliability of job performance ratings in validity
*Rothe, H., & Nye, C. T. (1958). Output rates among coil winders. studies of personnel selection. Perceptual and Motor Skills, 83,
Journal of Applied Psychology, 42, 182–186. 1195–1201.
Rothmann, S., Meiring, D., Van der Walt, H. S., & Barrick, M. R. Salgado, J. F., & Moscoso, S. (2000). Autoeficacia y criterios
(2002). Predicting job performance using personality measures in organizacionales de desempeño [Self-efficacy and organiza-
South Africa. Paper presented at the 7th Annual Conference of tional criteria of performance]. Apuntes de Psicologı´a, 18, 179–
the Society for Industrial and Organizational Psychology, 191.
Toronto, Canada. Saville, P., Holdsworth, R., Nyfield, G., Cramp, L., & Mabey, W.
*Rose, R. M., Fogg, L. F., Helmreich, R. L., & McFadden, T. J. (1984). The Occupational Personality Questionnaires (OPQ).
(1994). Psychological predictors of astronaut effectiveness. London, UK: Saville & Holdsworth.
Aviation, Space, and Environmental Medicine, 65, 910–915. *Saville, P., Sik, G., Nyfield, G., Hackston, J., & Maclver, R.
*Rozelle, R. (1968). The relationship between absenteeism and (1996). A demonstration of the validity of the Occupational
grades. Educational and Psychological Measurement, 28, 1151– Personality Questionnaire (OPQ) in the measurement of job
Downloaded by [Florida International University] at 09:01 27 September 2014
*Small, R. J., & Rosenberg, L. J. (1977). Determining job Viswesvaran, C., Ones, D. S., & Hough, L. M. (2001). Do
performance in the industrial sales force. Industrial Marketing impression management scales in personality inventories pre-
Management, 6, 99–102. dict managerial job performance ratings? International Journal
*Sommerfeld, D. (1997). Maintenance worker selection: High of Selection and Assessment, 9, 277–289.
validity and low adverse impact. Michigan Municipal League. Viswesvaran, C., Ones, D., & Schmidt, F. L. (1996). Comparative
Ann Arbor, MI: Employment Testing Consortium Project. analysis of the reliability of job performance ratings. Journal of
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2005). An IRT Applied Psychology, 81, 557–574. doi:10.1037/0021-9010.
approach to constructing and scoring pairwise preference items 81.4.557
involving stimuli on different dimensions: The multi-unidimen- *Warr, P., Bartram, D., & Brown, A. (2005). Big Five validity:
sional pairwise-preference model. Applied Psychological Mea- Aggregation method matters. Journal of Occupational and
surement, 29, 184–203. Organizational Psychology, 78, 377–386. doi:10.1348/
Stark, S., Chernyshenko, O., Drasgow, F., & Williams, B. (2006). 096317905X53868
Examining assumptions about item responding in personality *Warr, P., Bartram, D., & Martin, T. (2005b). Personality and
assessment: Should ideal point methods be considered for scale sales performance: Situational variation and interactions
development and scoring? Journal of Applied Psychology, 91, between traits. International Journal of Selection and Assess-
25–39. doi:10.1037/0021-9010.91.1.25 ment, 13, 87–91.
*Striker, L. J., Schiffman, H., & Ross, J. (1965). Prediction of *Weiss, P., Wertheimer, M., & Groesbeck, B. (1959). Achievement
college performance with the Myers-Briggs Type Indicator. motivation, academic aptitude, and college grades. Educational
Downloaded by [Florida International University] at 09:01 27 September 2014
Educational and Psychological Measurement, 25, 1081–1095. and Psychological Measurement, 19, 663–666.
Tenopyr, M. (1988). Artifactual reliability of forced-choice scales. *Whetzel, D. L., McDaniel, M. A., Yost, A. P., & Kim, N. (2010).
Journal of Applied Psychology, 73, 749–751. Linearity of personality-performance relationships: A large-
Tett, R., Rothstein, M. G., & Jackson, D. J. (1991). Personality scale examination. International Journal of Selection and
measures as predictors of job performance: A meta-analytic Assessment, 18, 310–320.
review. Personnel Psychology, 44, 703–742. *White, L. A. (2002, October). A quasi-ipsative temperament
Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the measure for assessing future leaders. Paper presented at the
crossroads: A response to Morgeson, Campion, Dipboye, 14th Annual Conference of the International Military Testing
Hollenbeck, Murphy, and Schmitt (2007). Personnel Psychol- Association, Otawa, Canada.
ogy, 60, 967–993. *Willingham, W. W., & Ambler, R. K. (1963). The relation of the
Tett, R. P., Christiansen, N. D., Robie, C., & Simonet, D. V. (2011, Gordon Personal Inventory to several external criteria. Journal
May). International survey of personality test use: An American of Consulting Psychology, 27, 460.
baseline. Paper presented at the 15th conference of the *Willingham, W. W., Nelson, P., & O’Connor, W. (1958). A note
European Association of Work and Organizational Psychol- on the behavioral validity of the Gordon Personal Profile.
ogy, Maastricht, The Netherlands. Journal of Consulting Psychology, 22, 378.
Thomas, K. W., & Killman, R. H. (1974). Thomas-Killman Conflict *Witt, L. A., & Jones, J. W. (1999). Very particular people quit first.
Mode Instrument. Mountain View, CA: Xicom. Paper presented at the annual conference of the Society for
Thompson, B., Levitov, J. E., & Miederhoff, P. A. (1982). Validity Industrial and Organizational Psychology, Atlanta, GA.
of the Rokeach Value Survey. Educational and Psychological *Wollowick, H. B., & McNamara, W. J. (1969). Relationship of
Measurement, 42, 899–905. the components of an assessment center to management
Thurstone, L. L. (1928). Attitudes can be measured. American success. Journal of Applied Psychology, 53, 348–352.
Journal of Sociology, 33, 529–554. Yoo, T.-Y., & Min, B.-M. (2002). A meta-analysis of the Big Five
Trapmann, S., Hell, B., Hirn, J.-O. W., & Schuler, H. (2007). and performance in Korea. Paper presented at the 17th annual
Meta-analysis of the relationship between the Big Five and conference of the Society for Industrial and Organizational
academic success at university. Zeitschrift für Psychologie, 215, Psychology, Toronto, Canada.
132–151. *Young, M., & Dulewicz, V. (2007). Relationship between
Travers, R. M. W. (1951). A critical review of the validity and emotional and congruent self-awareness and performance in
rationale of forced-choice technique. Psychological Bulletin, 48, the British Royal Navy. Journal of Managerial Psychology, 22,
62–70. 465–478.
Van der Walt, H. S., Meiring, D., Rothmann, S., & Barrick, M. R. *Zagar, R., Arbit, J., & Wengel, W. (1982). Personality factors as
(2002, June). Meta-analysis of the relationship between person- predictors of grade point average and graduation from nursing
ality measurements and job performance in South Africa. Paper school. Educational and Psychological Measurement, 42, 1169–
presented at the conference of the South-Africa Society for 1175.
Industrial and Organizational Psychology, Pretoria, South Zibarras, L. D., & Woods, S. A. (2010). A survey of UK selection
Africa. practices across different organization sizes and industry
*Vasilopoulos, N. L., Cucina, J. M., Dyomina, N. V., Morewitz, C. sectors. Journal of Occupational and Organizational Psychology,
L., & Reilly, R. R. (2006). Forced-choice personality tests: A 83, 499–511.
measure of personality and cognitive ability? Human Perfor-
mance, 19, 175–199. Original manuscript received May 2012
Villanova, P., Bernardin, H. J., Johnson, D. L., & Dahmus, S. A. Revised manuscript received June 2012
(1994). The validity of a measure of job compatibility in the First published online October 2012
prediction of job performance and turnover of motion picture
theater personnel. Personnel Psychology, 60, 224–235.
Viswesvaran, C., & Ones, D. S. (1999). Measurement error in ‘‘Big
Five factors’’ personality assessment: Reliability generalization
across studies and measures. Educational and Psychological
Measurement, 60, 224–235.
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 27
APPENDIX A
Classification of the personality scales according to the five- APPENDIX B
factor model (Continued )
(continued overleaf )
28 SALGADO AND TÁURIZ
APPENDIX B
Main codes and input values for the primary studies included in the meta-analysis
Adkins and Naumann (2001) 264 CES I SLS – – – – .18 .77 .94
Antler, Zaretsky, and Ritter (1967) 30 GPI Q JPR .58 – – – – – –
Balch (1977) 100 EPPS I TRA 7.10 .12 7.10 7.01 .20 – –
Bartram (2007) Korea 366 OPQ I JPR .17 .21 .11 .13 .22 .52 .99
Bartram (2007) S. Africa 68 OPQ I JPR .02 7.07 .02 7.09 .04 .52 –
Bartram (2007) USA 86 OPQ I JPR 7.04 .25 .09 .18 7.05 .52 –
Bennett (1977) 45 SDI Q JPR .40 – – – 7.05 – –
Bennett (1977) 49 SDI Q JPR .46 – – – .07 – –
Bhatnagar (1969) 261 EPPS I GPA – 7.11 – .10 .13 – –
Brown and Bartram (2009) 835 OPQ I SFR .12 .13 .08 7.01 .15 – –
Christiansen et al. (2005) a 60 IPIP Q JPR – – – – .46 – .62
Christiansen et al. (2005) b 62 IPIP Q JPR – – – – .17 – .63
Clevenger, Pereira, Wiechman, 207 OPQ I JPR – – – – .16 .76 –
Downloaded by [Florida International University] at 09:01 27 September 2014
(continued overleaf )
FFF, FORCED-CHOICE INVENTORIES AND PERFORMANCE 29
APPENDIX B
(Continued )
Gordon (1993) T4.42 b 292 GPP Q GPA .22 7.16 .17 .07 .30 – –
Gordon (1993) T4.42 c 1078 GPP Q GPA .02 7.14 .18 7.02 .08 – –
Gordon (1993) T4.44 95 GPP Q GPA .06 7.22 .36 .08 .22 – –
Graham and Calendo (1969) 69 SDI Q JPR – – – – .02 – –
Grimsley and Jarret (1973) 100 GPP Q PRG .13 .18 .12 7.27 .18 – .48
Guller (2003) 375 EPPS I JPR .06 7.04 .00 .04 7.01 – –
Hakel (1966) 102 EPPS I GPA .16 7.04 7.06 7.01 .25 – –
Hirsh and Peterson (2008) 196 IPIP I GPA 7.02 7.14 7.02 7.24 .32 – –
Hughes and Dood (1961) 90 GPI I SLS .22 7.17 – – .14 – –
Hughes and Dood (1961) 90 GPI Q SLS .06 7.08 – – .08 – –
Hughes and Prien (1986) 49 GPI Q JPR – – – .15 7.03 .42 –
Iliescu, Ilie, and Aspas (2011) a 833 ESQ Q CWB – – – – 7.43 – –
Iliescu et al. (2011) b 224 ESQ Q CWB – – – – 7.37 – –
Izard (1962) a 3 EPPS I GPA – – – – .40 – –
Downloaded by [Florida International University] at 09:01 27 September 2014
(continued overleaf )
30 SALGADO AND TÁURIZ
APPENDIX B
(Continued )
Slocum and Hand (1971) 37 EPPS I JPR 7.05 .17 .05 .20 .02 – –
Sommerfeld (1997) 332 GPI Q JPR .01 – – – .32 – –
Striker, Schiffman, and 225 MBTI F GPA – 7.18 7.07 .01 .24 – –
Ross (1965) a
Striker et al. (1965) b 201 MBTI F GPA – 7.07 7.10 .07 .13 – –
Striker et al. (1965) c 201 MBTI F CAB – 7.02 .07 .09 .09 – –
Vasilopoulos et al. (2006) 327 IPIP Q GPA – – – – .10 – –
Warr et al. (2005a) 119 CCSQ I SLS 7.03 .10 7.10 7.19 .26 .89 .90
Warr et al. (2005a) 78 CCSQ I SLS .04 .05 7.17 7.15 .20 .81 .90
Warr et al. (2005a) 90 CCSQ I SLS 7.04 .08 .25 7.32 .21 .89 .90
Whetzel, McDaniel, Yost, 1152 OPQ I JPR 7.02 .07 .08 7.04 .08 – 1.00
and Kim (2010)
White (2002) 613 AIM Q JPR .06 .22 – .01 .20 .53 .70
White (2002) 399 AIM Q JPR .07 .06 – 7.01 .04 .59 .71
Downloaded by [Florida International University] at 09:01 27 September 2014
Willingham and Ambler (1963) 208 GPI Q CWB – .17 – – 7.19 .33 1.00
Willingham, Nelson, and 1039 GPI Q TRA .05 .00 – – .03 – 1.00
O’Connor (1958)
Witt and Jones (1999) 168 OPQ I JPR .01 .07 .01 .01 .03 – –
Young and Dulewicz (2007) 261 OPQ I JPR .16 .14 .11 .17 .20 – –
Zagar, Arbit, and Wengel, (1982) 570 EPPS I GPA 7.01 .07 .02 .07 .02 – –
CAB ¼ counterproductive academic behaviour; CWB ¼ counterproductive work behaviour; GPA ¼ grade point average; JPR ¼ job
performance rating; PRG ¼ progress; PRR ¼ peer rating; SAL ¼ salary; SFR ¼ self-rating; SLS ¼ sales; TRA ¼ training; ryy ¼ criterion
reliability; u ¼ range restriction value; I ¼ ipsative; N ¼ normative; Q ¼ quasi-ipsative.