Personality Outcomes Across Samples Final MS

The ways of the world?
Cross-sample replicability of personality trait-life

outcome associations
Abstract
Research in (mostly) Western samples has demonstrated that associations between personality
traits and life outcomes are replicable and often driven by facets or nuances. Using three culturally
different samples (English-speaking, N = 1,257; Russian-Speaking, N = 1,616; and Mandarin-speaking,
N = 1,234) we investigated within and cross-sample predictive accuracies of five domains, thirty
facets and ninety nuances. Cross-sample associations were highest for domains and weakest for
nuances. However, nuances best predicted outcomes both within and across samples, although
cross-sample predictions were smaller than within-sample. Traits’ predictive accuracy was stronger
for English-speakers than for Mandarin and Russian-speakers. These findings suggested that trait-
outcome associations moderately generalise across diverse samples and nuances often contain extra
information about outcomes that partly generalises across samples.
The ways of the world? Cross-sample replicability of personality trait-life outcome associations
Personality traits have been linked to many life outcomes, including academic (Mammadov, 2022;
Trapmann et al., 2007) and socio-economic (Jonassaint et al., 2011) achievement, relationship
quality (O'Meara & South, 2019) and treatment success (Bucher et al., 2019). This highlights traits’
population-level implications that can inform policy creation and implementation (Bleidorn et al.,
2019). For example, identifying risk factors for negative outcomes such as substance abuse (Lackner
et al., 2013) or disregard for environment (Soutter & Mõttus, 2021) may facilitate designing
interventions that address common psychological barriers to behaviour change.
However, the theoretical and practical relevance of the personality trait-outcome research depends
on how well its findings generalise across people and circumstances. Soto (2019) observed that 87%
of previously demonstrated trait-outcome associations could be replicated in a large sample of US
adults, although the associations’ strengths were often weaker in the replication than in the original
studies. Less clear, however, is whether the associations would replicate in samples with different
cultural backgrounds. For example, only with evidence of the associations’ replicability across
relevant cultural circumstances could we start to think that a Western-based intervention may
succeed in other populations. Likewise, the associations’ replicability across culturally diverse
samples would indirectly support claims that personality traits have similar roles in many cultures
(Allik et al., 2013). Here, we investigated personality trait-life outcome associations in three different
samples, representing three diverse cultural backgrounds, speaking distinct languages: English-
speakers (mostly UK residents), Russian-speakers from Russia or countries with substantial Russian-
speaking minorities (e.g., Ukraine) and Mandarin-speakers (mostly Chinese residents).
Strengths of personality trait-life outcome associations
Besides replicability, the theoretical relevance of using personality traits to predict and intervene
on life outcomes depends upon how strongly they track the outcomes. The Big Five domains – one
of the broadest levels of the personality hierarchy – provide broad summaries of individuals’
personality traits and may each predict many life outcomes (Ozer & Benet-Martínez, 2006; Roberts
et al., 2007). This offers a parsimonious approach to the problem, but may not be optimal for every
purpose. This is because the domains split into narrower traits like facets that often track with
outcomes to a greater degree, partly explaining why individuals scoring similarly on a domain may
experience different outcomes. For example, Conscientiousness facets explained 24% more variance
in job performance than their domain (Dudley et al., 2006), and a combination of multiple facets
explained over 400% more variance in body mass index (BMI) than a combination of their domains
(Vainik et al., 2019). In the latter study, only two domains (Conscientiousness and Neuroticism)
showed small associations with BMI, but several comparatively stronger associations were found at
the facet level, with some (but never all) facets from each of the five domains correlating with the
BMI.
Moreover, outcomes might be even more strongly correlated with personality nuances. These are
traits even narrower than facets, usually represented by a single item that captures partly unique
but valid information about some aspect of individual differences (Condon et al., 2020; McCrae,
2015). For instance, many items have unique variances that show strong cross-rater agreement
(Mõttus et al., 2014), stability over time and heritability (Mõttus et al., 2017; Mõttus et al., 2019),
and distinct developmental trends (Mõttus & Rozgonjuk, 2021). Indeed, nuances often provide
stronger outcome predictions than domains and facets (Seeboth & Mõttus, 2018; Stewart et al.,
2022), and often (although not always) this is not because items’ content directly overlaps with the
outcomes. For example, the outcome “criminal behaviour” was most strongly (r ≈ .20 to .30)
associated with nuances from different domains, such as behaving irresponsibly, starting arguments,
being forgiving, being cold and uncaring and not cleaning after oneself, whereas not all nuances of
the same domains had similar links with the outcome (Stewart et al., 2022). Likewise, nuances such
as those referring to being lazy, disorganised, talkative and full of energy out-predicted the Big Five
domains for future BMI (Arumäe et al., 2023).
However, it is not clear yet whether the finding that facets and nuances are more strongly linked
with the outcomes is specific to European and North-American populations or generalises more
widely. Moreover, even if facets and nuances do out-predict domains in a range of backgrounds, it is
not clear how well particular facet- and nuance-specific outcome-correlations replicate across
samples.
Present study
The replicability of life outcomes’ associations with domains, facets, and nuances remains uncertain,
particularly across samples from diverse cultures. To address this, we investigated the degrees to
which personality-outcome associations were consistent across three culturally distinct samples at
three levels of the personality hierarchy – domains, facets, and nuances. We concentrated on two
aspects of these associations: a) the overall predictability of outcomes from personality traits and b)
the correlations between individual traits and outcomes. 1 We anticipated that trait-outcome
associations might vary across samples in both – traits’ overall predictive powers and individual
correlations – but we could not predict the extents of this variation due to lack of previous research
on this topic. Based on prior studies, we also expected that items would outperform domains and
facets in predicting outcomes both within samples and across samples (i.e., using models created in
one sample to predict outcomes in another). Specifically, we expected that items would explain
between 20% and 40% more variance for each outcome within each sample, on average (Seeboth &
Mõttus, 2018; Stewart et al., 2022), but we had no basis to hypothesize about the extent to which
this predictive advantage would be retained for predictions across samples.
Before creating and testing the prediction models, we also checked for domain and facet scales’
Measurement Invariance (MI) across the three samples. Sufficient levels of MI would support the
1
Importantly, throughout discussion of this study, we use the term ‘predict’ and its variants in the statistical
sense of extrapolation rather than the literal sense of foretelling the future because the study is cross-
sectional.
case that domain and facet scales can be used in research spanning languages and cultural
backgrounds. Conversely, lack of sufficient MI would be consistent with the possibilities that (a)
personality scales behave differently across context and, (b) by implication, their nuance-specific
correlations with outcomes may also vary from sample to sample, at least to some degree. To date,
no study has shown full MI across samples from different cultures (Dong & Dumas, 2020), but the
extent and implications of MI violations may vary with scales and samples.
Methods
Participants
We recruited a total of 4,105 participants into three samples: Russian speakers (N=1,616; 69%
female; age range = 18-86), Mandarin speakers (N=1,234; 74% female; age range = 18-60), and
English speakers (N= 1,256; 58% female; age range = 18-75). The English-speaking (EN) participants
were recruited through social media platforms and the Prolific participant sourcing platform. The
Mandarin-speaking Chinese (CH) sample was recruited entirely through social media and
participated without compensation. Finally, the Russian-speaking (RU) sample was recruited via
Google ads, targeting individuals in Russia and other countries with Russian-speaking minorities
(e.g., former Soviet Union members); these participants were not monetarily compensated. All
completed the same survey in their preferred language in the formr.org platform (Arslan et al.,
2020).
Measures
Personality traits
We developed a 90-item personality trait assessment scale for this study, using existing data from
other ongoing research projects (Henry & Mõttus, 2023 [https://osf.io/tcfgz/]) that aim to create a
comprehensive personality item pool prioritizing items’ retest reliability, variance, cross-rater
agreement, low social desirability and low redundancy (Condon et al., 2020). This pool’s 198 items
were mostly selected from the International Personality Item Pool (IPIP; Goldberg et al., 2006) and
the Synthetic Aperture Personality Assessment (SAPA; Condon, 2018), but some new items were
generated to cover all domains and facets of the Five Factor Model (FFM) and HEXACO, otherwise
known as “the Big Few” (Mõttus et al., 2020), and some traits beyond them (e.g., competitiveness,
envy, religiosity, sexuality, humour). For details on item selection steps and their psychometric
properties, see Henry and Mõttus (2022). For this study, we focused on the FFM domains and facets
as assessed by the NEO Personality Inventories (NEO-PI-R; (Costa Jr & McCrae, 2008) because it is
often considered the ‘gold’ standard of personality trait assessment. From the 198 items, we created
a shorter, 90-item questionnaire paralleling the NEO-PI-R scales, with 18 items per domain, and 3
per facet. To do so, we used data from an independent sample of mostly UK residents and English-
speaking residents of other European countries who had previously completed the survey (N =
1,436, 59% female).
In this test development sample, using Jamovi (The Jamovi Project, 2022), we ran an exploratory
factor analysis (EFA) on the 198 items using oblimin rotation, forcing the items into five domain-
factors, and selecting 18 items for each domain-factor that would cover the domain’s six facets. Two
of the authors (RS and WJ) allocated items into the facets through a two-step process. For each
domain, they ran the 18 items with the highest loadings on one of the five factors in the initial EFA
through another EFA and extracted six correlated factors, generating a basis for each facet. Then,
they ran a Maximum Likelhood confirmatory factor analysis (CFA) on each identified facet, using the
UK sample gathered for this cross-sample study as a replication sample independent of the initial
test development sample. For the CFA, latent trait variances were fixed to one and models were
considered well-fitting if the loading of each item was greater than .30, and the Comparative Fit
Index (CFI) was greater than .85. RS and WJ then read each item at face value, and assessed whether
its content was consistent with the facet’s definition. If items were deemed to, by definition, fit
better elsewhere, they were moved into the better fitting facet, and replaced by another item from
the 198-item EFA analysis, based on their loadings in the original EFA in the test development
sample. To check that new items ‘worked’ in their new facets, we ran new CFAs on each facet and
domain, with the same fit criteria. We repeated this process until all items in the new, 90 item
measure had minimum loadings of .30 on both facets and domains, CFI > .85, and both RS and WJ
were satisfied that they had done as well as possible with the available items.
Outcomes
We assessed 34 wide-ranging outcomes, some with pre-existing scales and others through items we
wrote. We translated and back translated these and the existing scales for which translations were
lacking, with no changes required. XL and XH provided Mandarin translations, and RM and UV
provided Russian translations. More information on the project and the full list of items can be
found at https://osf.io/xeqch/.
Pre-existing scales
Life-satisfaction. For this, we used the Satisfaction with Life Scale (Diener et al., 1985). Each item is
measured on a 7-point Likert scale. XH and XL translated the scale to Mandarin, whilst a Russian
translation was available for download at https://eddiener.com/scales/.
Health. We used the Short-form General Health Survey (version 2; SF-12v2; (Ware et al., 1996) to
measure health outcomes. This survey assesses eight health aspects: Physical Functioning, Role-
Physical, Bodily Pain, General Health, Vitality, Social Functioning, Role-Emotional and Mental Health
with 12 items. The first four aspects are considered features of physical health, and the rest mental
health. Items are scored on either 3- or 5-point Likert scales. There was some uncertainty about
existing translated versions (e.g. Hoffmann et al., 2005), so XH and XL back-translated the English
version to give us a useable Mandarin version, and UV did this for the Russian translation.
Perceived Social Support. To measure social support outcomes, we used the Multidimensional Scale
of Perceived Social Support (MSPSS; Zimet et al., 1988)(Zimet et al., 1988), which has three
subscales: perceived support from Family, Friends and Significant Others. All three scales have good
test-retest reliability and internal consistency, as well as strong factorial validity (Cartwright et al.,
2022; Wang et al., 2021). The scale has 12 items, with four items measuring each subscale. Each item
is scored on a 7-point Likert scale. For the Mandarin version, Zimet et al. provided a translated copy,
and we located a Russian version online (Pushkarev et al., 2020).
Other outcomes
We measured the following outcomes with single items rated on likert scales of varying lengths. The
items and scoring keys can be found in Supplementary Material (https://osf.io/xeqch/). The
following outcomes were all measured via one item responses:
Crime(s) committed, donating to causes, driving licence status, previous driving fines, duration of
holding driving licence, educational attainment, frequency of exercise, previous fight history,
number of hobbies, number of holidays, income, romantic relationship history, relationship
duration, career satisfaction, financial satisfaction, home satisfaction, living area satisfaction, work
satisfaction, smoker status, time spent with others, volunteering history, weight, bodily pain, general
health, vitality, social functioning.
Analyses
Measurement invariance
To compare personality trait-outcome associations robustly among samples, the assessments must
show measurement-invariance (MI; (Meade & Lautenschlager, 2004). To test this, there is a standard
procedure for multi-item scales (i.e., scales for domains and facets, and several outcomes). Using
this procedure, we tested MI (a) across the English-speaking sample used for item development and
our English-speaking sample to be compared to the samples in other languages, and then (b) across
the three samples tested in different languages. Currently, no procedure exists for testing MI for
single items, so we could not test for it for personality nuances and single-item outcomes.
We carried out the MI tests using Jeffrey’s Amazing Statistics Programme’s (JASP; JASP Team, 2022)
SEM feature, looking at invariance at four cumulative levels: configural (consistent baseline factor
structures), metric (items’/facets’ factor loadings could be constrained equal across groups without
substantial deterioration in model fit), residual (items/facets’ residual variances could also be
constrained equal across groups) and strict (items’/facets’ intercepts could be constrained equal
across groups). The most widely accepted indicator of measurement invariance is the difference in
CFI between less and more constrained models, with changes <.01 considered acceptable fit (Chen,
2007; Cheung & Rensvold, 2002). To date, we are unaware of any cross-cultural studies that have
shown evidence of strict MI for personality scales (Dong & Dumas, 2020). Although not all
participants in the test development sample spoke English as a first language, they had good
command of the language, so we expected to see little non-invariance there.
We first estimated item parameters freely in each sample, then added factor loading, residual
variance, and intercept constraints across samples, one constraint type at a time. Metric invariance
(equality of factor loadings) shows that constituents’ relative contributions to the construct are
similar and allows comparing latent traits’ structural relations, such as their correlations among
themselves or with other variables. Residual invariance shows that constituents absolute
contributions are similar and allows also comparing structural relations at the observed trait score
level (e.g., items’ sum-scores), but failure to attain it indicates differences in measurement reliability
and/or validity. Strict invariance additionally allows comparing traits’ observed mean scores among
groups. This was not of interest in our study, but failure to attain it involves group differences in
items beyond the latent traits.
Given that our outcome data was also measured in the three samples, we ran the same MI tests for
outcomes (Supplementary Table S10). The test development sample had not been assessed for
these outcomes.
The R code for all four MI stages is available at the Open Science Framework (https://osf.io/xeqch/).
Within-sample analyses
Our main analyses focused on the extents to which trait-outcome associations replicated across the
three samples with diverse cultural backgrounds. For this, however, we first needed to examine the
associations within each sample. We focused on two kinds of trait-outcome associations: first,
outcomes’ overall predictability from personality traits (domains, facets and nuance), best assessed
with a prediction-oriented modelling strategy; second, individual traits’ correlations with outcomes.
To estimate each outcome’s overall predictability within each sample, we ran a series of elastic net
regressions (ENR; Zhou & Hastie, 2005), with domains, facets and items in turn as predictors. The
ENR shrinks coefficients towards 0, mitigating the possibility of inflated predictive accuracy due to
over-fitting (Yarkoni & Westfall, 2017). The ENR includes or excludes highly correlated variables from
the models by either co-shrinking their coefficients to 0 or keeping them non-zero, and estimates
which coefficient combination provides the greatest predictive power across different folds of the
data (Waldmann et al., 2013). Before running each ENR, we split the sample randomly into two
subsamples, for model training (67%) and validation (33%). Within the training sample, we ran an
ENR with a 10-fold cross-validation and chose a shrinkage parameter that minimized prediction error
across the folds. We then transferred the model to the independent validation sample to predict the
outcome from personality traits and Pearson-correlate its predicted values with their observed
values (our measure of predictive accuracy). Such complete separation of model training and
validation entirely precluded over-fitting because any sample idiosyncrasies the model could
capitalize on in the training sample would not be present in the validation sample. We repeated this
training-cross-validation procedure 10 times for every outcome-predictor combination with different
random sample splits, averaging the predictive accuracies across the repeats. We controlled age and
gender in these regressions.
Next, to test individual associations, we Pearson-correlated each outcome with personality domains,
facets and items within each sample.
Cross-sample analyses
Our next step was to test the cross-sample replicability of the trait-outcome associations. To do so,
we first compared the extent to which personality traits predicted outcomes in each sample. For
example, did personality traits predict outcomes better among Mandarin-speakers than among
English-speakers? Next, separately for domains, facets and nuances and each outcome, we predicted
outcomes in one (“target”) sample from models which were trained in the combined data of the
other two samples, going through all three combinations. For example, we used a stratified
combination of the UK and RU data to predict outcomes in the CH data, using an even split of UK and
RU samples to create one combined sample the size of the CH test sample. Even samples were taken
to minimise the possibility of sample similarity influencing the results (e.g. if the less “WEIRD”
cultures were more similar, then this could influence cross sample predictions). We constrained the
combined sample size to avoid comparisons being confounded by sample size (larger samples may
allow training more predictive models, thus giving the cross-sample prediction an advantage over
within-sample predictions). We used the same procedures for the rest of this step as outlined in the
within-sample analysis, including averaging the predictive accuracies across 10 random
training/validation sample splits.
Comparing the extents to which models trained in combined samples predicted outcomes in the
target samples to the within-sample prediction accuracies indicated the cross-sample
generalizabilities of the models’ parameters (i.e., the trait-outcome associations). Cross-sample and
within-sample predictions being equally strong would indicate perfect replicability; the former being
much lower would indicate poor replicability. Often, it may be the cross-sample predictive accuracy
that interests researchers most. For example, this is a standard approach in genome-wide
association studies (GWAS) where meta-analytic allele-phenotype associations trained across many
samples (prediction models) are used to create polygenic scores (predictions) in an independent
sample and the predictive accuracy of these scores is tested against the phenotype’s observed
values in this sample.
To further compare the degrees to which trait-outcome associations replicated, we correlated
(among samples) the correlation profiles of respective outcomes with items, facets, and domains in
turns. Next, we calculated single-profile absolute intra-class correlations (ICCs) among these
correlation profiles for each outcome, at each trait level among the three samples to quantify their
consistencies. High ICCs would indicate that the same traits predicted the outcomes among the
samples.
The code can be found at https://osf.io/xeqch/.
Results
Measurement invariance
Within the two English-speaking samples (test development sample and the English-speaking sample
to be used for cross-sample comparisons), at the domain level, we observed strict invariance (ΔCFI
between more and less constrained models < .01) for Openness, Neuroticism and Extraversion, as
well as residual invariance for Agreeableness and Conscientiousness (see Supplementary Table S1).
We observed residual invariance for Competence, Order, Achievement Striving, Deliberate, Values,
Modesty, Trust, Altruism, Tender-Mindedness, Depression, Anxiety, Self-Consciousness, Impulsivity,
Positive Emotion and Warmth, as well as strict invariance for Assertiveness, Activity, Angry Hostility,
Vulnerability, Aesthetics and Self–discipline. We also observed metric invariance for Duty, Fantasy,
Feelings, Straightforwardness, Compliance and Gregariousness, and only configural invariance for
Seek Excitement, Actions and Ideas (see Supplementary Table S2). We considered this not ideal but
sufficient to justify using the same items we had selected to reflect the FFM domains and facets in
the development sample to reflect them in the cross-sample comparisons, especially because in
most cases where at least residual variance was not met, the ΔCFI values were not large (max ΔCFI
= .06, but usually smaller).
However, when we investigated MI among our three different-language samples, all domains but
Neuroticism and most facets only met the criteria for configural MI, with ΔCFI > .01 after imposing
cross-sample equality constraints on factor loadings (Supplementary Tables S3 to S4). Only the
Neuroticism domain and the facets of Deliberation, Ideas, Modesty, Compliance, Depression, Self-
Consciousness, Positive Emotion, and Activity met metric MI criteria, and no scale met the criteria
for more stringent MI levels. Item’s loadings on their domains and facets in all three data sets are
shown in Supplementary Table S5. Likewise, all outcome scales failed to meet anything but
configural MI, suggesting that the items relevant to the outcomes are the same among the samples,
but to different degrees (Supplementary Table S6).
Though we acknowledge the possible interpretative limitations that such a pervasive lack of MI
placed on the merits of our cross-sample comparisons, we proceeded with our analyses.
Associations within samples
Table 1 shows the percentages of each outcome’s variance for which personality domains, facets
and items (markers of nuances) accounted, as well as the mean and median percentages among the
outcomes. In all samples, there was wide variation across outcomes in their predictability form
personality traits, but items almost always out-predicted facets and domains.
Within the UK sample, domains, facets, and items accounted for 7.3% (Mdn = 4.4%), 11.2% (Mdn =
7.3%) and 16.4% (Mdn = 11.6%) of variance across the outcomes, respectively. Therefore, based on
these means, facets accounted for 53.4% more variance than domains, whilst items accounted for
46.4% more variance than facets and 124.7% more variance than domains. For the RU sample,
domains, facets, and items accounted for 5.4% (Mdn = 2.6%), 7.8% (Mdn = 4.4%) and 10.2% (Mdn =
7.3%). Facets therefore accounted for 47.2% more variance than domains, with items accounting for
92.5% more variance than domains, and 30.8% more than facets. Finally, for the CH sample,
domains, facets and items accounted for 7.0% (Mdn = 3.4%), 8.4% (Mdn = 4.4%) and 11.1% (Mdn =
7.3%) respectively. Thus, items accounted for 32.1% more variance than facets, and 58.6% more
variance than domains. Similarly, facets accounted for 20.0% more variance than domains. Overall,
outcomes were thus more predicable in the UK sample than in the Russian- and Mandarin-speaking
samples at all levels. Likewise, narrower traits tended to out-predict broader traits in all samples,
consistent with previous observations (e.g., Seeboth & Mõttus, 2018; Stewart et al., 2022).
Domains’, facets’ and items’ correlations with each outcome in each sample are shown in
Supplementary Tables S7 to S15.
Table 1
Outcomes’ Predictability Within Samples (% of Explained Variance)
UK RU CH
Outcome Domains Facets Items Domains Facets Items Domains Facets Items
Crime 0.00 1.00 4.00 0.00 0.00 0.00 0.00 0.00 0.00
Donating 2.25 3.61 5.29 4.41 5.29 7.29 1.96 0.81 1.21
Driving Status 2.46 3.24 6.76 0.81 1.44 4.84 0.00 0.00 0.00
Driving Fines 0.00 0.00 1.00 0.09 0.49 0.81 0.00 0.00 0.00
Driving Time 1.44 2.89 4.41 1.96 1.69 4.00 0.00 0.00 0.00
Education 3.61 6.25 10.24 2.25 4.00 7.29 2.25 4.84 7.84
Exercise 3.61 4.41 9.00 4.84 4.84 5.29 6.25 6.76 7.29
Fight History 0.00 2.89 7.29 0.00 1.69 2.56 1.21 3.61 4.84
Number of Hobbies 9.00 7.84 12.25 8.41 8.41 11.56 13.69 16.00 15.21
Holidays 4.00 8.41 12.25 1.44 3.24 4.00 3.24 2.89 1.00
Income 5.29 8.41 10.89 1.21 3.24 6.25 1.69 2.25 3.61
Relationship History 0.81 1.96 4.41 0.00 0.00 0.00 2.25 3.24 4.00
Relationship Time 0.00 0.00 0.36 0.00 0.00 0.00 0.00 0.00 0.00
Career Satisfaction 13.69 21.16 31.36 2.56 5.76 8.41 17.64 16.00 25.00
Financial Satisfaction 9.61 16.81 32.49 1.69 4.84 10.89 7.29 8.41 14.44
Home Satisfaction 11.56 17.64 24.01 4.84 4.00 7.84 7.84 8.41 10.89
Life Satisfaction 14.44 28.09 59.29 14.44 27.04 39.69 17.64 22.09 37.21
Living Area Satisfaction 4.84 6.76 10.89 1.69 2.89 9.61 4.00 5.76 9.61
Work Satisfaction 10.89 18.49 21.16 5.76 5.76 7.29 10.89 12.96 16.81
Smoke 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.49 2.25
Time with others 6.76 12.96 16.00 1.44 3.24 5.76 3.61 3.24 6.76
Volunteering 3.24 4.00 4.84 4.84 4.00 7.29 1.21 1.44 1.69
Weight 0.00 2.25 6.25 0.00 0.00 0.49 0.81 3.61 1.96
General Health 10.24 20.25 31.36 12.96 17.64 18.49 7.29 9.00 10.89
Physical Functioning 2.25 4.84 9.61 2.56 5.76 6.76 2.89 2.56 5.29
Role Physical 6.76 8.41 12.96 9.00 12.25 16.00 4.84 5.76 9.00
Role Emotional 27.04 30.25 33.64 15.21 20.25 25.00 19.36 24.01 25.00
Bodily Pain 4.00 6.76 14.44 7.84 11.56 14.44 2.25 4.00 7.29
Mental Health 1.96 3.24 3.61 0.64 1.00 0.49 0.00 0.00 0.00
Vitality 24.01 37.21 44.89 22.09 29.16 33.64 21.16 29.16 33.64
Social Functioning 21.16 31.36 32.49 17.64 26.01 26.01 15.21 17.64 21.16
Significant Other Social Support 5.76 17.64 23.04 10.89 16.00 14.44 20.25 23.04 30.25
Family Social Support 11.56 12.96 16.81 7.29 13.69 17.64 14.44 15.21 25.00
Friends Social Support 25.00 29.16 39.69 12.96 19.36 23.04 27.04 31.36 38.44
Median 4.42 7.30 11.57 2.56 4.42 7.29 3.43 4.42 7.29
Mean 7.27 11.21 16.38 5.35 7.78 10.21 7.01 8.37 11.11
NOTE: UK = English-speaking sample, RU = Russian-speaking sample, CH = Mandarin-speaking sample.
Cross-sample outcome analysis
Table 2 shows the percentages of each outcome’s variance for which personality domains, facets
and items accounted when predicted across samples, as well as the mean and median percentages
among the outcomes.
When predicting UK outcomes using models trained in the combined Russian and Chinese (RUCH)
data, the models tended to account for less outcome variance than did the within-sample prediction
models, with domains predicting 17.8%, facets 44.6% and items 50.0% less. However, items still out-
predicted domains and facets. Specifically, domains, facets, and items now accounted for 6.0% (Mdn
= 2.9%), 6.2% (Mdn = 1.7%) and 8.2% (Mdn = 3.6%) of variance, on average.
The same was true for predicting RU outcomes, from models based on domains, facets and items in
the combined UK and Chinese data (UKCH). Specifically, these models accounted for 5.9% (Mdn
=2.9%), 5.1% (Mdn = 2.1%) and 6.3% (Mdn = 2.9%) of variance across the outcomes, respectively,
amounting to differences of 11.3%, 34.6% and 38.4% compared to within-sample predictions (cross-
sample prediction being lower). Similarly, models based on domains, facets, and items from the
combined UK and Russian data predicted outcomes less in the CH sample than the models trained
within CH data: domains predicted 32.9% less (accounting for 4.7% (Mdn = 1.4%) of variance), facets
46.4% less (4.5%, Mdn = 1.3%) and items 48.6% less (5.7%, Mdn = 1.2%). Like in the RU data, cross-
sample facet models predicted the least variance in the CH data, on average.
Overall, although cross-sample predictions tended to be less accurate than within-sample

predictions, trait-outcome models trained in the combined data of two samples did predict
outcomes in independent samples. So, most trait-outcome associations were at least partly
replicable across samples. Domains were the most robust level of the trait hierarchy for cross-
sample prediction, given the smallest difference between within- and cross-sample predictive
accuracies. However, despite the domains’ relatively greater robustness, items remained as most
predictive of outcomes even in cross-sample predictions, so items’ predictive advantage was not
entirely sample/culture/language-specific. Specifically, in the cross-sample models, items were more
predictive than facets by 32.3%, 23.5% and 26.7% and more predictive than domains by 36.7%, 6.8%
and 21.3%, respectively for the outcomes in the UK, RU and CH samples. Facets, in turn, had little to
no advantage over domains in the cross-sample prediction models (-13.6% to 3.3%).
Table 2
Outcomes’ Predictability Across Samples (% of Explained Variance)
‍
RUCH-UK UKCH-RU RUUK-CH
Domain Domai Domain
Outcome s Facets Items ns Facets Items s Facets Items
Crime 0.00 0.00 0.00 0.01 0.09 0.16 1.00 0.25 0.25
Donating 3.24 0.36 0.00 3.61 2.89 3.24 0.36 0.16 0.04
Driving Status 0.25 0.16 2.25 2.56 3.24 3.24 0.81 0.25 0.81
Driving Fines 1.96 1.00 0.81 1.69 1.69 2.56 0.81 0.64 1.00
Driving Time 0.64 0.04 0.00 0.04 1.44 1.44 0.00 0.00 0.09
Education 0.36 0.64 0.49 3.24 1.00 1.44 0.04 0.64 0.81
Exercise 1.96 4.00 7.84 1.69 2.56 4.00 1.44 2.89 2.89
Fight History 2.56 1.00 0.36 0.64 1.69 4.41 1.96 0.25 1.21
Number of
Hobbies 2.25 1.96 3.24 6.76 7.29 11.56 4.41 11.56 8.41
Holidays 3.24 5.29 4.00 2.89 1.69 4.00 1.69 2.56 2.25
Income 7.29 7.84 6.25 3.24 1.96 2.56 0.49 0.64 0.16
Relationship
History 0.49 1.44 4.41 2.56 3.24 4.00 1.44 1.96 3.24
Relationship
Time 0.16 0.49 4.00 0.09 0.01 0.09 1.21 0.25 0.04
Career
Satisfaction 6.25 4.41 3.24 2.89 2.25 1.69 0.16 3.24 8.41
Financial
Satisfaction 2.89 3.61 4.00 2.56 1.44 2.56 0.16 0.36 3.61
Home
Satisfaction 1.96 0.04 0.25 1.00 1.00 1.00 2.56 1.69 0.16
Life
Satisfaction 20.25 33.64 60.84 25.00 26.01 40.96 17.64 22.09 33.64
Living Area
Satisfaction 0.64 0.09 0.09 0.25 0.25 0.36 0.49 0.49 0.49
Work
Satisfaction 3.61 0.04 0.81 2.56 1.96 1.96 3.61 1.44 0.04
Smoke 0.04 0.09 0.25 0.01 0.00 0.16 0.00 0.09 0.36
Time with
others 1.21 0.00 0.00 0.36 0.36 0.16 0.09 0.01 0.01
Volunteering 2.56 0.64 0.36 3.61 2.25 0.25 0.16 0.01 0.64
Weight 1.21 0.04 0.09 3.24 0.36 0.16 1.21 0.36 0.04
General Health 12.25 11.56 18.49 10.24 7.29 12.25 3.61 6.25 10.24
Physical
Functioning 2.89 4.41 5.29 2.56 1.44 4.00 2.56 1.21 1.21
Role Physical 6.25 5.29 8.41 7.84 5.76 5.76 5.76 1.69 2.56
Role
Emotional 28.09 26.01 32.49 24.01 23.04 18.49 19.36 13.69 21.16
Bodily Pain 4.41 5.29 4.84 7.84 5.76 7.84 4.41 1.44 2.56
Mental Health 3.61 0.64 0.64 1.44 1.44 0.81 0.64 0.36 0.04
Vitality 24.01 28.09 37.21 24.01 12.96 26.01 19.36 11.56 25.00
Social
Functioning 24.01 19.36 25.00 23.04 20.25 16.00 13.69 6.25 12.25
Significant
Other Social
Support 5.76 8.41 18.49 8.41 11.56 10.89 13.69 19.36 18.49
Family Social
Support 10.89 16.00 19.36 10.89 14.44 16.00 15.21 20.25 24.01
Friends Social
Support 17.64 18.49 6.25 9.00 5.76 2.56 18.49 20.25 8.41
Median 2.89 1.70 3.62 2.89 2.11 2.90 1.44 1.33 1.21
Mean 6.02 6.19 8.24 5.88 5.13 6.25 4.66 4.53 5.72
NOTE: UK = English-speaking sample, RU = Russian-speaking sample, CH = Mandarin-speaking
sample.
Trait-outcome association agreement
The single-profile absolute ICCs (see Table 3) indicated greater cross-sample similarity in trait-
outcome correlations at the domain level (r=.70) than facet (.47) and item (.38) levels, suggesting
that domains’ correlations with outcomes were most replicable among samples, although far from
perfectly. This explains why items’ predictive advantage over domains was smaller in cross-sample
than within-sample predictions, by 70.6%, 92.7%, and 63.7% respectively for outcomes in the UK, RU
and CH samples. This also explains why facets lost their predictive advantage over domains: unlike
for items, their within-sample predictive advantage was not strong enough to start with to withstand
the effect of cross-sample differences in how outcomes were correlated with facets.
Table 3
Cross-Sample Consistency in How Outcomes Correlated with Traits
Domain Facet Item
Outcome ICC ICC ICC
Crime 0.14 0.12 0.13
Donate 0.77 0.52 0.48
Driving Status 0.85 0.40 0.31
Driving Fines 0.58 0.42 0.48
Driving Time 0.54 0.42 0.40
Education 0.71 0.32 0.33
Exercise 0.90 0.67 0.51
Fight History 0.38 0.09 0.22
Number of Hobbies 0.85 0.72 0.56
Holidays 0.84 0.48 0.32
Income 0.75 0.42 0.29
Relationship History 0.69 0.34 0.30
Relationship Time 0.45 0.51 0.57
Career Satisfaction 0.72 0.46 0.31
Financial Satisfaction 0.69 0.31 0.17
Home Satisfaction 0.77 0.68 0.69
Life Satisfaction 0.91 0.55 0.39
Living Area Satisfaction 0.67 0.60 0.63
Work Satisfaction 0.80 0.75 0.75
Smoke 0.24 0.24 0.09
Time with others 0.72 0.34 0.25
Volunteering 0.80 0.54 0.37
Weight 0.01 0.03 0.00
General Health 0.97 0.69 0.45
Physical Functioning 0.81 0.46 0.31
Role Physical 0.89 0.63 0.46
Role Emotional 0.87 0.58 0.40
Bodily Pain 0.89 0.52 0.39
Mental Health 0.49 0.29 0.22
Vitality 0.96 0.75 0.49
Social Functioning 0.90 0.61 0.42
Significant Other Social Support 0.82 0.57 0.45
Family Social Support 0.92 0.59 0.49
Friends Social Support 0.90 0.70 0.47
Mean 0.70 0.47 0.38

Median 0.79 0.52 0.40
Discussion
We set out to investigate three main features of trait-outcome associations. Firstly and secondly, we
investigated the extents to which (a) outcomes were similarly predictable across samples with
diverse cultural backgrounds and (b) specific trait-outcome associations replicated across the
samples. Thirdly, we asked whether the pattern of lower level traits (nuances and facets) out-
predicting domains would replicate not only within but also across samples.
Our results suggest that outcomes were generally more predictable from domains, facets, and
nuances in the English-speaking sample of UK residents than among Russian-speakers from mostly
Eastern Europe and Mandarin-speakers mostly form China. Next, among samples, trait-outcome
associations tended to moderately replicate, although the replication was generally much better for
domains than for facets and nuances. That is, domains associated with an outcome in one sample
were more likely to be similarly associated with the same outcome in another, compared to facets
and nuances. Yet, nuances generally predicted outcomes most accurately not only within but also
among samples. So, nuances general predictive advantage over domains (and facets) was sufficiently
strong to be present in cross-sample predictions even when individual nuance-outcome associations
varied substantially among the samples.
(Lack of) measurement invariance (MI)
As we compared culturally diverse samples, we checked our scales’ MI to examine the degrees to
which the constructs were assessed similarly in the three groups (see Supplementary Tables S3, S4
and S5). The scales rarely met even minimal MI criteria often considered sufficient for cross-sample
comparisons (Van De Schoot et al., 2015). For our analyses, it would have been desirable if the scales
met at least cross-sample metric and preferably residual MI, suggesting that items defined the
constructs with similar relative and absolute strengths; since we did not plan to compare trait levels
between samples, strict invariance indicating intercept equality was not necessary. Unfortunately,
pervasive lack of MI is almost invariably observed in cross-cultural personality studies (Dong &
Dumas, 2020). As a result, rather than concluding that our results – and thereby the results of
virtually all studies using self-report scales to make comparisons among culturally diverse samples
tested in different languages – are meaningless, we will next interpret our observations in more
nuanced ways.
Personality traits are nuanced constructs
It is plausible that translation difficulties that we were unable to address despite our best efforts
contributed to both (a) inconsistencies in trait-outcome correlations and their being higher for lower
levels of the trait hierarchy (aggregation can smooth them over) and (b) poor measurement
invariance. Also, sampling differences may have been involved, including possible differences in data
quality (e.g., UK participants were “professional” participants used to being compensated for high-
quality work, whereas the RU and CH participants were internet volunteers). If so, our observations
may underestimate the extent to which personality traits relate to life outcomes similarly among
culturally diverse samples. That is, trait-outcome associations may in fact be more replicable than we
showed, especially for narrower traits, rather than less replicable.
Besides translation difficulties and possible data quality differences, however, both poor MI and
inconsistencies in trait-outcome correlations, especially at lower levels of the trait hierarchy, can be
interpreted substantively as personality being a nuanced phenomenon. That is, trait constructs
consist of partly, and often even largely (McCrae & Mõttus, 2019), autonomous narrow nuance-traits
that often relate to each other and other variables in distinct ways, explaining why – to researchers’
despair – personality items rarely if ever neatly coalesce into scales (Hopwood et al., 2011) and
items usually out-predict domains for many outcomes (Mõttus et al., 2020). Likewise, these nuance-
traits are likely to vary slightly in meaning and patterns of associations with each other and other
variables across diverse contexts, explaining pervasively poor MI among cultures (Dong & Dumas,
2020). In aggregate scale scores, however, these nuances’ uniquenesses are often smoothed over,
making the aggregates more robust albeit blunter research tools. This, of course, is another
instantiation of the widely-recognised bandwidth-fidelity dilemma (Ones & Viswesvaran, 1996). If so,
causes underlying poor MI and, relatedly, greater cross-sample variability in how lower-level traits
are linked with life outcomes may be understood as a fact of nature rather than a methodological
problem and researchers may always have to choose whether they want greater accuracy in their
findings, having to tolerate greater complexity in return, or whether they prefer simpler and
(somewhat) more replicable but less accurate findings. One cannot have their cake and eat it.
WEIRDly higher predictability
One notable inconsistency among samples was that personality traits generally predicted outcomes
better in the WEIRDEst sample, the UK, than in the arguably less WEIRD RU and CH samples. This
may have been because its professional participants provided better data. That said, this may also
have been because outcomes were generally more dependent on people’s own behaviour than on
circumstances less directly under their control in the UK than in the other samples. For example,
many outcomes (e.g. the ease of accessing a car, or of obtaining a driver’s license, volunteering
traditions or divorce rate) may have been differentially prevalent across samples, due to traditions,
accessibility, or any other reason. If so, however, we should have seen cross-sample differences in
the outcomes’ means and/or variances, but there was no systematic pattern in the outcomes having
higher standard deviations in the UK sample than in both the RU and CH samples (the outcomes’
median standard deviation was higher in the CH, but similar in the UK and RU, whereas the median
of the outcome’s means was the lowest in RU, but similar for the UK and CH; Supplementary Table
S16). So, it remains an open question why outcomes appear to be more predictable in comparatively
more Western-like samples.
The Big Picture
Our observations indicated moderate inconsistencies in how personality traits were linked with
outcomes in culturally diverse samples. These inconsistencies may provide valuable information for
better understanding the kinds of behaviours that contribute to specific outcomes in particular
circumstances. For one example, ability to resist impulses may be a comparatively stronger
contributor to exercising in the UK-like samples than in other samples (see Tables S7 to S15) and
interventions aiming to promote exercising may do well considering this personality nuance,
depending on where the interventions are meant to take place. For instance, exercising
interventions that help people work around their self-control limitations (e.g., by imposing a
systematic exercising regime and incentives structure) may work better in the UK-like circumstances,
whereas interventions that make exercising more available and/or socially normative, or simply help
people find time for exercising, may work better in some other countries.
On the flip side, however, our observations suggest that there is substantial robustness in how
personality traits are linked with outcomes, especially at the domain level. We had no a priori reason
to preclude the possibility that trait-outcome associations could be very different among diverse
samples, preventing one from using observations from one sample to make any useful guesses
about people in different circumstances. Fortunately for those who expect at least some generalities
in human psychology and behaviour (e.g., Allik et al., 2013), this was not the case. To an extent, our
observations suggest that the Western studies that dominate the personality traits-life outcomes
literature have at least moderate relevance in other parts of the world. This conclusion is consistent
with other studies indicating cross-cultural consistency in diverse psychological research findings
(e.g., Klein et al., 2018).
Future research
Psychological research’s often poor replicability has been well documented (Collaboration, 2015;
Wiggins & Christopherson, 2019). Although initial evidence (e.g., Soto, et al., 2019, and ours)
suggests that personality trait-outcome associations may be at least moderately replicable, far more
research is needed. First, this research should address broader ranges of cultural backgrounds than
we were able to cover. Possibly, observations may replicate worse among more diverse
circumstances, although this is certainly not given. Second, future research should consider broader
ranges of outcomes, especially more “objective” life-course variables such as formal records of
grades, income, health, antisocial behaviour, among others. Third, this research should include
assessments of personality traits that go beyond self-reports (McCrae & Mõttus, 2019). For example,
combining self-reports with informant-ratings may help to (a) overcome single-method biases (e.g.,
socially desirable response styles) that could operate differently in different circumstances and (b)
measure nuances more reliably. Fourth, research could be based on alternative personality
questionnaires, especially those covering broader ranges of traits than the NEO-PI-R facets (Mõttus
et al., 2020). Moreover, such work could use assessments developed in less WEIRD cultural
circumstances (e.g., Cheung et al., 2020; Fetvadjiev et al., 2015).
Conclusion
In conclusion, our results suggest that there is moderate cross-sample agreement for trait-outcome
associations, more at the domain level and less at lower levels of the trait hierarchy. So, it appears
that individuals with similar domain scores often tend to experience similar outcomes in different
circumstances, whereas the specific affects, behaviours, thoughts, and motivations that define the
domains and often more distinctively relate to these outcomes vary more across cultures. We also
add to a growing evidence that nuances predict outcomes better than facets and domains. So far,
this has been shown for models created and tested for predictive powers in largely similar samples;
we showed that nuances’ incremental predictive power decreases but remains present even when
the models are created and tested in samples with diverse backgrounds, tested in different
languages.
References
Allik, J., Realo, A., & McCrae, R. R. (2013). Universality of the five-factor model of personality. In
Personality disorders and the five-factor model of personality, 3rd ed. (pp. 61-74). American
Psychological Association. https://doi.org/10.1037/13939-005
Arslan, R. C., Walther, M. P., & Tata, C. S. (2020). formr: A study framework allowing for automated
feedback generation and complex longitudinal experience-sampling studies using R.
Behavior Research Methods, 52(1), 376-387. https://doi.org/10.3758/s13428-019-01236-y
Arumäe, K., Mõttus, R., & Vainik, U. (2023). Body mass predicts personality development across
18 years in middle to older adulthood. Journal of
Personality, 00, 1– 15. https://doi.org/10.1111/jopy.12816
https://doi.org/https://doi.org/10.1111/jopy.12816
Bleidorn, W., Hill, P. L., Back, M. D., Denissen, J. J. A., Hennecke, M., Hopwood, C. J., Jokela, M.,
Kandler, C., Lucas, R. E., Luhmann, M., Orth, U., Wagner, J., Wrzus, C., Zimmermann, J., &
Roberts, B. (2019). The policy relevance of personality traits. Am Psychol, 74(9), 1056-1067.
https://doi.org/10.1037/amp0000503
Bucher, M. A., Suzuki, T., & Samuel, D. B. (2019). A meta-analytic review of personality traits and
their associations with mental health treatment outcomes. Clin Psychol Rev, 70, 51-63.
https://doi.org/10.1016/j.cpr.2019.04.002
Cartwright, A. V., Pione, R. D., Stoner, C. R., & Spector, A. (2022). Validation of the multidimensional
scale of perceived social support (MSPSS) for family caregivers of people with dementia.
Aging Ment Health, 26(2), 286-293. https://doi.org/10.1080/13607863.2020.1857699
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance.
Structural Equation Modeling, 14, 464-504. https://doi.org/10.1080/10705510701301834
Cheung, F.M. (2020). Chinese Personality Assessment Inventory. In: Zeigler-Hill, V., Shackelford, T.K.
(eds) Encyclopedia of Personality and Individual Differences. Springer, Cham. https://doi.org/
10.1007/978-3-319-24612-3_17
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement
invariance. Structural Equation Modeling, 9, 233-255.
https://doi.org/10.1207/S15328007SEM0902_5
Collaboration, O. S. (2015). Estimating the reproducibility of psychological science. Science,
349(6251), aac4716. https://doi.org/doi:10.1126/science.aac4716
Condon, D. M. (2018). The SAPA Personality Inventory: An empirically-derived, hierarchically-
organized self-report personality assessment model. https://doi.org/10.31234/osf.io/sc4p9
Condon, D. M., Wood, D., Mõttus, R., Booth, T., Costantini, G., Greiff, S., Johnson, W., Lukaszewski,
A., Murray, A., Revelle, W., Wright, A. G. C., Ziegler, M., & Zimmermann, J. (2020). Bottom
Up Construction of a Personality Taxonomy. European Journal of Psychological Assessment,
36(6), 923-934. https://doi.org/10.1027/1015-5759/a000626
Costa, P. T., Jr., & McCrae, R. R. (2008). The Revised NEO Personality Inventory (NEO-PI-R). In G. J.
Boyle, G. Matthews, & D. H. Saklofske (Eds.), The SAGE handbook of personality theory and
assessment, Vol. 2. Personality measurement and testing (pp. 179–198). Sage Publications,
Inc. https://doi.org/10.4135/9781849200479.n9
Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The Satisfaction With Life Scale. Journal
of Personality Assessment, 49(1), 71-75. https://doi.org/10.1207/s15327752jpa4901_13
Dong, Y., & Dumas, D. (2020). Are personality measures valid for different populations? A systematic
review of measurement invariance across cultures, gender, and age. Personality and
Individual Differences, 160, 109956.
https://doi.org/https://doi.org/10.1016/j.paid.2020.109956
Dudley, N. M., Orvis, K. A., Lebiecki, J. E., & Cortina, J. M. (2006). A meta-analytic investigation of
conscientiousness in the prediction of job performance: Examining the intercorrelations and
the incremental validity of narrow traits. Journal of Applied Psychology, 91, 40-57.
https://doi.org/10.1037/0021-9010.91.1.40
Fetvadjiev, V. H., Meiring, D., van de Vijver, F. J. R., Nel, J. A., & Hill, C. (2015). The South African
Personality Inventory (SAPI): A culture-informed instrument for the country’s main
ethnocultural groups. Psychological Assessment, 27(3), 827–
837. https://doi.org/10.1037/pas0000078
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., & Gough, H. G.
(2006). The international personality item pool and the future of public-domain personality
measures. Journal of Research in Personality, 40(1), 84–
96. https://doi.org/10.1016/j.jrp.2005.08.007
Henry, S., & Mõttus, R. (2023, May 24). The 100 Nuances of Personality: Development of a
Comprehensive, Non-Redundant Personality Item Pool.
https://doi.org/10.17605/OSF.IO/TCFGZ
Hoffmann, C., McFarland, B. H., Kinzie, J. D., Bresler, L., Rakhlin, D., Wolf, S., & Kovas, A. E. (2005).
Psychometric properties of a Russian version of the SF-12 Health Survey in a refugee population.
Comprehensive Psychiatry, 46(5), 390-397.
https://doi.org/https://doi.org/10.1016/j.comppsych.2004.12.002
Hopwood, C. J., Malone, J. C., Ansell, E. B., Sanislow, C. A., Grilo, C. M., McGlashan, T. H., Pinto, A.,
Markowitz, J. C., Shea, M. T., & Skodol, A. E. (2011). Personality assessment in DSM-5:
Empirical support for rating severity, style, and traits. Journal of personality disorders, 25(3),
305-320.
JASP Team (2023). JASP (Version 0.17.2)[Computer software].
Jonassaint, C. R., Siegler, I. C., Barefoot, J. C., Edwards, C. L., & Williams, R. B. (2011). Low life course
socioeconomic status (SES) is associated with negative NEO PI-R personality patterns. Int J
Behav Med, 18(1), 13-21. https://doi.org/10.1007/s12529-009-9069-x
Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., Aveyard, M., Axt, J. R.,
Babalola, M. T., Bahník, Š., Batra, R., Berkics, M., Bernstein, M. J., Berry, D. R., Bialobrzeska, O.,
Binan, E. D., Bocian, K., Brandt, M. J., Busching, R., . . . Nosek, B. A. (2018). Many Labs 2: Investigating
Variation in Replicability Across Samples and Settings. Advances in Methods and Practices in
Psychological Science, 1(4), 443-490. https://doi.org/10.1177/2515245918810225
Lackner, N., Unterrainer, H.-F., & Neubauer, A. C. (2013). Differences in Big Five Personality Traits
Between Alcohol and Polydrug Abusers: Implications for Treatment in the Therapeutic
Community. International Journal of Mental Health and Addiction, 11(6), 682-692.
https://doi.org/10.1007/s11469-013-9445-2
Mammadov, S. (2022). Big Five personality traits and academic performance: A meta-
analysis. Journal of Personality, 90, 222– 255. https://doi.org/10.1111/jopy.12663
Meade, A. W., & Lautenschlager, G. J. (2004). A Comparison of Item Response Theory and
Confirmatory Factor Analytic Methodologies for Establishing Measurement
Equivalence/lnvariance. Organizational Research Methods, 7(4), 361-
388. https://doi.org/10.1177/1094428104268027
McCrae, R. R. (2015). A more nuanced view of reliability: Specificity in the trait hierarchy. Personality
and Social Psychology Review, 19, 97-112. https://doi.org/10.1177/1088868314541857
McCrae, R. R., & Mõttus, R. (2019). What Personality Scales Measure: A New Psychometrics and Its
Implications for Theory and Assessment. Current Directions in Psychological Science, 28(4),
415-420. https://doi.org/10.1177/0963721419849559
Mõttus, R., Kandler, C., Bleidorn, W., Riemann, R., & McCrae, R. R. (2017). Personality traits below
facets: The consensual validity, longitudinal stability, heritability, and utility of personality
nuances. J Pers Soc Psychol, 112(3), 474-490. https://doi.org/10.1037/pspp0000100
Mõttus, R., McCrae, R. R., Allik, J., & Realo, A. (2014). Cross-rater agreement on common and specific
variance of personality scales and items. Journal of Research in Personality, 52, 47-54.
https://doi.org/https://doi.org/10.1016/j.jrp.2014.07.005
Mõttus, R., & Rozgonjuk, D. (2021). Development is in the details: Age differences in the Big Five
domains, facets, and nuances. Journal of Personality and Social Psychology, 120, 1035-1048.
https://doi.org/10.1037/pspp0000276
Mõttus, R., Sinick, J., Terracciano, A., Hřebíčková, M., Kandler, C., Ando, J., Mortensen, E. L., Colodro-
Conde, L., & Jang, K. L. (2019). Personality characteristics below facets: A replication and
meta-analysis of cross-rater agreement, rank-order stability, heritability, and utility of
personality nuances. J Pers Soc Psychol, 117(4), e35-e50.
https://doi.org/10.1037/pspp0000202
Mõttus, R., Wood, D., Condon, D. M., Back, M. D., Baumert, A., Costantini, G., Epskamp, S., Greiff, S.,
Johnson, W., Lukaszewski, A., Murray, A., Revelle, W., Wright, A. G. C., Yarkoni, T., Ziegler,
M., & Zimmermann, J. (2020). Descriptive, Predictive and Explanatory Personality Research:
Different Goals, Different Approaches, but a Shared Need to Move Beyond the Big Few
Traits. European Journal of Personality, 34(6), 1175-1201.
https://doi.org/https://doi.org/10.1002/per.2311
O'Meara, M. S., & South, S. C. (2019). Big five personality domains and relationship satisfaction:
Direct effects and correlated change over time. Journal of Personality, 87, 1206-1220.
https://doi.org/10.1111/jopy.12468
Olaru, G., Stieger, M., Rüegger, D., Kowatsch, T., Flückiger, C., Roberts, B. W., & Allemand, M.
Personality change through a digital-coaching intervention: Using measurement invariance
testing to distinguish between trait domain, facet, and nuance change. European Journal of
Personality, 0(0), 08902070221145088. https://doi.org/10.1177/08902070221145088
Ones, D. S., & Viswesvaran, C. (1996). Bandwidth-fidelity dilemma in personality measurement for
personnel selection. Journal of Organizational Behavior, 17, 609-626.
https://doi.org/10.1002/(SICI)1099-1379(199611)17:6<609::AID-JOB1828>3.0.CO;2-K
Ozer, D. J., & Benet-Martínez, V. (2006). Personality and the prediction of consequential outcomes.
Annu Rev Psychol, 57, 401-421. https://doi.org/10.1146/annurev.psych.57.102904.190127
Pushkarev, G. S., Zimet, G. D., Kuznetsov, V. A., & Yaroslavskaya, E. I. (2020). The Multidimensional
Scale of Perceived Social Support (MSPSS): Reliability and Validity of Russian Version. Clin
Gerontol, 43(3), 331-339. https://doi.org/10.1080/07317115.2018.1558325
Roberts, B. W., Kuncel, N. R., Shiner, R., Caspi, A., & Goldberg, L. R. (2007). The Power of Personality:
The Comparative Validity of Personality Traits, Socioeconomic Status, and Cognitive Ability
for Predicting Important Life Outcomes. Perspect Psychol Sci, 2(4), 313-345.
https://doi.org/10.1111/j.1745-6916.2007.00047.x
Seeboth, A., & Mõttus, R. (2018). Successful Explanations Start with Accurate Descriptions:
Questionnaire Items as Personality Markers for More Accurate Predictions. European
Journal of Personality, 32(3), 186-201. https://doi.org/https://doi.org/10.1002/per.2147
Soto, C. J. (2019). How Replicable Are Links Between Personality Traits and Consequential Life
Outcomes? The Life Outcomes of Personality Replication Project. Psychological Science,
30(5), 711–727. https://doi.org/10.1177/0956797619831612
Soutter, A. R. B., & Mõttus, R. (2021). Big Five facets' associations with pro-environmental attitudes
and behaviors. Journal of Personality, 89(2), 203-215.
https://doi.org/https://doi.org/10.1111/jopy.12576
Stewart, R. D., Mõttus, R., Seeboth, A., Soto, C. J., & Johnson, W. (2022). The finer details? The
predictability of life outcomes from Big Five domains, facets, and nuances. Journal of
Personality, 90(2), 167-182. https://doi.org/https://doi.org/10.1111/jopy.12660
The jamovi project (2022). jamovi (Version 2.3) [Computer Software]. Retrieved from
https://www.jamovi.org
Trapmann, S., Hell, B., Hirn, J.-O., & Schuler, H. (2007). Meta-Analysis of the Relationship Between
the Big Five and Academic Success at University. Zeitschrift für Psychologie, 215, 132-151.
https://doi.org/10.1027/0044-3409.215.2.132
Vainik, U., Dagher, A., Realo, A., Colodro-Conde, L., Mortensen, E. L., Jang, K., Juko, A., Kandler, C.,
Sørensen, T. I. A., & Mõttus, R. (2019). Personality-obesity associations are driven by narrow
traits: A meta-analysis. Obes Rev, 20(8), 1121-1131. https://doi.org/10.1111/obr.12856
Van De Schoot, R., Schmidt, P., De Beuckelaer, A., Lek, K., & Zondervan-Zwijnenburg, M. (2015).
Editorial: Measurement Invariance [Editorial]. Frontiers in Psychology, 6.
https://doi.org/10.3389/fpsyg.2015.01064
Waldmann, P., Mészáros, G., Gredler, B., Fürst, C., & Sölkner, J. (2013). Evaluation of the lasso and
the elastic net in genome-wide association studies [Original Research]. Frontiers in Genetics,
4. https://doi.org/10.3389/fgene.2013.00270
Wang, D., Zhu, F., Xi, S., Niu, L., Tebes, J. K., Xiao, S., & Yu, Y. (2021). Psychometric Properties of the
Multidimensional Scale of Perceived Social Support (MSPSS) Among Family Caregivers of
People with Schizophrenia in China. Psychol Res Behav Manag, 14, 1201-1209.
https://doi.org/10.2147/prbm.S320126
Ware, J., Jr, Kosinski, M., & Keller, S. D. (1996). A 12-Item Short-Form Health Survey: construction of
scales and preliminary tests of reliability and validity. Medical care, 34(3), 220–233.
https://doi.org/10.1097/00005650-199603000-00003
Wiggins, B. J., & Christopherson, C. D. (2019). The replication crisis in psychology: An overview for
theoretical and philosophical psychology. Journal of Theoretical and Philosophical Psychology, 39,
202-217. https://doi.org/10.1037/teo0000137
Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From
Machine Learning. Perspect Psychol Sci, 12(6), 1100-1122.
https://doi.org/10.1177/1745691617693393
Zimet, G. D., Dahlem, N. W., Zimet, S. G., & Farley, G. K. (1988). The Multidimensional Scale of
Perceived Social Support. Journal of Personality Assessment, 52(1), 30–
41. https://doi.org/10.1207/s15327752jpa5201_2
Zou, H. and Hastie, T. (2005), Regularization and variable selection via the elastic net. Journal of the
Royal Statistical Society: Series B (Statistical Methodology), 67: 301-
320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

Personality Outcomes Across Samples Final MS

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Personality Outcomes Across Samples Final MS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Personality Outcomes Across Samples Final MS

Uploaded by

Copyright:

Available Formats

The ways of the world?

Cross-sample replicability of personality trait-life

Strengths of personality trait-life outcome associations

The code can be found at https://osf.io/xeqch/.

Associations within samples

Overall, although cross-sample predictions tended to be less accurate than within-sample

Mean 0.70 0.47 0.38

(Lack of) measurement invariance (MI)

Personality traits are nuanced constructs

WEIRDly higher predictability

The Big Picture

You might also like