Comprehensive Exam - Test Translation & Validation
Comprehensive Exam - Test Translation & Validation
Comprehensive Exam - Test Translation & Validation
Item 2
The Standards for Educational and Psychological Testing state that mere translation of a test
from one language into another cannot guarantee that the translated version of the test is
comparable in content to the original version of the test, or that the scores produced by the
translated test are equally reliable, precise, and valid as those from the original test (AERA et
al., 2014, p. 60). The Standards also require a detailed documentation of the process of
translation and of “empirical or logical evidence for the validity of test score interpretations
for intended use” (AERA et al., 2014, p. 69). To adapt the Critical Thinking Motivation Scale
the new scale, I will follow International Test Commission Guidelines (2017), ITC hereafter.
The pre-condition stage highlights the decisions that have to be made before starting the
translation of the test (ITC, 2017). The first step is to obtain a permission from holders of the
intellectual property rights, Valenzuela, J., Nieto, A. M., and Saiz, C. to adapt and translate
the Critical Thinking Motivation Scale (CTMS). A signed agreement will specify the
modifications in the adapted test that will be acceptable regarding the characteristics of the
original test and will state an owner of the intellectual property rights in the adapted version.
The second step is the foundation of valid cross-cultural comparisons (ITC, 2017, p. 9): I will
have to estimate a sufficient overlap in the definition and content of the construct measured
by CTMS and the item content in the population of American undergraduate students. In
other words, I have to ensure that CTMS-E would measure the same concept as CTMS does
(AERA et al., 2014, p. 68). To that end, I have to ascertain that 1) construct of Motivation
for Critical Thinking (MfCT hereafter) is understood in the same way by the scientists from
both academic cultures: from English-speaking and Spanish-speaking academia, and 2) the
American undergraduate students in the same way as their Spanish and Chilian counterparts
understood it when filling out the CTMS. According to Hambleton (2005, p. 7), ensuring
Page 1 of 10
Comprehensive Examination
Item 2
construct equivalence between two cultures is a very subjective task and “involves primarily
judgmental strategies. A researcher must begin by using his or her common sense.” By my
own judgement, the equivalency of both concepts, CT and MfCT, in two academic cultures
developed by a panel of 46 experts in the humanities, sciences, social sciences, and education
within the US and Canada (Facione, 1990). In following decades, Spanish academics actively
CT (Evans, J. S. B. T., 2006; Noveck, et al., 2007; Saiz, 2002; Saiz & Rivas, 2008,
Valenzuela et al., 2010). Moreover, Valenzuela et al. (2011) developed CTMS based on
(2002). The model holds that the motivation to perform a particular task is the product of the
Expectation that a person has about performing a task adequately and the Value assigned to a
task: the value of a task incorporates four sub-components: attainment, interest, utility, and
cost. Valenzuela et al. (2011) depicted a clear image of international collaboration of French-,
MfCT (Carré & Fenouillet 2008; Mateos et al., 2002; Neuville et al., 2004; Wigfield &
Eccles, 2000). Hence, I assume that CTMS measures the same concepts of CT, expectancy,
value, attainment, interest, utility, and cost that are defined by international academia.
To ascertain that the construct of CT is operationalized in the same way in two cultures, I will
difficult to find bilinguals with equal fluency in two languages. Hence, out of 12, six students
will be English native speakers, and other six will be Spanish native speakers (at least two
males and two females per native language). Because the concept of CT is seldom used in
non-academic context, the main requirement to this 12 and all other bilingual students
involved in my study will be being graduated from high school in one of two languages and
Page 2 of 10
Comprehensive Examination
Item 2
being studying in university in another language. In this way I will exclude those individuals
“Critical Thinking” and “Pensamiento Crítico” have similar meaning for them, and how
different they sound if they do. I will ask six Spanish native speakers about Spanish words for
critically, using one’s intellectual skills correctly): does this phrase sound completely
synonymous to CT or is there a degree of difference? Which word or phrase would you use in
English to reflect that difference? I will ask all participants to peruse CTMS and to translate
the questions into English orally as close as possible. I will not ask the students about their
understanding of the concepts of motivation and MfCT because these terms are not used in
CTMS but are operationalized by other, frequently used verbs: like to, be prepared to, be
worth of, be important to, feel capable, and be able to learn. Professionally trained translators
will work with those words. After confirming that “Critical Thinking” and “Pensamiento
Crítico” have a substantial overlap of meaning in two languages, I proceed to the next step.
The Test Development stage starts with choosing translators. In an ideal situation, I would
hire four trained translators (two English native speakers and two Spanish native speakers)
with qualifications beyond knowledge of the two languages (Grisay, 2003). The translators
should be close to the target population by demographic characteristics. Hence, they should
be university students (seniors, or graduates), below age of 30, with one male and one female
translator per native language. Translators must be familiar with general principles of testing.
To that end, I will train them using Hambleton and Zenisky’ empirically validated Review
Form, which lists different features of a translated test that should be checked during the
translation process (2010, p. 49). I selected 16 features from the Form, such as “Are there
Page 3 of 10
Comprehensive Examination
Item 2
cultural differences that would have an effect on the likelihood of a response being chosen
when the item is presented in the source or target language version?” and others (Appendix).
item equivalence between CTMS and CTMS-E (Hambleton & Patsula, 1999; ITC, 2017).
Two English native-speaking translators will independently translate the CTMS into English
(forward translation). They will have access to the audio records of 12 interviews to assist
with translation. Then, each of two Spanish native speakers will translate one of two English
copies back into Spanish without ever seeing the original CTMS (backward translation),
neither listening the interviews. The English items that after backward translation turn into
their original wording in Spanish are the best candidates for the new scale. To achieve such
effect, the translators should be skilled in functional rather than literal translation, made in
natural and acceptable language (ITC, 2017). Finally, two translators with Spanish as a native
tongue will listen to 12 interviews, and all four translators as a panel will review all items that
didn’t pass testing by “ backward translation” and will improve them (Geisinger, 1994).
If funds allow, I will provide three types of evidence that the test instructions and item
content have near identical meaning in two languages. Firstly, I will send both CTMS and
CTMS-E to the following experts in MfCT: Jorge Valenzuela, Universidad Católica del
Maule, Chile; Carlos Saiz and Ana Maria Nieto, Universidad de Salamanca, Spain; and Peter
Facione, Measured Reasons LLC, California. I will ask the experts to face-value 1) the
equivalence of the construct of CT across two versions (ITC, 2017); 2) how equally each item
relates to and measures the concepts of Expectancy and Value and the subordinate concepts
of attainment, utility, interest, and cost; 3) whether the favorability of each item compares
across two versions; and 4) to give open-ended feedback with suggestions on the wording of
CTMS-E, which will also provide validity evidence based on test content.
Page 4 of 10
Comprehensive Examination
Item 2
Secondly, I will conduct think aloud protocol (Trenor et al, 2011) with 4 bilingual students
that had not been involved in interviews: two Spanish- and two English-speakers (one male
and one female per language), asking them to fill both versions of test simultaneously while
talking aloud about 1) the differences in their cognitive and emotional reactions to two versi-
ons of each item; and 2) understanding well the instructions and response scale, with a focus
on the words describing “degree of agreement or disagreement” in Likert scale: “Does this
word mean completely same thing to you in two languages?” With help of translators, I will
make adjustments to CTMS-E following the suggestions from the experts and think-alouds
about both the item- and content equivalence of the two versions of the test (ITC, 2017).
Finally, I will conduct cognitive interviews with four monolingual students (two males and
two females) about their experience of filling a pilot survey of CTMS-E. This will help to
assess content equivalence and provide a validity evidence based on response processes
(AERA et al., 2014). Via pilot online survey, I will collect the data from 100 English-speaking
monolingual undergraduate students. It will include all 19 items of CTMS-E and 5 items of
the critical thinking subscale of Motivated Strategies for Learning Questionnaire, CT- MSLQ
(Duncan & McKeachie, 2005). Following the example of Valenzuela et al. (2011), I will
obtain validity evidence based on convergent relations to other variables by testing the
correlation between the results of CT- MSLQ and CTMS from the same sample: Motivated
Strategies for CT should be significantly associated with MfCT (García and Pintrich, 1992).
The pilot survey will provide data to assess the psychometric qualities of CTMS-E (ITC,
2017). Via reliability test (Traub & Rowley, 1991), I will estimate internal consistency for
both subscales of value and expectancy and for four sub-scales of value: attainment, utility,
interest, and cost. If translation is successful, and the degree of precision with which the
CTMS-E measures every construct is high, the numbers of Cronbach alpha will be similar to
Valenzuela et al.’s results, between 0.7 and 1. I will also test the discriminatory capacity of
Page 5 of 10
Comprehensive Examination
Item 2
the translated items in every subscale via means of the discrimination index D (Findley,
1956). Sensitive items should be able to distinguish between upper and lower 27% of the
scale (highly-motivated and low-motivated groups). Lastly, via two factorial structure
analyses, I will test two principle components, value and expectancy, in the construct of
MfCT and four principle components of the construct of value: attainment, utility, interest,
and cost. The nineteen items of CTMS-E have to load on principal components in the same
If the cognitive interviews and statistical testing reveal that some items are confusing, or too
“easy” or “hard” in comparison to other items, or showing low or negative discriminating po-
wers, I will review them with translators for correction of possible flaws. Then, CTMS-E will
be ready for the task of assessing critical thinking in U.S. college students. This will be also a
large-scale validation study for CTMS-E. To obtain convergent validity evidence, I will
include the Truth-seeking Sub-scale of California Critical Thinking Skills Test (CCTST) in
the survey. I will run item discriminatory ability, factor analyses, and reliability tests on
survey results. Strong correlation between students’ answers to the items of Truth-seeking
Sub-scale of CCTST and CTMS-E will confirm convergent validity of CTMS-E. Since the
MfCT construct rather than comparison of test-taker performance across two language
versions of CTMS, careful examination of the validity of CTMS-E is essential. However, for
future use of CTMS-E in cross-cultural analyses, to confirm the construct equivalence of the
two forms is important too (ITC, 2017). To that end, I will carry out a structural equation
modeling comparative online study on 1000 monolingual college students: 500 from Spain
and Chili and 500 from the U.S (Byrne, 2006). I will match two groups by age, gender, socio-
economic status, and college majors. Additionally, insignificant correlation between students’
majors and sub-scale scores of CTMS-E will confirm discriminant validity of CTMS-E.
Page 6 of 10
Comprehensive Examination
Item 2
References
Page 7 of 10
Comprehensive Examination
Item 2
Hambleton, R. K., & Zenisky, A. (2010). Translating and adapting tests for cross-cultural
assessment. In D. Matsumoto & F. van de Vijver (Eds.), Cross-cultural research
methods (pp. 46-74). New York, NY; Cambridge University Press.
International Test Commission. (2017). The ITC Guidelines for Translating and Adapting
Tests (Second edition). [www.InTestCom.org]
Mateos, M., Palmero, F., Fernández-Abascal, E., Martínez, F., & Choliz, M. (2002). Teorías
Motivacionales Psicología de la de la Motivación y la Emoción (pp. 155-186).
Madrid: Mc Graw Hill.
Neuville, S., Bourgeois, É., & Frenay, M. (2004). The subjective task value: clarification of a
construct. In S. Neuville (Ed.), La perception de la valeur des activités
d'apprentissage : étude des déterminants et effets. Louvain la neuve: Unpublished
Doctoral Thesis. Université Catholique de Louvain.
Noveck, I., Mercier, H., Rossi, S., & Van der Henst, J. B. (2007). Psichologie cognitive du
raisonnement. In S. Rossi & J. B. Van der Henst (Eds.), Psychologies du
raisonnement. Bruxelles: de Boeck.
Saiz, C. (Ed.). (2002). Pensamiento crítico: conceptos básicos y actividades prácticas.
Madrid: Pirámide.
Saiz, C., & Rivas, S. (2008). Intervenir para transferir en Pensamiento Crítico. Revista
Praxis, 10(13), 129-149. Standards for educational and psychological testing .
(2014). American Educational Research Association.
Traub, R. E., & Rowley, G. L. (1991). Understanding reliability. Educational measurement:
Issues and practice, 10(1), 37-45.
Trenor, J. M., Miller, M. K., & Gipson, K. G. (2011). Utilization of a think-aloud protocol to
cognitively validate a survey instrument identifying social capital resources of
engineering undergraduates. (Report No. AC 2011-925). Clemson University:
Clemson, SC.
Valenzuela, J., Nieto, A., & Saiz, C. (2010). Percepción del coste de utilización Pensamiento
crítico en universitarios chilenos y españoles. Electronic Journal of Research in
Educational Psychology 8(2), 689-706.
Valenzuela, J., Nieto, A. M., & Saiz, C. (2011). Critical thinking motivational scale: A
contribution to the study of relationship between critical thinking and motivation.
Electronic Journal of Research in Educational Psychology, 9(2), 823-848.
Wigfield, A., & Eccles, J. (2000). Expectancy-Value Theory of Achievement Motivation.
Contemporary educational psychology, 25(1), 68-81.
Page 8 of 10
Comprehensive Examination
Item 2
Appendix
1. Does the item have the same or highly similar meaning in the two languages?
2. Is the language of the translated item of comparable difficulty and commonality with
respect to the words in the item in the source language version?
3. Does the translation introduce changes in the text (omissions, substitutions, or
additions) that might make the respondent answer this item differently in the two
language versions?
4. Are there differences between the target and source language versions of the item
related to the use of metaphors, idioms, or colloquialisms?
5. Is the item format, including physical layout, the same in the two language versions?
6. Is the length of the item about the same in the two language versions?
7. Will the format of the item and task required of the examinee be equally familiar in the
two language versions?
8. Is there any modification of the item’s structure such as the placement of clauses or
other word order changes that might make this item more or less complex in the target
language version?
9. Are there any grammatical clues in this item that might make the respondent answer
differently in the two language versions?
10. Are there any grammatical structures in the source language version of the item that
do not have parallels in the target language?
11. Are there any gender or other references that might make this item be cued in the
target language version? Are there any words in the item that, when translated, change
from having one meaning to having more than one common meaning?
12. Are there any changes in punctuation between the source and target versions of the
item that may make the item easier or harder in the translated version?
Page 9 of 10
Comprehensive Examination
Item 2
13. Have terms in the item in one language been suitably adapted to the cultural
environment of the second language version?
14. Are there cultural differences that would have an effect on the likelihood of a
response being chosen when the item is presented in the source or target language
version?
15. Are the concepts covered in the item at about the same level of abstraction in the two
language versions?
16. Does the concept or construct of the item have about the same familiarity and
meaning in both the source and target language versions?
Page 10 of 10