Comprehensive Exam - Test Translation & Validation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Comprehensive Examination

Item 2

The Standards for Educational and Psychological Testing state that mere translation of a test

from one language into another cannot guarantee that the translated version of the test is

comparable in content to the original version of the test, or that the scores produced by the

translated test are equally reliable, precise, and valid as those from the original test (AERA et

al., 2014, p. 60). The Standards also require a detailed documentation of the process of

translation and of “empirical or logical evidence for the validity of test score interpretations

for intended use” (AERA et al., 2014, p. 69). To adapt the Critical Thinking Motivation Scale

for English-speaking respondents (CTMS-E) and to document the psychometric properties of

the new scale, I will follow International Test Commission Guidelines (2017), ITC hereafter.

The pre-condition stage highlights the decisions that have to be made before starting the

translation of the test (ITC, 2017). The first step is to obtain a permission from holders of the

intellectual property rights, Valenzuela, J., Nieto, A. M., and Saiz, C. to adapt and translate

the Critical Thinking Motivation Scale (CTMS). A signed agreement will specify the

modifications in the adapted test that will be acceptable regarding the characteristics of the

original test and will state an owner of the intellectual property rights in the adapted version.

The second step is the foundation of valid cross-cultural comparisons (ITC, 2017, p. 9): I will

have to estimate a sufficient overlap in the definition and content of the construct measured

by CTMS and the item content in the population of American undergraduate students. In

other words, I have to ensure that CTMS-E would measure the same concept as CTMS does

(AERA et al., 2014, p. 68). To that end, I have to ascertain that 1) construct of Motivation

for Critical Thinking (MfCT hereafter) is understood in the same way by the scientists from

both academic cultures: from English-speaking and Spanish-speaking academia, and 2) the

concept of Critical Thinking (CT hereafter) is understood by the targeted population of

American undergraduate students in the same way as their Spanish and Chilian counterparts

understood it when filling out the CTMS. According to Hambleton (2005, p. 7), ensuring

Page 1 of 10
Comprehensive Examination
Item 2

construct equivalence between two cultures is a very subjective task and “involves primarily

judgmental strategies. A researcher must begin by using his or her common sense.” By my

own judgement, the equivalency of both concepts, CT and MfCT, in two academic cultures

clearly exists. Valenzuela et al. (2011) studied an internationally recognized concept of CT

developed by a panel of 46 experts in the humanities, sciences, social sciences, and education

within the US and Canada (Facione, 1990). In following decades, Spanish academics actively

participated in international discourse on MfCT and further development of the concept of

CT (Evans, J. S. B. T., 2006; Noveck, et al., 2007; Saiz, 2002; Saiz & Rivas, 2008,

Valenzuela et al., 2010). Moreover, Valenzuela et al. (2011) developed CTMS based on

expectancy/value motivation model of English-speaking scientists Eccles and Wigfield

(2002). The model holds that the motivation to perform a particular task is the product of the

Expectation that a person has about performing a task adequately and the Value assigned to a

task: the value of a task incorporates four sub-components: attainment, interest, utility, and

cost. Valenzuela et al. (2011) depicted a clear image of international collaboration of French-,

Spanish-, and English-Speaking research teams working on expectancy/value model for

MfCT (Carré & Fenouillet 2008; Mateos et al., 2002; Neuville et al., 2004; Wigfield &

Eccles, 2000). Hence, I assume that CTMS measures the same concepts of CT, expectancy,

value, attainment, interest, utility, and cost that are defined by international academia.

To ascertain that the construct of CT is operationalized in the same way in two cultures, I will

interview 12 bilingual (English-Spanish) undergraduate students via online conferencing. It is

difficult to find bilinguals with equal fluency in two languages. Hence, out of 12, six students

will be English native speakers, and other six will be Spanish native speakers (at least two

males and two females per native language). Because the concept of CT is seldom used in

non-academic context, the main requirement to this 12 and all other bilingual students

involved in my study will be being graduated from high school in one of two languages and

Page 2 of 10
Comprehensive Examination
Item 2

being studying in university in another language. In this way I will exclude those individuals

that speak in one of two languages only in everyday, non-academic context.

In time of interviews (recorded by interviewee’s permission), I will ask 12 students whether

“Critical Thinking” and “Pensamiento Crítico” have similar meaning for them, and how

different they sound if they do. I will ask six Spanish native speakers about Spanish words for

“thinking critically” used in CTMS (reasoning properly, reasoning appropriately, reasoning

critically, using one’s intellectual skills correctly): does this phrase sound completely

synonymous to CT or is there a degree of difference? Which word or phrase would you use in

English to reflect that difference? I will ask all participants to peruse CTMS and to translate

the questions into English orally as close as possible. I will not ask the students about their

understanding of the concepts of motivation and MfCT because these terms are not used in

CTMS but are operationalized by other, frequently used verbs: like to, be prepared to, be

worth of, be important to, feel capable, and be able to learn. Professionally trained translators

will work with those words. After confirming that “Critical Thinking” and “Pensamiento

Crítico” have a substantial overlap of meaning in two languages, I proceed to the next step.

The Test Development stage starts with choosing translators. In an ideal situation, I would

hire four trained translators (two English native speakers and two Spanish native speakers)

with qualifications beyond knowledge of the two languages (Grisay, 2003). The translators

should be close to the target population by demographic characteristics. Hence, they should

be university students (seniors, or graduates), below age of 30, with one male and one female

translator per native language. Translators must be familiar with general principles of testing.

To that end, I will train them using Hambleton and Zenisky’ empirically validated Review

Form, which lists different features of a translated test that should be checked during the

translation process (2010, p. 49). I selected 16 features from the Form, such as “Are there

Page 3 of 10
Comprehensive Examination
Item 2

cultural differences that would have an effect on the likelihood of a response being chosen

when the item is presented in the source or target language version?” and others (Appendix).

I will use a double-translation design and reconciliation by a panel of translators to assure

item equivalence between CTMS and CTMS-E (Hambleton & Patsula, 1999; ITC, 2017).

Two English native-speaking translators will independently translate the CTMS into English

(forward translation). They will have access to the audio records of 12 interviews to assist

with translation. Then, each of two Spanish native speakers will translate one of two English

copies back into Spanish without ever seeing the original CTMS (backward translation),

neither listening the interviews. The English items that after backward translation turn into

their original wording in Spanish are the best candidates for the new scale. To achieve such

effect, the translators should be skilled in functional rather than literal translation, made in

natural and acceptable language (ITC, 2017). Finally, two translators with Spanish as a native

tongue will listen to 12 interviews, and all four translators as a panel will review all items that

didn’t pass testing by “ backward translation” and will improve them (Geisinger, 1994).

If funds allow, I will provide three types of evidence that the test instructions and item

content have near identical meaning in two languages. Firstly, I will send both CTMS and

CTMS-E to the following experts in MfCT: Jorge Valenzuela, Universidad Católica del

Maule, Chile; Carlos Saiz and Ana Maria Nieto, Universidad de Salamanca, Spain; and Peter

Facione, Measured Reasons LLC, California. I will ask the experts to face-value 1) the

equivalence of the construct of CT across two versions (ITC, 2017); 2) how equally each item

relates to and measures the concepts of Expectancy and Value and the subordinate concepts

of attainment, utility, interest, and cost; 3) whether the favorability of each item compares

across two versions; and 4) to give open-ended feedback with suggestions on the wording of

CTMS-E, which will also provide validity evidence based on test content.

Page 4 of 10
Comprehensive Examination
Item 2

Secondly, I will conduct think aloud protocol (Trenor et al, 2011) with 4 bilingual students

that had not been involved in interviews: two Spanish- and two English-speakers (one male

and one female per language), asking them to fill both versions of test simultaneously while

talking aloud about 1) the differences in their cognitive and emotional reactions to two versi-

ons of each item; and 2) understanding well the instructions and response scale, with a focus

on the words describing “degree of agreement or disagreement” in Likert scale: “Does this

word mean completely same thing to you in two languages?” With help of translators, I will

make adjustments to CTMS-E following the suggestions from the experts and think-alouds

about both the item- and content equivalence of the two versions of the test (ITC, 2017).

Finally, I will conduct cognitive interviews with four monolingual students (two males and

two females) about their experience of filling a pilot survey of CTMS-E. This will help to

assess content equivalence and provide a validity evidence based on response processes

(AERA et al., 2014). Via pilot online survey, I will collect the data from 100 English-speaking

monolingual undergraduate students. It will include all 19 items of CTMS-E and 5 items of

the critical thinking subscale of Motivated Strategies for Learning Questionnaire, CT- MSLQ

(Duncan & McKeachie, 2005). Following the example of Valenzuela et al. (2011), I will

obtain validity evidence based on convergent relations to other variables by testing the

correlation between the results of CT- MSLQ and CTMS from the same sample: Motivated

Strategies for CT should be significantly associated with MfCT (García and Pintrich, 1992).

The pilot survey will provide data to assess the psychometric qualities of CTMS-E (ITC,

2017). Via reliability test (Traub & Rowley, 1991), I will estimate internal consistency for

both subscales of value and expectancy and for four sub-scales of value: attainment, utility,

interest, and cost. If translation is successful, and the degree of precision with which the

CTMS-E measures every construct is high, the numbers of Cronbach alpha will be similar to

Valenzuela et al.’s results, between 0.7 and 1. I will also test the discriminatory capacity of

Page 5 of 10
Comprehensive Examination
Item 2

the translated items in every subscale via means of the discrimination index D (Findley,

1956). Sensitive items should be able to distinguish between upper and lower 27% of the

scale (highly-motivated and low-motivated groups). Lastly, via two factorial structure

analyses, I will test two principle components, value and expectancy, in the construct of

MfCT and four principle components of the construct of value: attainment, utility, interest,

and cost. The nineteen items of CTMS-E have to load on principal components in the same

way as Valenzuela et al. report in tables 3 and 4 (2011).

If the cognitive interviews and statistical testing reveal that some items are confusing, or too

“easy” or “hard” in comparison to other items, or showing low or negative discriminating po-

wers, I will review them with translators for correction of possible flaws. Then, CTMS-E will

be ready for the task of assessing critical thinking in U.S. college students. This will be also a

large-scale validation study for CTMS-E. To obtain convergent validity evidence, I will

include the Truth-seeking Sub-scale of California Critical Thinking Skills Test (CCTST) in

the survey. I will run item discriminatory ability, factor analyses, and reliability tests on

survey results. Strong correlation between students’ answers to the items of Truth-seeking

Sub-scale of CCTST and CTMS-E will confirm convergent validity of CTMS-E. Since the

main goal of creating CTMS-E is to assess test-takers in a different language group on a

MfCT construct rather than comparison of test-taker performance across two language

versions of CTMS, careful examination of the validity of CTMS-E is essential. However, for

future use of CTMS-E in cross-cultural analyses, to confirm the construct equivalence of the

two forms is important too (ITC, 2017). To that end, I will carry out a structural equation

modeling comparative online study on 1000 monolingual college students: 500 from Spain

and Chili and 500 from the U.S (Byrne, 2006). I will match two groups by age, gender, socio-

economic status, and college majors. Additionally, insignificant correlation between students’

majors and sub-scale scores of CTMS-E will confirm discriminant validity of CTMS-E.

Page 6 of 10
Comprehensive Examination
Item 2

References

American Educational Research Association, American Psychological Association, National


Council on Measurement in Education, Joint Committee on Standards for Educational
& Psychological Testing (US). (2014). Standards for educational and psychological
testing. American Educational Research Association.
Byrne, B. (2006). Structural equation modeling with EQS: Basic concepts, applications, and
programming (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Publishers.
Carré, P., & Fenouillet, F. (Eds.). (2008). Traité de psychologie de la motivation - Théorie et
Pratiques. Paris: Dunod.
Duncan, T. G., & McKeachie, W. J. (2005). The making of the motivated strategies for
learning questionnaire. Educational Psychologist, 40(2), 117-128.
Eccles, J., & Wigfield, A. (2002). Motivational beliefs, Values and Goals. In S. T. Fiske, D.
L. Schacter & C. Sahn-Waxler (Eds.), Annual Review of Psychology (pp. 109-132).
Palo Alto, CA: Annual Reviews.
Evans, J. S. B. T. (2006). The heuristic-analytic theory of reasoning: Extension and
evaluation. Psychonomic Bulletin & Review 13(3), 378-395.
Facione, P. A. (1990). APA Delphi Research Report, Critical Thinking: A Statement of
Expert Consensus for Purposes of Educational Assessment and Instruction. ERIC
Doc.No.: ED, 315, 423.
Findley, W. G. (1956). A rationale for evaluation of item discrimination statistics.
Educational and Psychological Measurement, 16, 175-180.
Garcia, T., & Pintrich, P. R. (1992). Critical Thinking and Its Relationship to Motivation,
Learning Strategies, and Classroom Experience. Paper presented at the Annual
Meeting of the American Psychological Association. Washington, DC.
Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation
issues influencing the normative interpretation of assessment instruments.
Psychological Assessment, 6, 304-312.
Grisay, A. (2003). Translation procedures in OECD/PISA 2000 international assessment.
Language Testing, 20(2), 225-240.
Hambleton, R. K. (2005). Issues, Designs, and Technical Guidelines for Adapting Tests into
Multiple Languages and Cultures. In R. K. Hambleton, P. F. Merenda, & C. D.
Spielberger (Eds.), Adapting Educational and Psychological Tests for Cross-Cultural
Assessment, 3-39. Psychology Press.
Hambleton, R. K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be
avoided and guidelines for improving test adaptation practices. Applied Testing
Technology, 1(1), 1-16.

Page 7 of 10
Comprehensive Examination
Item 2

Hambleton, R. K., & Zenisky, A. (2010). Translating and adapting tests for cross-cultural
assessment. In D. Matsumoto & F. van de Vijver (Eds.), Cross-cultural research
methods (pp. 46-74). New York, NY; Cambridge University Press.
International Test Commission. (2017). The ITC Guidelines for Translating and Adapting
Tests (Second edition). [www.InTestCom.org]
Mateos, M., Palmero, F., Fernández-Abascal, E., Martínez, F., & Choliz, M. (2002). Teorías
Motivacionales Psicología de la de la Motivación y la Emoción (pp. 155-186).
Madrid: Mc Graw Hill.
Neuville, S., Bourgeois, É., & Frenay, M. (2004). The subjective task value: clarification of a
construct. In S. Neuville (Ed.), La perception de la valeur des activités
d'apprentissage : étude des déterminants et effets. Louvain la neuve: Unpublished
Doctoral Thesis. Université Catholique de Louvain.
Noveck, I., Mercier, H., Rossi, S., & Van der Henst, J. B. (2007). Psichologie cognitive du
raisonnement. In S. Rossi & J. B. Van der Henst (Eds.), Psychologies du
raisonnement. Bruxelles: de Boeck.
Saiz, C. (Ed.). (2002). Pensamiento crítico: conceptos básicos y actividades prácticas.
Madrid: Pirámide.
Saiz, C., & Rivas, S. (2008). Intervenir para transferir en Pensamiento Crítico. Revista
Praxis, 10(13), 129-149. Standards for educational and psychological testing .
(2014). American Educational Research Association.
Traub, R. E., & Rowley, G. L. (1991). Understanding reliability. Educational measurement:
Issues and practice, 10(1), 37-45.
Trenor, J. M., Miller, M. K., & Gipson, K. G. (2011). Utilization of a think-aloud protocol to
cognitively validate a survey instrument identifying social capital resources of
engineering undergraduates. (Report No. AC 2011-925). Clemson University:
Clemson, SC.
Valenzuela, J., Nieto, A., & Saiz, C. (2010). Percepción del coste de utilización Pensamiento
crítico en universitarios chilenos y españoles. Electronic Journal of Research in
Educational Psychology 8(2), 689-706.
Valenzuela, J., Nieto, A. M., & Saiz, C. (2011). Critical thinking motivational scale: A
contribution to the study of relationship between critical thinking and motivation.
Electronic Journal of Research in Educational Psychology, 9(2), 823-848.
Wigfield, A., & Eccles, J. (2000). Expectancy-Value Theory of Achievement Motivation.
Contemporary educational psychology, 25(1), 68-81.

Page 8 of 10
Comprehensive Examination
Item 2

Appendix

Translation and Adaptation Review Form

Adapted from Hambleton, R. K., & Zenisky, A. (2010)

General Translation Questions

1. Does the item have the same or highly similar meaning in the two languages?
2. Is the language of the translated item of comparable difficulty and commonality with
respect to the words in the item in the source language version?
3. Does the translation introduce changes in the text (omissions, substitutions, or
additions) that might make the respondent answer this item differently in the two
language versions?
4. Are there differences between the target and source language versions of the item
related to the use of metaphors, idioms, or colloquialisms?

Item Format and Appearance

5. Is the item format, including physical layout, the same in the two language versions?
6. Is the length of the item about the same in the two language versions?
7. Will the format of the item and task required of the examinee be equally familiar in the
two language versions?

Grammar and Phrasing

8. Is there any modification of the item’s structure such as the placement of clauses or
other word order changes that might make this item more or less complex in the target
language version?
9. Are there any grammatical clues in this item that might make the respondent answer
differently in the two language versions?
10. Are there any grammatical structures in the source language version of the item that
do not have parallels in the target language?
11. Are there any gender or other references that might make this item be cued in the
target language version? Are there any words in the item that, when translated, change
from having one meaning to having more than one common meaning?
12. Are there any changes in punctuation between the source and target versions of the
item that may make the item easier or harder in the translated version?

Page 9 of 10
Comprehensive Examination
Item 2

Cultural Relevance and Specificity

13. Have terms in the item in one language been suitably adapted to the cultural
environment of the second language version?
14. Are there cultural differences that would have an effect on the likelihood of a
response being chosen when the item is presented in the source or target language
version?
15. Are the concepts covered in the item at about the same level of abstraction in the two
language versions?
16. Does the concept or construct of the item have about the same familiarity and
meaning in both the source and target language versions?

Page 10 of 10

You might also like