Papers by Micheline Chalhoub-Deville
Assessing English Language Proficiency in U.S. K–12 Schools, 2020
A discussion of second language testing focuses on the need for collaboration among researchers i... more A discussion of second language testing focuses on the need for collaboration among researchers in second language learning, teaching, and testing concerning development of context-appropriate language tests. It is argued that the nature of the proficiency construct in language is not constant, but that different linguistic, functional, and creative proficiency components are at work in different instructional and social contexts. Inadequacies of traditional and commercial tests for assessing contextualized language skills or determining instructional needs that are found frequently by teacher-researchers are examined. It is proposed that in both teaching and research, the validity of test score interpretation and use will be enhanced by use of tests constructed specifically for the instructional context in question, rather than generic, externally-produced proficiency measures. Broad criteria for construction of such measures are offered. Contains 100 references. (MSE)
Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Res... more Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Research Colloquium, Cambridge and Arnhem Edited by Michael Milanovic and Nick Saville The development of IELTS: A study of the effect of background knowledge on reading comprehension Caroline Clapham Verbal protocol analysis in language testing research: A handbook Alison Green Multilingual glossary of language testing terms Prepared by ALTE members Dictionary of language testing
The ITC International Handbook of Testing and Assessment, 2016
Testing culturally and linguistically diverse populations challenges test developers and test use... more Testing culturally and linguistically diverse populations challenges test developers and test users across the world. In the context of educational and psychological tests, validity and fairness issues associated with developing these tests have increased substantially due to significant changes in the test takers’ demographics. This chapter focuses on four interrelated issues pertaining to language and culture in testing (LCT): test purposes, constructs to be measured, target populations, and test construction and data analysis. All issues are discussed in the light of fairness and validity for test takers from diverse linguistic and cultural backgrounds. Special attention is given to language-minority test takers. The chapter combines research results from different perspectives with a special emphasis on work done in the fields of language assessment and psychometrics. Examples of language and educational content assessments are provided and discussed from different perspectives.
The Modern Language Journal, 1995
Language Learning, 2005
Page 1. Readers&a... more Page 1. Readers' Credits for Volume 55 Language Learning thanks the following people for one or more reviews of manuscripts for Volume 55 (2005): Niclas Abrahamsson Stockholm University Nobuhiko Akamatsu Doshisha University Bruce Anderson University of California, Davis Harald Baayen Max Planck Institute for Psycholinguistics Lyle Bachman University of California, Los Angeles Joe Barcroft Washington University in St. Louis Inge Bartning Stockholm University Robert Bayley University of Texas at San Antonio ...
This study investigated whether different groups of native speakers assess second language learne... more This study investigated whether different groups of native speakers assess second language learners' language skills differently for three elicitation techniques. Subjects were six learners of college-level Arabic as a second language, tape-recorded performing three tasks: participating in a modified oral proficiency interview, narrating a picture depicting a story, and reading a text aloud. The recordings were rated by three groups: 15 native Arabic-speakers teaching in the United States, 31 non-teaching native Arabic-speakers living in the United States, and 36 non-teaching native Arabic-speakers living in Lebanon. Ratings were given both holistically and on a nine-point scale of proficiency. Three response dimensions were assessed specifically: grammar/pronunciation; creativity in presenting information; and amount of detail provided. Results indicated variability of performance across tasks as well as between individuals. In sum, it was found that oral ability, tasks, and raters all affected students' scores. Further analysis of the effects of different tasks and of different raters on assessment of second-language performance is recommended. A 23-item bibliography and analysis data are appended. (MSE)
In Central Europe, education has undergone considerable upheaval since the change of political sy... more In Central Europe, education has undergone considerable upheaval since the change of political systems at the end of the 1980s. One of the most radical is the revision and reform of school-leaving examination systems. From school-and teacher-based subjective ...
... Assessment team at the Center for Advanced Research on Language Acquisition (CARLA) has been ... more ... Assessment team at the Center for Advanced Research on Language Acquisition (CARLA) has been involved for the last two years in evaluating the quality of the existing proficiency tests in French, German, and Spanish (see Chalhoub-Deville, Alcaya, Klein, Lozier, and ...
Newer statistical procedures are typically introduced to help address the limitations of those al... more Newer statistical procedures are typically introduced to help address the limitations of those already in practice or to deal with emerging research needs. Quantile regression (QR) is introduced in this paper as a relatively new methodology, which is intended to overcome some of the limitations of least squares mean regression (LMR). QR is more appropriate when assumptions of normality and homoscedasticity are violated. Also QR has been recommended as a good alternative when the research literature suggests that explorations of the relationship between variables need to move from a focus on average performance, that is, the central tendency, to exploring various locations along the entire distribution. Although QR has long been used in other fields, it has only recently gained popularity in educational statistics. For example, in the ongoing push for accountability and the need to document student improvement, the calculation of student growth percentiles (SGP) utilizes QR to document the amount of growth a student has made. Despite its proven advantages and its utility, QR has not been utilized in areas such as language testing research. This paper seeks to introduce the field to basic QR concepts, procedures, and interpretations. Researchers familiar with LMR will find the comparisons made between the two methodologies helpful to anchor the new information. Finally, an application with real data is employed to demonstrate the various analyses (the code is also appended) and to explicate the interpretations of results.
Non-abstract Style. We will particularly focus on problems that test developers encountered durin... more Non-abstract Style. We will particularly focus on problems that test developers encountered during the creation of prototype tasks, including mistakes in content, inappropriate content, content requiring background knowledge from previous classes, and a paucity of testable content. We will conclude by discussing how the corpus may be used by test development in the future. 19 SYMPOSIA 2. Contributions of Corpus Analysis to Vocabulary Assessment Since the frequency and range of word forms are so easily counted by computer, corpus analysis has obvious potential for the production of modern vocabulary lists to replace such venerable sources as the General Service List and the Teacher's Word Book. However, contemporary vocabulary assessment requires not only general inventories of word forms but also information about the lexical dimension of language use in particular social and educational contexts. This presentation will explore how analysis of corpora may contribute to an expan...
Language Learning & Technology, 2001
The L2 field's first concerted effort in terms of computer-based testing (CBT) emerged in the... more The L2 field's first concerted effort in terms of computer-based testing (CBT) emerged in the mid-80s with the 1985 LTRC. The conference proceedings were published under the title Technology and Language Testing (Stansfield, 1986). The proceedings indicate that several papers presented at the conference dealt with CBT and the application of latent trait models to item-bank construction, item selection, and computer adaptive testing (CAT). The general measurement profession had been working with CBT and, more specifically, with CAT since the early 70s. The first conference on CAT was held in 1975. Perhaps the main reason the L2 field has lagged behind in this area is because it has long promoted performance-based assessment, a form of assessment that does not lend itself as readily to computerized administration as do more traditional test formats. In fact, the second section of the Stansfield volume deals primarily with performance-based assessment. So, whereas general measureme...
s are listed in the following order: Workshops, Plenary, Symposia, Papers, Works in Progress, and... more s are listed in the following order: Workshops, Plenary, Symposia, Papers, Works in Progress, and Posters.
AbStfaCl: tvtany researchers andpractitioners maintain that ACTFI3 efforts to improte instruction... more AbStfaCl: tvtany researchers andpractitioners maintain that ACTFI3 efforts to improte instructional practices and promote proJiciency assessments tied to descrtptors oJ what learners can do in real life have contributed significantly to secondlanguage teaching and testing. Similar endeavors in the area of reseArch, however, are cntically needed. Focusing on the oral proJiciency interview (OPI), this article argues that ACTFLhas a responsibility to its staheholders to initiate a research program that generates a coherent combination of logical and ernpincal evidence to support its OPI interpretations and proctices. The article highlights a number of high-pnonty areasincluding delimiting purposes, examining intewiew discourse, documenting rater/interlocutor behavior, explicating the native speaher cntenon, and inyestigating the OPI's irnpact on language pedagoglt-that shouldbe incorporated into the research agenda.
1. If reading is reader-based, can there be a computer-adaptive test of reading? 2. Developments ... more 1. If reading is reader-based, can there be a computer-adaptive test of reading? 2. Developments in reading research and their implications for computer-adaptive reading assessment 3. Reading constructs and reading assessment 4. Considerations for test reading proficiency via computer-adaptive testing 5. Research and development of a computer-adaptive test of listening comprehension in the less-commonly taught language Hausa 6. The development of an adaptive test for placement in French 7. Computer-adaptive testing: a view from outside 8. From reading theory to testing practice 9. Selected technical issues in the creation of computer-adaptive tests of second language reading proficiency 10. A measurement approach to computer-adaptive testing of reading comprehension 11. The practical utility of Rasch measurement models 12. An overview and some observations on the psychometric models used in computer-adaptive language testing.
Language Assessment Quarterly
The present issue provides a much-needed space to key issues not visible in our discourse in lang... more The present issue provides a much-needed space to key issues not visible in our discourse in language testing and research. The various articles delve into research, policy, test development, and validity considerations for migrants who are increasingly mandated to take language and literacy tests. The papers point to issues of “test misuse,” bias, negative impact, and altogether different test taker populations, which tend to have low literacy in their first languages. While many of the concerns raised in this special issue relate specifically to the testing of language learners with low print literacy, there are lessons here for test development and validation theory across the board. In our commentary, we will focus primarily on issues of validation. This seems to be a critical theme in all the papers included in the present issue. We and the authors in this special issue argue that the language testing community needs to revisit validity theory considering the intricate connections between language testing and migration policies. Validation, as clearly shown in this issues, needs to be co-constructed by key stakeholder groups at the design, development, administration, research, and use levels.
Uploads
Papers by Micheline Chalhoub-Deville