I am an applied linguist, Professor Emeritus at The University of Melbourne and a former President of the American Association for Applied Linguistics. My main research areas are language assessment and language and identity.
All educational testing is intended to have consequences, which are assumed to be beneficial, but... more All educational testing is intended to have consequences, which are assumed to be beneficial, but tests may also have unintended, negative consequences (Messick, 1989). The issue is particularly important in the case of large-scale standardised tests, such as Australia's National Assessment Program–Literacy and Numeracy (NAPLAN), the intended benefits of which are increased accountability and improved educational outcomes. The NAPLAN purpose is comparable to that of other state and national 'core skills' testing programs which evaluate cross-sections of populations in order to compare results between population sub-groupings. Such comparisons underpin 'accountability' in the era of population-level testing. This study investigates the impact of NAPLAN testing on one population grouping that is prominent in the NAPLAN results comparisons and public reporting: children in remote Indigenous communities. A series of interviews with principals and teachers documents informants' first-hand experiences of the use and effects of NAPLAN in schools. In the views of most participants, the language and content of the test instruments, the nature of the test engagement and the test washback have negative impacts on students and staff, with little benefit in terms of the usefulness of the test data. The primary issue is the fact that meaningful participation in the tests depends critically on proficiency in Standard Australian English (SAE) as a first language. This study contributes to the broader discussion of how reform-targeted standardised testing for national populations affects subgroups who are not treated equitably by the test instrument or reporting for accountability purposes. It highlights a conflict between consequential validity and the notion of accountability which drives reform-targeted testing.
This report documents two coordinated exploratory studies into the nature of oral English-foracad... more This report documents two coordinated exploratory studies into the nature of oral English-foracademic-purposes (EAP) proficiency. Study I used verbal-report methodology to examine field experts' rating orientations, and Study II investigated the quality of test-taker discourse on two different Test of English as a Foreign Language™ (TOEFL ® ) task types (independent and integrated) at different levels of proficiency. Study I showed that, with no guidance, domain experts distinguished and described qualitatively different performances using a common set of criteria very similar to those included in draft rating scales developed for the tasks at ETS. Study II provided empirical support for the criteria applied by the judges. The findings indicate that raters take a range of performance features into account within each conceptual category and that holistic ratings are driven by all of the assessment categories rather than, as has been suggested in earlier studies, predominantly by grammar.
Page 271. The distinctiveness of Applied Linguistics in Australia A historical perspective Tim Mc... more Page 271. The distinctiveness of Applied Linguistics in Australia A historical perspective Tim McNamara and Joseph Lo Bianco University of Melbourne/Language Australia Introduction In this paper an attempt is made to identify ...
... 129 Mila Schwartz Family language policy: Core issues of an emerging field 171 Aurora Donzell... more ... 129 Mila Schwartz Family language policy: Core issues of an emerging field 171 Aurora Donzelli Is ergativity always a marker of agency? Toraja and Samoan grammar of action and the contribution of emancipatory pragmatics to social theory 193 Page 2. viii Contents ...
Cambridge Journals Online (CJO) is the e-publishing service for over 230 journals published by Ca... more Cambridge Journals Online (CJO) is the e-publishing service for over 230 journals published by Cambridge University Press and is entirely developed and hosted in-house. The platform's powerful capacity and reliable performance are maintained by a combination of our own expertise ...
a priori: an artifical language composed entirely of invented elements. aboriginal: one indigenous... more a priori: an artifical language composed entirely of invented elements. aboriginal: one indigenous to a country, whose ancestors have lived there during recorded history. accent: features of pronunciation that identify where a person is from, regionally or socially. ...
The use of common tasks and rating procedures when assessing the communicative skills of students... more The use of common tasks and rating procedures when assessing the communicative skills of students from highly diverse linguistic and cultural backgrounds poses particular measurement challenges, which have thus far received little research attention. If assessment tasks or criteria are found to function differentially for particular subpopulations within a test candidature with the same or a similar level of criterion ability, then the test is open to charges of bias in favour of one or other group. While there have been numerous studies involving dichotomous language test items (see e.g. Chen and Henning, 1985 and more recently Elder, 1996) few studies have considered the issue of bias in relation to performance based tasks which are assessed subjectively, via analytic and holistic rating scales. The paper demonstrates how Rasch analytic procedures can be applied to the investigation of item bias or differential item functioning (DIF) in both dichotomous and scalar items on a test of English for academic purposes. The data were gathered from a pilot English language test administered to a representative sample of undergraduate students (N= 139) enrolled in their first year of study at an English-medium university. The sample included native speakers of English who had completed up to 12 years of secondary schooling in their first language (L1) and immigrant students, mainly from Asian language backgrounds, with varying degrees of prior English language instruction and exposure. The purpose of the test was to diagnose the academic English needs of incoming undergraduates so that additional support could be offered to those deemed at risk of failure in their university study. Some of the tasks included in the assessment procedure involved objectively-scored items (measuring vocabulary knowledge, text-editing skills and reading and listening comprehension) whereas others (i.e. a report and an argumentative writing task) were subjectively-scored. The study models a methodology for estimating bias with both dichotomous and scalar items using the programs Quest (Adams and Khoo, 1993) for the former and ConQuest (Wu, Adams and Wilson, 1998) for the latter. It also offers answers to the practical questions of whether a common set of assessment criteria can, in an academic context such as this one, be meaningfully applied to all subgroups within the candidature and whether analytic criteria are more susceptible to biased ratings than holistic ones. Implications for test fairness and test validity are discussed.
All educational testing is intended to have consequences, which are assumed to be beneficial, but... more All educational testing is intended to have consequences, which are assumed to be beneficial, but tests may also have unintended, negative consequences (Messick, 1989). The issue is particularly important in the case of large-scale standardised tests, such as Australia's National Assessment Program–Literacy and Numeracy (NAPLAN), the intended benefits of which are increased accountability and improved educational outcomes. The NAPLAN purpose is comparable to that of other state and national 'core skills' testing programs which evaluate cross-sections of populations in order to compare results between population sub-groupings. Such comparisons underpin 'accountability' in the era of population-level testing. This study investigates the impact of NAPLAN testing on one population grouping that is prominent in the NAPLAN results comparisons and public reporting: children in remote Indigenous communities. A series of interviews with principals and teachers documents informants' first-hand experiences of the use and effects of NAPLAN in schools. In the views of most participants, the language and content of the test instruments, the nature of the test engagement and the test washback have negative impacts on students and staff, with little benefit in terms of the usefulness of the test data. The primary issue is the fact that meaningful participation in the tests depends critically on proficiency in Standard Australian English (SAE) as a first language. This study contributes to the broader discussion of how reform-targeted standardised testing for national populations affects subgroups who are not treated equitably by the test instrument or reporting for accountability purposes. It highlights a conflict between consequential validity and the notion of accountability which drives reform-targeted testing.
This report documents two coordinated exploratory studies into the nature of oral English-foracad... more This report documents two coordinated exploratory studies into the nature of oral English-foracademic-purposes (EAP) proficiency. Study I used verbal-report methodology to examine field experts' rating orientations, and Study II investigated the quality of test-taker discourse on two different Test of English as a Foreign Language™ (TOEFL ® ) task types (independent and integrated) at different levels of proficiency. Study I showed that, with no guidance, domain experts distinguished and described qualitatively different performances using a common set of criteria very similar to those included in draft rating scales developed for the tasks at ETS. Study II provided empirical support for the criteria applied by the judges. The findings indicate that raters take a range of performance features into account within each conceptual category and that holistic ratings are driven by all of the assessment categories rather than, as has been suggested in earlier studies, predominantly by grammar.
Page 271. The distinctiveness of Applied Linguistics in Australia A historical perspective Tim Mc... more Page 271. The distinctiveness of Applied Linguistics in Australia A historical perspective Tim McNamara and Joseph Lo Bianco University of Melbourne/Language Australia Introduction In this paper an attempt is made to identify ...
... 129 Mila Schwartz Family language policy: Core issues of an emerging field 171 Aurora Donzell... more ... 129 Mila Schwartz Family language policy: Core issues of an emerging field 171 Aurora Donzelli Is ergativity always a marker of agency? Toraja and Samoan grammar of action and the contribution of emancipatory pragmatics to social theory 193 Page 2. viii Contents ...
Cambridge Journals Online (CJO) is the e-publishing service for over 230 journals published by Ca... more Cambridge Journals Online (CJO) is the e-publishing service for over 230 journals published by Cambridge University Press and is entirely developed and hosted in-house. The platform's powerful capacity and reliable performance are maintained by a combination of our own expertise ...
a priori: an artifical language composed entirely of invented elements. aboriginal: one indigenous... more a priori: an artifical language composed entirely of invented elements. aboriginal: one indigenous to a country, whose ancestors have lived there during recorded history. accent: features of pronunciation that identify where a person is from, regionally or socially. ...
The use of common tasks and rating procedures when assessing the communicative skills of students... more The use of common tasks and rating procedures when assessing the communicative skills of students from highly diverse linguistic and cultural backgrounds poses particular measurement challenges, which have thus far received little research attention. If assessment tasks or criteria are found to function differentially for particular subpopulations within a test candidature with the same or a similar level of criterion ability, then the test is open to charges of bias in favour of one or other group. While there have been numerous studies involving dichotomous language test items (see e.g. Chen and Henning, 1985 and more recently Elder, 1996) few studies have considered the issue of bias in relation to performance based tasks which are assessed subjectively, via analytic and holistic rating scales. The paper demonstrates how Rasch analytic procedures can be applied to the investigation of item bias or differential item functioning (DIF) in both dichotomous and scalar items on a test of English for academic purposes. The data were gathered from a pilot English language test administered to a representative sample of undergraduate students (N= 139) enrolled in their first year of study at an English-medium university. The sample included native speakers of English who had completed up to 12 years of secondary schooling in their first language (L1) and immigrant students, mainly from Asian language backgrounds, with varying degrees of prior English language instruction and exposure. The purpose of the test was to diagnose the academic English needs of incoming undergraduates so that additional support could be offered to those deemed at risk of failure in their university study. Some of the tasks included in the assessment procedure involved objectively-scored items (measuring vocabulary knowledge, text-editing skills and reading and listening comprehension) whereas others (i.e. a report and an argumentative writing task) were subjectively-scored. The study models a methodology for estimating bias with both dichotomous and scalar items using the programs Quest (Adams and Khoo, 1993) for the former and ConQuest (Wu, Adams and Wilson, 1998) for the latter. It also offers answers to the practical questions of whether a common set of assessment criteria can, in an academic context such as this one, be meaningfully applied to all subgroups within the candidature and whether analytic criteria are more susceptible to biased ratings than holistic ones. Implications for test fairness and test validity are discussed.
All educational testing is intended to have consequences, which are assumed to be beneficial, but... more All educational testing is intended to have consequences, which are assumed to be beneficial, but tests may also have unintended, negative consequences . The issue is particularly important in the case of large-scale standardized tests, such as Australia's National Assessment Program -Literacy and Numeracy (NAPLAN), the intended benefits of which are increased accountability and improved educational outcomes. The NAPLAN purpose is comparable to that of other state and national 'core skills' testing programs, which evaluate cross-sections of populations in order to compare results between population sub-groupings. Such comparisons underpin 'accountability' in the era of population-level testing. This study investigates the impact of NAPLAN testing on one population grouping that is prominent in the NAPLAN results' comparisons and public reporting: children in remote Indigenous communities. A series of interviews with principals and teachers documents informants' first-hand experiences of the use and effects of NAPLAN in schools. In the views of most participants, the language and content of the test instruments, the nature of the test engagement, and the test washback have negative impacts on students and staff, with little benefit in terms of the usefulness of the test data. The primary issue is the fact that meaningful participation in the tests depends critically on proficiency in Standard Australian English (SAE) as a first language. This study contributes to the broader discussion of how reform-targeted standardized testing for national populations affects subgroups who are not treated equitably by the test instrument or reporting for accountability purposes. It highlights a conflict between consequential validity and the notion of accountability that drives reform-targeted testing.
Uploads
Papers by Tim McNamara