International papers with Impact factor by Stuart McLean
Research Methods in Applied Lingustics, 2023
Meaning-recognition and meaning-recall are two commonly-used test modalities to assess second lan... more Meaning-recognition and meaning-recall are two commonly-used test modalities to assess second language vocabulary knowledge for the purpose of reading. Although considerable variation in item format exists within each modality, previous research has examined this variation almost
exclusively among meaning-recognition item types. This article reports on two exploratory studies, each comparing a fully-contextualized and a non-contextualized meaning-recall variant for one specific testing purpose: coverage-comprehension research. The fully-contextualized test
utilized the same 622-word passage in each study. In the non-contextualized tests, target words appeared in short, non-defining sentences; in Study A, the elicited response was a translation of only the target item while in Study B, it was the entire prompt sentence. Scores on the compared tests differed significantly only in Study A. In both studies, the consistency with which the
compared item formats yielded the same outcome (correct or incorrect) when the same target word was encountered by the same learner was rather low. The provision of relatively authentic context sometimes seemed to aid lexical inferencing, but other times it increased task difficulty relative to the limited-context formats. These findings suggest that different meaning-recall formats could lead to different conclusions regarding knowledge of specific words, and this could impact coverage-comprehension research findings.
Modern Language Journal, 2023
In second language (L2) research, the lexical unit is often defined as a base word plus inflectio... more In second language (L2) research, the lexical unit is often defined as a base word plus inflectional and derivational forms through Level 6 of Bauer and Nation’s framework (WF6). WF6 use has been justified by the assumption that once a form is known, recognition of other WF6 members requires little extra effort. A more lenient view holds that an incomplete understanding of derivational forms is permissible if words containing the most frequent derivational affixes are known. This study assessed the validity of these views for L2 listening. Participants (N = 120) provided translations of 27 base words and 43 related affixational forms when listening. When participants knew one form (either the base word or an affixed form) they also knew the other just 25.1% of the time. For target words containing the most frequent derivational affixes, this was just 26.5%. Logistic regression found that learners’ overall vocabulary level, several aspects of word frequency, and base word knowledge were all significant predictors of knowing affixed forms. However, when other variables were held constant, base word knowledge was a weak predictor of affixational form knowledge. These findings support neither the strict assumption nor the more lenient view of WF6 use for L2 listening among study participants.
Language Testing, 2023
The purpose of this paper is to (a) establish whether meaning recall and meaning recognition item... more The purpose of this paper is to (a) establish whether meaning recall and meaning recognition item formats test psychometrically distinct constructs of vocabulary knowledge which measure separate skills, and, if so, (b) determine whether each construct possesses unique properties predictive of L2 reading proficiency. Factor analyses and hierarchical regression were conducted on results derived from the two vocabulary item formats in order to test this hypothesis. The results indicated that although the two-factor model had better fit and meaning recall and meaning recognition can be considered distinct psychometrically, discriminant validity between the two factors is questionable. In hierarchical regression models, meaning recognition knowledge did not make a statistically significant contribution to explaining reading proficiency over meaning recall knowledge. However, when the roles were reversed, meaning recall did make a significant contribution to the model beyond the variance explained by meaning recognition alone. The results suggest that meaning recognition does not tap into unique aspects of vocabulary knowledge and provide empirical support for meaning recall as a superior predictor of reading proficiency for research purposes.
Studies in Second Language Acquisition, 2023
Proper nouns constitute a lexical class with special properties and are thus treated differently ... more Proper nouns constitute a lexical class with special properties and are thus treated differently from other words by second language acquisition researchers. An assumption exists that even low-proficiency learners will find them unproblematic, yet research suggests this assumption might be misplaced. The present study involved two self-paced reading experiments designed to investigate proper nouns’ influence on Japanese university students’ reading fluency. In Experiment 1, participants were presented with 60 decontextualized sentences containing 30 proper nouns and 30 common nouns to determine whether they are processed in a similar manner. In Experiment 2, participants read another 60 sentences comprising a book chapter to explore the effects of repeated exposure to a set of proper nouns. The results indicated that proper nouns are processed in a similar manner to common nouns in terms of disrupting reading fluency. The implications for language learning pedagogy, in particular extensive reading, are discussed.
TESOL Quarterly , 2023
Aural lexical knowledge (ALK) is crucial for second language (L2) listening. Despite its importan... more Aural lexical knowledge (ALK) is crucial for second language (L2) listening. Despite its importance, there is scant research that has validly explored the relationship between ALK and L2 listening across different English as a Foreign Language (EFL) contexts. In an effort to broaden this research base, the current study closely replicates a previous study, Cheng et al. (2022), which measured single-word ALK, phrasal verb ALK and L2 listening comprehension among participants with Chinese as a first language (L1). The current study administered the same instruments but did so among 147 Japanese and 131 Arabic-speaking English language learners. Results indicated that the capacity of ALK to predict variance in L2 listening for the Japanese group (R2 = .38) was similar to that observed in the original study (R2 = .42). However, the results for the Arabic-speaking group were very different to that of the original study and showed an unexpectedly strong relationship between ALK and L2 listening (R2 = .92). Future research directions and pedagogical implications are discussed.
Aural single‐word and aural phrasal verb knowledge and their relationships to L2 listening comprehension., 2022
This study quantifies second language (L2) knowledge of aural single words and aural phrasal verb... more This study quantifies second language (L2) knowledge of aural single words and aural phrasal verbs (PVs) and investigates their relationship with L2 listening comprehension. An aural first language (L1) meaning recall test format was used to measure knowledge of 81 single-word and 81 PV target items (with equivalent frequencies of occurrence) among 224 Chinese tertiary-level learners of English as a Foreign Language (EFL). Participants’ L2 listening was measured with a version of the Test of English for International Communication (TOEIC). Participants’ aural single-word and aural PV knowledge were compared, and their relationship with L2 listening were examined using correlation and multiple regression analysis. These analyses also included comparison between participants of relatively high (Independent Users) and relatively low (Basic Users) L2 listening proficiency. Although regression modelling showed that single-word test scores were most predictive of L2 listening comprehension, it also showed that PV test scores made a substantial contribution to the model’s predictive capacity. In combination, single-word and PV test scores could predict 42.7% of the variance observed in the listening scores. The theoretical and practical implications of these results are discussed.
System, 2022
While word-frequency lists have been commonly used as indexes of word usefulness, their role as a... more While word-frequency lists have been commonly used as indexes of word usefulness, their role as a proxy for learner word knowledge is unclear. Word knowledge in a structured sample (N = 625) of Japanese university-level EFL learners, operationalized using dichotomous Rasch modeling of test-item data, was used as an external reference criterion to investigate two issues germane to the development of word lists representing learner knowledge in EFL contexts: 1) the definition of word and 2) the choice of reference corpus. On the former, corpus-derived, word-frequency lists based on either word orthographic forms, flemmas, or word families were generated from 18 different corpora. Word-frequency lists using flemma-based word groupings resulted in higher correlations with learner population word knowledge as compared with those using word-family-based groupings across all 18 sets of word lists tested. On the latter, lists derived from corpora of spontaneous speech, fictional TV/movies for younger viewers, and narrative written texts consistently showed higher correlations with word knowledge than those derived from non-conversational speech, or any non-fiction written text genre. These results suggest that mega-corpora compiled from conveniently available electronic written texts may not be ideal as scales for diagnostic vocabulary testing or as indexes used in readability formulae.
Studies in Second Language Acquisition, 2021
In this focused methodological synthesis, the sample construction procedures of 110 second langua... more In this focused methodological synthesis, the sample construction procedures of 110 second language (L2) instructed vocabulary interventions were assessed in relation to effect size–driven sample-size planning, randomization, and multisite usage. These three areas were investigated because inferential testing makes better generalizations when researchers consider them during the sample construction process. Only nine reports used effect sizes to plan or justify sample sizes in any fashion, with only one engaging in an a priori power procedure referencing vocabulary-centric effect sizes from previous research. Randomized assignment was observed in 56% of the reports while no report involved randomized sampling. Approximately 15% of the samples observed were constructed from multiple sites and none of these empirically investigated the effect of site clustering. Leveraging the synthesized findings, we conclude by offering suggestions for future L2 instructed vocabulary researchers to consider a priori effect size–driven sample planning processes, randomization, and multisite usage when constructing samples.
Studies in Second Language Acquisition, 2021
In response to our State-of-the-Scholarship critical commentary (Stoeckel et al., 2021), Stuart W... more In response to our State-of-the-Scholarship critical commentary (Stoeckel et al., 2021), Stuart Webb (2021) asserts that there is no research supporting our suggestions for improving tests of written receptive vocabulary knowledge by (a) using meaning-recall items, (b) making fewer presumptions about learner knowledge of word families, and (c) using appropriate test lengths. As we will show, this is not the case.
Language Testing , 2020
The last three decades have seen an increase of tests aimed at measuring an individual’s vocabula... more The last three decades have seen an increase of tests aimed at measuring an individual’s vocabulary level or size. The target words used in these tests are typically sampled from word frequency lists, which are in turn based on language corpora. Conventionally, test developers sample items from frequency bands of 1000 words; different tests employ different sampling ratios. Some have as few as 5 or 10 items representing the underlying population of words, whereas other tests feature a larger number of items, such as 24, 30, or 40. However, very rarely are the sampling size choices supported by clear empirical evidence. Here, using a bootstrapping approach, we illustrate the effect that a sample-size increase has on confidence intervals of individual learner vocabulary knowledge estimates, and on the inferences that can safely be made from test scores. We draw on a unique dataset consisting of adult L1 Japanese test takers’ performance on two English vocabulary test formats, each featuring 1000 words. Our analysis shows that there are few purposes and settings where as few as 5 to 10 sampled items from a 1000-word frequency band (1K) are sufficient. The use of 30 or more items per 1000-word frequency band and tests consisting of fewer bands is recommended.
Applied Linguistics, 2020
The choice of lexical unit is a significant issue in L2 vocabulary research and pedagogy. This br... more The choice of lexical unit is a significant issue in L2 vocabulary research and pedagogy. This brief review examines two important questions bearing on this issue: (i) How encompassing a lexical unit can learners deal with receptively? and (ii) How much difference does the choice of lexical unit make in practice? Regarding the former, empirical evidence from studies with L2-English learners shows that the broad ‘word family’ unit, requiring considerable knowledge of affixes and the ability to apply this knowledge, cannot be supported. Regarding the latter, estimates of the proportion of English text consisting of derivational forms vary due to differences in approach and text type examined. However, even the smallest estimate is of a magnitude sufficient to have a meaningful impact on text comprehension. Accordingly, this review suggests that the most appropriate lexical unit may be the lemma or flemma. This conclusion has major implications for L2 vocabulary research, with regards to vocabulary testing and estimates of learning needs, and for L2 vocabulary pedagogy, in respect of curriculum planning and the use of word lists.
Studies in Second Language Acquisition, 2020
Two commonly used test types to assess vocabulary knowledge for the purpose of reading are size a... more Two commonly used test types to assess vocabulary knowledge for the purpose of reading are size and levels tests. This article first reviews several frequently stated purposes of such tests (e.g., materials selection, tracking vocabulary growth) and provides a reasoned argument for the precision needed to serve such purposes. Then three sources of inaccuracy in existing tests are examined: the overestimation of lexical knowledge from guessing or use of test strategies under meaning-recognition item formats; the overestimation of vocabulary knowledge when receptive understanding of all word family members is assumed from a correct response to an item assessing knowledge of just one family member; and the limited precision that a small, random sample of target words has in representing the population of words from which it is drawn. The paper concludes that existing tests lack the accuracy needed for many specified testing purposes and discusses possible improvements going forward.
Language Testing, 2020
Vocabulary’s relationship to reading proficiency is frequently cited as a justification for the
a... more Vocabulary’s relationship to reading proficiency is frequently cited as a justification for the
assessment of L2 written receptive vocabulary knowledge. However, to date, there has been
relatively little research regarding which modalities of vocabulary knowledge have the strongest
correlations to reading proficiency, and observed differences have often been statistically
non-significant. The present research employs a bootstrapping approach to reach a clearer
understanding of relationships between various modalities of vocabulary knowledge to reading
proficiency. Test-takers (N = 103) answered 1000 vocabulary test items spanning the third 1000
most frequent English words in the New General Service List corpus (Browne, Culligan, & Phillips,
2013). Items were answered under four modalities: Yes/No checklists, form recall, meaning recall,
and meaning recognition. These pools of test items were then sampled with replacement to create
1000 simulated tests ranging in length from five to 200 items and the results were correlated to
the Test of English for International Communication (TOEIC®) Reading scores. For all examined
test lengths, meaning-recall vocabulary tests had the highest average correlations to reading
proficiency, followed by form-recall vocabulary tests. The results indicated that tests of vocabulary
recall are stronger predictors of reading proficiency than tests of vocabulary recognition, despite
the theoretically closer relationship of vocabulary recognition to reading.
Stoeckel, T., Stewart, J., McLean, S., Ishii, T., Kramer, B., & Matsumoto, Y. (2019). The relationship of four variants of the Vocabulary Size Test to a criterion measure of meaning recall vocabulary knowledge. System. Advance online publication. https://doi.org/10.1016/j.system.2019.102161 System, 2019
(This paper can be accessed until December 13, 2019 at https://authors.elsevier.com/a/1ZyJQ,7tt9x... more (This paper can be accessed until December 13, 2019 at https://authors.elsevier.com/a/1ZyJQ,7tt9xxGe.)
The Vocabulary Size Test (VST) was designed to measure the vocabulary needed for reading. Recent research, however, has questioned the “meaning-recognition” construct measured by the VST, arguing that “meaning-recall” is a more accurate estimate of reading vocabulary. The present study compared four variants of the VST to determine which, if any, could be used as an expedient proxy for estimating meaning-recall knowledge. Two hundred Japanese university students completed a criterion meaning-recall measure of VST target words and one of four randomly-assigned VST variants: monolingual, mono-lingual with an “I don’t know” option (IDK), bilingual, or bilingual with IDK. The bilingual+IDK variant (r = .77) had a significantly lower correlation with the meaning-recall measure than the other three versions (r = .88 to .91). The lower r-value for the bilingual+IDK version appears to have been caused by pronounced differences in IDK use among learners who sat that version of the test. The study concludes that other variants could effectively be used to rank or group learners by meaning-recall knowledge. However, for estimates of reading vocabulary size, measures of meaning-recall should be used, or raw VST scores need to be adjusted to account for differences between VST and meaning-recall scores.
Applied Linguistics, 2018
An important gap in the field of second language vocabulary research concerns the ability of Asia... more An important gap in the field of second language vocabulary research concerns the ability of Asian learners of English as a Foreign Language (EFL) to comprehend inflectional and derivational word family members. Japanese EFL learners (N = 279) were divided into three lexical proficiency groups, and their ability to comprehend inflectional and derivational English forms was measured with an English to Japanese translation test. A significant difference among the participants' ability to comprehend 12 base forms, associated inflected forms, and associated derived forms was found across the three proficiency groups, and even among participants who demonstrated mastery of the first 4,000 or 5,000 base forms of English. The flemma, a word's base form and associated inflectional forms, was found to be an appropriate word counting unit for most participants. Results are important because corpus research findings demonstrate that in cases where the word family provides 98 per cent coverage of texts, the flemma only provides 85 per cent coverage of the same texts. Thus, considering the detrimental impact to reading comprehension from only small decreases in the percentage of known tokens within a text, the results question the inferences made in word family-based research.
System, 2017
Few studies have examined the development of foreign language learners’ reading rates through ext... more Few studies have examined the development of foreign language learners’ reading rates through extensive reading. The previous studies conducted have methodological limitations with regards to their research design or interpretation of results. To address these limitations, this study investigated the impact of extensive reading and grammar-translation on reading rate development using an experimental research design with evidence that time spent conducting the respective treatments was similar. First-year Japanese university students (N = 50) were randomly assigned to one of two treatment groups. To measure reading rate improvements over an academic year, pre- and post-treatment reading rate measurements were used where comprehension was maintained above 70%. The between-groups analysis revealed that the extensive reading group participants (n = 23) increased their reading rate significantly relative to the grammar-translation group participants (n = 27). This study provides evidence of both the effectiveness and efficiency of developing reading rates through extensive reading relative to traditional reading instruction with grammar-translation exercises. Pedagogical implications include allocating more time for extensive reading and questioning the value of the grammar-translation approach. In addressing the call for stronger evidence than quasi-experimental studies, this research demonstrates that classroom-based experimental reading studies which control for time-on-task are feasible.
Language Assessment Quarterly, 2017
Stewart questioned vocabulary size estimation methods proposed by Beglar and Nation for the Vocab... more Stewart questioned vocabulary size estimation methods proposed by Beglar and Nation for the Vocabulary Size Test, further arguing Rasch mean square (MSQ) fit statistics cannot determine the proportion of random guesses contained in the average learner’s raw score, because the average value will be near 1 by design. He illustrated this by demonstrating this is true even of entirely random data. Holster and Lake appear to misinterpret this as a claim that Rasch analyses cannot distinguish random data from real responses. To test this, they compare real data to random and note that, predictably, the statistic easily distinguishes the two and that reliability for random data is near zero. However, while certainly true, this fact is not relevant to Stewart’s argument that multiple-choice options inflate the test’s size estimates and that MSQ fit statistics cannot be used to detect this. We further illustrate this by showing real data retains average MSQ values near 1, even when unknown items skipped by learners are imputed with random guesses. Furthermore, the imputed data do not exhibit “problematic guessing” under Holster & Lake’s own criteria, despite size inflation under Beglar and Nation’s suggested scoring. We conclude by discussing uses of the 3PL model.
TESOL Quarterly, 2016
The Vocabulary Size Test (VST) (Nation & Beglar, 2007) has attracted considerable attention in st... more The Vocabulary Size Test (VST) (Nation & Beglar, 2007) has attracted considerable attention in studies of second language acquisition. Designed to estimate the overall written receptive vocabulary size of English language learners, the test was created for a variety of pedagogical purposes—to guide syllabus design, inform decisions regarding extensive reading and vocabulary instruction, and monitor lexical growth over time (Beglar, 2010; Nation, 2012).
Because VST scores can be inflated by blindly guessing unknown words under its multiple‐choice format (Stewart, 2014), the inclusion of an I don't know (IDK) answer choice has been explored as a means of achieving more accurate estimates of vocabulary size. Using this convention, researchers have reported both reduced guessing and improved estimates of reliability (Lucovich, 2014; Zhang, 2013). It has also been observed, however, that when faced with unknown words some learners are more likely than others to use IDK (Bennett & Stoeckel, 2012; Zhang, 2013). This means that examinees with the same vocabulary knowledge could achieve significantly different test scores, rendering the instrument ineffectual for many of the purposes listed above. Although use of IDK improves reliability, it is unclear whether the resultant scores correlate to actual vocabulary knowledge as well as scores without IDK, as this has gone unreported in previous studies. To address this gap, the goals of the present study are to systematically explore the relationships between actual vocabulary knowledge, test scores, and estimates of reliability for the VST with and without the IDK answer choice.
Language Assessment Quarterly , 2016
The article investigated how the inclusion of loanwords in vocabulary size tests affected the tes... more The article investigated how the inclusion of loanwords in vocabulary size tests affected the test scores of two L1 groups of EFL learners: Hebrew and Japanese. New BNC- and COCA-based vocabulary size tests were constructed in three modalities: word form recall, word form recognition, and word meaning recall. Depending on the test modality, the tests measured the knowledge of 8,000 lemmas or word families through 80 randomly sampled items, 6 of which were loanwords in Hebrew and 13 in Japanese. Therefore, we added the same number of non-loanwords from corresponding frequencies and performed within-subject comparisons between the scores of the original tests with loanwords and their non-loanword versions in which non-loanwords replaced loanwords. The comparisons were done for each L1 group, at each test modality, and at three L2 proficiency levels, as defined by the total non-loanword test score. We also compared the two L1 groups on the degree of loanword effect. In both L1 groups, tests with loanwords yielded significantly higher scores in all test modalities and among most proficiency groups. Less able participants gained more from the presence of loanwords. However, loanwords differently influenced the size estimates of the two L1 groups. Implications are suggested for creating vocabulary size tests and making inferences from vocabulary test data.
Uploads
International papers with Impact factor by Stuart McLean
exclusively among meaning-recognition item types. This article reports on two exploratory studies, each comparing a fully-contextualized and a non-contextualized meaning-recall variant for one specific testing purpose: coverage-comprehension research. The fully-contextualized test
utilized the same 622-word passage in each study. In the non-contextualized tests, target words appeared in short, non-defining sentences; in Study A, the elicited response was a translation of only the target item while in Study B, it was the entire prompt sentence. Scores on the compared tests differed significantly only in Study A. In both studies, the consistency with which the
compared item formats yielded the same outcome (correct or incorrect) when the same target word was encountered by the same learner was rather low. The provision of relatively authentic context sometimes seemed to aid lexical inferencing, but other times it increased task difficulty relative to the limited-context formats. These findings suggest that different meaning-recall formats could lead to different conclusions regarding knowledge of specific words, and this could impact coverage-comprehension research findings.
assessment of L2 written receptive vocabulary knowledge. However, to date, there has been
relatively little research regarding which modalities of vocabulary knowledge have the strongest
correlations to reading proficiency, and observed differences have often been statistically
non-significant. The present research employs a bootstrapping approach to reach a clearer
understanding of relationships between various modalities of vocabulary knowledge to reading
proficiency. Test-takers (N = 103) answered 1000 vocabulary test items spanning the third 1000
most frequent English words in the New General Service List corpus (Browne, Culligan, & Phillips,
2013). Items were answered under four modalities: Yes/No checklists, form recall, meaning recall,
and meaning recognition. These pools of test items were then sampled with replacement to create
1000 simulated tests ranging in length from five to 200 items and the results were correlated to
the Test of English for International Communication (TOEIC®) Reading scores. For all examined
test lengths, meaning-recall vocabulary tests had the highest average correlations to reading
proficiency, followed by form-recall vocabulary tests. The results indicated that tests of vocabulary
recall are stronger predictors of reading proficiency than tests of vocabulary recognition, despite
the theoretically closer relationship of vocabulary recognition to reading.
The Vocabulary Size Test (VST) was designed to measure the vocabulary needed for reading. Recent research, however, has questioned the “meaning-recognition” construct measured by the VST, arguing that “meaning-recall” is a more accurate estimate of reading vocabulary. The present study compared four variants of the VST to determine which, if any, could be used as an expedient proxy for estimating meaning-recall knowledge. Two hundred Japanese university students completed a criterion meaning-recall measure of VST target words and one of four randomly-assigned VST variants: monolingual, mono-lingual with an “I don’t know” option (IDK), bilingual, or bilingual with IDK. The bilingual+IDK variant (r = .77) had a significantly lower correlation with the meaning-recall measure than the other three versions (r = .88 to .91). The lower r-value for the bilingual+IDK version appears to have been caused by pronounced differences in IDK use among learners who sat that version of the test. The study concludes that other variants could effectively be used to rank or group learners by meaning-recall knowledge. However, for estimates of reading vocabulary size, measures of meaning-recall should be used, or raw VST scores need to be adjusted to account for differences between VST and meaning-recall scores.
Because VST scores can be inflated by blindly guessing unknown words under its multiple‐choice format (Stewart, 2014), the inclusion of an I don't know (IDK) answer choice has been explored as a means of achieving more accurate estimates of vocabulary size. Using this convention, researchers have reported both reduced guessing and improved estimates of reliability (Lucovich, 2014; Zhang, 2013). It has also been observed, however, that when faced with unknown words some learners are more likely than others to use IDK (Bennett & Stoeckel, 2012; Zhang, 2013). This means that examinees with the same vocabulary knowledge could achieve significantly different test scores, rendering the instrument ineffectual for many of the purposes listed above. Although use of IDK improves reliability, it is unclear whether the resultant scores correlate to actual vocabulary knowledge as well as scores without IDK, as this has gone unreported in previous studies. To address this gap, the goals of the present study are to systematically explore the relationships between actual vocabulary knowledge, test scores, and estimates of reliability for the VST with and without the IDK answer choice.
exclusively among meaning-recognition item types. This article reports on two exploratory studies, each comparing a fully-contextualized and a non-contextualized meaning-recall variant for one specific testing purpose: coverage-comprehension research. The fully-contextualized test
utilized the same 622-word passage in each study. In the non-contextualized tests, target words appeared in short, non-defining sentences; in Study A, the elicited response was a translation of only the target item while in Study B, it was the entire prompt sentence. Scores on the compared tests differed significantly only in Study A. In both studies, the consistency with which the
compared item formats yielded the same outcome (correct or incorrect) when the same target word was encountered by the same learner was rather low. The provision of relatively authentic context sometimes seemed to aid lexical inferencing, but other times it increased task difficulty relative to the limited-context formats. These findings suggest that different meaning-recall formats could lead to different conclusions regarding knowledge of specific words, and this could impact coverage-comprehension research findings.
assessment of L2 written receptive vocabulary knowledge. However, to date, there has been
relatively little research regarding which modalities of vocabulary knowledge have the strongest
correlations to reading proficiency, and observed differences have often been statistically
non-significant. The present research employs a bootstrapping approach to reach a clearer
understanding of relationships between various modalities of vocabulary knowledge to reading
proficiency. Test-takers (N = 103) answered 1000 vocabulary test items spanning the third 1000
most frequent English words in the New General Service List corpus (Browne, Culligan, & Phillips,
2013). Items were answered under four modalities: Yes/No checklists, form recall, meaning recall,
and meaning recognition. These pools of test items were then sampled with replacement to create
1000 simulated tests ranging in length from five to 200 items and the results were correlated to
the Test of English for International Communication (TOEIC®) Reading scores. For all examined
test lengths, meaning-recall vocabulary tests had the highest average correlations to reading
proficiency, followed by form-recall vocabulary tests. The results indicated that tests of vocabulary
recall are stronger predictors of reading proficiency than tests of vocabulary recognition, despite
the theoretically closer relationship of vocabulary recognition to reading.
The Vocabulary Size Test (VST) was designed to measure the vocabulary needed for reading. Recent research, however, has questioned the “meaning-recognition” construct measured by the VST, arguing that “meaning-recall” is a more accurate estimate of reading vocabulary. The present study compared four variants of the VST to determine which, if any, could be used as an expedient proxy for estimating meaning-recall knowledge. Two hundred Japanese university students completed a criterion meaning-recall measure of VST target words and one of four randomly-assigned VST variants: monolingual, mono-lingual with an “I don’t know” option (IDK), bilingual, or bilingual with IDK. The bilingual+IDK variant (r = .77) had a significantly lower correlation with the meaning-recall measure than the other three versions (r = .88 to .91). The lower r-value for the bilingual+IDK version appears to have been caused by pronounced differences in IDK use among learners who sat that version of the test. The study concludes that other variants could effectively be used to rank or group learners by meaning-recall knowledge. However, for estimates of reading vocabulary size, measures of meaning-recall should be used, or raw VST scores need to be adjusted to account for differences between VST and meaning-recall scores.
Because VST scores can be inflated by blindly guessing unknown words under its multiple‐choice format (Stewart, 2014), the inclusion of an I don't know (IDK) answer choice has been explored as a means of achieving more accurate estimates of vocabulary size. Using this convention, researchers have reported both reduced guessing and improved estimates of reliability (Lucovich, 2014; Zhang, 2013). It has also been observed, however, that when faced with unknown words some learners are more likely than others to use IDK (Bennett & Stoeckel, 2012; Zhang, 2013). This means that examinees with the same vocabulary knowledge could achieve significantly different test scores, rendering the instrument ineffectual for many of the purposes listed above. Although use of IDK improves reliability, it is unclear whether the resultant scores correlate to actual vocabulary knowledge as well as scores without IDK, as this has gone unreported in previous studies. To address this gap, the goals of the present study are to systematically explore the relationships between actual vocabulary knowledge, test scores, and estimates of reliability for the VST with and without the IDK answer choice.
summer vacation represents a threat to instructed language acquisition remains unclear. In a previous study, Kramer et al. (2019) looked at receptive
vocabulary knowledge attrition over summer vacation, found no evidence of
attrition using these measures, and called for future research to instead use
tests of productive vocabulary knowledge which is more likely to be forgotten. Therefore, in this study, we investigate the amount of summer attrition among Japanese university students(N = 81) and any mediation in that attrition attributable to digital paired-associate vocabulary studying, extensive reading, or experience traveling abroad. The results indicate that although there was no significant group difference in pre- and post-test productive vocabulary scores, a small but significant relationship was found between digital paired-associate vocabulary studying and vocabulary test score gains.
supported by research or are based on studies with important limitations. First is that a
vocabulary size, instead of a level, can be used to match learners with lexically appropriate
materials despite test creators and research not supporting this. Second is that the word family
(WF6) is an appropriate definition of the lexical unit if learners know at least 5,000 WF6s.
The available evidence suggests that for such learners, knowledge of derivational forms is
limited enough that it can result in the incorrect matching of learners to pedagogical materials
(McLean, 2018). Additionally, foreign language learners who know 5,000 WF6s are rare.
Third is that derivational forms are infrequent enough that knowledge of only a few affixes
will support comprehension. This inference results from Laufer and Cobb’s (2020) analysis,
which has major limitations.
We are sincerely thankful for Laufer’s interest in McLean’s 2021 publication and for
discussing the recent commentary regarding the limitations of levels and size tests (Stewart,
et al., 2021; Stoeckel, et al., 2021; Webb, 2021). We hope readers will carefully read all of
these works and consider the validity of the arguments based on the evidence presented.
Japanese society has reached an unprecedented level of aging, with elderly people accounting for 25.1% of the population in October 2013. These changes have created concerns regarding deaths among the elderly. In this study, we compared recent forensic autopsy cases with cases from about 20 years ago, with the goal of understanding the context of death among the elderly within Japanese society today.
Methods
We investigated the forensic autopsy records of 297 people aged 65 years or above. In order to examine the effect of residential circumstances, we classified these cases into two groups: people who lived alone (group A) and those who lived with their family (group B). Forty-five of these autopsy cases were conducted about 20 years ago (1989 to 1993) and 252 cases were recent (2009 to 2013). The cases were limited to people who had been found dead or in a critical situation at home. We investigated the first finder, the period of time elapsed between death and discovery, and the cause of death.
Results
The incidence of the first finder being a family member was more than 20% greater in group B compared with group A. The proportions of cases for which it took more than three days for someone to find the body or an abnormal situation were about 14% and 7% in groups A and B, respectively, 20 years ago, and about 48% and 19% among the recent cases. These proportions were significantly higher among the recent cases. Among recent cases, a post-mortem elapsed time of more than 3 days occurred more often in group A than group B (p = 0.0002). None of the older cases had an unknown cause of death in either group. However, among the recent cases from both groups, 20–30% of cases resulted in unknown causes of death. The incidences of unknown causes of death were significantly higher among the recent cases in both groups (p = 0.015) and in group B alone (p = 0.037). The incidences of murder cases were significantly lower in group B among the recent cases (p = 0.0022).
Discussion
Elderly people who live alone are not easily found or aided when in critical situations, and they may only be discovered after death. Prolongation of the postmortem interval (PMI) results in the deterioration of the corpse making determination of cause of death problematic. The results of this study suggest that there are three factors that isolate elderly people and increase the difficulty in determining their cause of death: reduced communication with family members, reduced communication with neighbors or the community, and the increasing prevalence of the nuclear family. In group B, the prolonged discovery time and the increased incidence of unknown causes of death suggest reduced communication with family members, even though the incidence of being found by a family member was higher than in group A. The murder rate was significantly lower in group B, which may suggest that cases of domestic murder were overlooked. Support for a safe life and peaceful ending for the elderly requires a system based on three factors: remote monitoring to ensure safety, the establishment of elderly groups providing mutual support, and increased visits from welfare workers. Understanding the circumstances of the elderly who die alone is beneficial to countries facing an aging society with weakened family or community structures, and who hope for better support for the elderly.
In the field of forensic medicine, it is very difficult to know prior to autopsy what kind of virus has infected a body.
Objective
We assessed the potential of the genome profiling (GP) method, which was developed in the field of bioengineering, to
identify viruses belonging to one species.
Method
Two species in the same family, JC and BK viruses, were used in this study. Using plasmid samples, we compared the
findings of molecular phylogenetic analysis using conventional genome sequencing with the results of cluster analysis using the random PCR-based GP method and discussed whether the GP method can be used to determine viral species.
Results
It was possible to distinguish these two different viral species. In addition to this, in our trial we could also detect the JC virus from a clinical sample.
Conclusion
This method does not require special reagent sets for each viral species. Though our findings are still in the trial period, the GP method may be a simple, easy, and economical tool to detect viral species in the near future.
of a second language learner’s written receptive vocabulary size, measuring from the most frequent fourteen 1,000 word families of the spoken
subsection of the British National Corpus. While some have recommended
that users should limit the amount of the test taken to only slightly
above a student’s level, others argue that learners should take every
level of the test. However, this raises concerns that correct responses
on lower frequency levels could largely be attributed to guesses rather
than vocabulary knowledge. In this paper we analyze a data set of
3,373 Japanese university students’ responses to the first eight levels of
the original VST under the 3PL model, in order to determine the
minimum expected score on the test for learners of low ability, examine
the proportion of low-level students’ scores on the lowest frequency level
tested that can be attributed to guessing under the 3PL model, and
conduct a model fit comparison to determine whether the 3PL model
offers a significantly better description of the data than the Rasch model.
The results indicate that a substantial portion of lower level learners’
scores on items testing low-frequency words can be attributed to guessing
and support the position that students should not sit every level of the
test. The authors recommend using the results of the 3PL analysis in
order to determine which sections of the test learners of different
proficiency levels should sit.
Vocabulary Size Test (VST) usually takes 40-60 minutes. As a result,
teachers would benefit from being able to make reasonable estimates from
commonly available information. This paper aims to investigate: (1) What
are the mean vocabulary sizes of students at Japanese universities as a
whole, and by university department (hensachi)? and (2) Are a university’s
department standardized rank scores (hensachi) a useful proxy for English
vocabulary size? This study used a cross-sectional design where 3,449
Japanese university students were tested using Nation and Beglar’s VST.
The results showed an average score of 3,715.20 word families and that
VST scores were significantly higher for students in higher department
hensachi programs. This current department hensachi was also found to
have a stronger correlation with VST scores than with other covariates
when the entire sample was considered. Lastly, there appears to be a
lack of consistent knowledge of the most frequent words of English,
suggesting that curriculum designers at Japanese universities should focus on teaching high-frequency English words. Although the findings support the use of the VST for comparing receptive written vocabulary knowledge between learners, they perhaps do not support its use in establishing a vocabulary size to decide lexically appropriate materials.
rates. While the rare inclusion of a delayed posttest is a strength of the research methodology, Chang (2012) incorrectly argues that participants comprehended the TR instrument despite clear evidence to the contrary. This is critical, for with any study the validity of an author’s inferences
and conclusions is based on the appropriate use of measurement instruments. It is hoped that by highlighting limitations not stated in Chang (2012) and suggesting solutions, the reliability of future research might be improved, and in turn increase the strength of arguments for the inclusion of TR activities in the classroom.
accurate results, because they avoid conflating vocabulary knowledge with ability to decode answer choices in the L2. However, they have received little scrutiny beyond initial piloting and may therefore benefit from further examination and refinement (Nguyen & Nation, 2011). This paper describes the revision of the first eight 1,000-word frequency bands of the Japanese bilingual VST with the goal of increasing the test’s unidimensionality and accuracy. The revisions (a) removed English loanwords from the answer choices to prevent examinees from correctly responding through phonological matching alone, (b) ensured that the parts of speech of each answer choice were identical, and (c) matched the lengths of answer choices.
I also recommend you use a meaning-recall and not a multiple-choice test. Meaning-recall is closer to the target construct.