Applied Linguistics-2013-Jiang-1-24
Applied Linguistics-2013-Jiang-1-24
Applied Linguistics-2013-Jiang-1-24
Measurements of Development in
L2 Written Production: The Case of
L2 Chinese
WENYING JIANG
The University of Queensland, Australia
E-mail: [email protected]; [email protected]
INTRODUCTION
Larsen-Freeman (1977, 1978a, 1978b, 1983) demonstrated the need for an
objective and precise index of L2 development. Such an index would benefit
at least three groups of people: researchers who ‘‘could report a much more
precise, objective description of their subjects’ L2 proficiency than what the
labels ‘beginning’, ‘intermediate’, and ‘advanced’ currently allow’’ (1983:
287); language programme administrators who ‘‘would obtain a reliable
means of placing L2 learners in classes appropriate to the learners’ level of
proficiency’’ (1983: 287); and L2 teachers who ‘‘stand to gain if such an
index could be constructed, since they might then possess a way of measuring
any change in overall proficiency of their students over the course of a term’’
(1983: 287). Wolfe-Quintero et al. (1998) further elucidate below the useful-
ness of an independent measure for L2 development:
For research purposes, developmental measures can provide infor-
mation on developmental level that allows comparability across
2 MEASUREMENTS OF DEVELOPMENT IN L2
number of T-units and words of a discourse; and then calculating mean words
per T-unit. Being ‘‘objective’’, ‘‘instrument-free’’, ‘‘reliable’’, and ‘‘accessible’’,
T-unit length was extended to measure L2 development shortly after it was
established as an index of children’s L1 development (Harrington 1986: 49).
L2 English, the most promising measures have been found to be the average
T-unit length (W/T), average error-free T-unit length (W/EFT), and percentage
of error-free T-units (EFT/T).
These T-unit measures are also among the most satisfactory measures iden-
tified by Wolfe-Quintero et al. (1998) after examining 39 studies in measuring
L2 development. Having classified measures that have been used in these
studies, Wolfe-Quintero et al. (1998) identified three major categories corres-
ponding to different aspects of development: (i) fluency; (ii) accuracy; and
(iii) complexity (both grammatical and lexical). They defined fluency as ‘‘the
rapid production of language’’, accuracy ‘‘as error-free production’’, and com-
Wolfe-Quintero et al. (1998) referred to the above measures that they iden-
tified as ‘‘best measures of ‘development’ so far’’ (p. 119). This author has
summarized these measures in Table 1 below.
Among the measures in Table 1, two basic units emerged. One is T-unit and
the other is clause. Given that measures of fluency, grammatical complexity, lexical
a
Based on Wolfe-Quintero et al. (1998), this table was summarized by
the current author.
W. JIANG 5
complexity, and accuracy all employ T-unit as a basic unit, while only measures of
fluency and grammatical complexity employ clause as a basic unit, it appears that
T-unit is used more extensively than clause as a unit of analysis for measuring
L2 development. Polio (1997) has also found that T-unit is a sound measure of
L2 development in terms of accuracy. Specifically, she has found the measure
percentage of error-free T-units a reliable one for linguistic accuracy in L2
writing development after she compared three measures: holistic scale,
error-free T-units, and an error classification system. However, as Iwashita
(2006) states ‘‘the language studied is predominantly English, and little is
known about whether the findings of such studies can be applied to languages
(b) Should pinyin and/or English words, which learners use in their writing,
be counted or not?
(c) What constitutes an error-free T-unit in Chinese?
These three questions on the practicalities of applying T-unit analysis to
Chinese need to be answered before T-unit analyses in L2 Chinese can be
conducted.
the two clauses more coherent with each other, the wo jia and wo de should be
deleted. As shown in (6b), which is the corrected form of (6a), this sentence
comprises two error-free T-units that are grammatically correct and semantic-
ally appropriate in context.
(6a)* ,
Wo jia you si ge ren. Wo jia you wo de baba wo de mama wo de didi he wo.
I family have four M person. I family have my father my mother my
brother and I
There are four people in my family. There are my father, my mother,
my brother and myself in my family.
METHOD
Participants
Students enrolled in the Chinese language program at The University
of Queensland in Australia participated in the study. They were from three
different proficiency levels: level 1 (first year, n = 30), level 2 (second year,
n = 53), and level 3 (third year, n = 33). They were all native
English-speaking Chinese L2 learners in their late teens or early twenties
10 MEASUREMENTS OF DEVELOPMENT IN L2
with males and females of roughly the same proportion. Data were collected in
the middle of second semester for all three levels.
The level 1 students had no previous knowledge of Chinese language at
all (true beginners) when they started learning Chinese. They received four
hours teaching per week for 13 weeks in the first semester and seven weeks in
the second. For written Chinese course, the first semester concentrated on
character recognition, production (about 280), and learning to read simple
dialogues. The objectives for second semester included using a dictionary, con-
structing sentences, translating short passages, and composing short narrative
passages.
were required to write a passage inviting a friend for dinner. Please see
Supplementary Appendix A ‘The writing task for Level 1’ for task directions.
Forty-two students attended the examination. Twelve students were found
having heritage backgrounds other than English, four from Japan, three
from Korea, two from Malaysia, two from Thailand, and one from
Indonesia. In order to control learners’ L1 influence in their L2 production,
the author decided to exclude those written samples by non-native
English-speaking learners. Therefore, 30 valid written samples were collected.
Level 2: a letter-writing task involving inviting a friend to a dinner party
with specific requirements such as stating the time, date, address, and activities
Step 3: the data were tabulated on a SPSS data sheet and a one-way
MANOVA was run to measure effect of proficiency level on W/T, W/EFT,
and EFT/T, followed by pair-wise comparisons for any significant results.
Inter-rater reliability
The author and a trained native-speaking rater independently analyzed 20%
of the data, namely, 36 samples (182 20%) randomly selected, six from level
1, 10 from level 2, seven from level 3, and 13 from native level. The inter-rater
reliability was calculated respectively in relation to (i) number of T-units;
Level 1 37 41 35 89.7 40
Level 2 158 172 154 93.3 167
Level 3 109 117 104 92.0 115
Native 286 300 282 96.2 294
Total 590 630 575 94.3 616
a
Percentage of agreement is calculated by using the number of identical T-units to be divided by
half of the sum of rater 1 and rater 2 T-units in each level. For example, level 1 = 35/(37 + 41)
2 = 89.7%. This applies in Tables 3 and 4.
Level 1 17 20 16 86.5 19
Level 2 93 103 91 92.9 98
Level 3 68 76 64 88.9 74
Native 286 300 282 96.2 294
Total 464 479 453 96.1 485
a
MD stands for mean difference. It was calculated by using the mean number of characters
minus the mean number of words per T-unit in each level. For example, in level 1
MD = 9.83 6.67 = 3.16.
RESULTS
Words per T-unit (W/T) versus characters per T-unit (Ch/T)
The average T-unit lengths of the four levels were calculated by both W/T and
Ch/T. Table 5 presents a comparison of the two measures.
As shown in Table 5, the T-unit length by W/T is 6.67, 6.31, 7.20, and 8.79
while the T-unit length by Ch/T is 9.83, 9.07, 10.33, and 12.19 at each of the
four proficiency levels respectively. This indicates that T-unit length increases
from either level 1 or level 2 to level 3 and to native level with level 1 being in
14 MEASUREMENTS OF DEVELOPMENT IN L2
the middle of level 2 and level 3 no matter it is measured by W/T or Ch/T. The
average mean difference (MD) between Ch/T and W/T is 3.12. Both measures
present a similar pattern of L2 Chinese writing development.
a
1, 2, 3, and 4 stands for Level 1, Level 2, Level 3, and Native
level, respectively.
wptu stands for words per T-unit = W/T; wdpeftu stands for
words per error-free T-unit = W/EFT; and pcteftu stands for
percentage of error-free T-unit = EFT/T.
among the four proficiency levels. The three dependent variables are reported
below.
W/T
The average T-unit lengths (W/T) of the four levels from level 1 to native level
are 6.67, 6.31, 7.20, and 8.79, respectively. It shows a general increase from
level 1or level 2 to level 3 and to native level. The differences between level 2
and level 3 and level 3 and native level are both statistically significant
(p = 0.000). However, level 1 T-unit length is not statistically different from
that of level 2 (p = 0. 110).
W/EFT
The mean error-free T-unit lengths of the four levels from level 1 to native level
are 6.75, 5.84, 6.76, and 8.79, respectively. It shows a general increase from
level 2 to level 3 and to native level. The differences between level 2 and level
3 and level 3 and native level are both statistically significant (p = 0.001).
16 MEASUREMENTS OF DEVELOPMENT IN L2
a
Adjustment for multiple comparisons: least significant difference (equivalent to no adjustments).
*The mean difference is significant at the .05 level. Based on estimated marginal means
W. JIANG 17
The average error-free T-unit length of level 1 is longer than that of level 2,
even close to that of level 3.
EFT/T
The percentages of error-free T-units (EFT/T) of the four levels from level 1 to
native level are 49.10, 56.28, 63.21, and 100, respectively. It increases from
level 1 through to native level. Pair-wise comparisons show that the differ-
ences in EFT/T between the four proficiency levels are all statistically signifi-
cant (p = 0.048, 0.049, 0.000, all <0.05). Therefore, the EFT/T discriminates
DISCUSSION
Having extended T-unit analysis to measure the Chinese language develop-
ment, this study has found that
1 T-unit length measured by both W/T and Ch/T presents similar patterns
in L2 Chinese writing development although Ch/T is longer than W/T.
2 T-unit length is found to be a valid measure in adult L1 Chinese when
the language users reach university stage.
3 T-unit length (W/T) and error-free T-unit length (W/EFT) increase from
level 2 to level 3 and to native level with level 1 being similar to level 2
for W/T and similar to level 3 for W/EFT. The percentage of error-free
T-units (EFT/T) is found to discriminate between all the four proficiency
levels. Each of the findings is discussed below.
Chinese Ch/T measure in this study is closer in length to the W/T measure in
English and Japanese. Larsen-Freeman (1977, 1978a, 1978b, 1983) shows W/T
is all above 10 in English while Iwashita (2006) shows W/T is 11.45 for low
proficiency level and 13.45 for high proficiency level in L2 Japanese. The
Chinese Ch/T measure is also closer to Hunt’s (1976) results in L1 Chinese,
which might suggest that Ch/T was used in Hunt’s (1976) study. More research
is needed in order to explore whether a Chinese character is equivalent to a
syntactic word in other languages concerning T-unit length.
Language 4 8 12
measuring fluency and EFT/T in measuring accuracy (Table 1). Given that fluency
is defined as ‘‘the rapid production of language’’ (Wolfe-Quintero et al. 1998:
117), a measure tapping fluency should involve time, that is fluency should be
measured by a certain amount of language production per hour or per minute.
This also suggests that W/T and W/EFT do not measure fluency, instead, they
measure syntactic complexity as Hunt (1965, 1976), Helleck (1995), Ortega
(2003), and Iwashita (2006) all clearly state. At the same time, W/EFT and
EFT/T measure accuracy as they both deal with error-free T-units, as explained
by Polio (1997).
With the underlying measurement constructs in mind, let us come back to
the results found in terms of the three specific T-unit measures. With regard to
W/T, level 1 and level 2 are not statistically different from each other, which
would mean at face value that the syntactic complexities of level 1 and level 2
are similar. With respect to W/EFT, level 1 is statistically different from level 2,
but statistically not different from level 3, which would mean at face value that
level 1 is more accurate than level 2, and is as accurate as level 3. As to EFT/T,
level 1, level 2, level 3, and native level are statistically all different, which
would mean that native level is more accurate than level 3, level 3 is more
accurate than level 2, and level 2 is more accurate than level 1.
Nevertheless, both W/EFT and EFT/T are supposed to measure essentially
the same construct of accuracy, they surprisingly tell us two different stories.
W/EFT is telling us that the level 1 learners produced writings that were sig-
nificantly superior to the writings by level 2 learners and as good as the writ-
ings produced by level 3 learners. At the same time, EFT/T is telling us the
opposite, namely that level 1 learners wrote significantly less accurate than
level 2 learners did. Based on the results, two questions emerge: (i) Why W/T
and W/EFT of level 1 are longer than those of level 2? (ii) Why would the two
20 MEASUREMENTS OF DEVELOPMENT IN L2
measures (W/EFT and EFT/T) that involve error-free production, and therefore
measure accuracy in essentially similar ways, yield such different results?
One explanation why level 1 learners produced ‘long’ T-units and ‘long’
error-free T-units could be this: they remembered some of the sentences
from their lessons and rote produced them when required to do so, since
they had not yet reached the stage where they could use their target language
creatively. In other words, a great portion of sentences from level 1 learners’
written samples were likely rote produced instead of being composed cre-
atively on their own. For example, the sentence Tianqi bu leng ye bu re (The
weather is not cold and not hot either) from level 1 textbook was found in the
To answer the question why W/EFT and EFT/T tell us two different stories
regarding the same construct of accuracy for level 1 and level 2 learners, the
author has found that W/EFT can be misleading as this measure examines
error-free T-units only, which is a portion of the writing sample. Imagine
that in a piece of written production of 100 T-units, if there is only 1 error-free
T-unit, the measure of W/EFT will be very misleading as this one error-free
T-unit does not represent the whole piece of writing at all. Therefore, whether
W/EFT is a valid measure depends on the proportion of error-free T-units
among all T-units, namely depending on how representative the error-free
T-units are in terms of T-units in total. When the error-free T-units well rep-
CONCLUSIONS
Among the three T-unit measures (W/T, W/EFT, and EFT/T), W/T is found to
be a valid measure in tapping syntactic complexity and EFT/T is found to be a
valid measure in tapping accuracy in Chinese L2 development. Caution needs
to be taken when W/EFT is employed as error-free T-units do not always
represent T-units in total.
Building on this study, the potential for future research is vast. Most pressing
is a controlled study that compares genres of writing tasks. This would control
for the differences a certain genre can make in T-unit length, also allowing a
comparison across genres (e.g. letter, picture description, and essay) in relation
to different proficiency levels, so as to allow more solid conclusions to
be drawn.
Moreover, there is a need to employ T-unit analysis in a longitudinal
study of L2 Chinese development, which directly measures language develop-
ment, so as to allow a comparison with the results obtained from
this cross-sectional study, which only indirectly measures language growth.
Such a study would need to follow a group of learners for a period of time
and collect data periodically, in order to see whether and how T-unit length
changes.
22 MEASUREMENTS OF DEVELOPMENT IN L2
SUPPLEMENTARY DATA
Supplementary Data are available at Applied Linguistics online.
ACKNOWLEDGEMENTS
I would like to thank Dr Guy Ramsay, Dr Michael Harrington, Dr Noriko Iwashita and six
anonymous reviewers for very useful comments on earlier drafts of this article.
NOTES
REFERENCES
Bardovi-Harlig, K. 1992. ‘A second look at Ge, B. 2001. Xian dai han yu ci hui xue (Modern
T-unit analysis: Reconsidering the sentence,’ Chinese Lexicology). Shandong ren min chu ban
TESOL Quarterly 26/2: 390–5. she. Shandong People’s Press.
Chao, D. 2000. ‘Promoting the study of the Goh, Y.-S. 1999. ‘Challenges of the rise of global
Chinese language in the early 19th century: Mandarin,’ Journal of the Chinese Language
‘‘The Chinese Repository’’ as a resource,’ Teachers Association 34/3: 41–8.
Journal of the Chinese Language Teachers Guo, Z. 2000. A Concise Chinese Grammar.
Association 35/2: 91–110. Sinolingua.
Chao, Y. R. 1968. A Grammar of Spoken Chinese. Halleck, G. B. 1995. ‘Assessing oral proficiency: A
University of California Press. comparison of holistic and objective measures,’
Chu, C. C. 1998. A Discourse Grammar of The Modern Language Journal 79: 223–34.
Mandarin Chinese. Lang. Harrington, M. 1986. ‘The T-unit as a measure
Cooper, T. C. 1976. ‘Measuring written syntactic of JSL oral proficiency,’ Descriptive and Applied
patterns of second language learners of German,’ Linguistics, Bulletin of the ICU Summer Institute in
Journal of Educational Research 69/5: 176–83. Linguistics 19: 49–56.
Dvorak, T. R. 1987. ‘Is written FL like oral FL?’ Henry, K. 1996. ‘Early L2 writing development:
in B. VanPatten, T. R. Dvorak, and J. F. Lee A study of autobiographical essays by univer-
(eds): Foreign Language Learning: A Research sity-level students of Russian,’ The Modern
Perspective. Newbury House, pp. 79–91. Language Journal 80: 309–26.
Ellis, R. 1994. The study of second language acquisi- Ho, Y. 1993. Aspects of Discourse Structure in
tion. Oxford: Oxford University Press. Mandarin Chinese. Mellen University Press.
Ellis, R. 2005. ‘Principles of instructed language Huang, J. and E. Hatch. 1978. ‘A Chinese
learning,’ System. 33: 209–224. child’s acquisition of English’ in E. Hatch
FDMC. 1986. Xiandai Hanyu Pinlü Cidian (ed.): Second Language Acquisition: A Book of
(Frequency Dictionary of Modern Chinese). Readings. Newbury House, pp. 118–31.
Beijing Language Institute Press. Hunt, K. W. 1965. ‘Grammatical structures writ-
Gaies, S. J. 1980. ‘T-unit analysis in second lan- ten at three grade levels’. NCTE Research
guage acquisition: Applications, problems and Report No. 3. National Council of Teachers of
limitations,’ TESOL Quarterly 14/1: 53–60. English.
W. JIANG 23
Hunt, K. W. 1970. ‘Recent measures in syntactic Monroe, J. H. 1975. ‘Measuring and enhancing
development’ in M. Lester (ed.): Readings in syntactic fluency in French,’ The French Review
Applied Transformational Grammar. Holt, XLVIII/6: 1023–31.
Rinehart and Winston, Inc., pp. 187–200. Myles, F. 2004. ‘From data to theory: the over-
Hunt, K. W. 1976. ‘Study correlates age with representation of linguistic knowledge in SLA,’
grammatical complexity,’ Linguistic – Reporter Transactions of the Philological Society 102:
18/7: 3. 139–168.
Hunt, K. W. 1977. ‘Early blooming and late Myles, F., J. Hooper, and R. Mitchell. 1998.
blooming syntactic structures’ in C. R. Cooper ‘Rote or rule? Exploring the role of formulaic
and L. Odell (eds): Evaluative Writing: language in classroom foreign language learn-
Describing, Measuring, Judging. National ing,’ Language Learning 48/3: 323–363.
Council of Teachers of English. Myles, F., R. Mitchell, and J. Hooper. 1999.