Applied Linguistics-2013-Jiang-1-24

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Applied Linguistics 2013: 34/1: 1–24 ß Oxford University Press 2012

doi:10.1093/applin/ams019 Advance Access published on 6 July 2012

Measurements of Development in
L2 Written Production: The Case of
L2 Chinese

WENYING JIANG
The University of Queensland, Australia
E-mail: [email protected]; [email protected]

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


This study investigates measures for second language (L2) writing development.
A T-unit, which has been found the most satisfactory unit of analysis for mea-
suring L2 development in English, is extended to measure L2 Chinese writing
development through a cross-sectional design in this study. Data were collected
from three L2 Chinese learner groups (n = 116) at different proficiency levels
determined by institutional status, namely year of study and a native control
group (n = 66). A T-unit in Chinese is firstly defined and then solutions for ques-
tions of practicality faced in extending T-unit analysis to Chinese are provided.
In order to confirm the reliability of T-unit length as a measure for Chinese,
T-unit analysis is applied to L1 Chinese before it is used to measure L2 Chinese
development. With T-unit length being established as a reliable measure in L1
Chinese, three specific T-unit measures, namely T-unit length, error-free T-unit
length, and percentage of error-free T-units, are extended to measure L2
Chinese writing development. Percentage of error-free T-units is found to
be the only measure that discriminates between all levels of this learner
cohort. Significance of the findings and relevance to measurements of L2 writing
development in general are discussed.

INTRODUCTION
Larsen-Freeman (1977, 1978a, 1978b, 1983) demonstrated the need for an
objective and precise index of L2 development. Such an index would benefit
at least three groups of people: researchers who ‘‘could report a much more
precise, objective description of their subjects’ L2 proficiency than what the
labels ‘beginning’, ‘intermediate’, and ‘advanced’ currently allow’’ (1983:
287); language programme administrators who ‘‘would obtain a reliable
means of placing L2 learners in classes appropriate to the learners’ level of
proficiency’’ (1983: 287); and L2 teachers who ‘‘stand to gain if such an
index could be constructed, since they might then possess a way of measuring
any change in overall proficiency of their students over the course of a term’’
(1983: 287). Wolfe-Quintero et al. (1998) further elucidate below the useful-
ness of an independent measure for L2 development:
For research purposes, developmental measures can provide infor-
mation on developmental level that allows comparability across
2 MEASUREMENTS OF DEVELOPMENT IN L2

studies and target languages. Program levels are not comparable


across programs, and standardized test scores exclude the possibility
of comparisons when studies don’t have them readily available
(p. 126).
An objective measure for L2 Chinese development is particularly called for
because the number of L2 Chinese learners has been increasing over recent
years (Goh 1999). In the field of second language acquisition (SLA), the ma-
jority of empirical studies have been focused on English and some European
languages. Despite a large body of literature in this field, there remains a pau-
city of studies on Chinese second/foreign (L2) acquisition (Ko 1997). Few

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


studies on how to measure Chinese L2 development have been found. At
present, research in Chinese L2 acquisition does not match the increasing
demand to learn Chinese as an L2 and the need for understanding and charting
L2 Chinese development. It is compelling that the most satisfactory L2 devel-
opmental measures in the literature be researched and also be extended to
measure Chinese L2 development.

THE ORIGIN AND DEFINITION OF A T-UNIT


Studies show that a T-unit is an objective unit of analysis in charting language
development. The notion of ‘T-unit’ was proposed by Hunt (1965, 1970) when
he was charting the growth of children’s syntactic complexity in English.
He defined a ‘T-unit’, namely ‘minimal terminal unit’ as
the shortest units into which a piece of discourse can be cut without
leaving any sentence fragments as residue. Thus a T-unit always
contains just one independent clause plus however many subordin-
ate clauses there are attached to the independent clause (1970:
188).
With this definition, Hunt (1965) also set some criteria to help demarcate
T-units in a discourse. According to him, coordinating conjunctions ‘and’,
‘but’, and ‘or’ in a compound sentence would go with the clause that follows
them and punctuation errors or failure to use punctuations would be ignored if
the writing is intelligible. For example, the compound sentence My sister is a
nurse and she works in London is considered to be two T-units with five words
each, while the complex sentence, My sister who works in London is a nurse, is
considered to be one T-unit with nine words. Therefore, the average number
of words, that is, the length of a T-unit with imbedded clause (s), is usually
greater than that of a T-unit of a simple sentence. As a result, T-unit length is
highly correlated with the syntactic complexity of the T-unit in question.
Specifically, the longer the mean T-unit length, the more syntactically
mature/complex the discourse is. T-unit length was thus established as an
index of children’s L1 development. The process of calculating mean T-unit
length is named T-unit analysis. As a way of measuring language development,
T-unit analysis involves segmenting T-units in a discourse; counting the
W. JIANG 3

number of T-units and words of a discourse; and then calculating mean words
per T-unit. Being ‘‘objective’’, ‘‘instrument-free’’, ‘‘reliable’’, and ‘‘accessible’’,
T-unit length was extended to measure L2 development shortly after it was
established as an index of children’s L1 development (Harrington 1986: 49).

REVIEW OF T-UNIT ANALYSIS FOR MEASURING


L2 DEVELOPMENT
In L1 acquisition as mentioned above, T-unit length has been established as a
highly satisfactory index of oral as well as written language development

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


(Hunt 1965, 1970, 1977; O’Donnell et al. 1967; Mellon 1969; O’Hare 1973;
Loban 1976). In L2 acquisition, Cooper (1976) for German, Monroe (1975)
and Kern and Schultz (1992) for French, Dvorak (1987) for Spanish, Henry
(1996) for Russian, Harrington (1987) and Iwashita (2006) for Japanese, and a
number of studies including Gaies (1980), Halleck (1995), and
Larsen-Freeman and colleagues (1977, 1978a, 1978b, 1983) for English have
all found that T-unit length generally discriminates among L2 learners at dif-
ferent proficiency levels. Thornhill’s (1969) longitudinal study of four adult
Spanish ESL learners over a nine-week period also found ‘‘the mean length of
T-units a usable measure of development toward maturity in second language
production’’ (p. 37).
However, errors frequently occur in language performance of L2 data. As a
consequence, Scott and Tucker (1974), apart from measuring T-unit length,
employed slightly modified measures, error-free T-unit length and percentage of
error-free T-units in studying language performance of 22 Arabic ESL learners.
They found that the percentage of error-free T-units in written production
increased from 49.4% to 53.9%, while the percentage of error-free T-units
in oral production increased from 34.2% to 62.7% after 12 weeks intensive
training. Taking account of errors in L2 acquisition, Larsen-Freeman (1978a)
also found that T-unit length alone ‘‘might not suffice for L2 acquisition’’
(p. 445) and that the error-free T-unit length and percentage of error-free
T-units were more powerful measures in L2 acquisition, which were thus
considered as more accurate indices of L2 development.
Larsen-Freeman conducted a series of studies (1977, 1978a, 1978b, 1983)
employing T-unit analysis for measuring L2 English development. Her 1977
study in cooperation with colleague Virginia Strom examined 48 compositions
written by UCLA ESL students representing different L1 backgrounds and
English proficiencies. Encouraged by the findings, Larsen-Freeman (1978a)
undertook a large-scale study of 212 compositions written by ESL students.
Encouraged again by the potential indices of L2 development investigated,
Larsen-Freeman extended her T-unit analysis in 1983 to analyzing oral data,
comparing controlled versus free writing samples, and examining the effect of
time by longitudinal data. Although the results from her studies are mixed and
no single T-unit measure has been established as the developmental index of
4 MEASUREMENTS OF DEVELOPMENT IN L2

L2 English, the most promising measures have been found to be the average
T-unit length (W/T), average error-free T-unit length (W/EFT), and percentage
of error-free T-units (EFT/T).
These T-unit measures are also among the most satisfactory measures iden-
tified by Wolfe-Quintero et al. (1998) after examining 39 studies in measuring
L2 development. Having classified measures that have been used in these
studies, Wolfe-Quintero et al. (1998) identified three major categories corres-
ponding to different aspects of development: (i) fluency; (ii) accuracy; and
(iii) complexity (both grammatical and lexical). They defined fluency as ‘‘the
rapid production of language’’, accuracy ‘‘as error-free production’’, and com-

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


plexity ‘‘as the use of varied and sophisticated structures and vocabulary’’
(p. 117). They concluded
The fluency measures that were consistently linear and significantly
related to program or school levels were words per T-unit (W/T),
words per clause (W/C) and words per error-free T-unit (W/EFT).
The grammatical complexity measures that were consistently linear
and significantly related to program or school levels were clauses
per T-unit (C/T) and dependent clauses per clause (DC/C). In add-
ition, two lexical complexity measures that were significantly
related to short-term change were a word type measure (WT/
ˇ2 W) and a sophisticated word type measure (SWT/WT), but
these measures were not investigated across program levels. The
accuracy measures that were significantly related to short-term
change and holistic judgments across a range of levels and within
intact classes were error-free T-units per T-unit (EFT/T) and errors
per T-unit (E/T). These measures were only consistently linear in
these contexts, but not across program or school levels (p. 119).

Wolfe-Quintero et al. (1998) referred to the above measures that they iden-
tified as ‘‘best measures of ‘development’ so far’’ (p. 119). This author has
summarized these measures in Table 1 below.
Among the measures in Table 1, two basic units emerged. One is T-unit and
the other is clause. Given that measures of fluency, grammatical complexity, lexical

Table 1: Most satisfactory measures of L2 developmenta

Fluency W/T, W/C, W/EFT


Grammatical complexity C/T, DC/C
Lexical complexity WT/ˇ2 W, SWT/WT
Accuracy EFT/T, E/T

a
Based on Wolfe-Quintero et al. (1998), this table was summarized by
the current author.
W. JIANG 5

complexity, and accuracy all employ T-unit as a basic unit, while only measures of
fluency and grammatical complexity employ clause as a basic unit, it appears that
T-unit is used more extensively than clause as a unit of analysis for measuring
L2 development. Polio (1997) has also found that T-unit is a sound measure of
L2 development in terms of accuracy. Specifically, she has found the measure
percentage of error-free T-units a reliable one for linguistic accuracy in L2
writing development after she compared three measures: holistic scale,
error-free T-units, and an error classification system. However, as Iwashita
(2006) states ‘‘the language studied is predominantly English, and little is
known about whether the findings of such studies can be applied to languages

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


that are typologically different from English’’ (p. 151). The potential utility of
T-unit analysis in measuring L2 Chinese needs to be explored. This study will
examine whether or not measures using T-units are reliable and valid meas-
urements of L2 Chinese development.

APPLYING T-UNIT ANALYSIS TO L2 CHINESE


This section firstly defines a Chinese T-unit with the units available for analyz-
ing languages. Then, solutions for practicalities in applying T-unit analysis to
L2 Chinese are provided. Finally, the specific goals of the T-unit analysis for
measuring L2 Chinese development in this study are stated explicitly.

Working definition of a Chinese T-unit


Hunt (1976) employed T-unit analysis in measuring L1 Chinese development.
However, no definition of a Chinese T-unit was found in his study, nor did he
indicate how to demarcate T-units in Chinese. No other studies applying T-unit
analysis in relation to the Chinese language have been found in the literature.
Deciding on a T-unit in Chinese discourse can be problematic because the
Chinese language persistently resists the boundaries of Hunt’s T-unit as
defined in English, due to the features of Chinese such as ‘‘the fuzziness of
sentence boundaries, thematic prominence, the frequent deletion of major
sentence elements, and the easiness with which forms are compromised to
accommodate meaning’’ (Ho 1993: v). Thus, it is necessary to establish some
rules as to what constitutes a T-unit in Chinese, which also do not distort the
basic definition of the T-unit in English by Hunt (1965).
The working definition of a T-unit in Chinese discourse given below draws
on Hunt’s definition of T-unit in English and Chu’s definition of clause in
Chinese. Chu (1998) defines a Chinese clause as ‘‘minimally consisting of a
predicate of various forms’’ (p. 354). Therefore, combining the two notions,
that is, T-unit in English and clause in Chinese, a Chinese T-unit is defined as
follows:
A single main clause that contains one independent predicate plus
whatever other subordinate clauses or non-clauses are attached to,
or embedded within, that one main clause.
6 MEASUREMENTS OF DEVELOPMENT IN L2

According to this definition, a subject is not an essential component of a


Chinese T-unit (neither in a Chinese clause or sentence). This does justice to
the Chinese language because: (i) a subject tends to be deleted in Chinese
when it is predictable from the context and (ii) Chinese is a topic-prominent
language in that it is often the topic, not the subject, that is important in
constructing a Chinese sentence or discourse. Thus, a Chinese subjectless
simple sentence is considered as one Chinese T-unit. A compound sentence
with two self-standing clauses either connected by conjunction words such as
‘and’ or by a comma is considered as two T-units. Embedded clauses in com-
plex sentences are not counted as independent T-units. Examples (1), (2), and

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


(3) below demonstrate how T-units are demarcated in Chinese:
1
(1) (1 T-unit)
Wo jiejie jiao mali.
I elder sister call Mary.
My elder sister is called Mary.
(2) / / (3 T-units)
Wo jiejie jiao mali, / jin nian ershi sui, / zai Beijing shang daxue.
I elder sister call Mary, / this year twenty year, / at Beijing study
university.
My elder sister is called Mary, / (she is) twenty this year / and (she is)
studying at her university in Beijing.
(3) (1 T-unit)
Wo jiao mali de jiejie jin nian ershi sui.
I call Mary DE elder sister this year twenty year.
My elder sister who is called Mary is twenty this year.
Example (1) is a simple sentence. It comprises one T-unit. Example (2)
comprises three clauses. All the three clauses can stand on their own if a
subject were added despite the fact that there is no overt verb in the second
clause. So, Example (2) comprises three T-units. Example (3), a complex sen-
tence that comprises an embedded subject attributive clause, comprises only
one T-unit since embedded clauses are not considered independent T-units.

Practicalities in applying T-unit analysis to Chinese L2


With a T-unit having been defined in Chinese, ways of measuring T-unit
length need to be specified. The existing studies involving T-unit analysis
have employed the average number of words per T-unit to measure T-unit
length. In English, contractions such as he’d and isn’t were regarded as one
word by some researchers and as two words by others; compound nouns such
as snowball were also counted as one word by some researchers and as two
words by others (Vavra 2000). For the Chinese language, these issues are not
relevant. However, new issues arise:
(a) Should words or characters per T-unit be used to measure T-unit length
in Chinese?
W. JIANG 7

(b) Should pinyin and/or English words, which learners use in their writing,
be counted or not?
(c) What constitutes an error-free T-unit in Chinese?
These three questions on the practicalities of applying T-unit analysis to
Chinese need to be answered before T-unit analyses in L2 Chinese can be
conducted.

Words per T-unit (W/T) versus characters per T-unit (Ch/T)


Studies of other languages employing T-unit analysis all have employed W/T

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


as a measure of T-unit length. Should words per T-unit also be used in Chinese?
Considering the uniqueness of the Chinese language, W/T is only one of the
options to indicate T-unit length. Pan (2002) and Pan et al. (1993) strongly
argue that the basic unit in Chinese is zi (character), just as a word is the basic
unit in English and other European languages. Chao (1968) recognized the
difference almost half a century ago. He called Chinese zi ‘‘sociological word’’
(p. 136), which he regarded as the Chinese sociological equivalent of the word
in English. Chao (1968) described zi or ‘sociological word’ as follows:
By the ‘sociological word’ I mean that type of unit, intermediate in
size between a phoneme and a sentence, which the general, non-
linguistic public is conscious of, talks about, has an everyday term
for, and is practically concerned with in various ways. It is the kind of
thing which a child learns to say, which a teacher teaches children to
read and write in school, which a writer is paid for so much per
thousand, which a clerk in a telegraph office counts and charges so
much per [sic], the kind of thing one makes slips of the tongue on,
and for the right or wrong use of which one is praised or blamed.
Thus it has all the social features of the common small change of
every day speech one would call a ‘word’ in English (p. 136).
Besides, Chao (1968) has also employed the following two examples to show
how zi (character) is deeply conceptualized and frequently used as a basic unit
in Chinese:
An illiterate person can, just as naturally as a literate person, say:
(4) ‘ ’
Ni gan shuo yi ge ‘bu’ zi.
You dare say one M2 ‘bu’ character.
(Don’t) you dare say the word ‘no’!
(5)
Ta dui na jian shi yi ge zi mei ti.
He towards that M matter one M character not mention.
He did not mention a single word about that matter. (p. 137)
One would tend to count zi (characters) in any written discourse in Chinese,
not only because zi is the basic unit in Chinese language but also because the
8 MEASUREMENTS OF DEVELOPMENT IN L2

boundary of a Chinese syntactic word is rather unclear, since Chinese is writ-


ten without using any spaces or other word delimiters (except for punctuation
marks). A Chinese syntactic word is usually a bigram (two characters), but
may also be a unigram, trigram, or a four gram. According to the Frequency
Dictionary of Modern Chinese (FDMC 1986), among the top 9,000 most fre-
quently used words, 26.7% are unigrams, 69.8% are bigrams, and 2.7% are
trigrams, 0.007% four grams, and 0.0002 five grams. Apart from Chinese lin-
guists, common people cannot always tell (or do not find it necessary to do so)
the difference between a syntactic word and a phrase in Chinese. Therefore,
more justification will be needed if syntactic words are to be counted for

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


measuring T-unit length in Chinese.
In order to explore how W/T and Ch/T affect the average T-unit length,
both W/T and Ch/T will be calculated and compared in this study in order
to provide empirical evidence for future reference. As to demarcating Chinese
words, Jiang (2009) has established the definition of a Chinese syntactic word.
The detailed information of identifying a Chinese word can be found in Jiang
(2009).

Pinyin and/or English words in L2 Chinese discourse


The second issue to arise is that the L2 learners often produce some pinyin and/
or English words in their Chinese writing. These non-character items are used
by learners in their writing to probably replace those Chinese characters they
have difficulty with. Should these items be counted in Chinese T-unit analysis?
Given the fact that learners obviously know the syntactic structures of these
particular T-units and they use pinyin and/or English words as a communica-
tion strategy to convey the intended meaning, these non-character items
should be counted since they contribute to the T-unit length. However,
these T-units that contain pinyin and/or English words are considered as
T-units with errors as native speakers would not use them. In addition,
these items do not look natural among Chinese characters. What constitutes
an error-free T-unit in L2 Chinese then?

Criteria for an error-free T-unit in Chinese


Whether a T-unit is error-free is judged both on its syntactic structure and on
its meaning in context in this study. Sometimes a T-unit can be grammatically
correct on its own, but becomes odd or incoherent in a discourse, namely
semantically inappropriate. Therefore, error-free T-units are the T-units that
are both grammatically correct and semantically appropriate in their contexts.
For example, failure to write a character correctly or failure to use the correct
word order constitutes an error due to ungrammaticality. Errors due to in-
appropriateness in context are usually referred to as discourse-related errors.
Example (6a) below constitutes one such error. Under the context of (6a), the
bold fonts wo jia ‘my family’ and wo de ‘my’ are redundant. In order to make
W. JIANG 9

the two clauses more coherent with each other, the wo jia and wo de should be
deleted. As shown in (6b), which is the corrected form of (6a), this sentence
comprises two error-free T-units that are grammatically correct and semantic-
ally appropriate in context.
(6a)* ,
Wo jia you si ge ren. Wo jia you wo de baba wo de mama wo de didi he wo.
I family have four M person. I family have my father my mother my
brother and I
There are four people in my family. There are my father, my mother,
my brother and myself in my family.

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


(6b) / (2 T-units)
Wo jia you si ge ren, / you baba mama didi he wo.
I family have four person, / have father mother brother and I
There are four people in my family, / they are (my) father, (my)
mother, (my) brother and myself.
Having provided solutions for the three practicalities faced when applying
T-unit analysis to Chinese, T-unit analyses can now be conducted. Before that,
the specific goals of the T-unit analysis need to be explicitly set.

Specific goals of the T-unit analysis


Three basic goals are to be achieved in this study. Firstly, as an exploratory
study, W/T and Ch/T for measuring T-unit length are to be compared to pro-
vide reference for future studies. Secondly, whether T-unit length is a valid
measure in adult L1 Chinese needs to be examined. T-unit length in L1 should
remain comparatively stable when the language users reach their university
stage. Otherwise, T-unit length cannot be a valid measure for L2 development.
In other words, the validity of T-unit length in adult L1 serves as a prerequisite
for applying the measure to L2. Thirdly, the three most promising T-unit meas-
ures, namely average T-unit length (W/T), average error-free T-unit length
(W/EFT), and percentage of error-free T-units (EFT/T), are to be calculated
on a corpus of L2 Chinese written data in order to ascertain which will
prove to be the best measure.

METHOD
Participants
Students enrolled in the Chinese language program at The University
of Queensland in Australia participated in the study. They were from three
different proficiency levels: level 1 (first year, n = 30), level 2 (second year,
n = 53), and level 3 (third year, n = 33). They were all native
English-speaking Chinese L2 learners in their late teens or early twenties
10 MEASUREMENTS OF DEVELOPMENT IN L2

with males and females of roughly the same proportion. Data were collected in
the middle of second semester for all three levels.
The level 1 students had no previous knowledge of Chinese language at
all (true beginners) when they started learning Chinese. They received four
hours teaching per week for 13 weeks in the first semester and seven weeks in
the second. For written Chinese course, the first semester concentrated on
character recognition, production (about 280), and learning to read simple
dialogues. The objectives for second semester included using a dictionary, con-
structing sentences, translating short passages, and composing short narrative
passages.

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


The level 2 students had completed one year of Chinese instruction or
achieved equivalent proficiency as judged by the course convener. In their
second year written Chinese course, they received three hours teaching per
week for 13 weeks in the first semester and seven weeks in the second. The
main objectives included employing contextual and associational strategies to
manage unfamiliar language, translating passages from Chinese into English,
and composing narratives and letters in colloquial language.
The level 3 students had completed two years of Chinese instruction or
achieved equivalent proficiency as judged by the course convener. In their
third year written Chinese course, they received three hours teaching per
week for 13 weeks in the first semester and six weeks in the second. The
main objectives included using dictionaries skillfully, learning meanings of
radicals, using Chinese punctuation marks, and composing different genres
of writing such as essays and reports.
A native Chinese speaker control group (n = 66) was also included in this
study. They were university students enrolled in September 2003 at two in-
stitutions in Beijing, China: one being Beijing Normal University (BNU) and
the other being Beijing Institute of Civil Engineering and Architecture
(BICEA). Students from BNU majored in psychology while students from
BICEA majored in engineering. There was approximately the same proportion
of males and females. These two groups of native speakers passed the same
national entrance exam for higher education in China. They were assumed as
representative native Chinese users.

Written production data


Written production data from the four proficiency levels were collected during
June to December 2004. In other words, data were collected in the middle of
second semester for all three levels. Details of the written genre from the three
learner levels and the native speaker group are described as follows:
Level 1: to ensure the comparability of the writing tasks, all levels were set a
letter-writing task. However, since level 1 students were not sufficiently com-
petent to write a full letter including date, salutation, body, complimentary
close, and signature, their task was simplified to writing only the body of a
letter, which was included in a closed book examination. The learners
W. JIANG 11

were required to write a passage inviting a friend for dinner. Please see
Supplementary Appendix A ‘The writing task for Level 1’ for task directions.
Forty-two students attended the examination. Twelve students were found
having heritage backgrounds other than English, four from Japan, three
from Korea, two from Malaysia, two from Thailand, and one from
Indonesia. In order to control learners’ L1 influence in their L2 production,
the author decided to exclude those written samples by non-native
English-speaking learners. Therefore, 30 valid written samples were collected.
Level 2: a letter-writing task involving inviting a friend to a dinner party
with specific requirements such as stating the time, date, address, and activities

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


was included in an examination. Please see Supplementary Appendix A ‘The
writing task for Levels 2, 3, and Native’ for task instruction. Fifty-three valid
written samples were collected.
Level 3: the same letter-writing task was also included in an examination for
level 3 students. Please see Supplementary Appendix A ‘The writing task for
Levels 2, 3, and Native’ for details. Twelve valid samples were collected from
the exam in one class. Twenty-one were collected from another class. A total of
33 valid written samples were collected.
Native speaker: in June 2004, two English teachers at the two institutions of
BNU and BICEA in Beijing, China, delivered the information sheet and con-
sent sheet of the research project, namely, requesting the students to complete
the same letter-writing task of inviting a friend to a dinner party. Please
see Supplementary Appendix A ‘The writing task for Levels 2, 3, and Native’
for the task directions. A total of 66 letters were collected.
Altogether, 182 writing samples were collected for this study.
The writing tasks for the three learner groups might sound different as Level
1 wrote a passage (the body of a letter) while Levels 2 and 3 wrote a letter.
Details of the writing tasks are attached in the Supplementary Appendix A, in
which one can see that the three groups actually conducted the same writing
task, apart from the fact that Level 1 learners were not required to produce the
format of a letter such as date, salutation, complimentary close and signature.
Moreover, in the written samples of Levels 2 and 3 only the bodies of the
letters were analyzed.

Data analytic procedure


The data (182 written samples) were analyzed according to the following
three-step sequence:
Step 1: T-units were demarcated and counted, and the number of words and
the number of characters in each T-unit were counted for each written
sample;
Step 2: error-free T-units were identified and counted, and the number of
words in each error-free T-unit counted for each written sample;
12 MEASUREMENTS OF DEVELOPMENT IN L2

Step 3: the data were tabulated on a SPSS data sheet and a one-way
MANOVA was run to measure effect of proficiency level on W/T, W/EFT,
and EFT/T, followed by pair-wise comparisons for any significant results.

Inter-rater reliability
The author and a trained native-speaking rater independently analyzed 20%
of the data, namely, 36 samples (182  20%) randomly selected, six from level
1, 10 from level 2, seven from level 3, and 13 from native level. The inter-rater
reliability was calculated respectively in relation to (i) number of T-units;

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


(ii) number of words; and (iii) number of error-free T-units.
Based on the 36 samples, the two raters first demarcated T-units, counted
number of words and counted number of error-free T-units independently,
then counted the identical ones in each level, and re-examined the remaining
ones together through discussion. Tables 2, 3, and 4 show the results of
the three steps for calculating (i) number of T-units; (ii) number of words;
and (iii) number of error-free T-units.
Therefore, the inter-rater reliability for number of T-units = 575/
616 = 93.3%

Table 2: Number of T-units in 20% of data


Level Rater 1 Rater 2 Identical % Agreementa Agreed after discussion

Level 1 37 41 35 89.7 40
Level 2 158 172 154 93.3 167
Level 3 109 117 104 92.0 115
Native 286 300 282 96.2 294
Total 590 630 575 94.3 616

a
Percentage of agreement is calculated by using the number of identical T-units to be divided by
half of the sum of rater 1 and rater 2 T-units in each level. For example, level 1 = 35/(37 + 41)
 2 = 89.7%. This applies in Tables 3 and 4.

Table 3: Number of words in 20% of data


Level Rater 1 Rater 2 Identical % Agreement Agreed after
discussion

Level 1 253 270 231 88.3 260


Level 2 1014 1063 998 96.1 1042
Level 3 807 836 790 96.2 826
Native 2501 2539 2477 98.3 2530
Total 4575 4708 4496 96.9 4658
W. JIANG 13

Table 4: Number of error-free T-units in 20% of data


Level Rater 1 Rater 2 Identical % Agreement Agreed after
discussion

Level 1 17 20 16 86.5 19
Level 2 93 103 91 92.9 98
Level 3 68 76 64 88.9 74
Native 286 300 282 96.2 294
Total 464 479 453 96.1 485

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


Table 5: Comparison of T-unit lengths between W/T and Ch/T
Level N W/T Ch/T MDa
Mean (SD) Mean (SD) Ch/T-W/T

Level 1 30 6.67 (0.87) 9.83 (1.85) 3.16


Level 2 53 6.31 (0.66) 9.07 (1.19) 2.76
Level 3 33 7.20 (0.83) 10.33 (1.14) 3.13
Native 66 8.79 (1.27) 12.19 (1.92) 3.40
Total 182 7.43 (1.45) 10.55 (2.05) 3.12

a
MD stands for mean difference. It was calculated by using the mean number of characters
minus the mean number of words per T-unit in each level. For example, in level 1
MD = 9.83 6.67 = 3.16.

The inter-rater reliability for number of words = 4496/4658 = 96.5%


The inter-rater reliability for number of error-free T-units = 453/485 = 93.4%
In sum, the inter-rater reliabilities are all above 93% and no significant
disputes arose after discussion. This indicates that the criteria used in demar-
cating T-units, segmenting words and identifying error-free T-units are fairly
reliable.

RESULTS
Words per T-unit (W/T) versus characters per T-unit (Ch/T)
The average T-unit lengths of the four levels were calculated by both W/T and
Ch/T. Table 5 presents a comparison of the two measures.
As shown in Table 5, the T-unit length by W/T is 6.67, 6.31, 7.20, and 8.79
while the T-unit length by Ch/T is 9.83, 9.07, 10.33, and 12.19 at each of the
four proficiency levels respectively. This indicates that T-unit length increases
from either level 1 or level 2 to level 3 and to native level with level 1 being in
14 MEASUREMENTS OF DEVELOPMENT IN L2

the middle of level 2 and level 3 no matter it is measured by W/T or Ch/T. The
average mean difference (MD) between Ch/T and W/T is 3.12. Both measures
present a similar pattern of L2 Chinese writing development.

Validity of T-unit length as a measure in L1 Chinese


As discussed earlier, T-unit length has been established as a highly satisfactory
index of development in L1 and L2 English. However, no previous studies have
examined the validity of this measure between different L1 or L2 groups that
are assumed to represent the same given proficiency level. If T-unit length

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


varies across different groups of a same proficiency level, doubt would be
cast to the validity of using it for measuring L2 development. Thus, the validity
of T-unit length in measuring adult L1 provides a prerequisite for it to be used
in measuring L2.
The proficiency level of adult L1 Chinese should remain stable when the
language users reach their university stage. Thus, different adult native groups
with similar background, namely university students, should share generally
the same proficiency level, and the T-unit length of this level should remain no
significant different. The data from the Chinese native control group, collected
at two institutions in Beijing, China, were specially designed for examining the
validity of T-unit length between different groups of adult L1 Chinese. Thus,
the mean length of W/T and standard deviation (SD) of the two native Chinese
groups were calculated. Total mean length of the two groups yielded no stat-
istically significant difference (W/T: 8.78, F(2, 64) = 0.001, p = 0.973) although
psychology students (W/T 9.01, SD 1.27) seemingly produced longer T-units
than engineering students (W/T 8.54, SD 1.24).

T-unit measures for charting Chinese L2 writing development


As discussed earlier, W/T, W/EFT, and EFT/T are not only among the most
satisfactory L2 development measures Wolfe-Quintero et al. (1998) identified,
they have also been determined to be the most promising measures in
Larsen-Freeman’s research series. In this study, means and SDs of the three
T-unit measures were obtained through T-unit analysis. A one-way MANOVA
was run through SPSS in order to make pair-wise comparisons between
different proficiency levels for W/T, W/EFT, and EFT/T because MANOVA
is designed to look at several dependent variables (in this case they are W/T,
W/EFT, and EFT/T) simultaneously, which is more powerful than a series of
one-at-a-time ANOVAs. The one-way MANOVA revealed a significant multi-
variate main effect for level: Wilks’  = .217, F(9, 428.49) = 41.57, p < .001,
partial Z2 = .399. For details, see Supplementary Appendix B: output from
running one-way MANOVA.
Table 6 shows the descriptive statistics of W/T, W/EFT, and EFT/T of level 1,
level 2, level 3, and native level while Table 7 shows the pair-wise comparisons
W. JIANG 15

Table 6: Descriptive statistics by MANOVA


Levela Mean (SD) N

Wptu 1 6.6657 (0.87172) 30


2 6.3053 (0.65529) 53
3 7.1979 (0.83322) 33
4 8.7923 (1.27237) 66
Total 7.4284 (1.45017) 182
wdpeftu 1 6.7537 (1.73196) 30

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


2 5.8368 (1.06609) 53
3 6.7630 (1.12213) 33
4 8.7923 (1.27237) 66
Total 7.2276 (1.77393) 182
pcteftu 1 49.1000 (20.63200) 30
2 56.2830 (18.82958) 53
3 63.2121 (20.50268) 33
4 100.0000 (0.00000) 66
Total 72.2088 (26.52447) 182

a
1, 2, 3, and 4 stands for Level 1, Level 2, Level 3, and Native
level, respectively.
wptu stands for words per T-unit = W/T; wdpeftu stands for
words per error-free T-unit = W/EFT; and pcteftu stands for
percentage of error-free T-unit = EFT/T.

among the four proficiency levels. The three dependent variables are reported
below.

W/T
The average T-unit lengths (W/T) of the four levels from level 1 to native level
are 6.67, 6.31, 7.20, and 8.79, respectively. It shows a general increase from
level 1or level 2 to level 3 and to native level. The differences between level 2
and level 3 and level 3 and native level are both statistically significant
(p = 0.000). However, level 1 T-unit length is not statistically different from
that of level 2 (p = 0. 110).

W/EFT
The mean error-free T-unit lengths of the four levels from level 1 to native level
are 6.75, 5.84, 6.76, and 8.79, respectively. It shows a general increase from
level 2 to level 3 and to native level. The differences between level 2 and level
3 and level 3 and native level are both statistically significant (p = 0.001).
16 MEASUREMENTS OF DEVELOPMENT IN L2

Table 7: Pair-wise comparisons of W/T, W/EFT, and EFT/T


Dependent (I) level (J) level Mean difference Sig.a 95% Confidence
variable (I–J) (SE) interval for differencea

wptu 1 2 0.360 (0.224) .110 0.083 to 0.803


3 0.532* (0.248) .033 1.021 to 0.043
4 2.127* (0.216) .000 2.554 to 1.700
2 1 0.360 (0.224) .110 0.803 to 0.083
3 0.893* (0.218) .000 1.323 to 0.463
4 2.487* (0.181) .000 2.845 to 2.129

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


3 1 0.532* (0.248) .033 0.043 to 1.021
2 0.893* (0.218) .000 0.463 to 1.323
4 1.594* (0.209) .000 2.008 to 1.181
4 1 2.127* (0.216) .000 1.700 to 2.554
2 2.487* (0.181) .000 2.129 to 2.845
3 1.594* (0.209) .000 1.181 to 2.008
wdpeftu 1 2 0.917* (0.292) .002 .340 to 1.494
3 0.009 (0.323) .977 0.647 to 0.628
4 2.039* (0.282) .000 2.595 to 1.482
2 1 0.917* (0.292) .002 1.494 to 0.340
3 0.926* (0.284) .001 1.486 to 0.366
4 2.955* (0.236) .000 3.421 to 2.490
3 1 0.009 (0.323) .977 0.628 to 0.647
2 0.926* (0.284) .001 0.366 to 1.486
4 2.029* (0. 273) .000 2.568 to 1.491
4 1 2.039* (0. 282) .000 1.482 to 2.595
2 2.955* (0. 236) .000 2.490 to 3.421
3 2.029* (0.273) .000 1.491 to 2.568
pcteftu 1 2 7.183* (3.602) .048 14.290 to 0.076
3 14.112* (3.977) .000 21.960 to 6.265
4 50.900* (3.471) .000 57.750 to 44.050
2 1 7.183* (3.602) .048 0.076 to 14.290
3 6.929* (3.496) .049 13.827 to 0.031
4 43.717* (2.908) .000 49.455 to 37.979
3 1 14.112* (3.977) .000 6.265 to 21.960
2 6.929* (3.496) .049 0.031 to 13.827
4 36.788* (3.361) .000 43.420 to 30.156
4 1 50.900* (3.471) .000 44.050 to 57.750
2 43.717* (2.908) .000 37.979 to 49.455
3 36.788* (3.361) .000 30.156 to 43.420

a
Adjustment for multiple comparisons: least significant difference (equivalent to no adjustments).
*The mean difference is significant at the .05 level. Based on estimated marginal means
W. JIANG 17

The average error-free T-unit length of level 1 is longer than that of level 2,
even close to that of level 3.

EFT/T
The percentages of error-free T-units (EFT/T) of the four levels from level 1 to
native level are 49.10, 56.28, 63.21, and 100, respectively. It increases from
level 1 through to native level. Pair-wise comparisons show that the differ-
ences in EFT/T between the four proficiency levels are all statistically signifi-
cant (p = 0.048, 0.049, 0.000, all <0.05). Therefore, the EFT/T discriminates

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


between all the four proficiency levels.

DISCUSSION
Having extended T-unit analysis to measure the Chinese language develop-
ment, this study has found that
1 T-unit length measured by both W/T and Ch/T presents similar patterns
in L2 Chinese writing development although Ch/T is longer than W/T.
2 T-unit length is found to be a valid measure in adult L1 Chinese when
the language users reach university stage.
3 T-unit length (W/T) and error-free T-unit length (W/EFT) increase from
level 2 to level 3 and to native level with level 1 being similar to level 2
for W/T and similar to level 3 for W/EFT. The percentage of error-free
T-units (EFT/T) is found to discriminate between all the four proficiency
levels. Each of the findings is discussed below.

W/T versus Ch/T


T-unit length measured by both W/T and Ch/T presents similar patterns in L2
Chinese writing development with Ch/T being longer than W/T proportion-
ally. This suggests that it does not make any difference whether words or
characters are counted if the research aims to see whether T-unit measures
discriminate between different proficiency levels in Chinese. However, it does
make a difference to the actual T-unit length since Ch/T is about 3.12 longer
than W/T. For future research, Ch/T is recommended for research on L2
Chinese when results do not have to be compared with those of T-unit ana-
lyses in other languages because Ch/T is more reliable for coding. In addition,
characters can be easily counted by computer while words require more effort
in segmenting and counting and also can be disputable in coding sometimes.
When research results in Chinese are to be compared with those of T-unit
analyses in other languages, W/T should be suggested because the previous
studies all employed the W/T measure. However, due to the uniqueness of the
Chinese language, a Chinese character or ‘sociological word’ (Chao 1968: 136)
could be equivalent to a syntactic word in other languages like English as the
18 MEASUREMENTS OF DEVELOPMENT IN L2

Chinese Ch/T measure in this study is closer in length to the W/T measure in
English and Japanese. Larsen-Freeman (1977, 1978a, 1978b, 1983) shows W/T
is all above 10 in English while Iwashita (2006) shows W/T is 11.45 for low
proficiency level and 13.45 for high proficiency level in L2 Japanese. The
Chinese Ch/T measure is also closer to Hunt’s (1976) results in L1 Chinese,
which might suggest that Ch/T was used in Hunt’s (1976) study. More research
is needed in order to explore whether a Chinese character is equivalent to a
syntactic word in other languages concerning T-unit length.

Validity of T-unit length as a measure for Chinese

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


T-unit length is found to be a valid measure of proficiency in adult L1 Chinese
when the language users reach university stage. This confirms the validity of
W/T in measuring the development of the Chinese language. This is very
meaningful for L2 Chinese acquisition research. With W/T being established
as a reliable and objective measure in L1 Chinese, L2 Chinese development can
be explored and measured more accurately than the current literature can
offer. It is suggested that research in other target languages also check the
validity of this measure before applying it to chart L2 development as nothing
should be taken for granted in such studies.
The L1 Chinese T-unit length obtained from this study (8.79) is slightly
different from the result of Hunt’s (1976) study, as shown below in Table 8.
Among the seven languages investigated by Hunt (1976), only Mandarin
Chinese differed in that T-unit length remained roughly the same (around 10)
while it increased in other languages from grade 4 to 8 to 12, as expected. No
explanations were found why Chinese was different from other languages.
Presumably, the Chinese language proficiency of 8 and 12 graders did not
significantly improve compared to that of 4 graders, or they presented similar
proficiency level in their writing due to the writing task chosen, or due to
sampling limitations, for example the sample was too small and individual
language proficiency levels varied greatly among students in each grade.
The T-unit length produced by Chinese 4, 8, and 12 graders (around 10) in
Hunt’s study was even longer than that produced by university students in the
current study (8.79). This difference might be caused by different genres of
writing or by different criteria in demarcating T-units in the Chinese language
as there was no indication of how T-units were demarcated in Hunt’s study.
Or simply characters, instead of words, were counted in Hunt’s (1976) study as
mentioned earlier.

The three T-unit measures: W/T, W/EFT, and EFT/T


Before interpreting the results achieved through W/T, W/EFT, and EFT/T
measures, it is necessary to know what exactly W/T, W/EFT, and EFT/T are
measuring. According to Hunt (1965, 1976), W/T is to measure syntactic com-
plexity. However, Wolfe-Quintero et al. (1998) categorized W/T and W/EFT in
W. JIANG 19

Table 8: Mean T unit lengths for various languages in Hunt (1976)


Grade

Language 4 8 12

English 6.7 10.2 12.0


Fijian 8.1 11.1 13.0
Indonesian 4.9 9.1 10.3
Korean 6.4 9.2 9.6

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


Laotian 9.1 11.6 16.0
Marshallese 6.0 7.5 9.7
Mandarin (Taiwan) 10.0 10.3 10.1

Source: Hunt (1976: 3).

measuring fluency and EFT/T in measuring accuracy (Table 1). Given that fluency
is defined as ‘‘the rapid production of language’’ (Wolfe-Quintero et al. 1998:
117), a measure tapping fluency should involve time, that is fluency should be
measured by a certain amount of language production per hour or per minute.
This also suggests that W/T and W/EFT do not measure fluency, instead, they
measure syntactic complexity as Hunt (1965, 1976), Helleck (1995), Ortega
(2003), and Iwashita (2006) all clearly state. At the same time, W/EFT and
EFT/T measure accuracy as they both deal with error-free T-units, as explained
by Polio (1997).
With the underlying measurement constructs in mind, let us come back to
the results found in terms of the three specific T-unit measures. With regard to
W/T, level 1 and level 2 are not statistically different from each other, which
would mean at face value that the syntactic complexities of level 1 and level 2
are similar. With respect to W/EFT, level 1 is statistically different from level 2,
but statistically not different from level 3, which would mean at face value that
level 1 is more accurate than level 2, and is as accurate as level 3. As to EFT/T,
level 1, level 2, level 3, and native level are statistically all different, which
would mean that native level is more accurate than level 3, level 3 is more
accurate than level 2, and level 2 is more accurate than level 1.
Nevertheless, both W/EFT and EFT/T are supposed to measure essentially
the same construct of accuracy, they surprisingly tell us two different stories.
W/EFT is telling us that the level 1 learners produced writings that were sig-
nificantly superior to the writings by level 2 learners and as good as the writ-
ings produced by level 3 learners. At the same time, EFT/T is telling us the
opposite, namely that level 1 learners wrote significantly less accurate than
level 2 learners did. Based on the results, two questions emerge: (i) Why W/T
and W/EFT of level 1 are longer than those of level 2? (ii) Why would the two
20 MEASUREMENTS OF DEVELOPMENT IN L2

measures (W/EFT and EFT/T) that involve error-free production, and therefore
measure accuracy in essentially similar ways, yield such different results?
One explanation why level 1 learners produced ‘long’ T-units and ‘long’
error-free T-units could be this: they remembered some of the sentences
from their lessons and rote produced them when required to do so, since
they had not yet reached the stage where they could use their target language
creatively. In other words, a great portion of sentences from level 1 learners’
written samples were likely rote produced instead of being composed cre-
atively on their own. For example, the sentence Tianqi bu leng ye bu re (The
weather is not cold and not hot either) from level 1 textbook was found in the

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


majority of level 1 learners’ written production. It is not necessarily the case
that they could use the adverb ‘ye’ (which means ‘too or as well’ in English)
creatively, but the case that they remembered the sentence from the textbook.
A number of studies show that the use of chunks is very common in the early
stages of L2 acquisition (e.g. Wong-Fillmore 1976; Huang and Hatch 1978;
Myles et al. 1999; Vihman 1982; Ellis 1984, 2005; Weinert 1994, 1995). In
Ellis’ (2005) own words: ‘Classroom studies by Ellis (1984), Myles et al. (1998,
1999), and Myles (2004) demonstrate that learners often internalize
rote-learned material as chunks [italics added], breaking them down for ana-
lysis later on’ (p. 211). Similarly, Taguchi’s (2007) empirical data also suggest
‘that memorized chunks served as a basis for the creative construction of dis-
course’ (p. 433) in a later stage. Therefore, rigid use of the target language can
present a face value of seemingly higher level. It is not unusual that level 1
learners in this study produced ‘longer’ T-units, namely, chunk-based T-units
in their written samples.
Two sources of evidence also support the claim that level 1 learners made
use of chunk-based production strategy in the production of their written
samples. One is from the author’s dual roles as language teacher and re-
searcher. Being familiar with the language items in the textbook, the author
is able to judge the sources of learners’ L2 performance, especially in relation to
level 1 learners whose potentially producible language items were limited. In
level 1 learners’ written samples, the syntactic structures of the majority of
T-units produced were verbatim from the textbook (plus various errors maybe
due to failure of memory in writing) as demonstrated earlier. Although these
level 1 students produced the sentence Tianqi bu leng ye bu re correctly, they
could not use the ‘ye’ structure in Chinese competently at all. The other source
of evidence is from the error types. Some error types such as erroneous omis-
sions frequently occurred in level 1 and less frequently occurred in level 2 but
only occasionally occurred in level 3. These erroneous omissions can be ex-
plained best by memory failure. In sum, the rote-remembered T-units or sen-
tences not only tended to be error-free but also contributed to the T-unit
length in level 1, thus also contributed to the average syntactic complexity
of level 1, which explains why the mean T-unit length and error-free T-unit
length of level 1 learners were not the lowest among the three learner groups.
W. JIANG 21

To answer the question why W/EFT and EFT/T tell us two different stories
regarding the same construct of accuracy for level 1 and level 2 learners, the
author has found that W/EFT can be misleading as this measure examines
error-free T-units only, which is a portion of the writing sample. Imagine
that in a piece of written production of 100 T-units, if there is only 1 error-free
T-unit, the measure of W/EFT will be very misleading as this one error-free
T-unit does not represent the whole piece of writing at all. Therefore, whether
W/EFT is a valid measure depends on the proportion of error-free T-units
among all T-units, namely depending on how representative the error-free
T-units are in terms of T-units in total. When the error-free T-units well rep-

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


resent T-units in total, W/EFT can be a valid measure as it taps both syntactic
complexity and accuracy. When the error-free T-units do not represent T-units in
total, W/EFT can be very misleading as it does not measure the written sample
at all. This explains why W/EFT sometimes discriminates between proficiency
levels and sometimes not in Larsen-Freeman’s (1977, 1978a, 1978b, 1983)
series of studies. Due to its partial nature, when W/EFT contradicts with
EFT/T in measuring the same construct of accuracy, we know that W/EFT is
not reliable. Therefore, EFT/T is telling us the true story in measuring the
accuracy of the learners’ written production. Thus, caution needs to be taken
by researchers who apply the measure of W/EFT in any L2 target language. It is
recommended that EFT/T be used or both W/EFT and EFT/T be used at the
same time to measure accuracy in L2 development.

CONCLUSIONS
Among the three T-unit measures (W/T, W/EFT, and EFT/T), W/T is found to
be a valid measure in tapping syntactic complexity and EFT/T is found to be a
valid measure in tapping accuracy in Chinese L2 development. Caution needs
to be taken when W/EFT is employed as error-free T-units do not always
represent T-units in total.
Building on this study, the potential for future research is vast. Most pressing
is a controlled study that compares genres of writing tasks. This would control
for the differences a certain genre can make in T-unit length, also allowing a
comparison across genres (e.g. letter, picture description, and essay) in relation
to different proficiency levels, so as to allow more solid conclusions to
be drawn.
Moreover, there is a need to employ T-unit analysis in a longitudinal
study of L2 Chinese development, which directly measures language develop-
ment, so as to allow a comparison with the results obtained from
this cross-sectional study, which only indirectly measures language growth.
Such a study would need to follow a group of learners for a period of time
and collect data periodically, in order to see whether and how T-unit length
changes.
22 MEASUREMENTS OF DEVELOPMENT IN L2

SUPPLEMENTARY DATA
Supplementary Data are available at Applied Linguistics online.

ACKNOWLEDGEMENTS
I would like to thank Dr Guy Ramsay, Dr Michael Harrington, Dr Noriko Iwashita and six
anonymous reviewers for very useful comments on earlier drafts of this article.

NOTES

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


1 The first line of the example is written the third line. The last line in the ex-
in Chinese characters. The second line ample provides an idiomatic English
is the same sentence written in pinyin, translation. All the Chinese examples
the official Chinese phonetic system throughout this paper follow the same
used in the People’s Republic of pattern.
China. This is followed by a word-for- 2 M stands for measure word in Chinese
word or literal English translation in such as ge.

REFERENCES
Bardovi-Harlig, K. 1992. ‘A second look at Ge, B. 2001. Xian dai han yu ci hui xue (Modern
T-unit analysis: Reconsidering the sentence,’ Chinese Lexicology). Shandong ren min chu ban
TESOL Quarterly 26/2: 390–5. she. Shandong People’s Press.
Chao, D. 2000. ‘Promoting the study of the Goh, Y.-S. 1999. ‘Challenges of the rise of global
Chinese language in the early 19th century: Mandarin,’ Journal of the Chinese Language
‘‘The Chinese Repository’’ as a resource,’ Teachers Association 34/3: 41–8.
Journal of the Chinese Language Teachers Guo, Z. 2000. A Concise Chinese Grammar.
Association 35/2: 91–110. Sinolingua.
Chao, Y. R. 1968. A Grammar of Spoken Chinese. Halleck, G. B. 1995. ‘Assessing oral proficiency: A
University of California Press. comparison of holistic and objective measures,’
Chu, C. C. 1998. A Discourse Grammar of The Modern Language Journal 79: 223–34.
Mandarin Chinese. Lang. Harrington, M. 1986. ‘The T-unit as a measure
Cooper, T. C. 1976. ‘Measuring written syntactic of JSL oral proficiency,’ Descriptive and Applied
patterns of second language learners of German,’ Linguistics, Bulletin of the ICU Summer Institute in
Journal of Educational Research 69/5: 176–83. Linguistics 19: 49–56.
Dvorak, T. R. 1987. ‘Is written FL like oral FL?’ Henry, K. 1996. ‘Early L2 writing development:
in B. VanPatten, T. R. Dvorak, and J. F. Lee A study of autobiographical essays by univer-
(eds): Foreign Language Learning: A Research sity-level students of Russian,’ The Modern
Perspective. Newbury House, pp. 79–91. Language Journal 80: 309–26.
Ellis, R. 1994. The study of second language acquisi- Ho, Y. 1993. Aspects of Discourse Structure in
tion. Oxford: Oxford University Press. Mandarin Chinese. Mellen University Press.
Ellis, R. 2005. ‘Principles of instructed language Huang, J. and E. Hatch. 1978. ‘A Chinese
learning,’ System. 33: 209–224. child’s acquisition of English’ in E. Hatch
FDMC. 1986. Xiandai Hanyu Pinlü Cidian (ed.): Second Language Acquisition: A Book of
(Frequency Dictionary of Modern Chinese). Readings. Newbury House, pp. 118–31.
Beijing Language Institute Press. Hunt, K. W. 1965. ‘Grammatical structures writ-
Gaies, S. J. 1980. ‘T-unit analysis in second lan- ten at three grade levels’. NCTE Research
guage acquisition: Applications, problems and Report No. 3. National Council of Teachers of
limitations,’ TESOL Quarterly 14/1: 53–60. English.
W. JIANG 23

Hunt, K. W. 1970. ‘Recent measures in syntactic Monroe, J. H. 1975. ‘Measuring and enhancing
development’ in M. Lester (ed.): Readings in syntactic fluency in French,’ The French Review
Applied Transformational Grammar. Holt, XLVIII/6: 1023–31.
Rinehart and Winston, Inc., pp. 187–200. Myles, F. 2004. ‘From data to theory: the over-
Hunt, K. W. 1976. ‘Study correlates age with representation of linguistic knowledge in SLA,’
grammatical complexity,’ Linguistic – Reporter Transactions of the Philological Society 102:
18/7: 3. 139–168.
Hunt, K. W. 1977. ‘Early blooming and late Myles, F., J. Hooper, and R. Mitchell. 1998.
blooming syntactic structures’ in C. R. Cooper ‘Rote or rule? Exploring the role of formulaic
and L. Odell (eds): Evaluative Writing: language in classroom foreign language learn-
Describing, Measuring, Judging. National ing,’ Language Learning 48/3: 323–363.
Council of Teachers of English. Myles, F., R. Mitchell, and J. Hooper. 1999.

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014


Iwashita, N. 2006. ‘Syntactic complexity meas- ‘Interrogative chunks in French L2: a basis
ures and their relation to oral proficiency in for creative construction?,’ Studies in Second
Japanese as a foreign language,’ Language Language Acquisition 21/1: 49–80
Assessment Quarterly 3/2: 151–69. O’Donnell, R. C. 1976. ‘A critique of some
Jiang, W. 2009. ‘Definition of a Chinese syntactic indices of syntactic maturity,’ Research in the
word and its identifying criteria,’ Unpublished Teaching of English 10: 31–8.
manuscript. O’Donnell, R. C., W. J. Griffin, and
Kern, R. G. and J. M. Schultz. 1992. ‘The R. C. Norris. 1967. ‘Syntax of kindergarten
effects of composition instruction on inter- and elementary school children,’ A transform-
mediate level French students’ writing per- ational analysis. NCTE Research Report No. 8.
formance: Some preliminary findings,’ The National Council of Teachers of English.
Modern Language Journal 76: 1–13. O’Hare, F. 1973. ‘Sentence combining: improv-
Larsen-Freeman, D. 1978a. ‘An ESL index of ing student writing without formal grammar
development,’ TESOL Quarterly 12/4: 439–48. instruction,’ NCTE Research Report No. 15.
Larsen-Freeman, D. 1978b. ‘Evidence of the National Council of Teachers of English.
need for a second language acquisition index Ortega, L. 2003. ‘Syntactic complexity measures
of development’ in W. C. Ritchie (ed.): Second and their relationship to L2 proficiency: A re-
Language Acquisition Research: Issues and search synthesis of college-level L2 writing,’
Implications. Academic Press. Applied Linguistics 24: 492–518.
Larsen-Freeman, D. 1983. ‘Assessing global Pan, W. 2002. Zi benwei yu hanyu yanjiu
second language proficiency’ in H. W. Seliger (Character as a Basic Unit and Chinese Studies).
and M. H. Long (eds): Classroom Oriented East China Normal University Press.
Research in Second Language Acquisition. Pan, W., P. Yip, and Y. Han. 1993. Research on
Newbury House Publishers, Inc., pp. 287–304. Chinese Word Formation. Student Book Co. Ltd.
Larsen-Freeman, D. and V. Strom. 1977. ‘The Polio, C. 1997. ‘Measures of linguistic accuracy
construction of a second language acquisition in second language writing research,’ Language
index of development,’ Language Learning 27/1: Learning 47/1: 101–43.
123–34. Scott, M. and G. Tucker. 1974. ‘Error analysis
Loban, W. 1976. ‘Language development: and English-language strategies of Arab stu-
Kindergarten through grade twelve,’ Research dents,’ Language Learning 24/1: 69–97.
Report No. 18. National Council of Teachers of Taguchi, N. 2007. ‘Chunk learning and the de-
English. velopment of spoken discourse in a Japanese
Lü, S. and D. Zhu. 2002. Yufa Xiuci Jianghua as a foreign language classroom,’ Language
(Remarks on Grammar and Rhetoric). Liaoning Teaching Research 11/4: 433–57.
Education Press. Thornhill, D. E. 1969. ‘A quantitative analysis
Mellon, J. C. 1969. ‘Transformational sentence- of the development of syntactical fluency of
combining: A method for enhancing the four young adult Spanish speakers learning
development of syntactical fluency in English,’. Unpublished doctoral dissertation,
English composition,’ NCTE Research Report The Florida State University.
No. 10. National Council of Teachers of Vavra, E. 2000. ‘Definitions of the ‘‘T-unit’’.
English. Dr. Ed Vavra’s KISS Approach to Sentence
24 MEASUREMENTS OF DEVELOPMENT IN L2

Structure,’ available at http://nweb.pct.edu/ Wolfe-Quintero, K., S. Inagaki, and H. Kim.


homepage/staff/evavra/ED498/Essay009_Def_ 1998. Second Language Development in Writing:
TUnit.htm. Accessed 22 June (2004). Measures of Fluency, Accuracy, & Complexity.
Vihman, M. 1982. ‘Formulas in first and second Second Language Teaching & Curriculum
language acquisition’ in L. Obler and L. Menn Center, University of Hawaii at Manoa.
(eds): Exceptional Language and Linguistics. Wong-Fillmore, L. 1976. ‘The second time
Addison Wesley Longman. around,’. Unpublished doctoral dissertation,
Weinert, R. 1994. ‘Some effects of a foreign lan- Stanford University.
guage classroom on the development of German Zhu, D. 1982. Yufa Jiangyi (Lecture Series on
negation,’ Applied Linguistics 15/1: 76–101. Grammar). Commerce Press.
Weinert, R. 1995. ‘Formulaic language in SLA:
A review,’ Applied Linguistics 16/2: 180–205.

Downloaded from http://applij.oxfordjournals.org/ at University of Ottawa on May 9, 2014

You might also like