Learner Corpora and Computer-Aided Error Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Learner Corpora and Computer-aided Error Analysis

Myung-Jeong Ha

Learner Corpora and Computer-aided Error Analysis1


Myung-Jeong Ha
Department of English Language and Literature, Sangmyung University, Korea
[email protected]

Abstract
This study focuses on an alternative approach to looking at learner language. In particular, the
present study investigates the use of English verb-noun collocations in the writing of native speakers of
Korean at intermediate levels. For this purpose, a learner corpus was compiled that consists of about
19,826 words of comparison and descriptive essays. For comparison purposes, the study employed
LOCNESS, a corpus of young adult native speakers of English. The main body of the current study into
verb collocational usage compares a sample of the American English sub-component of LOCNESS
with the essay writing of Korean EFL university students. The most frequently occurring common verbs
in the learner corpus are retrieved and concordances for them are created, and verb-noun collocations
are extracted. Subsequently, two types of comparisons are performed: Korean EFL learners were
compared with native speakers on the variety of common verb collocation use and they are also
compared on the correctness of collocations. The data reveals that the Korean EFL learners produced
far fewer collocations than native speakers, and those errors, particularly interlingual ones, continued
to persist at intermediate levels of proficiency.

Keywords: Computer learner corpora, Computer-aided error analysis, Learner Language

1. Introduction
Recent developments in this field of small corpus studies, largely brought about by the personal
computer, have yielded remarkable insights into the nature and use of real language. Computer
corpora have played a main role in language related fields, from lexicography to language teaching
through natural language processing. While the use of corpus has spread to the English for academic
purposes (EAP) field, EAP researchers and material writers mainly rely on native corpora. In fact,
learner corpora containing data produced by second language (L2) learners are rarely examined in
spite of their tremendous potential for EAP studies. Although L2 learners share a lot of difficulties
with novice native writers, they have their own distinctive problems. The aim of this paper is to show
the usefulness of analyzing learner corpus as an effective way of operationalizing writing difficulties.
Following Gaëtanelle, Granger, and Paquot [1], it was fairly difficult to form a picture of learner EAP
writing, but the analysis of learner corpus has the potential to offer a breakthrough since researchers
are now allowed to use large databases of learner corpus and powerful methods of analysis.
Indeed, recent developments in corpus technology have heightened the need for exploring corpus of
learner language. This study employs error-oriented approaches to learner corpora that are different
from EA (Error Analysis) studies because the approaches are computer-aided and involve a higher
degree of standardization. Although it is generally accepted that collocations are indispensable and
problematic for foreign language learners and they therefore should play an important role in second
language acquisition (SLA), L2 learners’ difficulties with collocations have not been discussed in
detail by EFL practitioners so far [2]. Collocations have been largely neglected by researchers,
material designers and EFL practitioners. In the field of computer-assisted language learning with
new corpus technology becoming available, the need for more studies on collocations is obvious.
Following the trend towards collocational competence in second language learning, the present study
investigates the use of English verb collocations in the writing of native speakers of Korean. Since
restricted collocations are frequently said to be an area where L2 learners have greater difficulties but
remains neglected, the use of restricted collocations in a learner corpus are mainly investigated. In
spite of some suggestions made on the teaching of collocations in recent years [3], it is not clear how
and which of collocations in a second language should be taught. To provide some answers to these

1
This research was supported by a 2013 Research Grant from Sangmyung University.

International Journal of Digital Content Technology and its Applications(JDCTA) 78


Volume 7, Number 12, August 2013
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

questions, it is important to examine the difficulties that L2 learners have in using collocations. Since
the production of collocations is more problematic than the comprehension, the present study focuses
on collocational problems in L2 learners’ writing in order to identify the difficulties they have. The
purpose of this study is to explore aspects of how high-frequency common verbs in a learner corpus
form collocations with other words. Whereas high frequency makes a word familiar to learners, there
is always a gap between receptive and productive vocabulary. Common verbs such as make and have
attracted much attention from proponents of the lexical approach to second language teaching [2].
Given that common verbs serve various grammatical functions and their meaning is tailored by the
company they keep, this study assumes that these common verbs may not be as easy as to learn as
commonly believed. These verbs are worthy of investigation because they are not generally examined
in L2 vocabulary learning. An investigation into L2 learners knowledge of common verbs would shed
light on the importance of common verbs in L2 writing. Also this study contributes to the growing
literature that draws on Contrastive Interlanguage Analysis (CIA) (Granger 1996) by comparing the
use of verb + noun combinations between native speaker corpora and nonnative speaker corpora.

2. Types of word Combinations and computer learner corpora


Corpus linguists define word combinations on the basis of frequency and co-occurrence relations [4].
Cowie [5] define them based on semantic and syntactic criteria. It is well known that lexical and
grammatical word combinations have graded idiomaticity, contributing to the notion of continua. At
one end of the continua of idiomaticity are free combinations and at the other end of are idioms making
up a scale of idiomaticity.
Collocations are placed on the continuum between free combinations and idioms. They are fairly
frequent, complex phrases, whose elements have a strong relationship. A distinction is made between
combinations in which a possible restriction on the substitutability of elements. Three types of word
combinations can be classified in terms of the notion of restricted sense (e.g. [2][5][6]): (a) free
combinations which include the verb and the noun are used can be freely combined (e.g. find an
entrance); (b) collocations among which the sense of the verb is restricted and can be combined with
certain nouns (make a suggestion but *make a determination); (c) idioms in which the verb and the
noun are used with a restriction and the substitution is not acceptable/allowed (e.g. make up for).
In addition, collocations can be divided into two categories according to the word class of their
constituents [5]: lexical collocations and grammatical collocations. While lexical collocations combine
two open class words such as verb and noun, or adjective and noun, grammatical collocations combine
an open class word and one closed class word such as verb and prepositions or a grammatical structure
such as infinitive or clause. Under the present study, both lexical collocations and grammatical
collocations are scrutinized.
Computer learner corpora have been in existence for a relatively short time. Computer learner
corpora are electronic collections of spoken or written texts produced by foreign or second language
learners [7]. Learner corpora are seen as a new resource for foreign language teaching researchers and
educators [8-10]. Although learner corpus compilation is a new activity, a number of learner corpora
already exist. Learner here refers to somebody learning a foreign language or to a foreigner learning
the language in a country where it is spoken natively. Because it is characterized by a high rate of
misuse such as lexical and grammatical errors, it is hoped that learner corpora will provide researchers
with essential sources of controlled computerized data which can be analyzed with computer-aided
error analysis. In particular, studies informed by learner corpora have revealed some of the main
problems that EFL/ESL learners have when writing academic essays: errors involving the collocational
patterning of words and phraseological infelicities.
The learner corpus used in the study is a Korean EFL learner corpus of writing essays. Given that
learner corpus analysis has proved a useful tool to reveal a number of distinctive features of L2 writing
discourse, the use of learner corpora will provide relevant information on the difficulty of particular
collocational features from the learners' perspective, which can help to identify features which teaching
should emphasize and to evaluate their difficulty. In addition, many studies of collocation have been
corpus-informed while exploiting the potential of a corpus to identify linguistic categories.

3. Contrastive interlanguage analysis

79
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

One of the primary contributions of computer learner corpora is that they make it possible to
explore aspects of learner language which have been difficult to investigate. These investigations are
comparative. Given that authentic learner data is compared with native speaker data, the process falls
within the domain of Contrastive Interlanguage Analysis (CIA) [11]. This study employs CIA, which is
different from contrastive analysis. Whereas contrastive analysis involves the linguistic comparison of
more than two languages, CIA involves varieties of the same language.
According to Granger [11], CIA concerns “quantitative and qualitative comparisons between L1 and
L2 and between different varieties of interlanguage.” There are different types of comparison at the
heart of studies using CIA: comparing native language and interlanguage and comparing different
types of interlanguage with each other.
The aim of the first type of comparison is to explore distinctive features of a particular
interlanguage. This type of comparison makes it possible to examine the phenomenon of overuse and
underuse of linguistic items. This approach has the potential to reveal different distributional patterns
from comparable native language. These patterns can explain why a text including no overt lexical
errors gives the impression that it has not been written by a native speaker. Overuse and underuse are
intended as neutral, quantitative measures of linguistic differences [12]. This method can identify
between L1-related and universal features of learner language and have a picture of advanced
interlanguage and of the role of the L1 transfer for the different L1 backgrounds.
The studies of overuse and underuse widen the scope of traditional error analysis because it is
difficult to identify overuse and underuse other than computational methods. Computational methods
reflect areas where learner language differs from native language with respect to frequency of
distribution.
The second type of comparison includes the comparison of different non-native speaker (NNS)
varieties. NNS-NNS comparisons make it possible to observe strategies used by all learners or by
several learner groups. In addition, these comparisons are easily facilitated by the design of the sub-
corpora of International Corpus of Learner English (ICLE) with the control of relevant variables [13].
For example, a comparison of French learners’ use of indeed with that of some other learner groups
such as Norwegians reveals that the French learners overuse indeed in contrast to Norwegians who
underuse it [14]. Similarly, Norwegians are known to overuse kind of almost as much as their Swedish
neighbors with 44.8 frequencies per 100,000 words.
It is important to note that CIA is not restricted to ICLE or to Written English only. Rather other
researchers have adopted this method in analyzing interlanguage corpora of German, Italian, and
Norwegian. Also spoken learner discourse is being analyzed by the Louvain International Database of
Spoken English Interlanguage (LINDSEI). LINDSEI that comprises different L1 backgrounds and an
NS reference corpus is a spoken counterpart of ICLE.

4. Methods

4.1. Corpora used in the study

In the methods which follow, the terms NNS and NS are used to refer to the non-native speaker and
native speaker groups respectively. The computer learner corpus compiled for this study consists of
19,826 words of comparison essays and descriptive essays. The small corpus consists of comparison
and descriptive essays that were produced in an English writing class at a university of Korea. The 46
essays investigated were written by Korean–speaking university students of English, mainly in their
second or third year. The essays are comparison and descriptive essays, including essay titles such as
‘my favorite place’ or ‘shopping at stores and shopping online’ and have an average length of about
400 words. The corpus contains two essays per learner.
The comparative native-speaker (NS) corpus Louvain Corpus of Native English Essays
(LOCNESS) was compiled at the University of Louvain la Neuve in Belgium and comprises essays
written by young adult NSs of English. The total number of words in the corpus is 324,304. To
compare nonnative speaker use with native English use, a 170,000 word sample from LOCNESS was
used. Because LOCNESS contains argumentative essays written by native-speaker American
university students, the sample is fully comparable to the learner corpus. Table 1 presents the exact size
of the corpora used.

80
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

Table 1. NNS and NS corpora


Tokens Types TTR (type/token
No. of Essays
(total words) (distinct words) ratio)
Learner corpus 19,826 2,756 13.95 46
LOCNESS 168,290 11,248 6.72 207

4.2. Data retrieval and analysis procedure

For this study, WordSmith Tools, a user-friendly and powerful package was used to make it
possible to automate part of the linguistic analysis.
First, a frequency wordlist was generated by WordSmith Tools [15]. As mentioned earlier, I decided
to choose high-frequency common verbs and their collocations for analysis. According to Altenberg
and Granger [16], the following fifteen verbs are placed on any corpus-based list of high-frequency
verbs: have, do, know, think, get, go, say, see, come, make, take, look, give, find, and use.
Given that the literature of high-frequency verbs points to an overuse of these verbs by EFL learners
and the error-proneness of these verbs in L2 writing [17][18], I chose the node words based on these
high-frequency verbs.
Next the researcher used lemmatizing facility, which allowed for grouping all the inflectional forms
of the high-frequency verbs. There is an English lemma list from Someya at
http://www.lexically.net/downloads/BNC_wordlists/e_lemma.text. Figure 1 gives a screenshot of the
lemmatizing facility.

Figure 1. Screenshot of the Lemmatizer

In this paper, the composite set of words is viewed as lemma [4]. Figure 2 below shows an example
of lemmatized results. In this way, the verbs in the corpus were lemmatized to get the total occurrences
of each word.

Figure 2. Screenshot of Lemmatized Results

81
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

Table 2 below lists the frequencies of occurrence of the verbs used in the study. Of the fifteen verbs,
I selected the most frequent verbs from the word lists of the learner corpus - that is, verbs that occurred
more than 50 times in the corpus. In order to study the behavior of words in texts, it is necessary to
obtain enough observations for each verb. It was thus difficult to deal with verbs with relatively low
frequencies. All the necessary word-forms that are associated with the four key verbs were taken out
and counted separately.

Table 2. Frequency of the fifteen high frequency verbs in NNS corpus


Verb Frequency Percent (%)
Lemmas No. of Essays
Have 250 have[169] d[1] had[11] has[62] having[6] ‘ve[1] 109 87
Do 71 do[58] did[3] does[7] done[3] 24 52
Know 45 know[39] knew[2] known[3] knows[1] 22 48
Think 36 think[31] thinking[1] thinks[1] thought[3] 20 43
Get 39 get[32] gets[1] getting[1] got[5] 21 46
Go 105 go[87] goes[4] going[12] gone[2] 28 61
Say 20 say[13] said[5] saying[2] 9 20
See 47 see[36] saw[2] seeing[3] seen[6] 21 46
Come 15 come[11] came[1] comes[3] 7 15
Make 59 make[29]made[13] makes[12] making[5] 19 41
Take 35 take[19] taken[1] takes[13] taking[2] 13 28
Look 17 look[10] looked[1] looking[4] looks[2] 7 15
Give 27 give[16] given[2] gives[9] 10 22
Find 19 find[16] found[3] 11 24
Use 47 use[24] used[7] uses[1] using[15] 12 26

After choosing those verbs in the learner corpus, I then examined the frequencies of the verbs from
LOCNESS. The results are given in Table 3.

Table 3.Occurrences of the common verbs in NNS and NS corpora


Verb Korean LOCNESS
HAVE 250 1174
DO 71 274
GO 105 264
MAKE 59 408

The next step was to scrutinize concordance lines to weed out irrelevant instances. Modal verbs or
noun forms have been removed. I then picked up collocation errors from the KWIC (Key Word in
Context) lists with the help of KWIC index in WordSmith Tools. Figure 3 presents the concordances of
HAVE.

Figure 3. Screenshot of Concordances of HAVE

82
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

I classified them into the three categories, that is, free collocations, restricted collocations, and
idioms. On the basis of previous studies [2][19], I used the BBI dictionary of English word
combinations (BBI) and the Oxford Collocations Dictionary for Students of English (OCDE) as the
main references, supplemented by the Collins Cobuild English Dictionary (CCED). As collocation
errors were identified in the concordances, they were copied from the text and pasted into a separate
file for later analysis.

5. Results and discussion


Altogether 485 verb-object-noun combinations were extracted from the learner essays, of which 64
were classified as collocations and 293 as free combinations. Table 3 shows their distribution on the
degree of restriction. Interestingly there is no occurrence of idioms in the learner corpus.

Table 4. Overall distribution of have/do/go/make + noun combinations in NNS writing


Free combinations Collocations Idioms
HAVE 125 36 0
DO 21 15 0
GO 80 22 0
MAKE 39 19 0

First, as for the node word, have, 250 verb-object-noun combinations were extracted from the
learner essays. Among them, 125 were classified as free combinations and 36 as collocations. Korean
EFL learners have a tendency to underuse have collocations. The learner data show very a limited use
of have collocations including have an effect, have something in common, have a holiday, have friends,
etc. In contrast, NA corpus contains much wider range of have collocations such as have a sense of,
have a good idea of, have a greater effect on, have a greater respect for, etc. As Figure 4 shows, NS
corpus includes a variety of have collocations.

Figure 4. Concordances of HAVE collocations from NA corpus

Some of the noun collocates were shared because they occurred in both corpora, and some were
used exclusively by one group. While shared noun collocates for the node word have were effect,
something in common, friends, exclusive noun collocates used by the NS group appeared to have a far
greater variety, including sense, idea, respect, incidence, grudge, etc. In addition, it is interesting to
note that very few mistakes (3 out of 36) were identified with respect to the use of have collocations
from the learner corpus.
Second, in the case of the node word do, 71 verb-object-noun combinations were extracted from the
learner essays, of which 15 were classified as collocations and 21 as free combinations. Because the
learner corpus involves a very limited use of do collocations including do shopping, do their work, and

83
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

do the task, there is underuse of do collocations from NNS writing. In contrast to the case of have
collocations, more than half (9 out of 15; 60%) the do collocations produced by the Korean learners
contained one or several mistakes. As for the node word do, NS corpus contained 274 verb-object-noun
combinations. With respect to types of do collocations, certain types of do collocations were identified
in NS writing: do business, do a favour, do their jobs, etc.
Third, as for the node word, go, 105 verb-object-noun combinations were extracted from the learner
essays. Among them, 80 were appeared as free combinations and 22 as collocations. Examples of go
collocations in the NNS writing include go shopping, go on, go their own way, etc. Almost half of the
go collocations (12 out of 22) contained certain mistakes. Out of 12 mistakes, the one occurring most
frequently is the preposition of go verb such as *go to picnic (go for/on picnic), *go the store (go to the
store), and *going to there (go there). This seems to indicate that the Korean learners make frequent
errors in the use of prepositions or particles after common verbs. It may be suggested that one reason
for the EFL students' problems in learning English prepositions is that they usually try to learn the
meaning and use of prepositions individually without paying sufficient attention to their collocational
properties [26]. Therefore, knowing which prepositions or particle should follow the verbs is a crucial
part for Korean EFL learners. As for the node word go, NS corpus contained 264 verb-object-noun
combinations. With respect to types of go collocations, certain types of collocations were identified in
NS writing: go on strike, go home, go to bed, go into details, go on for months, go through, etc. This
indicates that NS writing shows a wider range of go collocations.
Finally, in the case of the node word make, 59 verb-object-noun combinations were extracted from
the learner essays, of which 19 were classified as collocations and 39 as free combinations. The learner
data show make collocations including make money, make atmosphere, make use of, make friends, etc.
In contrast, NS corpus containing 408 verb-object-noun combinations reveals much wider range of
make collocations such as make a point, make demand, make money, make assumptions, etc. Like the
use of have collocations, some of the noun collocates were shared and some were used exclusively by
the NS group. The exclusive noun collocates used by the NS group include claim, demand,
assumptions, statement, etc.
Table 5 below summarizes the overall distribution of major types of errors.

Table 5. Distribution of major types of errors


Errors n Examples

Nowadays it is so perfect to go *to picnic. For the happine …


People go *to shopping to get things …
Preposition (verb) You only have to *go the store and tell the cle…
Omission or inaccurate use of 37 … recommend for my good friend to *go this cafe.
prepositions … this boat, I had difficulty in going *to there that I vomited …
That`s why they make gimbob *from rice.
Because Kimbob is made *by rice.

Effective customer? Then, you have *a potential to be a good …


…offer some food to try before making *decision to buy it.
Determiner
These decorations make warm and cozy *atmosphere.
Omission or inaccurate use of 25
Unlike Dark Knight, he can do *flight to thy sky and suit…
articles
… bookshelf. There, some people do *personal task using notebook.
… the limelight as to do *announcement in the presence of others.

to the website and they can *have a refund or *make an exchange


Verb However, you cannot *do actions like that when you …
18
wrong choice of verb Recently, The teacher *did violence to student and …
… surrounded by many books, I can *make an intellectual curiosity.

This place can make me *having healing time.


Structure And the department store can make me *to earn some money.
7
Syntactic structure wrong … Partiality makes students *to hate each other.
And it can make you *won't be sorry.

84
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

Noun
Wrong choice of noun (or 3 My friends and I mostly do *shop online. A number of …
non-existent noun)

Number
Today many people want to have some healing *times.
Noun used in singular instead 3
…see my school campus? You will make great *memory.
of plural or vice versa

Usage 1 Now we got used to *do web surfing already.


3
Combination does not exist and they can have a refund or *make an exchange. However it

Usage 2 First, a proper friend never *go back on their friend even…
Combination exists but is not 3 we should *go for seek items and walking
used correctly However, the customers *go out on shopping at stores

First, prepositional errors showing 37 occurrences were the most salient. For example, *going to
there is a literal translation of the Korean expression “geogi-e ga-da.” It seems possible that these
results are due to the fact that second language learners tend to put to use as a hypothesis that there is a
word-for-word translation-equivalence between L1 and L2. Another possible reason is that Korean has
fewer words functioning as prepositions and they are used less frequently than in English. That is to
say, major difficulties appear to be L1 related.
Secondly, determiner errors were the second most one, showing 25 occurrences. The result may be
explained by the fact that Korean does not have articles. Moreover, the use of English articles is a
subtle and complex phenomenon, and there is no clear L2 input or formal instructions that can help
Korean learners acquire the semantics of English articles [21].
Compared to the use of determiners and prepositions of a prepositional verb, fewer mistakes (18
occurrences) were made with respect to the type of wrong choice of verb. This finding is interesting
because the verb in a collocation has a restricted sense, which makes its correct use more difficult.
Another interesting finding is that errors influenced by direct translation from L1 seem predominant in
terms of the type of wrong choice of verb. An example of a direct translation of a Korean collocation
is:
Iron Man is fond of the limelight as to *do announcement in the presence of others and enjoy the
given situation. (NNS-w2c6)

The Korean for make an announcement is seoneon-hada whose literal meaning is ‘do
announcement’. Another example is:

However, you cannot *do actions like that when you shop on the internet. (NNS-w2c6)

This is very likely to be a translation of haengdong-hada, literally ‘perform an action’. The Korean
word hada is usually translated as do in English, and there are contexts where this would be acceptable
(such as do shopping), but frequently a different word is expected as in the following example:

Unlike Dark Knight, he can *do flight to the sky and suit is a great weapon, with armor. (NNS-
w2c8)

Here do flight should be replaced with make his flight.


Third, errors influenced by lack of exposure to target language were identified too. Limited
exposure to authentic language norms and lack of feedback may be a factor in preventing learners from
making judgments about the relevant use of collocations. According to Table 3, such inaccuracies
involve word combinations that exist but are not used accurately. For example, go back on is usually

85
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

used with these nouns as the object: agreement, promise, word, and assurance, which means to break a
promise that one has made.

First, a proper friend never *go back on their friend even though the friend become a beggar. (NNS-
w2d12)

6. Conclusion
This study compared the use of common verb + noun combinations by Korean learners of English
with native speakers at university level. Its aim was to account for the quantitatively and qualitatively
different verb + noun combinations Korean learners of English produce. While the NS corpus
displayed greater variety in the use of common verb combinations, NNS corpus showed very limited
use of verb collocations. In particular, the lexical variety of the noun collocates for each common verb
led to greater differences between NNS and NS corpus.
Regarding sources of non-nativeness in the learner corpus, errors were divided into three factors:
those influenced by grammatical characteristics of the L1 (Korean), those influenced by direct
translation of the L1, and those influenced by lack of exposure to the target language. However, this
study does not suggest that each error is neatly assigned to one factor or another. Without interviewing
learners at the time of writing, judgments about collocational errors seem to be speculative. In addition,
collocational errors may show elements from more than one factor.
It is interesting to note that the Korean EFL learners made relatively fewer mistakes of the type of
wrong choice of verb, whether in the free or the restricted collocations. It seems reasonable to suppose
that the Korean learners were more conservative and cautious in the choice of a noun collocate for a
particular verb, whereas they tended to be more subjective and even creative in the choice of a
preposition collocate for a certain verb, hence generating more mistakes in this respect. The main
conclusion that can be drawn from this study is that, to a certain degree, intermediate Korean learners
still have a problem with the use of high-frequency common verb collocations. Although the students
learned high-frequency verbs very early, once they have been taught, they tend to be overlooked.
As discussed earlier, the students lacked knowledge with respect to the collocational possibilities of
verbs: there is a mismatch between lexical items as in to *do actions. Whereas previous studies have
mainly focused on the combinations of two lexical items, the present study reveals that verb
collocational errors are not entirely mismatch between the verb and the noun. Rather other types such
as prepositional errors and determiner errors were relatively more frequent among the intermediate
Korean learners. It is therefore reasonable to emphasize that the non-lexical elements belonging to
word combinations should not be overlooked.
Several limitations to this study need to be acknowledged. The sample size is relatively small
because the learner corpus used in this study was compiled from essays written by Korean learners in a
composition class. As mentioned earlier, there is some overlap with respect to sources of collocational
errors as well as types of collocational errors and these sources and types represent tendencies rather
than absolute. And this study did not employ error annotation which is particularly relevant for
interlanguage studies because POS taggers have been trained on the basis of native speaker corpora.
Nevertheless, the results have some pedagogical implications with respect to the teaching of
collocations. According to extensive literature review in second language acquisition, native standard
L2 proficiency seems an unattainable goal. Therefore, vocabulary teaching should aim at a functional
proficiency to approximate native-likeness. Because the learning of verb + noun combinations requires
a lot of input exposure, it is relevant to teach their combinatorial possibilities and restrictions explicitly.
For example, it is necessary to teach which collocate (the arbitrary part, e.g., make) is to be combined
with the base (the non-arbitrary part, e.g., announcement. The first step towards this is awareness that
collocations differ from language to language. Learners should be provided with sufficient input
including word combinations which are necessary for their needs, and the possibilities and restrictions
on collocational uses should be pointed out to them.
Whereas studies of collocations have showed convincing results for the explicit teaching of
collocation in the classroom, there still remains the issue of which collocations should be given priority
and how they should be taught. The findings in this study shed light on this issue. As previously
discussed, not all errors occurring are a mismatch between the verb and the noun that concerns the
collocational possibilities of the two lexical items in question. Other types of errors such as

86
Learner Corpora and Computer-aided Error Analysis
Myung-Jeong Ha

prepositional errors as in Kimbob is made *by rice (made of) and determiner errors as in before making
*decision (make a decision) are also frequent among the Korean EFL learners. These results suggest
that teaching grammatical collocations as well as lexical collocations for the frequent node words
would be of great benefit to Korean EFL learners.
A final suggestion related to research methods is in order. It is necessary to complement learner
production data with experimental data in order to capture both aspects of competence and
performance. To conduct a full CIA study, a researcher also needs to compare the NNS production
with comparative NS data based on the criteria of comparability.

7. References
[1] Gilquin Gaëtanelle, Sylviane Granger, Magali Paquot, “Learner corpora: The missing link in EAP
pedagogy”, Journal of English for Academic Purposes, vol. 6, no. 4, pp. 319-335, 2007.
[2] Nadja Nesselhauf, “The use of collocations by advanced learners of English and some implications
for teaching”, Applied Linguistics, vol. 24, no. 2, pp.223-242, 2003.
[3] Michael Lewis, Teaching collocation: Further developments in the Lexical Approach, Language
Teaching Publications, Hove, UK, 2000.
[4] John M. Sinclair, Corpus, concordance, collocation, Oxford University Press, Oxford, UK, 1991.
[5] Anthony Paul Cowie, “Phraseology”, In The Encyclopedia of Language and Linguistics, edited by
R.E. Asher, pp. 3168-3171, Pergamon, USA, 1994.
[6] Batia Laufer, “The development of L2 Lexis in the expression of the advanced learner”, The
Modern Language Journal, vol. 75, no. 4, pp. 440–448, 1991.
[7] Michael Lewis, “Language in the lexical approaches”, In Teaching collocation: Further
developments in the lexical approach, edited by Lewis, Michael, pp. 126-154, Language Teaching
Publications, London, UK, 2001.
[8] Victoria Hasko, “Capturing the Dynamics of Second Language Development via Learner Corpus
Research: A Very Long Engagement”, Modern Language Journal, vol. 97, S1, pp. 1-10, 2013.
[9] Thewissen, Jennifer. "Capturing L2 Accuracy Developmental Patterns: Insights From an Error-
Tagged EFL Learner Corpus", Modern Language Journal, vol. 97, S1, pp.1-25, 2013.
[10] Vyatkina, Nina. "The Development of Second Language Writing Complexity in Groups and
Individuals: A Longitudinal Learner Corpus Study", Modern Language Journal, vol. 96, no. 4, pp.
576-598, 2012.
[11] Dagneaux Estelle, Sharon Denness, Sylviane Granger, "Computer-aided error analysis", System
vol. 26, no. 2, pp. 163-174, 1998.
[12] Sylviane Granger, Learner English on computer. London: Longman, UK, 1998.
[13] Sylviane Granger, "The contribution of learner corpora to second language acquisition and foreign
language teaching." Corpora and language teaching, vol. 33, no. 13, 2009.
[14] Sylviane Granger, "Computer learner corpus research: current status and future prospects."
Language and Computers, vol. 52, no.1, pp. 123-145, 2004.
[15] Mike Scott, WordSmith Tools version 5, Lexical Analysis Software, Liverpool, UK, 2008.
[16] Bengt Altenberg, Sylviane Granger, “The grammatical and lexical patterning of MAKE in native
and non-native student writing”, Applied Linguistics, vol. 22, no. 2, pp. 173-195, 2001.
[17] Lennon, Paul. “Getting ‘easy’ verbs wrong at the advanced level”, IRAL - International Review of
Applied Linguistics in Language Teaching, vol. 34, no. 1, pp. 23-36, 1996.
[18] Howarth, Peter Andrew. Phraseology in English Academic Writing. Tubingen, Niemeyer, UK,
1996.
[19] Batia Laufer, Tina Waldman, “Verb-Noun Collocations in Second Language Writing: A Corpus
Analysis of Learners' English”, Language Learning, vol. 61, no. 2, pp. 647-672, 2011.
[20] Flowerdew, Lynne. A corpus based-analysis of referential and pragmatic errors in students’
writing. Hong Kong University of Science and Technology, China, 1999.
[21] Tania Ionin, Heejeong Ko, Kenneth Wexler, “Article semantics in L2-acquisition: the role of
specificity”, Language Acquisition, vol. 12, pp. 3-69, 2004.

87

You might also like