Seminar 7
Seminar 7
Seminar 7
Most lexical corpora today are part-of-speech-tagged (POS-tagged). However even corpus
linguists who work with 'unannotated plain text' inevitably apply some method to isolate salient
terms. In such situations annotation and abstraction are combined in a lexical search.
The advantage of publishing an annotated corpus is that other users can then perform
experiments on the corpus (through corpus managers). Linguists with other interests and
differing perspectives than the originators' can exploit this work. By sharing data, corpus linguists
are able to treat the corpus as a locus of linguistic debate and further study.
Corpora allow access to authentic data and show frequency patterns of words
and grammar construction. Such patterns can be used to improve language
materials or to directly teach students.
But, there are also two disadvantages. Only 10% of the corpus is based on spoken
language so there is not much information about it. The second disadvantage is that
a corpus will never tell you what is grammatically or syntactically wrong or
right.