Translationese Features as Indicators Of

Translationese Features as Indicators of Quality
in English-Russian Human Translation
Maria Kunilovskaya Ekaterina Lapshinova-Koltunski

University of Tyumen Saarland University
University of Wolverhampton [email protected]
[email protected]
Abstract non-translations may also reflect their quality. The

possible link between translationese and transla-
We use a range of morpho-syntactic fea- tion quality has been assumed in corpus-based
tures inspired by research in register stud- translation studies ever since translationese has be-
ies (e.g. Biber, 1995; Neumann, 2013) and come one of the most attractive research topics. At
translation studies (e.g. Ilisei et al., 2010; the onset of machine learning approach to trans-
Zanettin, 2013; Kunilovskaya and Kutu- lationese detection, Baroni and Bernardini (2006)
zov, 2018) to reveal the association be- suggested using machine learning techniques to
tween translationese and human transla- develop an automatic translationese spotter to be
tion quality. Translationese is understood used in translator education. Attempts has been
as any statistical deviations of translations made to correlate translation quality and statis-
from non-translations (Baker, 1993) and tical differences between translations and non-
is assumed to affect the fluency of trans- translations in the target language (TL, Scarpa,
lations, rendering them foreign-sounding 2006) and to describe translational tendencies with
and clumsy of wording and structure. This the view of using them as translation quality as-
connection is often posited or implied in sessment tools (Rabadán et al., 2009). Generally,
the studies of translationese or transla- it seems reasonable to posit that the more rigorous
tional varieties (De Sutter et al., 2017), but the translationese effects, the stronger they signal
is rarely directly tested. Our 45 features the low quality of translation. Mostly, the presence
include frequencies of selected morpho- of translationese is assumed to affect the fluency of
logical forms and categories, some types translations, hampering their readability and giv-
of syntactic structures and relations, as ing them the distinct flavour of foreignness. While
well as several overall text measures ex- it is true that fluency is one of the traditional as-
tracted from Universal Dependencies an- pects of translation quality evaluation, along with
notation. The research corpora include pragmatic acceptability and semantic accuracy (as
English-to-Russian professional and stu- set out in Koponen, 2010; Secara, 2005, for exam-
dent translations of informational or ar- ple), it is not clear whether the features that cap-
gumentative newspaper texts and a com- ture translationese can be related to the quality in
parable corpus of non-translated Russian. human translation evaluation. Therefore, we test
Our results indicate lack of direct associa- whether linguistic features responsible for transla-
tion between translationese and quality in tionese effects are also good indicators of human
our data: while our features distinguish translation quality as perceived by human experts
translations and non-translations with the in real-life educational environment. To the best of
near perfect accuracy, the performance of our knowledge, the direct application of automati-
the same algorithm on the quality classes cally retrieved translationese features for learning
barely exceeds the chance level. human translation quality has not been attempted
before. If successful, this application could be
1 Introduction: Aim and Motivation
useful for a number of translation technologies,
In the present paper, we test if the linguistic speci- especially those involving automatic quality as-
ficity of translations that makes them distinct from sessment of both human and machine translation
47
Proceedings of the 2nd Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT 2019), pages 48–57
Varna, Bulgaria, September 5 - 6, 2019.
(MT). (as our features are automatically retrieved from a
We select a range of lexico-grammatical fea- corpus) to the area of translation studies and trans-
tures that have originated in register stud- lation technologies.
ies (Biber, 1995; Neumann, 2013) and are known The remainder of the paper is structured as fol-
to capture translationese, i.e. to reflect the sys- lows: In Section 2, we report on the related studies
temic differences between translated and non- and the theoretical background of the paper. Sec-
translated texts (see, for example Evert and Neu- tion 3 provides details on our methodology and the
mann, 2017, where they use a similar set to resources used. In Section 4 we explore the ability
register features to reveal asymmetry in transla- of our features to distinguish between (1) trans-
tionese effects for different translation directions lated and non-translated texts (2) good and bad
in English-German language pair). Importantly, translations. We report results in terms of accuracy
our features are designed as immediately linguisti- and f-score, and provide a feature analysis. And fi-
cally interpretable as opposed to surface features, nally, in Section 5, we conclude and describe the
such as n-grams and part-of-speech frequencies future work.
commonly used in machine translation evalua-
tion, and include manually-checked frequencies of 2 Related Work and Theoretical
less easily extractable linguistic phenomena such Background
as correlative constructions, nominalisations, by- 2.1 Specificity of Translations
passives, nouns/ proper names in the function of
core verbal arguments, modal predicates, mean Our analyses are based on the studies showing that
dependency distance, etc., along with the more tra- translations tend to share a set of lexical, syntac-
ditional and easily-extractable features like lexi- tic and/ or textual features (e.g. Gellerstam, 1986;
cal density, frequency of selected parts-of-speech Baker, 1995; Teich, 2003). The choice and num-
(e.g. subordinating conjunctions and possessive ber of features investigated in translationese stud-
pronouns). ies varies. Corpas Pastor et al. (2008) and Ili-
sei (2012) use about 20 features to demonstrate
These features are believed to reflect language
translationese effects in professional and student
conventions of the source and target languages
translations from English to Spanish. They used
(English and Russian in our data) as well as po-
supervised machine learning techniques to distin-
tential ‘translationese-prone’ areas.
guish between translated and non-translated texts
We represent English and Russian texts in this language pair. The authors use two different
as feature vectors and use these representa- groups of features – those that grasp general char-
tions to automatically learn differences be- acteristics of texts, e.g. distributions of grammati-
tween translations/non-translations and high- cal words, different part-of-speech classes and the
scoring/low-scoring translations. Assuming that proportion of grammatical words to lexical words,
a shift in the translations linguistic properties and those that reflect simplification effect (the ten-
(away from the target language norm manifested dency of translations to be less complex than non-
in non-translations) may be related to the trans- translated texts), such as average sentence length,
lation quality, we use classification techniques to sentence depth as the parse tree depth, proportion
automatically distinguish between good and bad of simple sentences and lexical richness. Our fea-
translations. However, we are not only interested ture set is inspired by the research reported in Ev-
in the performance of classifiers, but also in ert and Neumann (2017). They adopted 27 fea-
identifying discriminative linguistic features tures from the feature set developed for the con-
specific either for good or bad translations. trastive study in English-German register variation
We believe that the findings of this study will in Neumann (2013) and effectively applied it to
contribute to both translation studies and translator the study of translationese effects. This research
training. On the one hand, the knowledge about shows a remarkable similarity between the register
differences between good and bad translations is features and translationese features: the two sets
important from a didactic point of view, as it de- have a big area of intersection, including, for ex-
livers information on the potential problems of the ample, such indicators as sentence length, type-
novice translators. On the other hand, they provide to-token ratio, number of simple sentences, the
new insights and new methodological approaches distributions of some parts-of-speech and function
48
words such as conjunctions, etc. Our own feature sify translations and non-translations motivating
set (described in Section 3.2) has considerable ex- their work by the fact that automatic distinction
tensions and modifications on the one suggested between originals and machine translations was
in the works referred above. The feature selection shown to correlate with the quality of the machine
is based on the assumption that the translationese translated texts (Aharoni et al., 2014). However,
effect is immediately related to quality, and we in- their data does not contain human quality evalua-
cluded the features that are known, or expected, in- tion. Translationese as quality indicator was also
dicators of translationese, which are, incidentally, used by Rabadán et al. (2009) who claims that the
mostly lexico-grammatical features. smaller the disparity between native and translated
usage in the use of particular grammatical struc-
2.2 Translation Features and Quality tures associated with specific meanings, the higher
Estimation the translation rates for quality. De Sutter et al.
(2017) use a corpus-based statistical approach to
Automatic human translation evaluation is an measure translation quality (interpreted as target
emerging direction in Natural Language Process- language acceptability) by comparing the features
ing (NLP). For instance, Vela et al. (2014a) of translated and original texts. They believe that
and Vela et al. (2014b) used automatic metrics de- acceptability can be measured as distance to the
rived from machine translation evaluation and ap- target language conventions represented in the lin-
plied them for the evaluation of human transla- guistic behaviour of the professional translators
tions. They correlated the automatic scores with and professional writers. Their analysis is based
the human evaluations showing that these auto- on the visual estimation of the linguistic homo-
matic metrics should be used with caution. One of geneity of professional and original fiction books
the latest work in this strand of research is (Yuan that are expected to form separate clusters on the
et al., 2016). The authors use easily extractable Principal Components biplots. The acceptability
monolingual features to capture fluency and their of student translations is interpreted as the loca-
bilingual ratios as well as bilingual embeddings tion of a given translation on the plot with regard to
features to account for adequacy of content trans- these clusters. The PCA-based multivariate analy-
fer. Their models return the best predictions on sis was supported by univariate AVOVA tests. The
the embedding features for both fluency and accu- features that were used in this research include a
racy. The advantage of using other features such 25 language-independent (overwhelmingly, sim-
as part-of-speech and dependency frequencies is ple frequencies of parts-of-speech, types, tokens,
in their interpretability: the best-performing fea- n-grams, as well as sentence length, TTR, hapax)
tures selected in their experiments helped the au- and 5 language dependent features. The differ-
thors to determine grammatical features that are ences observed between professional and student
likely to be responsible for lower translation qual- translations are not clear-cut and “only seven fea-
ity scores. They show that human translations typ- tures (out of 30) exhibit a significant difference
ically contain errors beyond the lexical level, to between students and professionals” in their first
which proximity-based MT evaluation metrics are case study, for example. Their data does not con-
less sensitive. tain manual quality evaluation and it remains un-
The only study that make use of genre fea- clear how selected linguistic features relate exactly
tures for quality analysis is (Lapshinova-Koltunski to translation quality. This work is particularly
and Vela, 2015). However, the authors compare relevant to us, because it is explicitly bringing to-
English-German translation (both human and ma- gether translational quality and professionalism.
chine) with non-translated German texts that, as
the authors claim, represent target language qual- 2.3 Translation Competence
ity conventions. Their main aim is to show that
the usage of translation corpora in machine trans- A few other works, like the last one commented
lation should be treated with caution, as human above, attempted to capture the specificity of the
translations do not necessarily correspond to the two translational varieties – the professional and
quality standards that non-translated texts have. the student translations. If professionalism in
Rubino et al. (2016) use features derived from translation could be reliably linked to the linguis-
machine translation quality estimation to clas- tic properties of translations, (probably, the ones
49
associated with translationese), then professional fit’ (Chesterman, 2004). They were sampled on
translations could be used to work around the the frame limiting the extracted texts to the type
scarcity and unreliability of the data annotated for ‘article’, intended for the large adult non-specialist
translation quality. However, there is hardly any readership, created after 2003 and marked as neu-
work that has successfully completed this chal- tral of style. For our quality-related analysis, we
lenging task: professional and learners’ transla- use the total of 438 student translations from En-
tions prove to be difficult to classify. Further glish into Russian labeled for quality in real-life
product-oriented analyses of professional and stu- translation competitions, exam or routine class-
dent translations that do not exclusively focus on work settings. All translations were evaluated
the analysis of errors include works by Nakamura by the translation experts (either university teach-
(2007); Bayer-Hohenwarter (2010); Kunilovskaya ers of translation and/or professional translators),
et al. (2018). The idea to link the level of pro- who were asked to rank several translations of the
fessional expertise and the performance of a trans- same source text. Though each translation com-
lationese classifier was put to the test in Rubino petition and each institution, where translations
et al. (2016). They used a range of features to were graded, had their own descriptions of qual-
analyse German translations of the two types and ity requirements, they were not limiting transla-
non-translated comparable texts in German. Their tion quality to a specific aspect. For the pur-
feature set included features inspired by MT qual- poses of this research, we relied on the overall
ity estimation (13 surface features such as num- agreed judgment of the jury or exam board. For
ber of upper-cased letters, and over 700 surprisal the purposes of this research, we use only 1–3 top
and distortion features that were “obtained by ranking translations and/ or translations that re-
computing the negative log probability of a word ceived the highest grade and bottom translations
given its preceding context” based on regular and and/ or translations that received the lowest grade,
backward language models). Their result for the which gives us the binary labels ‘best’ and ‘worst’.
binary professional/student translation classifica- These translations and their quality labels were ex-
tion was “barely above the 50% baseline” demon- tracted from RusLTC (Kutuzov and Kunilovskaya,
strating that the MT evaluation features were not 2014), a collection of quality-annotated learner
helpful for that task. In a similar attempt, Ku- translator texts, available online (https://www.rus-
nilovskaya et al. (2018) used a set of 45 syntac- ltc.org). The English source texts for both profes-
tic features (mostly Universal Dependencies rela- sional and student translations were published in
tions) to achieve F1 = 0.761, which was lower that 2001-2016 by well-known English media like The
their baseline, based on part-of-speech trigrams. Guardian, The USA Today, The New York Times,
the Economist, Popular Mechanics. All corpus re-
3 Experimental Setup sources used in this research are made compara-
ble in terms of register and are newspaper infor-
3.1 Corpus Resources
mational or argumentative texts. The quantitative
For our translationese-related analysis, we use a parameters of the corpus resources used in this
corpus of Russian professional translations to En- research (based on the pre-processed and parsed
glish mass-media texts and a comparable subcor- data) are given in Table 1. We have different num-
pus of newspaper texts from the Russian National ber of student translations of the two classes (best,
Corpus (RNC, Plungian et al., 2005). Professional worst), which is also distinct from the number of
translations (‘pro’) are collected from a range of source texts, because we used several top-ranking
established electronic media, such as Nezavisi- translations and in some settings the worst trans-
maya Gazeta and InoSMI.RU or Russian editions lations were not determined (i.e. the ranking was
of global mass media such as BBC, Forbes and Na- done only for the top submissions).
tional Geographic (all publications either carry the
name of the translator or the endorsement of the Taking into account the small size of our data,
translation by the editorial board). Non-translated we paid attention to its pre-processing to reduce
Russian texts (reference corpus, ref) come from the number of tagging and sentence-splitting er-
a user-defined subcorpus of the RNC to represent rors that may have influence on the feature ex-
the expected target language norm for the selected traction. First, we normalised spelling and typo-
register, i.e. the current target language ‘textual graphic conventions used. Second, we split sen-
50
ref pro best worst (attrib), copula verbs (copula), nouns
words - 458k 49k or proper names used in the functions of core
EN
texts - 385 98 verbal argument (subject, direct or indirect
words 737k 439k 141k 61k object) to the total number of these relations
RU
texts 375 385 305 134 (nnargs);
Table 1: Basic statistics on the research corpora • nine syntactic features that have to do with
the sentence type and structure: simple
tences with the adjusted NLTK sentence tokeniser, sentences (simple), number of clauses
deleted by-lines, dates and short headlines (sen- per sentence (numcls), sentence length
tences shorter that 4 tokens, including punctua- (sentlength), negative sentences (neg),
tion) and corrected any sentence boundary errors. types of clauses – relative (relativ)
Finally, the corpora were tagged with UDpipe and pied-piped subtype (pied), correlative
1.2.0 (Straka and Straková, 2017). For each lan- constructions (correl), modal predicates
guage in this experiments we used the pre-trained (mpred), adverbial clause introduced by a
model that returned most accurate results for our pronominal ADV(whconj);
features and had the highest accuracy for Lemma, • two graph-based features: mean hierarchi-
Feats and UAS reported at the respective Univer- cal distance and mean dependency distance
sal Dependencies (UD) page among the available (mhd, mdd) (Jing and Liu, 2015);
releases. At the time of writing it is 2.2 for English
EWT, and 2.3 for Russian-SynTagRus treebank. • five list-based features for semantic types
of discourse markers (addit, advers,
3.2 Features caus, tempseq, epist) and the dis-
For our experiments, we use a set of 45 features course marker but1 (but). The approach
that include the following types: to classification roughly follows (Halliday
and Hasan, 1976; Biber et al., 1999; Fraser,
• eight morphological forms: two de- 2006). The search lists were initially pro-
grees of comparison (comp, sup), past duced independently from grammar refer-
tense and passive voice (pasttense, ence books, dictionaries of function words
longpassive, bypassive), two non- and relevant research papers and then verified
finite forms of verb (infs, pverbals), for comparability and consistency;
nominalisations (deverbals) and finite
verbs (finites); • two overall text measures of lexical density
and variety (lexdens, lexTTR).
• seven morphological categories: pronomi-
nal function words (ppron, demdets, Special effort was made to keep our feature set
possdet, indef), adverbial quantifiers cross-linguistically comparable. The rationale be-
(mquantif), coordinative and subordina- hind this decision is an attempt to reveal the most
tive conjunctions (cconj, sconj); notorious effect in translation, namely, ‘shining-
through’, the translational tendency to reproduce
• seven UD relations that are known trans-
source language patterns and frequencies rather
lationese indicators for the English-Russian
than follow the target language conventions. This
translation pair (Kunilovskaya and Kutu-
form of translationese can be established by com-
zov, 2018). These include adjectival
paring the distributions of a feature values across
clause, auxiliary, passive voice auxiliary,
three corpora: non-translations in the source lan-
clausal complement, subject of a passive
guage (SL), non-translations (or reference) in the
transformation, asyndeton, a predicative or
TL and in the translated texts in the TL. We use
clausal complement without its own sub-
several norms to make features comparable across
ject (acl, aux, aux:pass, ccomp,
different-size corpora, depending on the nature of
nsubj:pass, parataxis, xcomp).
the feature. Most of the features, including all
• three syntactic functions in addition to UD 1
If not followed by ‘also’ and not in the absolute sentence
relations: various PoS in attributive function end.
51
types of discourse markers, negative particles, pas- made by ANOVA. Besides, we use Principal Com-
sives, relative clauses, are normalised to the num- ponent Analysis (PCA) to visualise the distinc-
ber of sentences (30 features). Such features as tions between our classes, given our features.
personal, possessive pronouns and other noun sub- In the first task, we automatically distinguish
stitutes, nouns, adverbial quantifiers, determiners comparable Russian non-translations from profes-
are normalised to the running words (6 features). sional and student translations. In the second
Counts for syntactic relations are represented as task, we use the same algorithm and the same
probabilities, normalised to the number of sen- features to learn the difference between good and
tences (7 features). Some features use their own bad translations. The comparative outcome of
normalisation basis: comparative and superlative this two-step methodology indicates whether the
degrees are normalised to the total number of ad- features described in 3.2 capture translationese,
jectives and adverbs, nouns in the functions of sub- whether they correlate with the human evaluation
ject, object or indirect object are normalised to the of human translation quality, and whether there
total number of these roles in the text. is an association between the two. Moreover, we
analyse which features are most informative in the
3.3 Methodology two classification tasks and intersect the resulting
feature lists.
We extract the instances of the features from
our corpus relying on the automatically annotated 4 Results and their Interpretation
structures (parts-of-speech, dependency relations,
4.1 Translationese
etc.). The accuracy of feature extraction is there-
fore largely related to the accuracy of the auto- As seen in Figure 1 illustrating the results of PCA,
matic annotation. However, care has been taken our features are good indicators of translationese:
to filter out noise by using empirically-motivated we get very similar, consistent results on the dif-
lists of the closed sets of function words and typ- ferentiation between the non-translations in our
ical annotation errors where possible. Each text data and the two translational corpora that come
in the data is represented as a feature vector of from different sources and, in fact, represent two
measures for a range of linguistic properties as de- socio-linguistic translational varieties (student and
scribed in 3.2. professional translations).
For both tasks – (1) the analysis of the differ- These visual impressions are corroborated by
ences between translated and non-translated texts the results of the automatic classification. Table 2
and (2) the comparison of the highest-ranking and show that this feature set allows us to predict trans-
lowest-ranking translations, we model the differ- lations of any type with the accuracy of 92-94%.
ence between our binary text classes using ma-
precision recall f1-score
chine learning techniques. The experiments are
pro 0.91 0.94 0.93
arranged as text classification tasks, where we de-
ref 0.94 0.91 0.92
termine the utility of our features based on the
macro avg 0.92 0.92 0.92
performance of the classifier. For the consider-
stu 0.93 0.95 0.94
ation of space, we report the results of a Sup-
ref 0.94 0.92 0.93
port Vector Machine (SVM) algorithm with the
macro avg 0.94 0.94 0.94
default sklearn hyper parameters only. To account
for the generalization error of the classifier, we Table 2: Cross-validated classification between
cross-validate over 10 folds. The results of the translations and non-translations on the full fea-
same learner on the full feature set are compared ture set
to the results on the most informative features only
to reveal the comparative usefulness of our hand- As a sanity check measure, we ran a dummy
crafted features for each task. Below we report the classifier that randomly allocates labels with re-
results for the 15 best features selected with Re- spect to the training set’s class distribution to get
cursive Feature Elimination (RFE) method, which the expected overall accuracy of 48%. Most in-
seems preferable to the standard ANOVA-based formative features contributing to this distinction
SelectKBest, because some of our features do not (as selected by RFE wrapped around a Random
comply with the normal distribution assumption Forest algorithm) include possdet, whconj,
52
Figure 1: Student and professional vs. non-translations in Russian
relativ, correl, lexdens, lexTTR,

finites, deverbals, sconj, but,
comp, numcls, simple, nnargs,
ccomp. It is the stable best indicators of transla-
tionese: 2/3 of this list is reproducible on the both
translational collections, and the classification
results on just these features are only 3% inferior
to the whole 45-feature set.
4.2 Quality
Using the same feature set, we analyse differences
between the top-scoring and lowest-scoring trans-
lations labelled as ‘good’ and ‘bad’ in our data.
As seen from Figure 2 that plots the values for our
data points on the first two dimensions from PCA
(the x- and y-axis, respectively), the best and the
worst translations are evenly scattered in the two- Figure 2: Best vs. worst translations
dimensional space and, unlike the previous exper-
iment, no groupings are visible.
worse than that returned by the dummy classifier.
The cross-validated SVM classifier on the full
feature set for good/bad translations returns the precision recall f1-score
macro-averaged F1-measure of 0.64 (Table 3). bad 0.48 0.55 0.51
The overall accuracy of this classification is 68%. good 0.79 0.74 0.76
Interestingly, good translations can be more eas- macro avg 0.63 0.64 0.64
ily modelled than the bad ones (76% vs. 51%
respectively). This contradicts expectations from Table 3: Results for good/bad classification
the teaching practice where examiners commonly
better agree on what is a bad translation. But If we attempt the classification on the 15 best
given that bad translations are a minority class in translationese indicators established in the previ-
our classification and that the employed feature ous step of this research, we would see the overall
set performs worse than a dummy classifier which classification results deteriorate to F1=0.56, while
achieves 73% accuracy, these observations are un- the results for the minority class (‘bad’) plummet
reliable anyway. The result on the 20 RFE features to F1=0.36.
is the same as on the full feature set of 45, but Even though the classification result can hardly
53
be found reliable, we calculated the features that
statistically return the best differentiation between
the labeled classes according to ANOVA. They
include copula, finites, pasttense,
infs, relativ, lexdens, addit,
ccomp, but, sconj, nnargs, acl,
advers, ppron, sentlength. The inter-
section with the 15 top translationese indicators
is limited to the six list items: finites,
lexdens, but, relativ, nnargs,
sconj, ccomp.
One of the major motivation behind this re-
search was to reveal the existence and extent
of features responsible for one distinct form of
translationese, namely, shining-through. We visu-
alise the difference (distance) between good and
bad translations with a kernel density estimation Figure 3: Good and bad translations vs. non-
(KDE) plot provided in Figure 3. This plot demon- translations in the source and the target languages
strates how well the values learnt on one of the
PCA dimensions separate the text classes in our
experiment. In this way, we are able to observe dicting translation quality, at least for the data at
the extent of the shining through effects in our hand. We have to admit that these results do not
data: while it is clear that all translations are lo- align well with our expectations. One explanation
cated in the gap between the source and the tar- is that we relied on the morphology and syntax
get language, this form of translationese does not for capturing translationese, while the most im-
differentiate translations of different quality. If mediately perceptible lexical level remained unac-
shining through features were useful in discern- counted for. Another reason for the lack of corre-
ing bad translations (as we expected), the red line lation between the quality labels and the fluency
should have been more shifted towards the yel- (understood here as deviations from TL morpho-
low dashed line of the source language. Needless syntactic patterns) is that quality is not entirely
to say, the professional translations demonstrate a about fluency, of course. The quality labels in
similar shining through effect, which we do not il- our data must reflect semantic faithfulness and
lustrate here for brevity. pragmatic acceptability of translations as well. If
anything, our results support the original inter-
5 Conclusion pretation of translationese as inherent properties
of translations exempt from the value judgment:
In the present paper, we analyzed if morpho- translationese is not the result of poor transla-
syntactic features used in register studies and tion, but rather a statistical phenomenon: various
translationese studies are also useful for the anal- features distribute differently in originals than in
ysis of quality in translation. It is often as- translations (Gellerstam, 1986).
sumed that any differences of translations from To our knowledge, there are no further stud-
non-translations may affect the fluency of transla- ies pursuing direct application of translationese
tions. If so, automatically extracted translationese features for learning human translation quality.
features can also be used for human translation In (De Sutter et al., 2017), the authors tried to
evaluation, which saves time and effort of manual automatically assess translation quality of student
annotation for quality. translations measuring their deviation from the
We tested this on a dataset containing English- “normal” texts represented by professional trans-
Russian translations that were manually evaluated lations and non-translated texts in a target lan-
for quality. The results of our analysis show that guage. Although they were able to show that stu-
features that are good for predicting translationese, dent translations differ from both comparable orig-
i.e. separating translations from the comparable inals and professional translations, it is not clear
non-translations, are not necessarily good in pre- if these differences were encountered due to other
54
influencing factors, as their data does not contain Andrew Chesterman. 2004. Hypotheses about transla-
any manual evaluation. Besides that, they were tion universals. Claims, Changes and Challenges in
Translation Studies, pages 1–14.
not able to find out why certain linguistic features
were indicators of deviant student translation be- Gloria Corpas Pastor, Ruslan Mitkov, Naveed Afzal,
haviour in a given setting. and Lisette Garcia-Moya. 2008. Translation univer-
Similarly, we show that translationese, at least sals: do they exist? a corpus-based and nlp approach
the features used in our analysis, are not neces- to convergence. In Proceedings of the LREC-2008
Workshop on Building and Using Comparable Cor-
sarily good indicators of translation quality. We pora, pages 1–7.
believe that these results provide valuable insights
for both translation studies and translation tech- Gert De Sutter, Bert Cappelle, Orphée De Clercq, Rudy
nologies, especially those involving quality esti- Loock, and Koen Plevoets. 2017. Towards a corpus-
based, statistical approach to translation quality:
mation issues.
Measuring and visualizing linguistic deviance in stu-
dent translations. Linguistica Antverpiensia, New
Acknowledgments Series–Themes in Translation Studies, 16.
This work is mostly produced in the University of Stefan Evert and Stella Neumann. 2017. The impact of
Tyumen and is supported in part by a grant (Ref- translation direction on characteristics of translated
erence No. 17-06-00107) from the Russian Foun- texts : A multivariate analysis for English and Ger-
dation for Basic Research. man. Empirical Translation Studies: New Method-
ological and Theoretical Traditions, 300:47.
Bruce Fraser. 2006. Towards a Theory of Discourse

References Markers. Approaches to discourse particles, 1:189–
Roee Aharoni, Moshe Koppel, and Yoav Goldberg. 204.
2014. Automatic detection of machine translated
text and translation quality estimation. In Proceed- Martin Gellerstam. 1986. Translationese in Swedish
ings of ACL, pages 289–295. novels translated from English. In L. Wollin and
H. Lindquist, editors, Translation Studies in Scan-
Mona Baker. 1993. Corpus linguistics and translation dinavia, pages 88–95. CWK Gleerup, Lund.
studies: Implications and applications. In G. Francis
Baker M. and E. Tognini-Bonelli, editors, Text and M A K Halliday and Ruqaiya Hasan. 1976. Cohesion
Technology: in Honour of John Sinclair, pages 233– in English. Equinox.
250. Benjamins, Amsterdam.
Iustina Ilisei. 2012. A machine learning approach to
Mona Baker. 1995. Corpora in translation studies: An the identification of translational language: an in-
overview and some suggestions for future research. quiry into translationese. Doctoral thesis, Univer-
Target, 7(2):223–243. sity of Wolverhampton.
Marco Baroni and Silvia Bernardini. 2006. A new Iustina Ilisei, Diana Inkpen, Gloria Corpas Pastor,
approach to the study of translationese: Machine- and Ruslan Mitkov. 2010. Identification of trans-
learning the difference between original and translationese: a supervised learning approach. In Pro-
lated text. Literary and Linguistic Computing, ceedings of CICLing-2010, volume 6008 of LNCS,
21(3):259–274. pages 503–511, Springer, Heidelberg.
Gerrit Bayer-Hohenwarter. 2010. Comparing transla-
tional creativity scores of students and profession- Yingqi Jing and Haitao Liu. 2015. Mean Hierarchical
als: flexible problem-solving and/or fluent routine Distance Augmenting Mean Dependency Distance.
behaviour? In S. Göpferich, F. Alves, and I. Mees, In Proceedings of the Third International Confer-
editors, New Approaches in Translation Process Re- ence on Dependency Linguistics (Depling 2015),
search, Copenhagen studies in language, pages 83– pages 161–170.
111. Samfundslitteratur.
Maarit Koponen. 2010. Assessing Machine Transla-
Douglas Biber. 1995. Dimensions of Register Varia- tion Quality with Error Analysis. In Electronic pro-
tion: A Cross-Linguistic Comparison. Cambridge ceedings of the VIII KäTu symposium on translation
University Press. and interpreting studies, volume 4, pages 1–12.
Douglas Biber, Susan Conrad, Edward Finegan, Stig Maria Kunilovskaya and Andrey Kutuzov. 2018. Uni-
Johansson, Geoffrey Leech, Susan Conrad, Edward versal Dependencies-based syntactic features in de-
Finegan, and Randolph Quirk. 1999. Longman tecting human translation varieties. Proceedings
grammar of spoken and written English, volume 2. ofthe 16th International Workshop on Treebanks and
MIT Press. Linguistic Theories (TLT16), pages 27–36.
55
Maria Kunilovskaya, Natalia Morgoun, and Alexey Elke Teich. 2003. Cross-Linguistic Variation in Sys-
Pariy. 2018. Learner vs. professional translations tem and Text. A Methodology for the Investigation
into Russian: Lexical profiles. Translation & Inter- of Translations and Comparable Texts. Mouton de
preting, 10. Gruyter, Berlin.
Andrey Kutuzov and Maria Kunilovskaya. 2014. Rus- Mihaela Vela, Anne-Kathrin Schumann, and Andrea
sian learner translator corpus. In Petr Sojka, Aleš Wurm. 2014a. Beyond Linguistic Equivalence.
Horák, Ivan Kopeček, and Karel Pala, editors, Text, An Empirical Study of Translation Evaluation in a
Speech and Dialogue, volume 8655 of Lecture Notes Translation Learner Corpus. In Proceedings of the
in Computer Science, pages 315–323. Springer In- EACL 2014 Workshop on Humans and Computer-
ternational Publishing. assisted Translation, pages 47–56, Gothenburg,
Sweden. Association for Computational Linguistics.
Ekaterina Lapshinova-Koltunski and Mihaela Vela.
2015. Measuring ’registerness’ in human and ma- Mihaela Vela, Anne-Kathrin Schumann, and Andrea
chine translation: A text classification approach. In Wurm. 2014b. Human Translation Evaluation and
Proceedings of the Second Workshop on Discourse its Coverage by Automatic Scores. In Proceed-
in Machine Translation, pages 122–131, Lisbon, ings of MTE Workshop at LREC 2014, Reykjavik,
Portugal. Association for Computational Linguis- Iceland. European Language Resources Association
tics. (ELRA).
Sachiko Nakamura. 2007. Comparison of features of Yu Yuan, Serge Sharoff, and Bogdan Babych. 2016.
texts translated by professional and learner transla- MoBiL: A Hybrid Feature Set for Automatic Human
tors. In Proceedings of the 4th Corpus Linguistics Translation Quality Assessment. In Proceedings of
conference, University of Birmingham. the Tenth International Conference on Language Re-
sources and Evaluation (LREC 2016).
Stella Neumann. 2013. Contrastive register variation.
A quantitative approach to the comparison of En- Frederico Zanettin. 2013. Corpus Methods for De-
glish and German. Mouton de Gruyter, Berlin, scriptive Translation Studies. Procedia - Social and
Boston. Behavioral Sciences, 95:20–32.
Vladimir Plungian, Tatyana Reznikova, and Dmitri
Sitchinava. 2005. Russian National Corpus: Gen-
eral description [Nacional’nyj korpus russkogo
jazyka: obshhaja harakteristika]. Scientific and
technical information. Series 2: Information pro-
cesses and systems, 3:9–13.
Rosa Rabadán, Belén Labrador, and Noelia Ramón.
2009. Corpus-based contrastive analysis and trans-
lation universals A tool for translation quality as-
sessment. Babel, 55(4):303–328.
Raphael Rubino, Ekaterina Lapshinova-Koltunski, and
Josef van Genabith. 2016. Information density and
quality estimation features as translationese indica-
tors for human translation classification. In Pro-
ceedings of NAACL HT 2006, pages 960–970, San
Diego, California.
Federica Scarpa. 2006. Corpus-based quality-
assessment of specialist translation: A study using
parallel and comparable corpora in English and Ital-
ian. In Maurizio Gotti and Susan Šarčevic, editors,
Insights into specialized translation, volume 46 of
Linguistic Insights / Studies in Language and Com-
munication, pages 155–172. Peter Lang, Bern.
Alina Secara. 2005. Translation Evaluation - a State
of the Art Survey. Proceedings of the eCoL-
oRe/MeLLANGE Workshop, Leeds, pages 39–44.
Milan Straka and Jana Straková. 2017. Tokenizing,
POS Tagging, Lemmatizing and Parsing UD 2.0
with UDPipe. In Proceedings of the CoNLL 2017
Shared Task: Multilingual Parsing from Raw Text to
Universal Dependencies, pages 88–99.
56

Translationese Features as Indicators Of

Uploaded by

Copyright:

Available Formats

Translationese Features as Indicators Of

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Translationese Features as Indicators Of

Uploaded by

Copyright:

Available Formats

Translationese Features as Indicators of Quality

in English-Russian Human Translation

Maria Kunilovskaya Ekaterina Lapshinova-Koltunski

Abstract non-translations may also reflect their quality. The

relativ, correl, lexdens, lexTTR,

Bruce Fraser. 2006. Towards a Theory of Discourse

You might also like