Is Post-Editing Really Faster Than Human Translation?: Silvia Terribile University of Manchester
Is Post-Editing Really Faster Than Human Translation?: Silvia Terribile University of Manchester
Is Post-Editing Really Faster Than Human Translation?: Silvia Terribile University of Manchester
Silvia Terribile
University of Manchester
Abstract
Time efficiency is paramount for the localisation industry, which demands ever-faster
turnaround times. However, translation speed is largely underresearched, and there is a lack
of clarity about how language service providers (LSPs) can evaluate the performance of their
post-editing (PE) and human translation (HT) services. This study constitutes the first large-
scale investigation of translation and revision speed in HT and in the PE of neural machine
translation, based on real-world data from an LSP. It uses an exploratory data analysis
approach to investigate data for 90 million words translated by 879 linguists across 11
language pairs, over 2.5 years. The results of this research indicate that (a) PE is usually but
not always faster than HT; (b) average speed values may be misleading; (c) translation speed
is highly variable; and (d) edit distance cannot be used as a proxy for post-editing productivity,
because it does not correlate strongly with speed.
1. Introduction
Despite the pressure for language service providers (LSPs) to work at an ever-faster pace,
human translation (HT) speed is largely underresearched, and previous studies on post-
editing (PE) speed have mostly focused on evaluating machine translation (MT) systems’
performance, rather than the efficiency of PE itself (e.g. Läubli et al. 2019). This lack of
research has severely affected the localisation industry’s ability to adequately evaluate
translation speed. Although numerous publications have reported significant time-savings
through PE compared to HT (e.g. Kosmaczewska and Train 2019; Sánchez-Gijón, Moorkens
and Way 2019), there are no widely accepted standards of how many words per hour (WPH)
can typically be translated through these services. Previous research has usually analysed
1
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
very limited data, and/or data collected under experimental conditions. There is large variability
in the average translation speed values reported in different studies, and revision speed has
largely been overlooked. Furthermore, some LSPs use edit distance – which measures
technical effort, by calculating the number of characters that linguists edit to transform an MT
output into a high-quality translation – to evaluate the productivity gains obtained through post-
editing, and/or to determine linguists’ remuneration (Albarino 2019; ELIA et al. 2022).
However, previous studies have reported conflicting results about the correlation between
technical and temporal PE effort (e.g. Cumbreño and Aranberri 2021; Moorkens et al. 2015).
2. Related work
Professional translation blogs have widely discussed speed at the translation stage of HT. For
example, according to PacTranz (2021), “[t]ranslator speeds and output vary enormously –
anywhere from 200 to 500 words an hour”, averaging at 300 WPH. Virino (2022) suggests that
the same linguist may be able to translate from 150 to 600 WPH depending on the ‘complexity’
of the source and other contextual variables. To my knowledge, no professional publications
have discussed PE speed, except for one forum discussion on ProZ.com (2018). This is
surprising because, according to data from Memsource (2020), “post-editing has become the
dominant translation method” in 2020 and an ever-growing number of LSPs, such as
1TranslateMedia was acquired by another LSP in 2022. This article refers to ‘TranslateMedia’, because the data
analysed were created before the acquisition.
2
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
TranslateMedia, are using full post-editing – i.e. PE that aims to deliver a final translation of
the highest possible quality – as an alternative to human translation.
Scholars have widely discussed speed at the first round of both HT and PE, presenting
very diverse average values. For instance, in an experiment on a novel translation, Toral,
Wieling and Way (2018) reported an average of 503 WPH for HT, and 685 WPH for NMT post-
editing. In the banking and finance domain, Läubli et al. (2019) recorded average values of
585 WPH in HT and 934 WPH in NMT post-editing in the German-to-French language pair,
and 453 WPH in HT and 495 WPH in NMT post-editing in German to Italian. Numerous studies
(e.g. Moorkens et al. 2015; Parra Escartín and Arcedillo 2015; Toral, Wieling and Way 2018)
found large variance in speed among different linguists, due to their individual skills and ways
of working. Macken, Prou and Tezcan (2020) compared the variance among the speed
achieved by 20 linguists from the Directorate-General for Translation of the European
Commission, reporting higher variability in PE (average: 410-910 WPH), than in HT (average:
450-720 WPH).
To measure PE effort and evaluate any gains compared to HT, much scholarship has
considered three dimensions, defined in Krings’ (2001) seminal work as temporal, technical,
and cognitive effort. Temporal effort represents the time needed for post-editing (ibidem).
LSPs typically consider time in relation to throughput, thus focusing on speed, expressed as
3
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
the words per hour ratio. Technical effort involves all the operations needed to practically carry
out PE, namely “deletion, insertion, reordering, or a combination of these operations” (ibid.,
179). Among other methods, technical effort is frequently measured through edit distance.
Finally, cognitive effort is defined as “the type and extent of those cognitive processes”
activated during post-editing (ibid., 179). It is the most challenging to measure, because only
indirect measurements are possible (ibidem).
Despite the complexity of measuring PE effort, some LSPs use edit distance as a key
proxy for post-editing productivity, and/or to determine linguists’ remuneration (Albarino 2019;
ELIA et al. 2022). However, cognitive effort is at times only weakly correlated to temporal and
technical effort (Krings 2001; Moorkens et al. 2015). If we focus on the types of effort that are
the most feasible for LSPs to consider, i.e. temporal and technical effort, to my knowledge,
only four small-scale studies have investigated their interrelation. In the post-editing of
statistical machine translation (SMT), O’Brien (2011, 19) found that speed and edit distance
“correlate well”, and Moorkens et al. (2015) identified strong correlations between these
variables. Macken, Prou and Tezcan (2020) reported weak to moderate associations between
speed and edit distance in the PE of SMT, and moderate in the PE of NMT, and Cumbreño
and Aranberri (2021) found only weak correlations in NMT PE. No previous research has
investigated this correlation at the revision stage of PE or HT.
TranslateMedia offers a wide range of language services, including human translation and full
post-editing of adaptive NMT – here referred to simply as post-editing – in which post-editors
edit NMT outputs to achieve translations of publishable quality, and the NMT engine adapts
and learns from their corrections (Kosmaczewska and Train 2019). HT and PE are carried out
within the same computer-assisted translation (CAT) environment, a customised version of
memoQ. When available, translation memory (TM) matches are provided for all segments in
both HT and PE jobs.
Quality expectations are the same for HT and PE services, as both aim to deliver final
translations of the highest possible quality, which will be published by clients. To this aim, all
HT and PE work consists of a first round of translation, followed by a revision stage carried
out by a second human linguist, who proofreads and makes any corrections to the target text.
As such, this article refers to a ‘translation stage’ and a ‘revision stage’, in the context of both
human translation and post-editing tasks. All linguists are professional freelance translators
and translate into their native language. They have different backgrounds and levels of
experience, and work remotely across the globe. Most post-editors (92% in the analysed
4
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
dataset) work on HT too; since the volume of texts translated through HT is much higher than
that of post-edited texts, only 22% of linguists working on HT provide PE as well. The LSP’s
project and account managers liaise with their clients to determine whether HT or PE may be
the most suitable service for each project on a case-by-case basis, considering clients’ needs
and MT quality in different domains and language combinations, among other factors.
TranslateMedia gathers quantitative productivity data, which help them manage and
evaluate their performance. These data include some productivity measures (e.g. translation
speed and edit distance) as well as data related to other factors (e.g. language pair, text genre,
etc.) that can affect productivity. The LSP provided me with PE and HT data presented from
different perspectives – i.e. data by HT/PE job, by language pair, by translator/post-editor, and
by reviser – which have been useful to investigate various aspects of translation speed. These
data were saved into an Excel workbook format (.xlsx), and processed through the statistical
analysis software IBM SPSS (Statistical Package for the Social Sciences), version 25 (IBM
2021).
This investigation started by considering data for all PE and HT tasks in all available
language pairs completed from January 2019 to mid-June 2021, when this research began.
2019 was chosen as the starting year because the LSP adopted NMT in 2018, but only a
limited number of texts were post-edited using NMT that year. A preliminary analysis
demonstrated that the amount of work carried out in some language pairs was not sufficient
to achieve reliable results. Therefore, further data selection was undertaken through an
empirically driven process based on visualisations of translation speed values – not discussed
here due to spatial constraints – resulting in the selection of 11 language pairs with the highest
number of translated words. The selected language pairs are English (EN) to Danish (DA) /
Dutch (NL) / Finnish (FI) / French (FR) / German (DE) / Italian (IT) / Polish (PL) / Portuguese
(PT) / Spanish (ES) / Swedish (SV)3, and French to English. The majority of texts translated
in all these language pairs belonged to the genres of marketing (average: 63% in HT, 86% in
PE) or technical marketing, i.e. marketing texts with a high percentage of technical terminology
(average: 22% in HT, 12% in PE). Table 1 summarises the quantities of analysed data, and
Table 2 displays the number of linguists.
5
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Quantities of data
Human translation Post-editing
Language Jobs Words Jobs Words
pair
EN>DA 1,007 2,194,805 639 1,043,931
EN>DE 6,813 14,544,521 3,172 4,765,551
EN>ES 4,891 9,869,196 1,349 3,250,709
EN>FI 860 1,242,069 789 1,274,238
EN>FR 6,021 11,668,452 2,248 3,936,132
EN>IT 6,151 10,884,517 1,266 2,212,146
EN>NL 2,724 5,168,568 1,167 2,916,506
EN>PL 2,336 3,710,768 1,098 1,818,600
EN>PT 2,075 4,218,121 237 587,994
EN>SV 890 2,005,279 588 1,008,192
FR>EN 492 975,566 88 922,288
Total 34,260 66,481,862 12,641 23,736,287
Table 1. Quantities of data analysed in this research
Number of linguists
(translation + revision)
Language HT + Only Only Total
pair PE HT PE
EN>DA 8 15 2 25
EN>DE 19 131 0 150
EN>ES 18 98 3 119
EN>FI 14 14 3 31
EN>FR 22 145 0 167
EN>IT 20 79 1 100
EN>NL 20 60 0 80
EN>PL 23 28 0 51
EN>PT 22 38 1 61
EN>SV 14 16 0 30
FR>EN 17 41 7 65
Total 197 665 17 879
Table 2. Number of linguists
6
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
resource, as they captured speed in a very large number of translation tasks, in a wide range
of real-world contexts.
Edit distance values have also been essential for this research. They are measured
automatically in the LSP’s CAT through the Levenshtein algorithm, which “calculates the
minimum number of character edits that are necessary to transform one string into another
string” (Kosmaczewska and Train 2019, 170). This algorithm is rather complex, and it is
beyond the scope of this article to discuss it. For clarity, I would argue that this simplified
equation provides a basic way to calculate the Levenshtein distance: Edit distance = minimum
number of edited characters / number of reference characters. This metric is expressed as a
percentage: the lower the percentage, the fewer the edits. This research has examined three
types of edit distance: post-editing distance (PED), PED including TM matches and repetitions
– which I will refer to as ‘PED-TMR’ – and revision distance. PED considers corrections to the
MT output only, and it is frequently used to evaluate MT systems’ performance. PED-TMR
accounts for the overall technical PE effort. Indeed, post-editors edit a combination of MT
output and TM matches; in case of repetitions, linguists edit the segment once, and their edits
are auto-propagated. Finally, revision distance is measured for both PE and HT tasks, and it
calculates the distance between the translated and revised texts.
This research has applied an exploratory data analysis approach, which involved using
descriptive statistics to explore TranslateMedia’s productivity data (Cleff 2014). The
distribution of speed data was tested to understand whether parametric statistics – which
assume that the data under investigation follow a normal distribution – would be appropriate
for this study (Mellinger and Hanson 2016). The Kolmogorov-Smirnov (K-S) test of normality
was used, as it is the most suitable test for datasets with over 50 data points (Pennsylvania
State University 2022). The K-S test produces (1) a distance (D) value, which ranges from 0
to 1 and represents the maximum distance between the analysed data and the normal
distribution; and (2) a probability value (p-value): if this is lower than .05, there is sufficient
evidence to reject the null hypothesis according to which the analysed data are normally
distributed (Mellinger and Hanson 2016). Table 3 presents the results of the Kolmogorov-
Smirnov test of WPH by HT/PE job at the translation stage, and Table 4 displays the same
information from the revision stage.
7
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
As can be seen above, the null hypothesis was rejected in all cases. Since normality tests
may sometimes produce inaccurate results especially when dealing with large datasets
(Mellinger and Hanson 2016), the distribution of speed values was cross-checked through
data visualisations – namely normal quantile-quantile plots, not presented here due to spatial
constraints – which confirmed that the observed data are not normally distributed. As such,
nonparametric statistical tests were used in this research.
To gain a general understanding of whether PE is usually faster than HT, this project
started by considering average speed by language pair. However, average values can only
8
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
provide a very partial picture of speed, especially because they are not necessarily
representative of how individual values are spread out (Cleff 2014). Firstly, outliers, i.e.
extreme cases, may significantly affect averages – as well as standard deviations and
correlations, also examined in this study. Since some WPH values may be incorrect, it was
fundamental to identify outliers, and check whether they constitute valid cases.
Previous research dealing with translation speed (e.g. Parra Escartín and Arcedillo
2015) has typically investigated small datasets and identified outliers by establishing
thresholds of possible/impossible values. However, such an approach may not be effective in
revealing trends and patterns in large datasets. Therefore, here outliers were identified
through the mathematical formula applied in SPSS, which uses quartiles to consider the
distribution of individual values in the dataset. The first quartile (Q1) is the value under which
25% of cases are found, the third quartile (Q3) is the value under which 75% of cases are
situated, and the difference between Q3 and Q1 constitutes the interquartile range (IQR) (Cleff
2014). Outliers are identified through the following equation: Outliers =/< Q1 – (1.5 × IQR) or
=/> Q3 + (1.5 × IQR) (Forsyth 2018). Quartiles have also been helpful to identify the ranges
of the most common translation speed values. Indeed, the range between the first and third
quartile comprises the middle 50% of HT/PE jobs, which may be considered as the most
typical values.
Additionally, this project has investigated speed variability, aiming to shed some light
on whether the speed obtained in different tasks is more variable in PE or HT, and how much
variability there is in the average speed achieved by different linguists, due to their individual
skills and ways of working. Previous research (e.g. Macken, Prou and Tezcan 2020) has
typically considered the distance between the minimum and maximum average speed values
obtained in different contexts as a proxy for speed variability. However, these ranges are not
necessarily representative of how values are distributed. Therefore, here speed variability was
explored through the interquartile range and standard deviation (SD). The IQR provides an
indication of the extent to which values in the middle 50% of the data are spread out: the higher
the IQR, the higher the dispersion in this range. It has the advantage of being resistant to
outliers (Mellinger and Hanson 2016), but the disadvantage of considering only the 50% of
data around the mean. To consider the distribution of all speed values, this study has also
looked at the standard deviation, which measures the average distance between individual
values and the mean: the higher the standard deviation, the higher the variability (Cleff 2014).
Since standard deviation may be highly affected by outliers, the SD values with and without
outliers were compared. Due to spatial constraints, in this article IQR values will be discussed
in greater detail than SD ones.
9
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Finally, the correlation between translation speed and edit distance was investigated,
to understand to what extent temporal and technical effort are interrelated, and whether edit
distance can be used as a proxy for productivity. A correlation can be positive or negative:
when positive, if one variable grows, the other grows too, and vice versa; when negative, if
one variable grows, the other decreases (Forsyth 2018). The strength of all correlations has
been measured through the Kendall’s tau correlation coefficient, as much research has
reported that it is more powerful than other nonparametric correlation coefficients, such as
Spearman’s rho (Mellinger and Hanson 2016). In particular, this project has used Kendall’s
tau-b, which corrects for ties in the data. Kendall’s tau-b may range from -1 to +1: a value of -
1 indicates a perfect negative correlation; 0 signals that there is no linear relationship between
the variables; and 1 represents a perfect positive correlation (ibidem). In this research, a
coefficient in the range of .06 to .25 has been interpreted as a weak correlation, a value
between .26 and .48 as a moderate correlation, a value between .49 to .70 as a strong
correlation, and a value equal to or larger than .71 as a very strong correlation (Wicklin 2023).
The p-value indicates how likely it is that the correlation coefficient is incorrect – for example,
because the analysed values seem to follow a pattern by coincidence (Forsyth 2018). A
correlation is statistically significant if the p-value is equal to or under .05 – indicating an error
probability of 5% – and it is highly significant if it is equal to or lower than .01 (error probability:
1%) (ibidem).
When examining correlations excluding outliers, only translation speed outliers were
disregarded. Indeed, in the analysed data, all edit distance outliers appear to be valid. In
particular, (a) none of these values look ‘impossible’, as none are above 100%; (b) they are
less subject to technical or measurement issues. For example, if a linguist works offline, this
will affect the translation speed value, but not edit distance; and (c) any doubts about their
validity can be solved by manually calculating edit distance.
5. Results
This section discusses the results of this research, which aimed to understand which
translation and revision speed values are typical/atypical in PE and HT (Subsection 5.1), how
variable speed is (Subsection 5.2), and to what extent temporal and technical effort are
correlated (Subsection 5.3).
10
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Figure 1 displays average translation and revision WPH in HT and PE (average calculated by
HT/PE job). In these and other figures in this article, the darkest colour indicates the addition
of PE to the HT values, the percentages above bars indicate the speed increase through PE,
and the value that accounts for all language pairs is highlighted in bold.
Figure 1. Stacked bar chart of average WPH by HT/PE job, at the (a) translation and (b) revision stages
These values are higher than those reported in previous publications. However, as
previously mentioned, they do not include time spent reading briefing documents, researching,
and taking breaks. At the translation stage, on average PE has greatly enhanced time
11
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
efficiency compared to HT – i.e. it has considerably increased the amount of work completed
in the same period of time – with overall time-savings of 66%. However, the impact of post-
editing on speed has varied dramatically among different language pairs, from an average
increase of +130% in English to French, to a decrease of -7% in English to Swedish. This
shows that PE may not always increase speed in all language combinations, even when a
significant amount of post-editing work has been completed in a language pair, since over a
million words were translated from English to Swedish through post-editing.
In any case, as previously mentioned, average values only provide a partial picture of
speed, and they may be affected by outliers. Figure 2 compares average translation speed,
including and excluding outliers.
12
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Figure 2. Stacked bar chart of average translation WPH by HT/PE job, with and without outliers
When excluding outliers, the overall average WPH values decrease by 13% in HT and
by 7% in PE. However, results vary considerably among language pairs, from a difference of
just -2% in English-to-Swedish HT, to -29% in English-to-Polish PE. In the latter case, the
mean values that include outliers would suggest that, on average, linguists have saved 18%
of their time through post-editing. Conversely, the mean values without outliers indicate that,
on average, linguists took 4% longer when post-editing. Before disregarding any outliers, it is
fundamental to investigate whether these values are invalid, or there could be some reasons
explaining their presence. To gain a general idea of whether these values are realistic, I have
investigated the highest, middle, and lowest values within all outliers for each language pair,
both in PE and HT. In most cases, I could not exclude that these values were impossible a
priori.
For example, in English-to-Polish post-editing, the analysed outliers were: 9,721 WPH,
2,378 WPH, and 1,709 WPH. At the bare minimum, a post-editing task involves reading the
source text and the MT output, and average reading speed ranges from 200 to 330 WPM
(Rayner, Slatter and Bélanger 2010). Although no previous research has investigated reading
speed in post-editing or other professional translation contexts, I would argue that linguists
are likely to read at a slower pace, because they need to be very careful to spot mistakes and
inconsistencies. Nevertheless, these ranges may help us identify unrealistically high speed
values. If a post-editor were able to read at the highest speed in this range, they could
theoretically read up to 19,800 WPH; if we halve this value as they would read both the source
and target texts (although their lengths may differ), they could read up to 9,900 words in an
hour. In the highest outlier for English-to-Polish PE – i.e. 9,721 WPH – the post-editor has not
only read the source and target texts, but they have also made some edits (PED-TMR:
13
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
19.52%). Thus, we can be confident that this WPH value is invalid, and it was likely caused
by measurement or technical issues. However, the other two outliers are much lower than
9,900 WPH, so they could potentially be valid. I searched for any possible causes that would
explain their validity, by looking into any available resources, such as the TM, the number of
edits made during translation and revision, any localisation briefs, etc. However, the results of
this analysis were inconclusive.
These outliers are likely to include a combination of valid and invalid cases. Therefore,
when examining values that may be affected by outliers, I decided to (1) consider speed values
both including and excluding outliers, and (2) if trends emerge from the data without outliers,
calculate what percentage of all cases the data without outliers comprise, to understand
whether the results obtained by disregarding extreme values relate to a considerable share of
all data. In the case of English-to-Polish PE speed, even if all outliers were valid, PE would
have been very time-efficient in these cases (average speed in the English-to-Polish PE
outliers: 2,522 WPH), which are only 11% of jobs. Conversely, on average PE has been
counterproductive in the remaining 89% of tasks. Indeed, in these cases the average PE
speed is 520 WPH, which is lower than the average speed achieved through HT, both
including outliers (622 WPH), and excluding them (544 WPH).
At the revision stage, the number of outliers is even higher than in translation, as shown
in Figure 3.
Figure 3. Stacked bar chart of average revision WPH by HT/PE job, with and without outliers
14
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Finally, considering quartiles has enabled me to identify the ranges of the most
common speed values. In Table 5, these figures are rounded to the nearest multiple of ten for
the sake of memorability.
15
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
To compare the variability of speed values obtained at the translation and revision stages of
HT and PE, Figure 4 shows the interquartile range among WPH by HT/PE job.4 For ease of
reference and readability, the values related to all language pairs are also presented
numerically in Table 6, together with the standard deviation values related to the same data,
for the sake of increased robustness.
Figure 4. Stacked bar chart of interquartile range of WPH by HT/PE job, at the (a) translation and (b)
revision stages
4 Only values including outliers are presented, as the IQR is resistant to outliers (Mellinger and Hanson 2016).
16
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
As seen above, there was great variability in the number of words processed in one
hour, with sometimes high differences across language pairs. At the translation stage, the
interquartile range is overall 87% higher in PE than in HT. In revision, the IQR values are
extremely high, and they are on average 33% higher in PE compared to HT. The standard
deviation values without outliers, which account for the vast majority of all data – i.e. 95% of
translation data and 93% of revision ones in HT, and 97% of post-editing data and 95% of
revision ones in PE – provide a qualitatively similar picture: SD is much higher in revision
compared to translation, it is overall 87% higher in PE than HT at the translation stage, and
35% higher in PE compared to HT in revision.
5 It may be argued that the interquartile range and standard deviation of the H/KW values are likely to go exactly
in the opposite direction of the IQR and SD of WPH. However, please note that converting speed values from WPH
to H/KW involves carrying out a non-linear transformation. As such, it was necessary to verify the extent to which
the IQR and SD of the H/KW values would present an opposite picture compared to their corresponding measures
of dispersion related to the WPH values. Indeed, the IQR and SD of the H/KW values displayed in Figure 5 and
Table 7 are considerably different from the values obtained by converting the IQR and SD values from WPH to
H/KW (not presented here for the sake of conciseness).
17
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
the interquartile range of H/KW by HT/PE job, and Table 7 presents the overall IQR and SD
values.
Figure 5. Stacked bar chart of interquartile range of H/KW by HT/PE job, at the (a) translation and (b)
revision stages
18
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
These values show the other side of the coin: H/KW dispersion is overall considerably
lower in revision than in translation, and much lower in PE compared to HT. On average, IQR
values are 34% lower in PE than in HT at the translation stage, and 44% lower in PE compared
to HT in revision. At the translation stage, the standard deviation of H/KW is highly affected by
a great number of outliers, resulting in an overall SD that is 31% higher in PE than in HT.
However, standard deviations without outliers – which account for 59% of PE data and 68%
of HT ones at the translation stage – are qualitatively similar to the IQR values, as SD is 35%
lower in PE than in HT. At the revision stage, the IQR values are qualitatively confirmed by
the SD values both including and excluding outliers.
To shed some light on the extent to which the skills and ways of working of individual
translators impact their speed, Figure 6 presents the interquartile range of average WPH and
average H/KW achieved by different linguists working at the translation stage of PE and HT.
Please see Table 8 for a summary of the overall IQR and SD values.
19
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Figure 6. Stacked bar chart of interquartile range of average (a) WPH and (b) H/KW by linguist, at the
translation stage
In line with the results of previous research (e.g. Moorkens et al. 2015; Parra Escartín
and Arcedillo 2015; Toral, Wieling and Way 2018), there was high variability among the
20
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
average speed obtained by different linguists, indicating that linguists’ skills and practices have
had a high impact on their speed. In line with the findings reported in Macken, Prou and Tezcan
(2020), the dispersion among the average WPH values achieved by different linguists was
much higher in PE than HT. Similarly to the values by HT/PE job, the opposite was the case
when considering variability in the average H/KW values.
To consider the impact of revisers’ skills and ways of working on their speed, Figure 7
presents the interquartile range of average speed by reviser, and Table 9 summarises the
overall IQR and SD values.
Figure 7. Stacked bar chart of interquartile range of average (a) WPH and (b) H/KW by linguist, at the
revision stage
21
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Once again, these values indicate that there was very high variability in the average
speed obtained by different revisers, with higher dispersion of WPH values in PE, and higher
variability of H/KW in HT.
All in all, high levels of variability were identified in all speed data analysed in this
research. WPH variability was considerably higher in post-editing and at the revision stage,
and H/KW variability was much higher in human translation and at the translation stage. I
would argue that these measures of dispersion can fruitfully complement one another, as it
would be highly relevant for LSPs to gain an understanding of variability not only in the number
of words translated in an hour, but also in the amount of time spent to process a text of a given
number of words. The high speed variability identified in this research appears to reflect the
translation process, which is not as standardised and uniform as a manufacturing process;
rather, it varies based on a wide range of factors, such as the features of different source texts
and the translation strategies that linguists apply to deal with them, any requirements that
linguists need to comply with, the quality of the MT output in the case of post-editing, etc.
Investigating the impact of all these factors on speed lies beyond the scope of this article,
which has focused on comparing speed variability in post-editing and human translation.
To consider the extent to which post-editing speed and edit distance values are correlated,
Table 10 displays the Kendall’s tau-b correlations between post-editing WPH and PED, or
PED-TMR, with overall values highlighted in bold.
22
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
All values above are highly statistically significant, except for the correlations between
speed and PED-TMR in English-to-Finnish and English-to-Swedish, which are not statistically
significant. Overall, there was a moderate, negative correlation between speed and PED –
which accounts for edits to the MT output only – whilst there was only a weak, negative
correlation between speed and PED-TMR – which measures the overall technical effort. These
findings indicate that there was a moderate tendency for post-editing speed to decrease as
linguists made more edits to the MT output, and that the strength of this tendency dropped to
weak when including corrections to TM matches and repetitions in the edit distance values.
These results do not support the use of edit distance by some LSPs to determine linguists’
remuneration (Albarino 2019; ELIA et al. 2022), as this metric does not correlate strongly with
temporal effort.
The higher strength of the correlation between speed and PED compared to the
interrelation between speed and PED-TMR may be affected by the fact that linguists working
at TranslateMedia cannot edit 100% and 101% TM matches,6 because these matches are
locked. The correlation between temporal and technical effort is always 0 in these segments,
and it has the potential of lowering the strength of the correlation between speed and PED-
TMR for the whole text. Nevertheless, evaluating the impact of 100% and 101% TM matches
on this correlation is beyond the scope of this article. The correlations identified in this research
are partially in line with those reported in previous studies investigating the interrelation
between speed and edit distance in NMT post-editing, which identified moderate correlations
6I.e. the segment in the TM is the same as the segment that needs translating; in 101% matches, one segment
below and one above the selected segment are 100% matches too.
23
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
between speed and edit distance comprising changes to the MT output and TM matches
(Macken, Prou and Tezcan 2020), and weak correlations when considering changes to the
MT output only (Cumbreño and Aranberri 2021). However, these studies have not measured
correlations between speed and edit distance both including and excluding TM matches, so
we can only make limited comparisons with their findings.
Table 11 presents the correlations between revision speed and revision distance, in
HT and PE.
With the exception of a few cases where the correlation between revision speed and
revision distance was strong – i.e. French-to-English HT and PE, and English-to-Danish HT
with outliers – all other language pairs presented a moderate, negative, and highly statistically
significant correlation between revision speed and revision distance, both in PE and HT. This
indicates that there was a moderate tendency for revision speed to decrease as revisers made
more edits.
6. Conclusion
This article has presented the results of the first large-scale analysis of translation and
revision speed in human translation and NMT post-editing, based on real-world data from an
LSP. On average, PE has been 66% faster than HT, at the translation stage. However, the
impact of post-editing on speed has varied dramatically among different language pairs, from
24
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
an average speed increase of +130%, to a decrease of -7% – showing that PE may not always
increase speed in all language combinations. Although the revision process is not supposed
to differ significantly in HT or PE, on average PE revision has been 38% faster than HT
revision. Overall, the most typical speed values – excluding breaks and time spent researching
or reading briefing documents – have been in the ranges of 530-1,440 WPH at the translation
stage of PE, and 330-820 WPH in HT; 1,990-5,540 WPH at the revision round of PE, and
1,200-3,870 in HT. These ranges provide a first reference for LSPs to compare with their
speed values.
Besides, there was high variability in the speed obtained for different HT/PE tasks. The
average speed values achieved by different linguists presented high variability too, suggesting
that the skills and ways of working of individual linguists had a high impact on their speed.
WPH variability was considerably higher in post-editing than in human translation, and in
revision compared to the translation stage, and the opposite was the case for H/KW variability.
The high levels of speed variability identified in this research appear to reflect the translation
process, which is not standardised and uniform, but rather is influenced by a wide range of
factors.
Additionally, in the data investigated in this research there was a moderate correlation
between post-editing speed and PED (accounting for edits to the MT output), and only a weak
correlation between speed and PED-TMR (which measures the overall technical effort). These
findings indicate that there was a moderate tendency for post-editing speed to decrease as
linguists made more edits to the MT output, and that the strength of this tendency dropped to
weak when including corrections to TM matches and repetitions in the edit distance values.
Moderate correlations between revision speed and revision distance were identified, both in
HT and PE data. These results do not support the use of edit distance values by some LSPs
to determine linguists’ remuneration (ELIA et al. 2022), as this metric is not a comprehensive
measure of post-editing effort.
The exploratory data analysis approach utilised in this research has provided me with
a unique opportunity to investigate speed in TranslateMedia’s data for 90 million translated
words, yet the lack of an experimental design prevented me from controlling the variables
25
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
under investigation and utilising more powerful statistical methods. Thus, it is hoped that the
findings from this research may provide a foundation for future experimental analyses of speed
in post-editing and human translation (Mellinger and Hanson 2016). Future studies may also
seek to analyse the projects of other LSPs to confirm if revision is typically faster in PE
compared to HT and, if that is the case, to investigate the reasons behind this. Besides, since
the findings of this research suggest that linguists’ individual skills and ways of working have
a high impact on their speed, further studies could examine the skills and ways of working of
different linguists, to understand which ones tend to increase or decrease speed. It would also
be fruitful to determine to what extent the same linguist might spend diverse amounts of time
when processing different texts, and to examine the variables behind it – such as source text
features, the quality of the MT output in post-editing, etc.
Funding information
This research was funded by the Arts and Humanities Research Council (AHRC) of UK
Research and Innovation (UKRI), with grant number 2498533.
Acknowledgments
I would like to express my gratitude to Prof Maeve Olohan for her invaluable guidance and
feedback on my research. I would like to extend my thanks to TranslateMedia for their support
and for enabling me to analyse their translation projects. I would also like to thank Dr Rebecca
Tipton and Dr Henry Jones for their constructive advice.
26
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
References
Albarino, Seyma. 2019. “Reader Polls: Rev Pay Cut, Shift to Hourly Rates, Edit Distance,
Translation Regulations.” Slator. Accessed April 22, 2022.
https://slator.com/reader-polls-rev-pay-cut-shift-to-hourly-rates-edit-distance-
translation-regulations/.
Krings, Hans P. 2001. Repairing Texts: Empirical Investigations of Machine Translation Post-
Editing Processes. Ohio: Kent State University Press.
Läubli, Samuel, Chantal Amrhein, Patrick Düggelin, Beatriz Gonzalez, Alena Zwahlen, and
Martin Volk. 2019. “Post-Editing Productivity with Neural Machine Translation: An
Empirical Assessment of Speed and Quality in the Banking and Finance Domain.” In
Proceedings of Machine Translation Summit XVII Volume 1: Research Track. Dublin,
27
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Library of Congress. 2017. Codes for the Representation of Names of Languages. Accessed
June 22, 2021. https://www.loc.gov/standards/iso639-2/php/code_list.php.
Macken, Lieve, Daniel Prou, and Arda Tezcan. 2020. “Quantifying the Effect of Machine
Translation in a High-Quality Human Translation Production Process.” Informatics
7 (2): 12. DOI 10.3390/informatics7020012.
Mellinger, Christopher D., and Thomas A. Hanson. 2016. Quantitative Research Methods
in Translation and Interpreting Studies. Milton Park, Abingdon, Oxon: Routledge.
DOI 10.4324/9781315647845.
Moorkens, Joss, Sharon O’Brien, Igor A. L. da Silva, Norma B. de Lima Fonseca, and Fabio
Alves. 2015. “Correlations of Perceived Post-Editing Effort with Measurements of
Actual Effort.” Machine Translation, 29 (3/4): 267-284. DOI 10.1007/s10590-015-
9175-2.
Moorkens, Joss. 2020. “Translation in the Neoliberal Era.” In The Routledge Handbook of
Translation and Globalization, edited by Bielsa, Esperança, and Dionysios
Kapsaskis, 323-336. London: Routledge.
Parra Escartín, Carla, and Manuel Arcedillo. 2015. “Machine Translation Evaluation Made
Fuzzier: A Study on Post-Editing Productivity and Evaluation Metrics in Commercial
Settings.” In Proceedings of the MT Summit XV Volume 1: MT Researchers’ Track.
Miami, USA, October 30 - November 3, 2015, 131-144. Accessed February 10,
2021. https://aclanthology.org/2015.mtsummit-articles.11.pdf.
Pennsylvania State University. 2022. Project 5: Explore the Data: Normality Tests and
Outlier Tests. Accessed July 14, 2023. https://www.e-
education.psu.edu/geog586/node/678.
28
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
ProZ.com. 2011. Words Revised per Hour. Accessed February 15, 2022.
https://www.proz.com/forum/proofreading_editing_reviewing/204398-
words_revised_per_hour.html.
ProZ.com. 2016a. How Many Words Can You Proofread per Hour? Accessed February 16,
2022. https://www.proz.com/forum/proofreading_editing_reviewing/302636-
how_many_words_can_you_proofread_per_hour.html.
ProZ.com. 2016b. Revision Rate: 2000 Words per Hour – Really? Accessed February 16,
2022. https://www.proz.com/forum/proofreading_editing_reviewing/307931-
revision_rate_2000_words_per_hour_really.html.
ProZ.com. 2018. What Are the Realistic Expectations for Post-Edited Work? Accessed
February 16, 2022.
https://www.proz.com/forum/post_editing_machine_translation/330430-
what_are_the_realistic_expectations_for_post_edited_work.html.
Rayner, Keith, Timothy J. Slattery, and Nathalie N. Bélanger. 2010. “Eye Movements, the
Perceptual Span, and Reading Speed.” Psychonomic Bulletin & Review, 17 (6):
834-839. DOI 10.3758/PBR.17.6.834.
Sánchez-Gijón, Pilar, Joss Moorkens, and Andy Way. 2019. “Post-Editing Neural Machine
Translation versus Translation Memory Segments.” Machine Translation, 33 (1):
31-59. DOI 10.1007/s10590-019-09232-x.
Temizöz, Özlem. 2017. “Translator Post-Editing and Subject-Matter Expert Revision versus
Subject-Matter Expert Post-Editing and Translator Revision.” Journal of Translator
Education and Translation Studies (2) 4: 3-21.
Toral, Antonio, Martijn Wieling, and Andy Way. 2018. “Post-editing Effort of a Novel with
Statistical and Neural Machine Translation.” Frontiers in Digital Humanities 5: 1-11.
DOI 10.3389/fdigh.2018.00009.
Virino, Virginia. 2022. “How Many Words Does a Professional Translator Translate per
Day?” Pangeanic. Accessed February 15, 2022.
https://blog.pangeanic.com/professional-translator-translation-words-per-day.
Wicklin, Rick. 2023. “Weak or strong? How to interpret a Spearman or Kendall correlation.”
SAS. Accessed October 5, 2023.
https://blogs.sas.com/content/iml/2023/04/05/interpret-spearman-kendall-
corr.html.
29
This article has been published in Translation Spaces. This is the author accepted manuscript. Please find the
published version at: https://doi.org/10.1075/ts.22044.ter.
Author’s address
Silvia Terribile
University of Manchester
Oxford Road
Manchester, M13 9PL
UK
[email protected]
https://orcid.org/0000-0001-5791-926X
30