Ricardo Muñoz Martín / Tomás Conde Ruano
PETRA Research Group
Granada
Effects of Serial Translation Evaluation
Evaluating, in translation, is a prototypical concept with many extensions.
Readers tend to view it as a matter of quality, adequacy, or acceptability,
whereas other stakeholders conceive of it as activities, or part thereof, such as
proofreading, correcting, revising, editing, assessing, grading, and so on. Means
and goals are also different quite often. These circumstances, together with
enormously varied personal criteria and standards in evaluators, support the
generally accepted view that evaluation cannot be studied in deep. Our aim is to
find out whether studying the way subjects actually perform evaluations might
shed light on some regularities to better understand what is at stake. That is, we
are trying to find out whether there is some order in evaluators’ subjectivities
through their observable behavior.
The overarching purpose of this research line is to map intra- and intergroup coincidences and differences at evaluating translations, and to look for correlations
with other parameters, such as age, level of expertise, group orientation, etc. To
do so, we operatively define evaluation as “the set of activities carried out by a
subject which end up with at least an overall qualitative judgment of a text, or a
pair of texts, independently of the way subjects envisioned and performed the
task”. This preliminary study is a piece of descriptive-relational research, for it
seeks to depict what already exists in a group or population, and it investigates
the connection between variables that are already present in that group or population.
1 Goals and Hypotheses
Evaluating several translations from the same original is pretty unnatural in the
market. It actually comes up nearly only in translator training and translator hiring. However, training and hiring are crucial for the industry. Can the repeated
activity teach us something about evaluating translations? Does the repetition
have an influence on the outcome of the evaluation? Those were the questions
raised in this study, which is a part of a larger effort by Tomás Conde, within
the activities of the group Expertise and Environment in Translation (PETRA).
We wanted to move away from popular approaches to evaluating translations
focusing on mistakes, which assign arbitrary values to poorly defined categories. To do so, some concepts had to be operationalized. A text segment is “any
portion of a text singled out for analysis”. Since evaluators do not only mark
mistakes, we defined phenomenon as “what motivates an evaluator to act onto a
particular text segment”. Phenomena were classified into two groups: 1) normalized phenomena, which include typos, punctuation and spelling, formatting
Effects of Serial Translation Evaluation
429
variations, syntax, weights and measurements, and the like, where an authority
(in Spanish, normally, RAE) sanctions a proper option; and 2) not normalized
phenomena, such as optional word order, register and different interpretations of
the original, where appropriateness is a matter of degree.
Phenomena were taken to be more or less salient according to the number of
subjects who worked on them. Hence, a text segment where seven evaluators
perform an action is thought of as more salient than a text segment where only
three evaluators do. Here saliency is reduced to phenomena where more than
half of the evaluators coincide. Our current data couple each phenomenon with
its corresponding action.
An action is “any mark introduced by the evaluator on the text”. In this study,
actions have been limited to those present in the evaluated translation. Therefore, phenomena which are identified but not worked upon go unnoticed. Further, on-line changes which are modified again later, which might be very informative, have not been taken into account either due to current data collection
procedures. Hence, in this report each phenomenon is paired by a response from
the subject. On the other hand, focusing on marked text segments is a safe criterion to identify phenomena.
Professionals and some teachers often quote the amount of work needed to fix a
translation as a criterion to evaluate it, but their actions may entail varying quantities of work depending on their nature (e.g. conceptual vs. grammatical), systematicity (e.g. spelling vs. correct figures), and other aspects (such as purpose
and relevance, and even the skills of the subject!) which may be specific for
each phenomenon. Actions, taken as phenomena indicators and as behavioral
units, are informative. The classification of actions into types emerged from observation of the evaluated translations. Our guiding principles were avoiding
overlaps and covering as many instances as possible with the fewest categories
where behavior can be undeniably understood to follow a certain pattern. Hence,
actions have been operationalized in their quantity, and in their types. There are,
certainly, a number of cases where the action includes elements which belong to
two different types. In such cases, each one has been counted as more than one
action and is reflected in the figures as two different actions.
Actions observed so far can be divided into those made in the body of the text
and those made in the margin (which also include before and after the body of
the text). The distinction is relevant because actions in the margin cannot be
thought of as aiming to improve the text, but probably to inform another reader
of the existence of phenomena in the text, or as personal reminders. This distinction might prove worth to become one of the main parameters in a potential
classification of activities within translation evaluation. An effort was made to
keep categories symmetrical on both sides, and up to now both actions in the
margin and in the body of the texts can be classified into additions, suppressions, changes, marks, annotations, and comments.
430
Ricardo Muñoz Martín / Tomás Conde Ruano
An addition is “any procedure which results in adding some alphanumeric chain
of symbols to the translation”. Suppression is the opposite procedure. Changes
imply the combination of both. Marks refer to “any introduction of signs which
are not aimed to become part of the translation, and usually consist of symbols,
underlining, abbreviations, question marks, etc.” Annotations are “alphanumeric
chains which do not cross out a translation text segment but might constitute an
alternative to it”. For example, alternative solutions scribbled in the margin.
Comments are annotations which deal with the action or the phenomenon at
stake, but do not (only) provide a text segment to substitute some translation
fragment. Further, some evaluators chose to code their marks so as to classify
phenomena in some way. Typically, an underlining system with a three or four
color code was used.
This classification is neither homogeneous nor totally sharp, but it responds to
the nature of the phenomena pretty well, does not demand a strong heuristic effort, and brings about a considerable reduction of undetermined phenomena
(5.99%). Preliminary observation yielded that evaluators did not seem to count
on a set of clear or conscious criteria to evaluate translations, and that many of
those who professed to adhere to a more or less elaborated set of parameters
turned out to apply it rather unevenly in actual practice. However, their actual
behavior showed some interesting, statistically significant tendencies. As in the
case of action types, we chose to start by obvious quantitative parameters.
Hence, we defined demand as “the set of conscious and unconscious expectations an evaluator seems to think that a translation should meet” to try to accommodate their various standards. Demand was operationalized from two perspectives: 1) level, that is, whether evaluators seem to expect more or less from
a translation as reflected in their quality judgments; and 2) evenness, or the uniformity or lack of variation in the level of demand. The second perspective may
indicate the existence of clear and/or stable criteria for evaluating, or else an attempt to pursue an even-handedness of some sort. Finally, order effects were
defined as ‘any consistent tendency in serial evaluation behavior across evaluators which cannot be explained as a feature of the translations when considered
separately’.
2 Materials and methods
35 students in their fourth year of the translation degree at the University of
Granada were invited to “assess / correct / proofread / edit / revise four sets of
12 translations each, corresponding to four originals, according to their beliefs
and intuition, and to the best of their knowledge.” This report presents the data
of ten of these subjects, the first set to have been completely analyzed. As for
the texts, originals A and C dealt with politics, and B and D, with technical procedures for painting machinery. They had been selected from The Economist
and from an online technical list, and all four were brief but complete texts
which seemed realistic commissions. Translations had been carried out by stu-
Effects of Serial Translation Evaluation
431
dents from an earlier course, and they were chosen amongst those which had not
been assigned a very good grade by the teacher, so as to avoid that an excellent
rendering would serve as a model for the subjects when evaluating other translations from the same set. The sets were alternatively sequenced, as described
above, to prompt subjects to think of them as four separate tasks. Translations
within each set were randomly ordered and coded for blind intervention by subjects and the analysts.
Originals and translations were provided as digital files, printouts were provided
upon request, and subjects were also allowed to print them out. The only constraints imposed to all subjects alike were the following: They had to (1) process
the translations in the order they were given; (2) work on a whole set of translations from the same original in a single session; and (3) classify translations into
one of four categories: very good, good, bad, and very bad.
We searched for three types of order effects: (1) between sets; (2) within each
set; and (3), within each text. Since translations were evaluated in the same order, task effects were analyzed simply by checking changes in the progression
through the sets. For set effects, translations were grouped into three subsets, so
that subset I includes translations 1-4 from each set, subset II includes translations 5-8, and subset III includes translations 9-12. For effects within translations, originals were divided into three sections (initial, middle, and final) of
roughly identical length, and translations were divided accordingly. Data were
entered in a Microsoft Access database and later analyzed with SPSS 12.0.
Bivariate correlations, frequency and descriptive statistics were the most useful
analyses.
3 Results and discussion: subjects profiles
We will first need to describe subjects and their behaviors. The first parameter
was final quality judgments, which were assigned numerical values, to allow for
computing averages: very bad, 1; bad, 2; good, 3; very good, 4.
A
B
C
D
1
2.1
2.4
2.2
3.3
2
2.2
2.6
2.2
2.2
3
1.2
2.7
1.6
1.8
4
1.5
1.7
1.2
1.6
5
1.6
2.5
2.0
2.6
6
1.1
1.9
2.2
2.4
7
2.1
2.8
1.3
2.3
8
1.6
2.4
3.0
2.0
9
1.8
2.6
2.5
1.6
10
2.9
2.6
2.3
2.0
11
2.1
2.2
2.3
1.9
12
2.4
2.2
1.5
2.0
Set aver.
1.883
2.383
2.025
2.141
Table 1: Average quality judgments
Table 1 displays average quality judgments for the translations. In set A, translations A02 and A10 received the best grades, whereas A03 and A06 got the lowest. The median value of all translations was 2.1. Technical translations got
higher grades than “general” ones. Translations’ lengths did not correlate with
quality judgments, although I05 and I03 tended to think that long translations
are good (correlations of 0.295 and 0.291, respectively, significant at 0.05).
Ricardo Muñoz Martín / Tomás Conde Ruano
432
Graphic 1: Frequency of quality judgment averages in the task.
Graphic 1 shows the frequency of average quality judgments in the task, which
is close to a fairly typical distribution (Gauss’ bell), except for the fact that the
curve is displaced to the left, probably because translations were chosen among
the worst ones. Only nine translations were deemed Good or Very good (right
columns). When the continuum is divided into three equal periods (1-4), then
only two translations reach the highest third (darkest background).
3.1 Demand
3.1.1 Level
Table 2 tries to capture some specifics of subjects’ behavior. Correlations between quality judgments by evaluators were statistically significant between I02
and I06 (0.426), I04 and I10 (0.627), I05 and I07 (0.375), and I07 with I08
(0.384) and with I09 (0.596).
Evaluator I03 has the best opinion of the translations (general average, 2.74).
Other generous or lenient evaluators are I04 (2.62 average), I11 (2.45 average)
and I07 (2.27 average). On the other hand, I06 is the most demanding evaluator
(1.52 average), followed by I05 (1.69 average). Graphic 2 shows that subjects
can be classified into three groups: I05 and I06 are the most demanding evaluators; I03, I04, I10, and I11 are the lenient; and I02, I07, I08 and I09 are in between. The intermediate group is remarkably homogeneous.
Effects of Serial Translation Evaluation
subject/set
I02
I03
I04
I05
I06
I07
I08
I09
I10
I11
A
Aver.
2.08
2.83
2.42
1.08
1.83
2.25
2.08
2.00
2.67
2.25
433
B
C
D
Total
s.d. Aver. s.d. Aver. s.d. Aver. s.d. aver. s.d.
0.996 1.82 0.603 2.25 1.055 2.92 0.669 2.28 0.926
0.835 2.64 0.505 2.92 0.669 2.58 0.669 2.74 0.675
0.793 2.73 0.786 2.83 0.835 2.50 0.905 2.62 0.822
0.289 2.25 0.622 1.67 0.651 1.75 1.138 1.69 0.829
0.835 1.50 0.674 1.17 0.389 1.58 0.996 1.52 0.772
0.965 2.92 0.793 2.17 0.937 1.75 0.452 2.27 0.893
0.900 2.42 0.900 2.33 1.155 2.17 1.030 2.25 0.978
0.853 2.92 0.793 2.42 0.793 1.58 0.515 2.23 0.881
0.492
–
–
2.33 0.778
–
–
2.50 0.659
0.965 2.58 1.084 2.50 0.905 2.45 0.934 2.45 0.951
Table 2: Quality judgment, per subject and set.
3.1.2 Evenness
Graphic 3 displays average quality judgments per set in the evaluators. I05
seems especially tough in set A (1.08 set average) when compared to general
average, and I02 is generous in set D (2.92). On the other hand, I08 is regular
throughout the sets (2.25 subject average), followed by I11, I04 and I03, who
have better general opinions on the translations. Lenient evaluators (and medium evaluator I08) seem more even than the rest in all texts.
Graphic 2: Quality judgment, per subject
Ricardo Muñoz Martín / Tomás Conde Ruano
434
Graphic 3: Set averages of subjects’ quality judgments.
3.2 Actions
3.2.1 Quantity
The number of actions correlates significantly with quality judgments (-0.535)
when considered text by text, but not when analyzed by subjects. The total
amount of actions is 11909 (table 3). Within sets, C and D show the largest
variations, which may amount up to four times as many actions between translations.
text / set
01
02
03
04
05
06
07
08
09
10
11
12
set aver.
A
aver.
44.90
37.10
56.80
56.70
42.40
62.70
32.90
47.60
63.30
30.10
37.20
35.30
45.58
s.d.
21.702
15.366
20.471
25.975
20.250
24.784
15.871
20.007
18.331
17.272
17.561
15.151
19.4
B
aver.
24.00
18.00
16.30
19.90
16.90
22.10
14.90
17.60
18.30
14.60
15.50
17.30
17.95
C
s.d.
9.684
5.249
4.347
6.226
4.701
6.557
7.534
8.072
8.433
8.605
7.200
8.629
7.103
aver.
27.00
18.20
22.60
29.60
19.80
14.50
24.10
23.40
13.10
13.50
14.50
22.90
20.27
s.d.
15.727
6.374
8.579
11.138
8.053
10.157
10.682
19.945
7.370
7.706
8.567
9.597
10.32
D
aver.
8.80
15.80
22.70
26.00
10.40
12.50
12.00
16.80
11.70
17.50
17.70
13.40
15.44
s.d.
5.181
8.664
7.675
5.598
5.296
8.100
6.716
8.257
4.877
7.634
6.701
5.621
6.693
Table 3: Quantity of actions, per translation
Subjects differ widely in the number of actions carried out (table 4). I02 has
done a total of 627 actions, while I10 reached 1877, around three times as many.
Effects of Serial Translation Evaluation
average
set / subject I02 I03 I04 I05 I06
23.08 59
42.5 62.5 37.58
A
8.91 24.92 15.5 30.83 15.83
B
13.75 19.25 20.67 14.67 19.92
C
6.5 16.33 13.58 16.33 16.5
D
30
23
31
22
Total 13
Nr. actions
627 1434 1107 1492 1078
435
I07
28.42
14
11.17
11.33
16
779
I08
26.33
14.67
15.58
16.25
18
874
I09
47.83
13.33
12.08
11.83
21
1021
I10
72.67
30.17
25.17
28.42
39
1877
I11
55.92
34.5
27.25
17.33
34
1620
aver.
45.58
20.27
17.95
15.44
Table 4: Quantity of actions, per subject
Graphic 4 displays the average quantity of actions that each subject performed
for every set, ordered from left to right in decreasing total quantity of actions.
Four out of the five subjects who made more actions were the lenient evaluators,
and medium evaluators (clear background) performed fewer actions than demanding ones (dark background). Most medium evaluators also show the smallest differences in the number of actions undertaken between the first and the following sets.
Graphic 4: Set averages for subjects’ quantity of actions
3.2.2 Types
Marking is the only type of action which correlates significantly at 0.01 with
quality judgments (so does changes at the margin, but there were very few
cases). Actions co-occur in certain patterns. Adding in text strongly correlates
with changing in text (0.896), suppressing (0.864) and marking (0.721). Other
correlations show emergent profiles of coherent behavior: evaluators seem either to fix the translations for later use (text-oriented), or to provide explanations
of their actions to the translator or the researcher (feedback-oriented). Graphic 5
shows the distribution of the five most common actions in the subjects. Subjects
I02 and I08 only classify phenomena, whereas I03, I04 and I09 focus on changing, adding, and suppressing in the body of texts.
Ricardo Muñoz Martín / Tomás Conde Ruano
436
actions/subjects
Classification
Mark
Addition
Note
39
total
2180
1517
1
22
Change
54 104
Addition
2 288 113 150
32 125
99 172 175
Suppression
141 134
85
47 65
90
87 206
Change
962 838 620 212 401
809 727 1151
Note
2
28
14
46 46 50
21
17
49
15
1
11
Doubtful
Total 627 1434 1107 1492 1078 779 874 1021 1877 1620
158
1156
855
5720
273
27
11909
margin
I03
I04
I05
568
1
I06
I07 I08
821
636 142
3
I09
2
I10
747
127
I11
22
in text
I02
612
Table 5: Types of actions, per subject
Evaluators I03, I04, I07, I09 and I11 tended to act on the texts by adding, suppressing, and changing text segments. On the opposite pole, I02 and I08 were
oriented to offer feedback to another reader. The rest did not seem to have a
clear pattern of behavior.
When contrasted to their level of demand (Graphic 5), demanding evaluators
preferred to just mark phenomena, medium evaluators tended to classify more,
and lenient evaluators, to change and suppress.
Graphic 5: Types of actions, per subject
Comments did not yield any clear pattern. However, I05 —one of the most demanding evaluators— was the subject who made the most comments (37.5% of
all), followed by I10 (14.4%). On the other hand, the subjects who wrote the
Effects of Serial Translation Evaluation
437
fewest comments were I04 (1.4%) and I03 (2.8%), the two most lenient subjects.
2.3 Summary of subjects’ profiles
Evaluators showed consistent tendencies (1) to adopt a given level of demand,
and (2) to confront different texts with a certain degree of evenness. Their actions may be (3) more or less abundant, (4) text-oriented or feedback-oriented,
and (5) supported with a few or many comments.
Table 6 displays a summary of variables. Column I shows the level of demand,
from the most lenient (1) to the most demanding (3). Column II displays the
level of evenness in two stages, even (1) and uneven (2). Column III reflects the
quantity of actions, from the fewest (1), to the most abundant (3). Column IV
ranks subjects from the most feedback-oriented (1) to the most text-oriented (4).
Finally, column V ranks subjects according to the number of comments introduced, from the fewest (1) to the most abundant (4).
demand
I02
I03
I04
I05
I06
I07
I08
I09
I10
I11
Level
2
1
1
3
3
2
2
2
1
1
Even
1
2
2
1
1
1
2
1
2
2
actions
Quant
1
3
2
3
2
1
1
2
3
3
Type
1
4
4
3
2
3
1
4
1
3
Comm
3
1
1
4
2
3
1
1
3
2
Table 6: Summary of subjects’ characteristics
In brief, demanding evaluators tend to be feedback-oriented, and uneven in their
level of demand. Medium evaluators tend to perform few actions, and to be
pretty uneven. Lenient evaluators seem more homogeneous: they are textoriented, perform many actions, and tend to be pretty even in their judgments.
Of course, ten evaluators are too few to think that data can hold any consistent
truth, but they are interesting since they point to potential consistent tendencies
and relationships between variables.
4 Results and discussion: Order effects
4.1 Order effects in the whole task
Graphic 4 above showed that the number of actions decreases dramatically from
set A to the rest in all subjects. This is the first and most obvious order effect,
and might be due to the lack of experience of the students as evaluators. They
438
Ricardo Muñoz Martín / Tomás Conde Ruano
would start performing many actions to progressively realize that it meant too
much work or that it was unnecessary. Graphic 6 shows that classifications were
the only type of action that increased from set A through D. The reason was that
one of the subjects stopped changing and started classifying during the task.
This supports the notion that decreasing actions might be due to sujects’ adjusting their effort to the task.
Graphic 6: Type of actions in different sets
Graphic 7 shows that normalized phenomena only account for ca. 5% of all actions and that salient phenomena >5 —those singled out by at least 6 out of 10
evaluators— stays around 20% in sets A, C, and D. Text B was the first technical translation and students were not familiar with the subject matter. This might
explain the drop in coincidences. The relative increase in normalized phenomena within salient phenomena probably indicates that evaluators felt uncomfortable with the text, since they refrain a little bit from venturing into potentially
questionable actions.
Effects of Serial Translation Evaluation
439
Graphic 7: Percentage of salient phenomena >5 in each set
4.2 Order effects within sets
Table 7 shows the amount of actions in the three subsets within each set. There
is a general tendency to reduce the quantity of actions per subset, which may be
due to an improvement in efficiency, such as the one we might expect as the
product of the use of a macrostrategy. Again, this supports the notion of the
evaluators learning how to carry out the task as they were doing it. The exception is set D, where subset III has more actions than subset II, but it also has a
lower quality judgment average.
Subsets/Sets
1
2
3
A
1955
1856
1659
B
974
818
640
C
782
715
657
D
733
517
603
Subset ave.
4444
3906
3559
Table 7: Amount of actions per subsets
While there is a tendency for most types of action to appear less in subsets II
and III across sets, suppressions increase in sets B and D; additions and
changes, in set D; and classifications in sets C and D. The increase of suppressions throughout three sets may be taken to indicate that evaluators have a
clearer notion about the relevance of the information. The evaluators turned to
classifications throughout sets C and D, perhaps as a way to spare efforts.
440
Ricardo Muñoz Martín / Tomás Conde Ruano
Graphic 8: Percentage of salient >5 phenomena, in each subset
Graphic 8 shows that salient phenomena, i.e., coincidences between subjects’
actions, are higher in subset II, probably an indicator of subjects’ similar contextualization. On the other hand, the drop in normalized phenomena in subset II
might be explained as a bottom in their level of of self-confidence.
4.3 Order effects within the texts
Table 8 shows the relationship between quality judgments and number of actions in different translation sections. Evaluators seem to identify phenomena
and perform actions in all sections of the translations, but the further down in
the text, the stronger the effect on their judgment of the quality of the translation
as a whole. Interestingly, this does not correspond to the percentage of salient
phenomena, which drops in central sections, mainly due to the reduction of notnormalized phenomena. Thus, after the first section, these subjects seem to have
become more assertive but also less personal, while they feel better with the task
in the third section.
The tendency to increasing significance is evident at sentence level, since actions in the first sentences of the translations do not correlate with quality judgments. The relationship between quality judgments and actions in translation
text segments which received a special typographic treatment or else stood out
due to their position in the text, such as titles, headings, captions and the like,
showed a lower significance than regular segments. Hence, visual prominence
was ruled out as an explanation for first and last sentence results.
Quality judgments are independent of the quantity of actions introduced, when
considered by subject. Lenient evaluators do perform more actions than demanding evaluators, and medium evaluators perform the fewest, as shown in
graphic 10.
Effects of Serial Translation Evaluation
Translations
Pearson Sig. (bil.)
outstanding - 0.324*
0.025
Segment
regular - 0.525**
0.000
first - 0.057
0.701
Sentence
last - 0.514**
0.000
rest - 0.522**
0.000
initial - 0.411**
0.004
Section
central - 0.548**
0.000
final - 0.597**
0.000
** Correlation significant at 0.01
Table 8: Relationship between quality judgment
and actions
Graphic 9: Percentage of salient phenomena (>5) in initial, central,
and final sections of translations
441
442
Ricardo Muñoz Martín / Tomás Conde Ruano
Graphic 10: Quantity of actions in initial, central, and final sections
by lenient, medium, and demanding evaluators
Graphic 11: Quantity of actions in initial, central, and final sections
of translations, per average quality judgment
Another interesting effect can be traced down (graphic 11) when the number of
actions in translations’ sections—their initial, middle, and final parts— is correlated to average quality judgments. Bad and Good translations show a similar
pattern of subjects’ behavior, where initial sections contain an amount of actions
which slightly decreases in central sections to minimally rise again in final sec-
Effects of Serial Translation Evaluation
443
tions. Very Bad translations, however, show a steady increase in the number of
actions across sections, and Very good translations present a constant decrease
in the number of actions as the text progresses. This might point to an emotional
involvement of evaluators.
5 Conclusion
Technical translations had higher quality judgments than general translations.
As expected, a higher number of actions usually corresponds to lower quality
judgments but some order effects modify these results (see below). We found
out that there were statistically significant correlations between some subjects’
final quality judgments, usually, in pairs (in a group of ten). A couple of subjects seemed to consistently value information completeness above text length.
Our framework worked pretty well to distinguish between product-oriented and
feedback-oriented evaluators as well, since marking, adding, suppressing, and
changing in the body of texts do strongly correlate. Intersubject comparison allowed us to distinguish lenient, medium, and demanding evaluators, who
showed consistent behavioral trends: Lenient evaluators seemed more even in
their demands throughout the task, and performed more actions on the translations, especially changes and suppressions, since they seemed to be productoriented. Medium evaluators performed the fewest actions, were fairly uneven
in their demands, and tended to classify. Demanding evaluators were uneven in
their demand and tended to just perform minimal feedback-oriented actions.
As for order effects, salient phenomena increased from one translation set to the
next. A nearly constant decrease in the number of actions, with a steep difference between the first and second sets, seems to point to a learning curve whose
goal is minimizing effort while performing well in the task. However, these
drops do not correlate with final quality judgments. In fact, first sentence actions
did not usually affect final quality judgments. Classifications rose towards the
end of the task, and so did suppressions, additions, and changes. Since they
seem to be closer to product-oriented evaluation, this mode might be thought of
as demanding less cognitive effort, perhaps due to the lack of additional
metalinguistic, probably conscious demands.
The farther down the text, the stronger the correlation between the number of
actions and final quality judgments, an increase paralleled by a constant increase
in salient phenomena from one translation set to the next. This might indicate a
stronger correlation between salient phenomena and quality judgments than
more individual phenomena. This is, perhaps, a starting point for a motivated set
of translation evaluation criteria.
At single text level, subjects tended to use the first third of the text for contextualization purposes, as an addressee would do, but away from what might be expected in a professional, who knows that first impressions may have more influence on the reader’s opinion. This might be subject of study from a pedagogical
444
Ricardo Muñoz Martín / Tomás Conde Ruano
perspective, to discern whether specific training improves subjects’ performance
at the beginning of texts.
In the second third, subjects refrain a little from acting upon not-normalized
phenomena. This could be a symptom that subjects may have just developed a
macrostrategy or macrostructure with perhaps a number of rules or criteria, and
stick to it. Not-normalized phenomena did rise in the third section, where subjects probably felt more confident about their performance and could free mental resources thanks to sticking to their plans. If so, then reducing efforts does
not seem incompatible with developing adhoc mental structures to handle
evaluations. On the contrary, reduction of not-normalized actions might be a
good indicator of the existence of mental constructions of some sort governing
or interacting with the evaluation process.
All these are just speculations on the results of ten subjects, but their quantity
and nature seem to support the notion that the framework is useful to study
translation evaluation. And it does so in such a way that makes it compatible
with second generation cognitive paradigms such as situated cognition. In the
near future, we will cross-analyze these variables in larger amounts of subjects
and also between different groups of population; apart from translation students,
we will study professional translators, translation teachers, and addressees.
Comments are more than welcome. Full data and colored graphics are available
upon request.