Landmeen 2023 Guidelines

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 6

LANDMEEN 2023 GUIDELINES

[mPCP] Guidelines for multi-language appropriate cross-lingual prosodic consistency


and emotion evaluation - V8

Overview
In this task, you will listen to pairs of audio segments. Each pair will consist of
one German segment and one English segment.

Our goal is to know how similar the two segments (utterances) are perceived in
terms of:
Semantics/Meaning
Emotion
Rhythm
Overall expressive intent

Different languages have distinct speech patterns related to the aspects mentioned
above. When comparing expressivity in different languages, we want to determine if
the expressive qualities in German convey similar information as in English.

By “overall expressive intent,” we mean the overall impact and manner in which the
speaker spoke the sentence. To rate similarity in expressive intent between two
audio files, consider aspects like emphasis, tone, rhythm, and the speaker's
emotional state combined.

All of the dimensions are explained in more detail below:


Semantics

The semantics of an utterance refers to the literal meaning of the words


disregarding the manner in which they are spoken.

Example:
The sentence “There is a green apple” in English has a different meaning from “Hay
una manzana roja” (“There is a red apple”) in Spanish.

Question: Do the two segments have similar meaning?

Score (single choice):


1. The two segments are completely different in their meaning - they refer to
different objects, actions or concepts and the relationships between them.

2. The two segments are mostly different in their meaning but share some
similarities - there are some important differences in the meaning of the two
segments, although one or more objects, actions or concepts may appear in both
sentences.

3. The two segments are mostly similar in their meaning but have some differences -
they could be paraphrases of one another.

4. The two segments are completely similar in their meaning - they are exact
translations of one another.
Emotion
Emotion describes the overall feeling of the speaker while they are talking.
Example:
A speaker may sound angry, pleased, happy or confused (to name just a few emotions)
while speaking. Consider whether you could imagine the two speakers making similar
facial expressions while speaking or whether you could apply the same description
of their emotions.

Question: Do the two segments sound similar in the speaker’s emotional state?
Score (single choice):
1. The two segments sound completely different in the emotions conveyed - basically
none of the emotion aspects are shared.
For example, while one utterance might sound very happy throughout, the other
utterance might sound neutral throughout.

2. The two segments are mostly different but share some similarities in terms of
the emotion.
For example, while one utterance might sound happy throughout, the other utterance
might sound neutral throughout and happy just at the end.

3. The two segments are mostly similar in the emotion but have some differences.
For example, both utterances might share the same emotion or mix of emotions, but
the emotions are more pronounced in one compared to the other (one segment sounds
very happy while the other is subtly pleased).

4. The two segments sound completely similar in the emotions conveyed - basically
all of the emotion aspects are shared.
For example, both utterances sound very happy to the same extent, and this is
expressed similarly throughout.

3. Rhythm
The rhythm of an utterance describes its speed, pacing (i.e. changes in speed), and
pauses. A speaker pausing or elongating/shortening words can impact rhythm.
Example:
“You -- lied to me?” Having a pause after “you” is distinct from "You lied to --
me?" having a pause after “to.” A speaker speaking quickly or slowly throughout the
sentence or speeding up/slowing down at certain parts of the sentence, also impacts
rhythm.

Question: Do the two segments sound similar in terms of rhythm?


Score (single choice):

1. The two segments sound completely different in their rhythm - basically none of
the rhythmic aspects are shared.
For example, one utterance may be spoken slowly at first, have a pause in the
middle, then faster at the end, while the other utterance is spoken in a normal
cadence throughout.

2. The two segments are mostly different but share some similarities in their
rhythm.
For example, one utterance may be spoken slowly at first, have a pause in the
middle, then faster at the end, while the other utterance may be spoken normally at
first and faster at the end without the pause in the middle.

3. The two segments are mostly similar in their rhythm but have some differences.
For example, one utterance may be spoken slowly at first, have a pause in the
middle, and then faster at the end, while the other utterance is spoken virtually
the same, except without the pause in the middle.

4. The two segments sound completely similar in their rhythm - basically all of the
rhythmic aspects are shared.
For example, one utterance may be spoken slowly at first, have a pause in the
middle, and then faster at the end, and the other utterance has the same pattern.
4. Overall expressive intent
The overall expressive intent of an utterance is the combined feeling of the
rhythm, emotion and any additional factors (such as emphasis and intonation) which
give rise to the utterance’s overall impact and implications. When comparing the
expressive intent across different languages the idea is to assess whether the
expressive qualities of the {LANG_1} utterance convey equivalent (or as similar as
possible) information as the expressive qualities of the {LANG_2} utterance.

Select examples of how expressive characteristics can impact intent:

Sarcasm often includes exaggerated emphasis on specific words to express the


opposite of what is said. Each language has its own way of showing sarcasm through
tone and cues, which can differ a lot even though the underlying sarcastic intent
remains the same.

The English question "Does Amy speak French or German?"


is understood as a yes-or-no question when delivered with a single rising
intonation contour.
It is seen as an alternative question when intoned with a rising contour on
"French" and a falling contour on "German."
Different languages have their unique intonation patterns and cues for yes-or-no or
alternative questions, and these can vary widely even though the underlying intent
remains the same

When emphasis is placed on different words in English, the implicit


meaning/implications of the sentence change:
I didn't take the train on Monday. (Somebody else did.)
I didn't take the train on Monday. (I did not take it.)
I didn't take the train on Monday. (I did something else with it.)
I didn't take the train on Monday. (I took one of several, or I didn't take the
specific train that would have been implied.)
I didn't take the train on Monday. (I took something else.)
I didn't take the train on Monday. (I took it some other day.)
Different languages have their unique patterns to convey equivalent implications to
the ones above.

Question: Considering the overall expressive intent of the two utterances, how
similar are they?
Score (single choice):

The two segments are completely different in their overall expressive intent - the
information conveyed through the expressive features and speaker emotional state is
different.
The two segments are mostly different across expressive aspects but share some
similarities.
The two segments are mostly similar across expressive aspects but have some
differences.
The two segments are completely similar in their overall expressive intent.

Tip on handling languages with different prosodic characteristics:


If an hypothetical “Language A” expresses confusion by slowing down (elongating the
words and adding larger pauses) and:
Scenario 1: “Language B” also expresses confusion by slowing down, then you could
compare how similar the emotion being displayed in both languages is in terms of
slowing down.
Scenario 2: “Language B” expresses confusion by changing the rhythm in some other
way such as speeding up (rather than slowing down)
Scenario 3: “Language B” doesn’t express confusion by altering their rhythm in any
other manner, but rather by using a different feature altogether.
In Scenario 2 and 3 you would rate the similarity in terms of how you perceive the
speaker’s intended use of the expressive / prosodic feature.

Task description
Listen to audio 1 from start to finish. Then listen to audio 2 from start to
finish.
Provide your similarity scores on all dimensions as explained above
Consider the following:
If either of the segments is very garbled or unclear, please check the box “audio
issues” and skip the item.
If the segments have the same or similar content, but one has additional content
relative to the other, please only consider the content shared between the two
segments. If the difference in the amount of content is greater than a few words,
please move to the question on Semantics and select a score of 1 (“Completely
Different”). Where this occurs, you will not be answering questions related to the
other expressivity dimensions.
If one or both of the segments has leading or trailing silence, please ignore this
and try to focus on spoken content only.
Two segments can be similar in the presence or absence of the aspects of interest.
That is, if two sentences are both equally neutral in any of the categories, we can
also consider them to be “similar.” For example, we would consider two segments as
being similar in emotion if they were both spoken in a “neutral” tone.
Please try to rate the similarity independently of the speakers’ voices. For
example, in some cases the source audio may be in a conventionally female voice
while the target audio may be in a conventionally male voice. Try your best to
focus on how the sentence is uttered in terms of the expressive intent, as outlined
above, irrespective of the voice differences.
Examples
Example ratings for Question 1 (Semantics)
Score
Utterance 1
Utterance 2
1 (completely different)
The rocket had a coupling issue preventing atmospheric clearance.
J'ai vu une pomme de terre en allant travailler aujourd'hui. / I saw a potato on
the way to work today.
2 (mostly different)
The rocket had a coupling issue preventing atmospheric clearance.
Le tuyau présentait une fissure provoquant un blocage. / The pipe had a crack in it
causing some blockage.
3 (mostly (similar)
1. The rocket had a coupling issue preventing atmospheric clearance.

2. I really have to call my mom


1. La fusée avait un problème avec les pièces de jonction et ne pouvait donc pas
sortir de l'atmosphère / The rocket had an issue with the connecting parts and thus
could not breach the atmosphere.

2. I need to call my mom


4 (completely similar)
The rocket had a coupling issue preventing atmospheric clearance.
La fusée avait un problème de couplage empêchant la sortieatmosphérique. / The
rocket had a coupling issue preventing atmospheric clearance.

Example ratings for Question 2 (Emotion)


Examples:
*Note: this is a simplification (basic description of emotional state) to allow for
a clearer understanding of the underlying task.
Score
Utterance 1
Utterance 2
1 (completely different)
A person speaking with fear (talking with haste whilst stuttering profusely): “He’s
going to set the building ablaze!”
A person giving instructions calmly (talking slowly): “Il braccio destro va nella
manica destra e il braccio sinistro va nella manica sinistra.” / “The right arm
goes in the right sleeve and the left arm goes on the left sleeve.”
2 (mostly different)
A person speaking with fear (talking with haste whilst stuttering profusely): “He’s
going to set the building ablaze!”
A person speaking with trepidation (talking with a bit of a slowdown and stuttering
on the last word): “Stai dicendo che la barricata non è riuscita a fermare il
mostro?” / “You’re saying that the barricade failed to stop the monster?”
3 (mostly (similar)
A person speaking with fear (talking with haste whilst stuttering profusely): “He’s
going to set the building ablaze!”
A person speaking with worry (talking with progressive haste whilst stuttering a
bit at the end): “Pensi che potrebbero esserci dei topi qui?” / “You think that
there might be rats here?”
4 (completely similar)
A person speaking with fear (talking with haste whilst stuttering profusely): “He’s
going to set the building ablaze!”
A person speaking with fear (talking with haste whilst stuttering profusely): “C'è
un ragno proprio lì!” / “There’s a spider right there!”

Example ratings for Question 3 (Rhythm)

Examples:
*Note: changes in speed will be demonstrated by increasing the spacing between
items.
**Note: this is a simplification (reduction in dimensions to just changes in speed)
to allow for a clearer understanding of the underlying task.
Score
Utterance 1
Utterance 2
1 (completely different)
I didn’t know how to get there.
(Slow-normal-slow)
Yo no sabia como llegar ahí. /
I didn’t know how to get there.
(Fast)
2 (mostly different)
I didn’t know how to get there.
(Slow-normal-slow)
Yo no sabia como llegar ahí. /
I didn’t know how to get there.
(Normal pace)
3 (mostly (similar)
I didn’t know how to get there.
(Slow-normal-slow)
Yo no sabia como llegar ahí. /
I didn’t know how to get there.
(Slow-normal)
4 (completely similar)
I didn’t know how to get there.
(Slow-normal-slow)
Yo no sabia como llegar ahí. /
I didn’t know how to get there.
(Slow-normal-slow)

Example ratings for Question 4 (Overall Expressive Intent)


Examples:
*Note: this is a simplification (reduction in dimensions to simple emotional
description and semantics) to allow for a clearer understanding of the underlying
task.
Score
Utterance 1
Utterance 2
1 (completely different)
A person speaking with joy (talking at a slow pace and getting faster whilst
raising their voice): “I won the lottery!”
A bored person giving a lecture (talking normally): “การเกี่ยวพันกันของอนุภาคนี้
เรียกว่าปฏิกิริยาร่วมในระยะห่าง” / “This entanglement of particles is referred to
as Spooky Action at a Distance.”
2 (mostly different)
A person speaking with joy (talking at a slow pace and getting faster whilst
raising their voice): “I won the lottery!”
A person speaking with moderate happiness (talking normally): “เธอรู้หรือเปล่าว่า
จอห์นชนะรางวัล” / “Did you know John won something?”
3 (mostly (similar)
A person speaking with joy (talking at a slow pace and getting faster whilst
raising their voice): “I won the lottery!”
A person speaking with joy (saying things twice and raising their voice): “ฉันชนะ
ฉันได้รางวัล!” / “I won, I won a prize!”
4 (completely similar)
A person speaking with joy (talking at a slow pace and getting faster whilst
raising their voice): “I won the lottery!”
A person speaking with joy (saying things twice whilst raising their voice): “ถูก
หวย ฉันถูกหวย!” / “I won, I won the lottery!”

You might also like