Transcription (Complex) Guidelines
Transcription (Complex) Guidelines
Transcription (Complex) Guidelines
Term Definition
Writing
Transcription should follow the standard conventions of the target language.
To reference the names of song titles, movies, TV shows, brands etc. you could do a
quick Google search.
Punctuation
Use of punctuation will vary depend on the specific project. You should always refer to
the project specific Transcription guidelines to see how punctuation should be used.
Special Characters
Do not use special characters or symbols such as quotation marks, dollar signs, etc.
Please transcribe all full words spoken.
Example
$ → dollar
% → percent
Example - speaker pronounces the word "slash"
You hear: it was great slash weird
You transcribe: it was great slash weird
INCORRECT: it was great/weird
Capital letters
Name Entities (e.g. person names, place names, some time words) should be spelled
with a capital letter as per usual writing conventions for the target language.
Example
George
Monday
If a business name is spelled with a capital letter in the middle of the word, this is okay.
Example
eBay
iPhone
YouTube
Do not use a capital letter if the only reason is that the word is at the start of a sentence.
Example - the first word is only capitalised if it is a proper name
Numbers
Do not use any digits (e.g. 1 2 3 4 5 ...). All numbers must be spelled out as full words in
the way they were pronounced.
Example - the number '2012' may be pronounced in many different ways:
Abbreviations
Do not use any abbreviations. Words must be spelled out in full.
Example
Mr Johnson ==> Mister Johnson
Dr Smith ==> Doctor Smith
Elizabeth St ==> Elizabeth Street
Acronyms
An acronym is a word made up of the first letters of other words that is spoken as a word
(e.g. NASA, FIFA). Acronyms are spelled using capital letters joined with no space.
Example
NASA
FIFA
Initialisms
An initialism is an abbreviation made up of the first letters of other words where each
letter is pronounced separately (e.g. IBM, CPU, ADHD). Initialisms are spelled using
capital letters joined by underscores.
Example
I_B_M
C_P_U
A_D_H_D
Spelled Letters
Spelled letters are where a word is pronounced letter by letter (e.g. L I A I S E). Spelled
letters are transcribed using capital letters joined by underscores.
Example
my name is Jayme and it's spelled J_A_Y_M_E
For single stand-alone spelled letters, transcribe them with an underscore after the letter.
Ensure that there is a space after the underscore so that it is not linked to the following
word.
Example for single stand-alone spelled letters
my blood type is B_ positive
Mixed Initialisms
Mixed initialisms involve combinations of words, letters, and numbers. When a single
concept is expressed, all parts are written together with an underscore. Models like 4S
(below) are written separately from the brand name. Numbers in a proper name are
capitalised when written out.
Example
iPhone four_S
Seven_Eleven
A_B_forty-eight
M_P_three
Fragments
When a speaker pronounces only part of a word, write that part of the word and attach a
hyphen to it. Make sure there is a space after the hyphen.
Example - someone begins to say 'motorcycle' but stops after 'moto'
she came to work today by moto- I mean car
Example: someone begins to say 'onions' but stops after 'on-' and then
repeats the word in full
my eyes hurt when I cut on- onions
If it is not clear what the full word was going to be, do not transcribe the word and
instead use the unintelligible tag (see the section on using tags).
Tags
Tags are used to add additional information to transcriptions of speech. Tags can be used
to add information about the audio. These may include noise events, sections of silence,
fillers, foreign speech, and more. As each project may be different, it is important that
you follow the project specific guidelines, as tag usage may differ from what is detailed
below.
Standalone tags are inserted independently into the text box. In Ampersand, these tags
appear as images. In the examples below, these tags are represented in text format
using < > brackets.
Span tags can are used to highlight transcription in the text box.
Speaker Tags
In some projects, you may hear multiple speakers in a batch and each speaker may need
to be identified with a unique Speaker ID tag throughout the batch.
When to use Speaker IDs will vary depending on the project. You should always refer to
the project specific guidelines for information on when and where to use a Speaker ID
tag.
Fillers
Fillers are the sounds people make while they are thinking of what to say next, for
example "um", "ah", "er".
Whenever you hear a filler, insert the filler tag that best represents the sound made.
Example: speaker says "um" after "was"
Interjections
Interjections are very common in spoken language, but strictly speaking they are not
'words' and would be unlikely to show up in a dictionary or a newspaper article.
Interjections should be transcribed according to the project specific guidelines. In most
transcription projects, interjections should be highlighted with a highlighting span tag.
Overlapping Speech
Overlapping speech is when two or more people are talking at the same time and at a
similar volume.
In some projects, overlapping speech may need to be tagged. How to tag and/or
transcribe overlapping speech will vary depending on the project. You should always
refer to the project specific guidelines for information on how to approach overlapping
speech.
Noises overlapping with speech do not constitute an overlap. Only mark overlaps when
two foreground speakers are speaking at once
Foreign words
You may hear someone speaking in a foreign language. If you cannot understand the
foreign speech, just place a <foreign> tag in place of the words you cannot understand.
Example
no she said <foreign> which means goodbye in Croatian
If someone uses just the occasional foreign word and you know how to spell it, write out
the word and then highlight it using the "foreign word" highlighting tag.
Mispronounced words
When it is obvious a speaker has mispronounced a word, use the mispronounced tag to
highlight the word. When you type the mispronounced word, use the normal correct
spelling.
Example - you hear the speaker say "expresso" instead of "espresso"
YOU TRANSCRIBE: espresso
Words pronounced with a regional accent are NOT considered mispronounced. If you are
unsure, imagine asking the person after they spoke if they made a mistake. If that
person would admit they made a mistake, then the word was mispronounced.
Unintelligible Speech
If you come across a word or several words that are not clear because there is
interference, audio problems, or because the person is not talking clearly, enter the
<unintelligible> tag in place of the unintelligible speech.
Of course you should try your best to listen and determine what was said, but in natural
speech there will be unintelligible words often. As a guide you should try at least three
times to understand what was being said. If it is not clear, insert the tag and move on.
Example - speaker mumbles something after "her"
well I already told her <unintelligible> you know I told her
Thought Continues
Sometimes, a thought or sentence in the current utterance may into the next utterance.
In such cases, you should:
Insert the <continued> tag at the end of the first utterance where the thought is
cut off.
o If you have inserted a comma at the end of the first utterance, the
<continued> tag must be placed after the comma.
o An utterance cannot end with a comma.
Example
Note: do not use the continued tag if the sentence or thought has ended where the
utterance ends.
Example
Truncations
If a word gets cut off at the end of an utterance because the computer program has not
cut up the audio correctly, this is called a truncation. This is different from a fragment
(where the person stops talking part way through a word). In a truncation, the recording
has cut someone off while they were saying a word. Therefore, truncations only occur at
the start or end of an utterance.
When you hear a truncation at the end of an utterance, write out the truncated word in
full followed by the <truncation> tag. In the following utterance, insert the
<truncation> tag and then continue to transcribe the rest of the sentence.
Example – “probably” has been truncated and is split across two utterances.
If you can tell that a word was truncated but you don't know what the word is, simply
insert the <unintelligible> tag in place of the word and the <truncation> tag after
the <unintelligible> tag.
Example - the word at the end of the utterance has been truncated but you
couldn't make out the truncated word
No Speech
If an entire utterance contains no speech (e.g. there is only silence or noises) insert the
<no-speech> tag only and move on. The noises in such utterances should not be
tagged.
Unintelligible speech, fillers and interjections ARE considered speech. All other noises
(human and non-human) are NOT considered speech.
Pause
Whenever there is a pause in speech, insert the <pause> tag. In most transcription
projects, pauses of 1 second or more should be tagged. However, you should always
refer to the project specific guidelines for guidance on when to tag pauses.
Example - speaker takes a two second pause between
"just" and "feels"
I don't know why it just <pause> feels different now
Use the tag for pauses within speech (between words) and for silence before the
person commences speaking or after they finish.
If noises occur in the foreground during pauses of 1 second or more within speech, do
not tag these noises - simply put only a pause tag.
If there is no speech at all within an utterance, use the 'no speech' tag (see above).
Speaker noises
All noises made by the main speaker should be tagged with the appropriate noise tag.
Common speaker noise tags that you may see in a transcription project are shown in the
table below. You should always refer to the project specific guidelines for information on
the noise tags used for the project and when to use them.
Insert the tag exactly where the noise first occurs.
If it occurs at the same time as a word, put the tag BEFORE the word.
If the noise occurs more than once in sequence, you only need a single tag.
<cough> coughing
throat clearing
sneezing
<laugh> laughing
chuckling
Other noises
Insert the relevant tag when you hear a noise that is not made by the speaker, and which
is at a comparable volume to the speech. Common noise tags that you may see in a
transcription project are shown in the table below. You should always refer to the project
specific guidelines for information on the noise tags used for the project and when to use
them.
<click> Any interference from the phone line (e.g. crackling sounds).
<short_noise> Any other short noises that do not continue over several words
(generally lasting less than one second), for example: door
slams, a loud cough by a person in the background, car horns.
<long_noise> Any other long noises that continue over longer periods of
time and perhaps multiple words (generally lasting more than
one second), for example: wind, rain, background speech or
music. This tag is used when the noise begins. The point at
which the stationary noise ends is not marked. Low level
background sounds are expected and do not need to be
tagged.
Timestamping
In most transcription projects, you'll see a waveform in Ampersand for each utterance.
Timestamps are placed on the waveform to divide the audio into segments.
Timestamps are generally used for two purposes:
Segment periods of non-speech from speech.
Segment speech based on where the speaker changes.
However, you should always refer to the project specific guidelines for information on
when and where to use timestamps.
Please also refer to 'How to use Ampersand - Timestamping Projects' for generic
guidelines on ho to place and manipulate timestamps in Ampersand.
The table below shows examples of when a timestamp may be required. However, you
should always refer to the project specific guidelines for information on when and where
to use timestamps.
Event What to do
When there Place the timestamp on the waveform to indicate that there is a change in speaker.
is a change The timestamp needs to be placed before the <speaker_ID> tag, just before the
in speaker beginning of speech.
Before If an utterance starts with a pause (as defined by the project guidelines), place
speech at a timestamp on the waveform to show where the speech begins.
the Place the timestamp after the <pause> tag.
beginning of
an
utterance
After If an utterance ends with a pause (as defined by the project guidelines), place
speech at a timestamp on the waveform to show where the speech ends.
the end of Place the timestamp before the <pause> tag.
an
utterance
At the start If an utterance starts and ends with a pause (as defined by the project
and end of guidelines), use a combination of the approaches above according to duration of
utterance silence.
Pause If an utterance contains a pause (as defined by the project guidelines) between
between speech segments, place a timestamp when speech finishes, as well as before the
speech speech begins again.
Always place the timestamps directly around the pause tag.