Below
is a description of the workflow I follow for transcribing and translating oral
texts into ELAN.
The
workflow assumes that the media files (audio and video) have been uploaded to
ELAN and synchronized (using Media Synchronization Mode). It also assumes that
the oral text has been segmented in ELAN. I generally work in Annotation Mode,
so I can easily change segment boundaries. But I also skip back and forth to
Transcription Mode, which has some nice features. For example, the view in
Transcription Mode gives a very clear overview of what has been transcribed and
translated already, and what remains to be done.
My
ELAN project is set up with three tiers (a Sasi tier, with no parent tier), and
two translation tiers, one for Setswana and one for English. Each of the
translation tiers has the Sasi tier as parent. I do not set up a glossing tier
in ELAN, since it is much easier to gloss in FLEx.
See
the following post for converting an ELAN file to FLEx, and back again:
My
working group consists of the linguist (me), a Setswana translator and two Sasi
consultants. The linguist speaks English, and some Setswana and Sasi. The
Setswana translator speaks English and standard Setswana (but no Sasi at all). The
Sasi consultants speak Sasi and the Sengwato dialect of Setswana (but no
English at all). I always try to include the original speaker in the group if
possible. The Sasi consultants are illiterate, so they do not know how to read
or write in any language.
The
translator is necessary (a) to communicate with the consultants, especially
about complicated issues, and (b) to create accurate and natural Setswana
translations for the videos (since the resulting video clips will be made
available to Setswana speaking people in Botswana).
This
work mostly takes place in the village. There is only one computer, a Macbook
Pro, powered by a solar panel. The computer is running ELAN, but not FLEx. The
translator does not have a computer, and there is no internet.
Linguists
with a different set-up (e.g., only one translation language) will not use the
same workflow. However, I advise people to write down their workflow. This will
help them make the process more efficient. A single 10 minute text may take 10
hours or more to process, so working out the exact steps can save a lot of
time. I know this from experience!
Steps in Workflow
1.
The linguist
plays a segment from ELAN to the group one or more times.
[Press
“Play selection”, immediately above the waveform in ELAN.]
2.
The
linguist transcribes the segment in modified IPA, writing by hand into a notebook.
[In
general, I try to stick closely to what the consultant actually says. The rough
transcription in step 2 is sometimes modified after the translation steps
below. It can be more efficient to do a bunch of segment transcriptions ahead
of time (without the translator and consultants).]
3.
The linguist
refers to the Sasi dictionary to check transcriptions and glosses of words
where needed.
[The Sasi
dictionary is a .pdf version, since FLEx only runs on a PC.]
4.
The linguist
asks the consultant to repeat the segment, if it is unclear.
[This task
is difficult for many consultants to carry out.]
5.
The linguist
asks the consultant any questions needed to clarify segment.
[For example,
Who said this? Who are they talking to? Who does ‘he’ refer to? Where does
‘there’ refer to? How is this word pronounced?]
6.
The linguist
asks the consultant to define any new words found in the current segment.
[Just a
quick gloss is needed for now, more systematic lexical work can be done in a
separate session. The larger the existing dictionary of the language, the less
time that needs to be spent on defining new words.]
7.
Words
and phrases that are difficult to transcribe can be marked with (???),
optionally followed by the best guess transcription.
[See
Appendix below.]
8.
The
linguist adjusts the segment boundaries were necessary (splitting segments,
merging segments, etc.) using ELAN commands in Annotation Mode.
[In
transcribing, it becomes clearer what the segment boundaries should be. Usually
the segments correspond to complete sentences, or long clauses of complete
sentences. ELAN uses the confusing word “annotation” to refer to the individual
segments of an oral text. Some useful commands that I use regularly are: New
Annotation Here, Merge with Next Annotation, Delete Annotation. All these are
found by right clicking on a particular segment on the top tier.]
9.
The linguist
hands the notebook to the translator.
[In
simple cases, the linguist just translates the Sasi into Setswana and English,
bypassing steps 9-15. In other simple cases, the linguistics translates
directly from Sasi to English and hands the notebook to the translator for
Setswana. As should be clear, the notebook is needed in the process to give the
translator a place to write the Setswana translations to share and discuss with
the linguist.]
10.
The
translator asks the consultant to translate the segment into Setswana.
[The translator
says: Bua ka Setswana “Speak in
Setswana” or Ka Setswana “In
Setswana”]
11.
The translator
writes down Setswana translation by hand in notebook.
[The
translator mostly writes in standard Setswana.]
12.
The translator
asks consultant questions about Setswana translation.
[For
example, questions about unknown words, ambiguous words, dialect issues. Also,
the consultant may have to repeat the Setswana translation.]
13.
The translator
translates Setswana into English, writing by hand in the notebook.
[At this
point, the notebook has three hand-written lines: Sasi, Setswana and English.]
14.
The translator
uses the Setswana-English dictionary for translations and spellings where
needed.
[In our
work, the use of the dictionary has turned out to be a time sink so I
discourage it, unless it is necessary.]
15.
The translator
hands the notebook back to the linguist.
16.
The linguist
checks consistency of the Sasi transcription, and the Setswana and English
translations, and addresses any issues that come up.
[This step
can sometimes involve lengthy discussions with the consultants and the translator
about the correct translations. It can also lead to discussions of cultural and
historical matters.]
17.
The linguist
types Sasi into MS Word.
[The Sasi
keyboard (for the Macbook Pro) does not work directly in ELAN. This step takes
place at the same time as steps 10-14, so no time is lost.
18.
The linguist
copies Sasi from MS Word and pastes into ELAN.
[This step
also takes place at the time as steps 10-14.]
19.
The linguist
copies the Setswana and English translations from the notebook, by typing them
into the ELAN translation tiers.
20.
The
linguist uses the ELAN commands to move to the next segment, and the process
starts over.
[Press “Go
to next annotation” which is a right arrow, just above the wave form in ELAN.]
21.
After
transcribing and translating an oral text, the whole text should be reviewed (in
Transcription Mode), to check for errors and to check the consistency of the
running text in each of translation tiers (e.g., Are all the tenses and
pronouns consistent? Does the oral text make sense? Are there small notes that can
be inserted in the translations to help the reader? etc.). Ideally, the
translator should be present for this step. For a 15 minute long oral text,
this step itself can take a few hours (e.g., 2-4 hours), and has its own
internal workflow.
22.
Transferring
the oral text from ELAN to FLEx to ELAN might result in further minor corrections
in the three tiers.
Appendix [Step 7]: Resolving (???)
As
noted in step 7, words and phrases that are difficult to transcribe can be
marked with (???), optionally followed by the best guess transcription. Unresolved
issues arise when the linguist is not able to transcribe a word or a phrase or
a whole segment, and the consultants are also unable to make out the word or
phrase or segment. The reasons that this happens are numerous:
a.
The
speaker is speaking very softly.
b.
The
speaker is speaking very fast.
c.
The
speaker is speaking very loudly (clipping the sound file).
d.
The
speaker does not complete some phrases or sentences.
e.
There
is background noise (e.g., wind, music, people speaking, clothing on lavs).
f.
The
quality of the recording is poor (e.g., using the built-in camera mic, echo in
room).
g.
The
mic was positioned too far from the speakers.
h.
The
speaker turned away from the mic.
In
my case, the speakers often do not remember exactly what they said. In one
case, the speaker has a hearing disability, so they cannot help with
transcription of difficult areas of their own texts.
Areas
marked with (???) can sometimes be resolved in the translation steps. They can
sometimes be resolved in the review step, once the entire text has been
transcribed and translated. If you put the text on the back burner for a few
days and revisit it, sometimes the transcription will become clear. Lastly,
some consultants are better than others at transcription, so if you find some
difficult spots in a text, you can try to work them out with a different
consultant.
Even
for unresolved words, phrases and segments, try to give rough transcriptions
with as much detail as possible, since the more you chip away at a difficult area,
the greater the chance that you will have of resolving it later on.
As
a last resort, you might consider developing a policy where unresolved words
and phrases are replaced with reasonable alternatives that make sense in the
context of the oral text.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.