As I have discussed in previous blog posts that non-experimental data, such as transcribed oral texts, is an excellent source of syntactic data.
In this blog post, I outline some of the factors that affect the rate of transcription of recorded oral texts when doing syntactic fieldwork. If anybody knows of relevant literature on this topic, please let me know. A systematic survey amongst fieldworkers would probably be useful in helping to understand the process, and maybe to help make it more efficient.
Certainly, the advent of tools like digital recorders, Praat, ELAN and FLEx has increased the speed at which it is possible to process oral texts. I can recall the days not that long ago (the early nineties) when I would transcribe a text by pressing the rewind button over and over. As far as I know, there were no digital recorders at that point, or at least none that I had access to. And therefore, I also did not have the benefit of using Praat to see the actual waveform while transcribing.
How long does transcription take exactly, and what are the factors influencing how long it takes? My answer is that in ideal circumstances, the rate of transcription will be around 40 to 1 (40 minutes of transcription for one minute of recording). In this blog post, I will explain my answer.
For concreteness, I am assuming that the linguist is working with ELAN, and is transcribing the recording into either IPA or an orthography. I assume that the linguist is also translating (but not glossing) the text. Glossing would add considerably more time, and is more efficient in FLEx (than in ELAN). In my calculations of rate of transcription, I put aside here the time it takes to set up the project in ELAN. The linguist needs to have the video and audio file ready (or to prepare them in Adobe Premier Pro CC, or a similar program). They need to load the files into ELAN and create the tier types and tiers. All this takes time (e.g., 5-15 minutes, depending on what exactly needs to be done).
My process for transcription is as follows:
a. Segment the recording in ELAN.
On many occasions the boundaries have to be redone during transcription (adjusting boundaries, merging segments, dividing segments).
b. Listen to an individual segment.
Usually, we listen to the segment two or three times just to get going. I also focus in on any unclear parts of the segment and play them a few times.
c. Transcribe the segment into IPA
If I have any problems with transcription, I ask the native speaker consultant for help. They can help me identify a word, or repeat part or all of the segment.
d. Translate the segment into Setswana.
This is the job of the native speaker consultant, who speaks both Sasi and Setswana fluently. The translation is then written down by the translator. I check to see if there are any egregious errors in the Setswana translation. The most frequently occurring problem is that the consultant will often not translate the segment, but rather explain it in some way.
e. Translate the segment into English.
This is the job of the translator. I check to see if all three (the Sasi transcription, the Setswana translation and the English translation) match. Sometimes getting them to match takes a bit adjustment.
f. Repeat b-e until finished with the recording.
As an example, I made a recording in Sasi of how to make local beer. The recording lasted 1 minute and 35 seconds and took me about an hour and a half to transcribe and translate (no glossing). So that is a ratio of approximately 60 to 1. The entire text has 32 sentences. The transcription and translation of this particular text were relatively easy from my point of view. The speaker spoke in a clear fashion on a clearly delimited topic (with known vocabulary). Also, the speaker was present, helping me with the transcription when necessary. A more difficult text with a different speaker could have easily taken much longer.
Not everybody will have the same set-up as me (e.g., translating into two languages), but they may have their own complications of various sorts (e.g., a language without a dictionary). I list below some of the factors that could affect the rate of transcription, and comment on them with respect to my Sasi project. Given these factors, it is difficult to estimate how long transcription should take. But in ideal circumstances, I think a ratio of 40 to 1 is reasonable. Ideal circumstances include: the presence of an experienced native speaker consultant on hand to help with the transcription, good sound quality, translation into only one language, the existence of a good dictionary and limiting the goal to broad phonetic transcription.
So if you have a six minute video, you can plan on taking the morning to transcribe it accurately.
What is the fastest possible rate? I would say that is around 20 to 1. Just the physical acts of segmenting the recording, listening to the segments a few times, transcribing the segments and writing the English translations would take you very close to a ratio 20 to 1.
Here are some of the factors that affect the rate of transcription:
Is the transcriber a native speaker of the language?
Sasi Project: In my case, I am not a native speaker of Sasi. Furthermore, all speakers of Sasi are elderly (over 60) and illiterate. There are no existing native speaker transcribers. If the transcriber were a native speaker of the language being investigated, it would certainly be a huge advantage to them in transcribing oral texts. I don’t think it would resolve all the issues listed below, but it would make the process faster.
Is the original speaker (of the oral text) present for the transcription? If they are present, are they able to help with the transcription? (How old are they? What is their hearing like? What is their health like? Are they inebriated or otherwise incapacitated?)
Sasi Project: In my typical set-up, I try to make sure the speaker is present for the transcription. The speaker can listen to the segment, and if needed, help me to transcribe it. Some of my consultants have hearing problems making it difficult for them to help me, even when they are the speaker of the oral text.
Is a native speaker consultant present to help with the transcription? And if so, are they experienced? Do they have basic transcription skills?
Basic transcription skills can be learned and practiced. If you have an experienced consultant, it can be very helpful in speeding the process along. Such skills include: (a) repeating a segment exactly (as spoken in the recording), (b) translating a segment of the recording, (c) repeating a single word from the segment, (d) defining a single word from a segment, (d) saying whether a segment has been transcribed correctly, (e) saying whether a segment has been translated correctly.
Sasi Project: Because of age, some of my consultants are unable to comply with all the task demands of helping to transcribing text.
How good is the sound quality of the recording? Is there clipping, background noise, wind, hissing, echo in the room, etc.? Did the speaker turn their head away from the mic often? Any decrease in the quality of the recording can make transcription more difficult.
Sasi Project: For example, if the waveform is clipped, a click can sound like a non-click consonant, making it difficult to recognize the word being used. Usually, a good consultant can recognize the word anyway (even with clipping), but not always. In my experience, lavalier mics produce the best sound (less background noise), but other recording methods can produce acceptable recordings as well.
How many languages are being translated into?
Sasi Project: Sasi is being transcribed, and translated into Setswana and then English. I find the Setswana and English translations help with the Sasi transcriptions. However, it is often the case that the translation from Setswana into English raises its own particular problems that take time to resolve. If a linguist only had to translate the transcription into one language (e.g., English) the process would be faster. The addition of a second language of translation (Setswana in my case) is one of the biggest time sinks in the process.
If you have a translator (between two translation languages), what is their background? Do they have translation experience? What is their level of English? Does the translator speak the local dialect (of the speakers)? Are they familiar with the concepts that the speakers use in their oral texts (e.g., the various plants and animals)? How well does the translator interact with the speakers?
Sasi Project: Often my translator has to use a dictionary to translate from Setswana to English. Each time they pick up the dictionary, it adds a minute to the process. And the local Setswana (Sengwato, Central Region, Botswana) is not identical to standard Setswana, raising further difficulties in translation.
How clear is the speech of the speaker? Do they speak quickly, running words together? Do they speak really softly, sometimes almost inaudibly? Do they speak with large bursts of intensity (clipping the sound file)? Do they often turn their head from mic? These are the kinds of issues that have a big effect on the difficulty of transcription.
Sasi Project: I find speakers vary a lot on these dimensions. Some speakers have a careful pronunciation of words, whereas some speak very quickly leaving out or greatly reducing morphemes. As for people who speak softly, to some extent one can increase the gain with software (e.g., Adobe Premiere Pro CC).
What is the subject matter of the text? Does it involve new vocabulary that needs to be carefully transcribed and translated? Does it involve a complex topic unfamiliar to the linguist?
Sasi Project: In the recording of making local beer discussed above, the topic of the text was clearly delimited, and only known vocabulary was used. For other more free flowing recordings (e.g., a life story, relatively unstructured interviews, unscripted conversations), new words will come up, and it takes time to give accurate transcriptions (and translations) of these words, which adds to the time of transcription.
How well is the language documented? Is there a good grammar and a good dictionary (both available in an electronically searchable format)? Does the language have a linguistic tradition of papers written on it? Many endangered or less studied languages may lack any or all of these useful documents.
Sasi Project: Sasi is not very well documented. I have a short dictionary that I wrote (in collaboration with Andy Chebanne), and there are sections about Sasi in the =Hoan grammar (written with Jeff Gruber). There are no linguistic papers uniquely on Sasi, but I have written a few papers on the closely related language =Hoan. There is a spelling primer available online (written with Zach Wellstood). Other than that, there are no other resources on Sasi.
How much practice has the linguist had in transcription? Certain transcription issues occur over and over. Once these issues are resolved, the job of transcribing new texts is easier.
Since 1996, I have transcribed seventeen short oral texts combined for Sasi and =Hoan. During this year (2019-2020), I plan to transcribe five hours of video. In the past, I have only transcribed audio recordings, not video recordings.
As an example of a recurring transcription issue, in Sasi ka ki “with” often comes out as ka i, or even a more reduced form. But the tone is immediately recognizable (high low). So it only takes a few encounters before “with” becomes easy to transcribe consistently.
Is there more than one participant? If so, do their contributions continually overlap?
Sasi Project: I have both one person and two person recordings. The two person recordings are mostly interviews (one Sasi speaker interviewing another). In the interviews, there is lots of overlap at the edges of the sentences, making transcription more difficult. For most of these interviews, each person has their own Lavalier mic.
How much uncertainty are you willing to live with? Are you willing to simply delete an obscure barely audible passage? Are you willing to have question marks in your transcription? Are you willing to adopt the suggestions made by your consultants, even when they clearly diverge from the recording?
Sasi Project: Many problems of transcription have to be puzzled over in order to try to figure out exactly what was said. To address these sorts of problems, I find that it helps to do a complete rough transcription of the whole text and then to go over the problem areas at a later date in order to resolve them. But of course, such a second pass adds more time to the transcription process.
What level of detail do you want in the transcription? Are you just aiming for a broad phonetic transcription, or do you want to transcribe fine phonetic detail of production? Are intonation, gesture and possibily other features being transcribed?
Sasi Project: I usually aim for broad phonetic transcription. More fine grained transcriptions would take a longer time.