Thursday, December 26, 2019

Two Kinds of Data in Syntactic Fieldwork: Experimental and Non-Experimental

[This is a revision of an earlier post (October 9, 2019), responding to feedback that I received at that time.]

In classical generative grammar, the primary type of data is the grammaticality judgment. The following paradigm illustrates the relevant concepts:

Mary wrote up the paper.
Mary wrote the paper up.

Mary jogged up the hill.
*Mary jogged the hill up.

The data is organized into minimal pairs (e.g., (2a) vs. (2b)), and paradigms (e.g., (1) vs. (2)). Minimal pairs differ by at most one simple property. For example, the only difference between (2a) and (2b) is the position of the word up. The * before (2b) means that it is ungrammatical.

The assumption is that speakers of English in hearing or reading a sentence such as (2b) know intuitively that something is wrong. They have a sense or perception or feeling that the sentence is ill-formed in some way. They are also able to evaluate the degree of ungrammaticality. The grammaticality levels are traditionally written with diacritics such as: ?, ??, ?*, *, ** (in that order).

I draw no difference in this post between a grammaticality judgment and an acceptability judgment. Syntacticians largely use these two terms interchangeably. Some people use grammaticality judgment to mean the status assigned to a sentence by a particular theoretical grammar, and acceptability judgment to mean the judgment given by a particular person of a sentence (thanks to Alan Munn for bringing this up). The idea is that people can reject sentences for all kinds of reasons (some independent of grammar), and our job as syntacticians is to find out how grammar plays a role in those judgments. When I use the term grammaticality judgment here, I mean only the judgments given by people of sentences (not the status with respect to a particular theory). So for me grammaticality judgments are data, and the * mark before (2b) is the judgment of Chris Collins of the sentence written in (2b) on Sunday, July 12, 2019.

From such tightly controlled data, one can draw solid conclusions. For example, from the contrast between (2a) and (2b), the conclusion is that locative prepositions do not appear following the noun phrase. From the contrast between (1b) and (2b), the conclusion is that up has two uses: as a particle in (1) and as a locative preposition in (2).

The sentences in (1) and (2) are presented without context (out of the blue) since they are very simple. The implicit assumption for such data presented out of context is that it is easy to create contexts where (1a,b) and (2a) would be acceptable, and impossible to create a context where (2b) would be acceptable. However, as a general rule it is better to pair up sentences with contexts when asking for grammaticality judgments. I return to this point later on in the essay.

How do you elicit grammaticality judgments in the field? Here is an example of the kind of instructions that can be given.

Your task is to judge how natural the sentence sounds to you. The sentence may be completely natural, or not at all natural, or somewhere in between.

You should be ready to discuss these instructions a length. For example, you can help to explain what you are getting at by saying that a natural sentence would be used by a native speaker.  Also, you should be ready to use substitute terms if (3) does not translate well into your language of communication. For example, instead of using “natural”, you might use “good” or something like that. I use the phrase a go siame in Setswana, which means “Is it good?”.

Crucially, the subject should then be given a warm-up trial with clear examples of grammatical and ungrammatical sentences that they are asked to judge. And they should be given feedback on the results of this trial. It might take two or three trials for them to understand what you are getting at.

Creating and evaluating such sentences constitutes an experiment. The syntactician constructs a sentence, or a minimal pair, and then evaluates the sentences for grammaticality. Hundreds of sentences like (1) and (2) can be generated and tested in a short time, leading to the possibility of developing sophisticated analyses of the data in a relatively short period of time (e.g., hours or days, as opposed to months or years). Generating sentences, testing them, and building analyses based on the results is the bread-and-butter of modern syntax. But it takes training to learn how to do this. There is lots more to say about grammaticality judgments. But I will leave all these issues aside in order to focus on the main topic of the essay.

The grammaticality judgments described above contrast with another messier sort data. People use language all the time: greetings, swearing, proverbs, jokes, riddles, casual conversations, heated arguments, scientific explanations, talking to oneself, buying and selling, school lectures, singing, television shows, movies, church services, books, the Bible, newspaper articles, advertisements, recipes, warning signs, instructions on packages, Google searches, e-mail messages, Facebook posts, texting, etc. Our lives are literally saturated with language from the moment we wake up in the morning, until the moment we fall asleep at night.

All of this data is non-experimental, since it is not carefully prepared and tested by the syntactician. Rather, it simply occurs. Sometimes it is recorded and accessible for search (e.g., all written material in the internet, or recorded political debates, etc.), but mostly not. Most of it (99.99…%) is never evaluated explicitly for grammaticality. I will simply call this data non-experimental, as a cover term for any data that occurs outside of the confines of a controlled experiment. Another term might be natural, but then that would include all kinds of data (such as texting and e-mails) that one might not want to call natural (because of the artificial medium on which it is written).

In the field, getting non-experimental data can be a challenge. Unlike the situation for English, there are no sources that one can Google for spontaneous speech and writing when dealing with an endangered language spoken in rural Botswana by a handful of elderly speakers. Rather, the linguist needs to record all the speech on their own. Usually, the linguist will arrange to record an oral text (audio and video), such as a folktale. But the circumstances of the recording are tighly controlled, including the presence of a mic (which may be awkwardly attached to the speaker’s clothing) and a camera (which takes several minutes to set up). The linguist also positions the speaker with respect to the camera and tells them when to start speaking. And they are usually paid for their services. In spite of all this, I still count such a recorded oral text as non-experimental, since the speaker is producing the sentences with no direct linguistic prompts (like a sentence to judge or a sentence to translate).

Given this background, we have the following conclusion:

There are two types of syntactic data: experimental and non-experimental.

As a side note, my usage of the term experimental here is somewhat non-standard. Nowadays the term seems to have been appropriated for syntactic data obtained in the lab, or obtained using a questionnaire administered to a small group of subjects, or obtained with Mechanical Turk. Such methods involve the use of statistics in evaluating the data. I prefer the term psycholinguistics for such methods. This kind of data is an extremely important development in modern syntax, and worth a lot of attention and discussion of what the results actually mean. But to call these methods of collecting data experimental, to the exclusion of the traditional grammaticality judgment task, is a huge error. The grammaticality judgement task is the ultimate experiment: clearly defined, replicable, tightly controlled with a specific range of outcomes. Similar considerations hold for the phrase experimental semantics (Pauline Jacobson, personal communication).

Of course, one can also discuss the use of psycholinguistic methodology in doing fieldwork, but that is a topic that I am not qualified to write about.

From the point of view of the grammaticality judgment task outlined above, non-experimental data found in the wild suffers from two main problems. First, such data is all positive, in the sense that people do not normally produce ungrammatical sentences in their speech. If they do produce such sentences, it is because they are non-native speakers, or they have made some kind of error in speech production (e.g., starting to say one sentence, but finishing with another). Since the data is all positive, one cannot make any definitive claims about what sentences cannot be produced by a particular person speaking a particular language. But knowing which sentences are ungrammatical is a powerful research tool, both for describing languages in a grammar and for creating theoretical models. This is a serious deficiency in the use of non-experimental data.

The second main problem is that such non-experimental data is completely uncontrolled. It is as if you looked out the window at the leaves blowing around, and tried to come up with some kind of theory of leaf motion. But leaf motion is complex. You need to factor in gravity, wind, the size and shape of the leaf, the dryness of the leaf, and maybe many other factors (humidity?, elevation?, temperature? barometric pressure?). There are way too many factors to get a clear grasp of what is going on. And even describing what is going on informally is difficult. What leaf blowing patterns are out there, and which ones should you attempt to explain? This is why it was so important that the data in (1) and (2) was organized into minimal pairs and paradigms. Irrelevant factors are cleared away, and one focuses on some very particular issue (e.g., the position of up in (2a,b)). Constructing sentences and evaluating them allows one to focus narrowly on particular parts of the sentence, and gives one confidence in ascribing the reason for ungrammaticality to some particular property. This narrow focus in turn allows one to quickly test various hypothesis, and to either reject them or to accept them.

Or to put it another way, if one relied solely on non-experimental data, it is unclear that the relevant minimal pair in (2) would ever occur. And (2) is a relatively simple paradigm. When looking at more complex things (such as the uses of logophoric pronouns, indefinite noun phrases, locality constraints or pluractional markers), one needs quite precise data to be able to figure out what is going on. And it is unclear whether such precise data is forthcoming from a natural language corpus, especially in the fieldwork scenario, where the corpus is often constructed from transcribing few hours of recorded oral texts.

To clarify the issue, suppose you wanted to work out the pronominal system of a language having a complex series of contrasts such as person, number, gender, inclusive/exlusive, grammatical function (subject, object, possessor), etc. One needs to get a paradigm of such pronouns in maximally similar contexts (to see how the surrounding environment influences the form and use of the pronouns). First, getting all the relevant pronouns from oral texts might not be even be possible for large pronominal inventories. Second, getting them all in a similar context (modulo grammatical function) will definitely be impossible even for a vast corpus (such as Google searches in English). But such pronominal paradigms are a core part of the description of any language.

But what about grammaticality judgments? Are there any drawbacks to gathering data solely based on them? There are at least three main drawbacks, which I will outline here. They are existence and scope, task difficulty and the observer’s paradox. Before going into the problems, let me outline what I consider to be a few non-problems.

A possible objection to the use of grammaticality judgments (in language description and syntactic theory) is that they are artificial, used by syntacticians, but with no real connection to language use in the real world. One reason to doubt such a claim is that people do in fact have systematic and rich grammaticality judgments. It is hard to see how this could be so, if grammaticality judgments were not somehow connected to the language abilities of the speaker, and hence to their use of language in the real world. In fact, in real life there are areas where people do make use of grammaticality judgments. If you hear a non-native speaker speaking English, and they make mistakes (e.g., in the use of the indefinite article), you recognize it immediately. You recognize the sentences that they use as ill-formed, sensing there is something wrong. This recognition is a form of grammaticality judgment. So this means that grammaticality judgments are not just an artifact of the syntactician’s experiment, but something that everybody in every language does all the time. In fact, grammatical judgments serve a social role. They allow you to pick out people from different dialects (e.g., the Pittsburgh passive, positive anymore, copula drop, standing on line), and also to pick out people who have learned your language as a second language. I am not trying to explain the existence of grammaticality judgments in terms of their social role. That is, I am definitely not giving a functionalist explanation of grammaticality judgments. I am merely pointing out that grammaticality judgments are far from an esoteric task created by generative syntacticians to amuse themselves.

Another possible objection that could be raised against grammaticality judgments is that they are dissociated from any context, so they are disembodied and abstract in a way that day-to-day language is not. While it is true that linguists often present sentences to their colleagues and students out of the blue, with no context, there is nothing inherent in the grammaticality judgment task that demands this form of testing. In fact, in Field Methods classes, I stress the importance of building contexts, even for evaluating the grammaticality of simple sentences. I also encourage students to write down the context, so that the context becomes part of the data, making it easier to replicate the data. The use of carefully constructed contexts is well-known in the formal semantics fieldwork literature. I see no reason why such contexts should not also be used for grammaticality judgments.

As an exercise for the reader, try to find a sentence in English that seems ungrammatical or marginal out of the blue, but gets better once a specific linguistic context is provided.

A related objection is that grammaticality judgments are in fact complex. There is no way for a speaker in giving a judgment to disentangle the many different factors that could lead to a judgment of ungrammaticality, including the inappropriateness of a sentence in a particular context or difficulty in processing certain kinds of sentences. But this is just science as usual. There is some data (the grammaticality judgment), and the syntactician must make sense of that data. They (not the speaker) must figure out why the data patterns as it does. Given a particular set of grammaticality judgments, the linguist can decide whether to rerun the experiments, changing context, finding shorter sentences, finding more natural sentences, etc. I do not take this as a valid criticism of the use of grammaticality judgments in uncovering syntactic generalizations.

To deny the validity of grammaticality judgments is essentially a form of disrespect toward the speakers. They have knowledge of the language. Many speakers (but not all, see below) are easily able to judge sentences as grammatical or not. Why not listen to what they have to say about their own language, at least as one source of data?

Now onto some real problems in the use of grammaticality judgments as the sole source of data.

First, in studying a language that one is unfamiliar with, it is impossible to predict what kinds of data will exist in that language. Suppose that your task is to investigate the syntax and semantics of =Hoan, a Khoisan language spoken in Botswana. English does not have pluractionality marking on verbs, so from the point of view of English there is no way to know that pluractionality marking exists in =Hoan. And even if the syntactician somehow stumbles onto the existence of pluractionality marking, by accidentally eliciting some sentence containing a pluractionality marker, they may not be aware of the scope of the phenomenon and so they might not be able to know what kinds of sentences to construct and test.

So an approach to gathering linguistic data based solely on grammaticality judgments runs into the twin problems of existence and scope. Does a phenomenon exist in a language, and if so, what is its empirical scope? These are severe problems. In fact, they are problems often encountered in ex-situ Field Methods courses where the emphasis is on grammaticality judgment tasks and translation tasks.

The problems of existence and scope can be partially addressed by collecting oral texts. I have often been surprised at the kinds of interesting constructions that pop up in oral texts. There are grammatical constructions one finds in texts that would have been difficult to anticipate ahead of time. Once one finds an interesting construction in an oral text, one can begin to explore it using grammaticality judgments or translation tasks (or other tasks, like the truth value judgment task). But without knowing the construction exists, it is impossible to explore it. Of course, collecting oral texts is not the final solution to the issues of existence and scope. There may be some constructions that are simply not used very often, and hence may not show up in oral texts, especially if the collection of oral texts is limited.

The availability of oral texts also helps to study syntactic properties that are discourse based, for example, properties having to do with specificity, ellipsis, deixis, focus and information structure. Of course, all of these properties can also be studied with controlled experiments (precisely varying context), but having converging evidence from texts can be helpful, sometimes even providing hints to the syntactician for designing controlled experiments. Similarly, the nature of controlled grammaticality judgment experiments probably favors careful speech, since one is carefully constructing a sentence and evaluating it. But there may be many interesting fast speech or casual speech syntactic phenomena, including contraction and ellipsis.

English is far better studied than =Hoan, or any African language for that matter, and that is why for many topics one can get away with only looking at grammaticality judgments (and not looking at non-experimental data). A lot of basic observations about English grammar have already been made by traditional grammarians, such as Curme and Jespersen, and many important details are documented in descriptive grammars. The best ESL grammars are rich sources of information on English grammar. Also, there is a long tradition of generative studies of English grammar that have uncovered piecemeal many of its interesting properties, and these have been compiled into sources such as Pullum and Huddleston’s CGEL (Cambridge Grammar of the English Language). For less studied languages like =Hoan, one cannot rely on this vast background knowledge. So one needs to use any source of data available, including various experimental tasks, but also oral texts.

By the way, lest the reader think that the problems of existence and scope only plague less studied languages such as =Hoan, it is also true that in English there are still large swaths of unexplored territory. This is a fact not often recognized in the generative literature, and in fact, sometimes one hears statements to the contrary (e.g., in pronouncements about “The End of Syntax”). To take an example from my own work, the study of imposters (noun phrases such as yours truly, the undersigned, etc.) had received no systematic description or analysis until very recently. And overcoming the problems of existence and scope for English crucially required using vast corpora (e.g., Google searches), as illustrated in Collins and Postal 2012 (Imposters. MIT Press).

See the following blog post on searching for linguistic data using Google:

Another way to overcome the problems of existence and scope, complementing the use of oral texts, is the growing tradition of using linguistic questionnaires, often associated with computer databases. There are lots of interesting and powerful questionnaires being developed by syntacticians and typologists on various topics (e.g., word order, anaphora, focus, tense/aspect). Going through these questionnaires carefully can help to uncover the existence and scope of particular constructions in a language. Responding to such questionnaires constitutes another form of experimental data.

A second problem of relying solely on grammaticality judgments, especially in a fieldwork scenario, is that many consultants simply cannot give them. Giving grammaticality judgments is difficulty task. In other words, there are many consultants who when presented a sentence and asked to give a grammaticality judgment (suitably explained to them in terms of naturalness, etc.), either cannot do so, or give all positive responses. These kinds of consultants are usually elderly and illiterate. The fact that they are illiterate, in particular, means that they have never been obliged to develop the kind of meta-linguistic awareness about linguistic form that even very young literate speakers possess. And the unfortunate fact is that in work on endangered languages, the remaining speakers are almost always elderly and illiterate.

After posting the original version of this blog post, I received feedback from other people who have also had problems eliciting grammaticality judgments. Gary Thoms noted that such people were pretty much always older speakers. Paco Ordonez noted that education plays a role in the sense that some speakers who could not give grammaticality judgments had hardly any formal education. I assume the problem that such speakers face is the lack of meta-linguistic awareness. Formal education forces us to become aware of properties of sounds and words and sentences. We are told to copy sentences, and perform various tasks on them. So it is a small step to also assign a grammaticality judgment to sentences. But people without formal education have never consciously carried out such tasks before and they do not have explicit awareness of how to do them. I assume that such people do have the underlying grammar, but the problem is in articulating judgments. It would be an interesting research project to look at the factors determining whether or not people are able to give grammaticality judgments and to try to get some kind of broader idea (in terms of the general population) of what percentage of people are unable to give them, or have some kinds of difficulties in giving them. As far as I know these questions have not be investigated before since the idea that everybody can give grammaticality judgments is simply taken for granted.

On a general note, a lot of tasks that we take for granted in doing our linguistic work are actually tasks that require a high degree of meta-linguistic awareness. They often need to be explained in great detail to the consultants who need to be trained on how to carry them out.

How can one study a language when there are no ungrammaticality judgments available? In part, one has to rely on oral texts. In part, there are various other tasks that are easier for consultants to accomplish that are also quite rich. For example, there is the translation task. You can ask the consultant to translate sentence X from the language of communication L1 (which may be English) into the target language L2. This kind of task is quite easy for almost everybody (assuming that they are bilingual speakers), and yields rich results. In fact, I would say that the translation task is the most frequent form of elicitation task that I use (more than the grammaticality judgment task, or any other task). There is also the back-translation task which is to translate a sentence from L2 back into L1 (this test is often a good control on the translation task). But the translation task, like the grammaticality judgment task, is also an experiment. It is carefully controlled. The exact construction of X is important, and usually focuses on some very narrow issue. And therefore, some of the same issues that come up with grammaticality judgment tasks also face translation tasks. For example, the same issues of existence and scope arise for the translation task.

On some basic fieldwork skills for consultants, including the translation task, see the following posts:

The third problem of relying solely on grammaticality judgments is the so-called observer’s paradox. The syntactician constructs a sentence and then evaluates it. In doing so, they set up all kinds of implicit biases. Do they want the sentence to come out grammatical or not? Why are they looking at these sentences in the first place? Why did they choose those particular words, instead of some other words? Especially for non-linguist consultants, in fieldwork scenarios, it is difficult to control for these effects. And so, there is always the question of whether the consultant is giving the syntactican what they think the syntactician wants. Of course, these effects can be mitigated in various ways. One can discuss the nature of the task with the consultants, and explain to them exactly what is wanted (and repeat these instructions on several occasions). One can give them clear examples of grammatical and ungrammatical sentences to prep them. One can make sure that the context in which the sentence is evaluated is clearly stated and understood. One can ask different consultants or teams of consultants to see if there is uniformity across speakers.

A related issue is the construction of natural sentences. Sometimes the syntactician in the field will construct a perfectly straightforward and simple sentence, which for some reason (unrelated to grammaticality or to the issues being investigated) sounds unnatural to the speakers. Maybe it is word choice. Maybe it is register. Maybe it is the subject matter discussed in the sentence. Maybe there is a more common or colloquial way of saying the same thing. The factors influencing how a sentence is perceived can be quite subtle and difficult to isolate. In working with our own native language, English in my case, we avoid this issue since we can easily construct natural sounding sentences in our own language. But in fieldwork on less studied languages, it is a real issue.

Oral texts can provide additional data to help avoid the various facets of the observer’s paradox. If one can find instances of constructions that one has been studying (e.g., involving serial verb constructions) in natural speech, and those instances have the properties that one has established by looking at grammaticality judgments (or the results of a translation task), then that is converging evidence. Also, using sentences from recorded oral texts as the starting point of an investigation, based on more controlled experiments, can help solve the issue of unnaturally constructed sentences.

As a side note, not all non-experimental data needs to be recorded. A particularly rich set of data in the field is just talking with the consultants either during breaks or during off hours. Just spending time with your consultants, trying to engage them in conversation and listening to what they say to each other can reveal interesting syntactic data. In fact, trying to learn a language, and practicing with your consultants in a casual setting can be an important source of natural data. The syntactician is still present, but their role as an observer is replaced by their role as conversational partner and as language learner. For anybody trying to do fieldwork, or language documentation, actually making an effort to learn a language can lead to all kinds of interesting insights. In fact, trying to learn to speak a language by speaking with community members constitutes a source of linguistic data which is as important as the other kinds of data discussed in this post (e.g., grammaticality judgments, translations, oral texts).

I summarize these points as follows:

Grammaticality Judgment Task:
Pros: controlled, both positive and negative data
Cons: issues of existence and scope, task difficulty, observer’s paradox

Translation Task:       
Pros: controlled, task is not difficult
Cons: only positive data, issues of existence and scope, observer’s paradox

Pros: helps to address issues of existence and scope, task difficulty and observer’s paradox
Cons: uncontrolled, only positive

My conclusion based on this discussion is the following:

Both kinds of data, experimental and non-experimental, are crucial in syntactic fieldwork.

And in fact, I take the stronger position that both kinds of data are crucial to all syntactic research (even on well-studied languages like English). They complement each other, and partially help to resolve each other’s shortcomings. Take the specific task of writing a grammar or grammatical sketch of a less-studied language. Trying to write a grammar based purely on grammaticality judgments risks producing a grammar that is heavily skewed to what the researcher already knows (about other languages, such as their native language). And so, it risks missing out on interesting aspects of the language being studied (that could be quite important for syntactic theory), and not reflecting the real richness of the language. But trying to write a grammar purely based on a corpus, or recorded natural speech would be just as catastrophic, missing out on interesting generalizations about the structure of the language that could easily be uncovered by more controlled experiments, such as grammaticality judgments or the translation task.

Of course, the conclusion in (6) now raises the issue of how much? What is the right balance of controlled experimentation and non-experimental data to use in writing a grammar? Should 50% of the data (in terms of number of sentences) be from controlled experiments and 50% be from audio/video recordings of natural speech. There is no way to answer this question a priori. There are no guidelines to follow in terms of percentages. But one thing is clear: the presence of corpuses that can be easily searched is of great importance for syntactic research. Such corpuses help to overcome the problems outlined above, especially the problems of existence and scope, problems which plague the description and analysis of even better studied languages like English.

But from the other angle, if one does find interesting data in an oral text, it is impossible to understand it without a healthy dose of controlled experimentation. We need to be able to play with language and manipulate it, and to run through the permutations and possibilities, in order to be able to discover what its properties are.

Furthermore, the use of powerful technologies such as audio and video recording equipment, and powerful software packages, such as FLEx and ELAN, should not distract us from the conclusion that it is impossible to do syntactic description and analysis solely on the basis of corpus data. In making this comment, I am in no way criticizing the use of these technologies in syntactic fieldwork. On the contrary, I feel that they are extremely important, game-changing technologies. Rather, the point I am trying to make is that one cannot see syntactic fieldwork merely as the application of these technologies to natural speech. Rather, they need to be supplemented by classical syntactic argumentation supported by classical experimental methodologies, such as grammaticality judgments and translation tasks.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.