Ordinary Working Grammarian: Two Kinds of Data in Syntactic Fieldwork

This blog post is a prelude to future blog post: Why Video? In that blog post I will tackle the question of why a generative syntactician, like me, should care about video? But before I get to that point, I need to tackle some background issues concerning the source of data in syntactic fieldwork.

Disclaimer: I am in Botswana this year (2019-2020), and I have none of my precious books by my side. So I will state positions in this post with no references. This kind of writing frees me from the chains of what people have actually said, but it also means that you should be careful to not attribute what I say to any specific person or specific framework until you check the references.

Before discussing the main issue, let me get one non-issue out of the way. In generative grammar, the goal is to understand the human language faculty (UG) which is argued to be innate. The syntactician looks at particular I-languages, and tries to draw conclusions about UG (e.g., by poverty of stimulus arguments, or by comparisons to other languages). Other researchers may not share this framework. They may take a more functionalist perspective, and not assume the existence of UG. What I have to say in the following blog post is largely independent of these framework issues. Whatever perspective one takes on syntactic theory (Minimalism, Principles and Parameters, Arc-Pair Grammar, HPSG, LFG, DM, functionalism, etc.), it is important to establish basic empirical generalizations about the language one is working with, the kinds of empirical generalizations that one finds in a good descriptive grammar. The following blog posts addresses the issue of how to get the kind of data that allows one to establish those empirical generalizations.

In classical generative grammar, in the so-called Chomskyan tradition, the primary data is the grammaticality judgment. The following paradigm illustrates the relevant concepts:

(1)

a. Mary wrote up the paper.

b. Mary wrote the paper up.

(2)

a. Mary jogged up the hill.

b. *Mary jogged the hill up.

The data is organized into minimal pairs (e.g., (2a) vs. (2b)), and paradigms (e.g., (1) vs. (2)). Minimal pairs differ by at most one simple property. For example, the only difference between (2a) and (2b) is the position of the word up. The * in (2b) means that (2b) is ungrammatical.

I draw no difference in this post between a grammaticality judgment and an acceptability judgment. Syntacticians largely use these two terms interchangeably. The assumption is that speakers of English in hearing or reading a sentence such as (2b) know intuitively that something is wrong. They have a sense/perception/feeling that the sentence is ill-formed in some way. They are also able to evaluate the degree of ungrammaticality. The grammaticality levels are traditionally written with diacritics such as: ?, ??, ?*, *, ** (in that order).

From such tightly controlled data, one can draw solid conclusions. For example, from the contrast between (2a) and (2b), the conclusion is that locative prepositions cannot appear following the noun phrase. From the contrast between (1b) and (2b), the conclusion is that up has two uses: as a particle in (1) and as a locative preposition in (2).

Creating and evaluating such sentences constitutes an experiment. The syntactician constructs a sentence, or a minimal pair, and then evaluates the sentences for grammaticality. Hundreds of sentences like (1) and (2) can be generated and tested in a short time, leading to the possibility of developing sophisticated analyses of the data in a relatively short time (e.g., hours or days, as opposed to months or years). Generating sentences, testing them, and building analyses based on the results is the bread-and-butter of modern syntax. But it takes training to learn how to do this. There is lots more to say about grammaticality judgments. But I will leave all these issues aside in order to focus on the main topic of the essay.

The grammaticality judgments described above contrast with another messier sort data. People use language all the time: greetings, conversations, talking to themselves, buying and selling, school lectures, singing, television, movies, church services, the Bible, newspaper articles, advertisements, recipes, warning signs, instructions on packages, Google searches, e-mail messages, Facebook posts, texting, etc. Our lives are literally saturated with language.

All of this data is non-experimental, since it is not carefully designed and tested for grammaticality by the syntactician. Rather, it simply occurs. Sometimes it is recorded (e.g., written material accessible on the internet, or in recorded political debates, etc.), but mostly not. Most of it (99.99…%) is never evaluated explicitly for grammaticality. For lack of a better term, I will call such data natural, in order to contrast it with the carefully controlled grammaticality judgments described above.

So there are two types of syntactic data: experimental and natural.

As a side note, my usage of the term experimental here is somewhat non-standard. Nowadays the term seems to have been appropriated for syntactic data obtained in the lab, or obtained using a questionnaire administered to a small group of subjects, or obtained with Mechanical Turk. Such methods usually involve the use of statistics in evaluating the data. This kind of data is an extremely important development in modern syntax, and worth a lot of attention. But to call these methods of collecting data experimental, to the exclusion of the traditional grammaticality judgment task, is a huge error. The grammatical judgement task is the ultimate experiment: clearly defined, replicable, tightly controlled with a specific range of outcomes.

From the point of view of the grammaticality judgment task outlined above, natural data found in the wild suffers from two main problems. First, such data is all positive, in the sense that people do not normally produce ungrammatical sentences in their speech. If they do produce such sentences, it is because they are non-native speakers, or they have made some kind of error in speech production (e.g., starting to say one sentence, but finishing with another). Since the data is all positive, one cannot make any claim about what sentences cannot be produced by a particular person speaking a particular language. But knowing which sentences are ungrammatical is a powerful research tool. This is a serious deficiency in the use of natural data.

The second main problem is that such natural data is completely uncontrolled. It is as if you looked out the window at the leaves blowing around, and tried to come up with some kind of theory of leaf motion. But leaf motion is complex. You need to factor in gravity, wind, the size and shape of the leaf, the dryness of the leaf, and maybe many other factors. There are way too many factors to get a clear grasp of what is going on. And even describing what is going on is difficult. What leaf blowing patterns are out there, and which ones should you attempt to explain? This is why it was so important that the data in (1) and (2) was organized into minimal pairs and paradigms. Irrelevant factors are cleared away, and one focuses on some very particular issue (e.g., the position of up in (1b) and (2b)). Constructing sentences and evaluating them allows one to focus narrowly on particular parts of the sentence, and gives one confidence in ascribing the reason for ungrammaticality to some particular property. This narrow focus in turn allows one to quickly test various hypothesis, and to either reject them or to accept them.

Or to put it another way, if one relied solely on natural data, it is unclear that the relevant minimal pair in (2) would ever occur. And (2) is a relatively simple paradigm. When looking at more complex things (such as the uses of logophoric pronouns or the distribution of indefinite noun phrases), one needs quite precise data to be able to figure out what is going on. And it is unclear whether such precise data is forthcoming from a natural language corpus, especially in the fieldwork scenario, where the corpus is often constructed from transcribing few hours of recorded oral texts.

To clarify the issue, suppose you wanted to work out the pronominal system of a language having a complex series of contrasts such as person, number, gender, inclusive/exlusive, etc. One needs to get a paradigm of such pronouns in maximally similar contexts (to see how the surrounding environment influences the form and use of the pronouns). First, getting all the relevant pronouns from oral texts might not be even be possible for large pronominal inventories. Second, getting them all in a similar context will definitely be impossible even for a vast corpus (such as Google searches in English). But such pronominal paradigms are a core part of the description of any language.

The conclusions can be summarized in tabular form:

(3) Controlled/Uncontrolled Positive/Negative

Gram. Judgments Controlled Both

Natural Data Uncontrolled Positive

But what about grammaticality judgments? Are there any drawbacks to gathering data purely based on them? There are at least three main drawbacks, which I will outline here. Before going into the problems, let me outline what I consider to be a few non-problems.

A possible objection to grammaticality judgments is that they are artificial, used by syntacticians, but with no real connection to language use in the real world. One reason to doubt such a claim is that people do in fact have systematic and rich grammaticality judgments. It is hard to see how this could be so, if grammaticality judgments were not somehow connected to the language abilities of the speaker, and hence to their use of language in the real world. Second, in real life there are areas where people do in fact employ grammaticality judgments. If you hear a non-native speaker speaking English, and they make mistakes (e.g., in the use of the indefinite article), you recognize it immediately. You recognize the sentences that they use as ill-formed, and sense there is something wrong. This recognition is a form of grammaticality judgment. So this means that grammaticality judgments are not just an artifact of the syntactician’s experiment, but something that everybody in every language does all the time. In fact, grammatical judgments serve a social role. They allow you to pick out people from different dialects (e.g., the Pittsburgh passive, or positive anymore, or copula drop), and also to pick out people who have learned your language as a second language. I am not trying to explain the existence of grammaticality judgments in terms of their social role. That is, I am definitely not giving a functionalist explanation of grammaticality judgments. I am merely pointing out that grammaticality judgments are far from an esoteric task created by generative syntacticians to amuse themselves.

To deny the validity of grammaticality judgments is essentially a form of disrespect toward the speakers. They have knowledge of the language. Many speakers (but not all, see below) are easily able to judge sentences as grammatical or not. Why not listen to what they have to say about their own language, at least as one source of data?

Another possible objection that could be raised against grammaticality judgments is that they are dissociated from any context, so they are disembodied and abstract in a way that day-to-day language is not. While it is true that linguists often present sentences to their colleagues and students out of the blue, with no context, there is nothing inherent in the grammaticality judgment tasks that demands this form of testing. In fact, in Field Methods classes, I stress the importance of building contexts, even for evaluating the grammaticality of simple sentences. I also encourage students to write down the context, so that the context as well as the sentence can be replicated. The use of carefully constructed contexts is well-known in the formal semantics fieldwork literature. I see no reason why such contexts should not also be used for grammaticality judgments.

Now onto some real problems in the use of grammaticality judgments as the sole source of data.

First, in studying a language that one is unfamiliar with, it is impossible to predict what kinds of data will exist in that language. Suppose that your task is to investigate the syntax and semantics of =Hoan, a Khoisan language spoken in Botswana. English does not have pluractionality marking on verbs, so from the point of view of English there is no way to know that pluractionality marking exists in =Hoan. And even if the syntactician somehow stumbles on the existence of pluractionality marking, by accidentally eliciting some sentence containing a pluractionality marker, they may not be aware of the scope of the phenomenon and so they might not be able to know what kinds of sentences to construct and test.

So the approach to linguistic data based solely on grammaticality judgments runs into the problems of existence and scope. Does a phenomenon exist in a language, and if so, what is its empirical scope? These are severe problems. In fact, they are problems often encountered in ex-situ Field Methods courses where the emphasis is on grammaticality judgment tasks and translation tasks.

The problems of existence and scope can be partially solved by collecting oral texts. I have often been surprised at the kinds of interesting constructions that pop up in oral texts. There are grammatical constructions one finds in texts that would have been difficult to anticipate ahead of time. Once one finds an interesting construction in an oral text, one can begin to explore it using grammaticality judgments or translation tasks (or other tasks, like truth condition tasks). But without knowing the construction exists, it is impossible to explore it. Of course, collecting oral texts is not the final solution to the issues of existence and scope. There may be some constructions that simply do not appear very often, and hence may not show up on oral texts, especially is the collection of oral texts is limited.

The availability of oral texts also helps to study syntactic properties that are discourse based, for example, properties having to do with specificity, ellipsis, deixis, focus and information structure. Of course, all of these properties can also be studied with controlled experiments (precisely varying context), but having converging evidence from texts can be helpful, sometimes even providing hints to the syntactician for designing controlled experiments. Similarly, the nature of controlled grammaticality judgment experiments probably favors careful speech, since one is carefully constructing a sentence and evaluating it. But there may be many interesting fast speech or casual speech syntactic phenomena, including contraction and ellipsis.

English is far better studied than =Hoan, or any African language for that matter, and that is why for many topics one can get away with only looking at grammaticality judgments (and not looking at natural data). A lot of basic observations about English grammar have already been made by traditional grammarians, such as Curme and Jespersen, and many important details are documented in descriptive grammars. The best ESL grammars are rich sources of information on English grammar. Also, there is a long tradition of generative studies on English grammar that have uncovered piecemeal many of its interesting properties, and these have often been compiled into sources such as Pullum and Huddleston’s CGEL. For less studied languages like =Hoan, one cannot rely on this vast background knowledge.

By the way, lest the reader think that the problems of existence and scope only plague less studied languages such as =Hoan, it is also true that in English that there are still large swaths of unexplored territory. This is a fact not often recognized in the generative literature, and in fact, sometimes one hears statements to the contrary (e.g., in pronouncements about “The End of Syntax”). To take an example from my own work, the study of imposters (noun phrases such as yours truly, the undersigned, etc.) had received no description or analysis until very recently. And overcoming the problems of existence and scope for English crucially required using vast corpora (e.g., Google searches), as illustrated in Collins and Postal 2012.

See the following blog post on searching for linguistic data using Google:

http://ordinaryworkinggrammarian.blogspot.com/2018/10/using-in-google-searches_5.html

Another way to overcome the problems of existence and scope, complementing the use of oral texts, is the growing tradition of using linguistic questionnaires, often associated with computer databases. There are lots of interesting and powerful questionnaires being developed by syntacticians/typologists on various topics (e.g., word order, anaphora, focus, tense/aspect). Going through these questionnaires carefully can help to uncover the existence and scope of particular constructions in a language.

A second problem of relying solely on grammaticality judgments, especially in a fieldwork scenario, is that many consultants simply cannot give them. In other words, there are many consultants who when presented a sentence and asked to give a grammaticality judgment (suitably explained to them in terms of naturalness, etc.), either cannot do so, or give all positive responses. These kinds of consultants are usually elderly and illiterate. The fact that they are illiterate, in particular, means that they have never been obliged to develop the kind of meta-linguistic awareness about linguistic form that even very young literate speakers possess. And the unfortunate fact is that in work on endangered languages, the remaining speakers are almost always elderly and illiterate.

How can one study a language when there are no ungrammaticality judgments available? In part, one has to rely on oral texts. In part, there are various translation tasks that are easier for consultants to accomplish that are quite rich. For example, you can ask the consultant to translate sentence X from the language of communication L1 (which may be English) into the target language L2. This kind of task is quite easy for almost everybody, and yields rich results. And there is also the back-translation tasks which is to translate a sentence from L2 back into L1 (this test is often a good control on the translation task). But the translation task, like the grammaticality judgment task, is also an experiment. It is carefully controlled. The exact construction of X is important, and usually focuses on some very narrow issue. So some of the same issues that come up with grammaticality judgment tasks also face translation tasks. For example, the same issues of existence and scope arise for the translation task.

The third problem constructing a theory based solely on grammaticality judgments is the so-called observer’s paradox. The syntactician constructs a sentence and then evaluates it. In doing so, they set up all kinds of implicit biases. Do they want the sentence to come out grammatical or not? Why are they looking at these sentences in the first place? Why did they choose those particular words, instead of some other words? Especially for non-linguist consultants, in fieldwork scenarios, it is difficult to control for these effects. And so, there is always the question of whether the consultant is giving the syntactican what they think the syntactician wants. Of course, these effects can be mitigated in various ways. One can discuss the nature of the task with the consultants, and explain to them exactly what is wanted (and repeat these instructions on several occasions). One can give them clear examples of grammatical and ungrammatical sentences to prep them. One can ask different consultants or teams of consultants to see if there is uniformity across speakers.

A related issue is the construction of natural sentences. Sometimes the syntactician in the field will construct a perfectly straightforward and simple sentence, which for some reason (unrelated to grammaticality or to the issues being investigated) sounds unnatural to the speakers. Maybe it is word choice. Maybe it is register. Maybe it is the subject matter discussed in the sentence. The factors influencing how a sentence is perceived can be quite subtle and difficult to isolate. In working with our own native language, English in my case, we avoid this issue since we can easily construct natural sounding sentences in our own language. But in fieldwork on less studied languages, it is a real issue.

At the extreme end of the set of worries associated with the observer’s paradox is the issue of whether the syntactician is merely fabricating sentences that don’t really exist in the language, even though the speakers judge them as grammatical. Just as fake news plagues our modern political landscape, fake sentences might be plaguing our modern linguistic landscape. Perhaps in diving deeper and deeper into some theoretical issue, the syntactician constructs sentences that no speaker would ever use in any context, but are still judged as grammatical. Although I note this as a facet of the oberver’s paradox, I think it is less important that the others, because it is probably difficult to find some sentence X such that (a) no person would use it in any context and (b) it is judged as completely grammatical.

Oral texts can provide additional data to help avoid the various facets of the observer’s paradox. If one can find instances of constructions that one has been studying (e.g., involving serial verb constructions) in natural speech, and those instances have the properties that one has established by looking at grammaticality judgments (or the results of a translation task), then that is converging evidence. Using sentences from oral texts as the starting point of an investigation, based on more controlled experiments, also can help solve the issue of unnaturally constructed sentences.

A problem with using oral texts in this way is that in the fieldwork scenario those oral texts also usually suffer from their own observer’s paradox. The texts are often elicited by the syntactician. For example, the consultant can be asked to tell a folktale in their language, which is then recorded using a video camera and a mic. The presence of the video camera, the mic and the syntactician are all clues to the consultant that something special is going on. Whether or not this actually affects their speech is unclear to me. But it should be kept in mind that such speech is not completely free of the observer’s paradox, since the observer is so strongly present.

As a side note, not all natural data needs to be recorded. A particularly rich set of data in the field is just talking with the consultants either during breaks or during off hours. Just spending time with your consultants, trying to engage them in conversation and listening to what they say to each other can turn up all kinds of interesting syntactic data. In fact, trying to learn a language, and practicing with your consultants in a casual setting can be an important source of natural data. The syntactician is still present, but their role as an observer is replaced by their role as language learner.

I summarize all these points in the following table:

(4)

Gram. Judgments
Pros:    controlled, both positive and negative data
Cons:   issues of existence and scope, not all speakers, observer’s paradox

Translation Task:
Pros: controlled, all speakers
Cons: only positive data, issues of existence and scope, observer’s paradox

Natural Data
Pros:    helps address issues of existence and scope, helps address issues of observer’s paradox
Cons:   uncontrolled, only positive

My conclusion is that both kinds of data, experimental and natural, are crucial in syntactic fieldwork.

And in fact, I take the stronger position that both kinds of data are crucial to all syntactic research (even on well-studied languages like English). They complement each other, and partially help to resolve each other’s shortcomings. Take the specific task of writing a grammar or grammatical sketch of a less-studied language. Trying to write a grammar based purely on grammaticality judgments risks producing a grammar that is heavily skewed to what the researcher already knows (about other languages, such as their native language). And so, it risks missing out on interesting aspects of the language being studied (that could be quite important for syntactic theory), and not reflecting the real richness of the language. But trying to write a grammar purely based on a corpus, or recorded natural speech would be just as catastrophic, missing out on interesting generalizations about the structure of the language that could easily be uncovered by more controlled experiments, such as grammaticality judgments or the translation task.

Of course, that conclusion now raises the issue of how much? What is the right balance of controlled experimentation and natural speech to use in writing a grammar? Should 50% of the data (in terms of number of sentences) be from controlled experiments and 50% be from audio/video recordings of natural speech. There is no way to answer this question a priori. There are no guidelines to follow in terms of percentages.

But one thing is clear: the presence of huge corpuses that can be easily searched is of great importance for syntactic research. Such corpuses help to overcome the problems outlined above, especially the problems of existence and scope, problems which plague the description and analysis of even better studied languages like English.

But from the other angle, if one does find interesting data in a corpus, it is impossible to understand it without a healthy dose of controlled experimentation. We need to be able to play with language and manipulate it, and to run through the permutations and possibilities, in order to be able to discover what its properties are. We cannot learn about a language simply by keeping our eyes and ears open, and jotting down what is being said.

Furthermore, the use of powerful technologies such as audio and video recording equipment, and powerful software packages, such as FLEx and ELAN, should not distract us from the conclusion that it is impossible to do syntactic description and analysis solely on the basis of corpus data. In making this comment, I am in no way criticizing the use of these technologies in syntactic fieldwork. On the contrary, I feel that they are extremely important, game-changing technologies. Rather, the point I am trying to make is that one cannot see syntactic fieldwork merely as the application of these technologies to natural data. Rather, they need to be supplement by classical syntactic argumentation supported by classical experimental methodologies, such as grammaticality judgments and translation tasks.

Ordinary Working Grammarian

Wednesday, October 9, 2019

Two Kinds of Data in Syntactic Fieldwork

No comments:

Post a Comment