[This is a revision of an earlier post (October 9, 2019), responding to feedback that I received at that time.]
In classical generative grammar, the primary type of data is the grammaticality judgment. The following paradigm illustrates the relevant concepts:
(1)
a.
Mary
wrote up the paper.
b.
Mary
wrote the paper up.
(2)
a.
Mary
jogged up the hill.
b.
*Mary
jogged the hill up.
The
data is organized into minimal pairs (e.g., (2a) vs. (2b)), and paradigms
(e.g., (1) vs. (2)). Minimal pairs differ by at most one simple property. For
example, the only difference between (2a) and (2b) is the position of the word up. The * before (2b) means that it is
ungrammatical.
The
assumption is that speakers of English in hearing or reading a sentence such as
(2b) know intuitively that something is wrong. They have a sense or perception
or feeling that the sentence is ill-formed in some way. They are also able to
evaluate the degree of ungrammaticality. The grammaticality levels are traditionally
written with diacritics such as: ?, ??, ?*, *, ** (in that order).
I
draw no difference in this post between a grammaticality judgment and an
acceptability judgment. Syntacticians largely use these two terms
interchangeably. Some people use grammaticality judgment to mean the status
assigned to a sentence by a particular theoretical grammar, and acceptability
judgment to mean the judgment given by a particular person of a sentence
(thanks to Alan Munn for bringing this up). The idea is that people can reject
sentences for all kinds of reasons (some independent of grammar), and our job
as syntacticians is to find out how grammar plays a role in those judgments.
When I use the term grammaticality judgment here, I mean only the judgments
given by people of sentences (not the status with respect to a particular
theory). So for me grammaticality judgments are data, and the * mark before
(2b) is the judgment of Chris Collins of the sentence written in (2b) on
Sunday, July 12, 2019.
From
such tightly controlled data, one can draw solid conclusions. For example, from
the contrast between (2a) and (2b), the conclusion is that locative
prepositions do not appear following the noun phrase. From the contrast between
(1b) and (2b), the conclusion is that up
has two uses: as a particle in (1) and as a locative preposition in (2).
The
sentences in (1) and (2) are presented without context (out of the blue) since
they are very simple. The implicit assumption for such data presented out of context
is that it is easy to create contexts where (1a,b) and (2a) would be
acceptable, and impossible to create a context where (2b) would be acceptable.
However, as a general rule it is better to pair up sentences with contexts when
asking for grammaticality judgments. I return to this point later on in the
essay.
How
do you elicit grammaticality judgments in the field? Here is an example of the
kind of instructions that can be given.
(3)
Your
task is to judge how natural the sentence sounds to you. The sentence may be completely
natural, or not at all natural, or somewhere in between.
You
should be ready to discuss these instructions a length. For example, you can help
to explain what you are getting at by saying that a natural sentence would be
used by a native speaker. Also, you
should be ready to use substitute terms if (3) does not translate well into
your language of communication. For example, instead of using “natural”, you
might use “good” or something like that. I use the phrase a go siame in Setswana, which means “Is it good?”.
Crucially,
the subject should then be given a warm-up trial with clear examples of
grammatical and ungrammatical sentences that they are asked to judge. And they
should be given feedback on the results of this trial. It might take two or
three trials for them to understand what you are getting at.
Creating
and evaluating such sentences constitutes an experiment. The syntactician constructs a sentence, or a minimal
pair, and then evaluates the sentences for grammaticality. Hundreds of
sentences like (1) and (2) can be generated and tested in a short time, leading
to the possibility of developing sophisticated analyses of the data in a
relatively short period of time (e.g., hours or days, as opposed to months or years).
Generating sentences, testing them, and building analyses based on the results
is the bread-and-butter of modern syntax. But it takes training to learn how to
do this. There is lots more to say about grammaticality judgments. But I will
leave all these issues aside in order to focus on the main topic of the essay.
The
grammaticality judgments described above contrast with another messier sort
data. People use language all the time: greetings, swearing, proverbs, jokes,
riddles, casual conversations, heated arguments, scientific explanations,
talking to oneself, buying and selling, school lectures, singing, television
shows, movies, church services, books, the Bible, newspaper articles,
advertisements, recipes, warning signs, instructions on packages, Google
searches, e-mail messages, Facebook posts, texting, etc. Our lives are
literally saturated with language from the moment we wake up in the morning, until
the moment we fall asleep at night.
All
of this data is non-experimental, since it is not carefully prepared and tested
by the syntactician. Rather, it simply occurs. Sometimes it is recorded and
accessible for search (e.g., all written material in the internet, or recorded political
debates, etc.), but mostly not. Most of it (99.99…%) is never evaluated
explicitly for grammaticality. I will simply call this data non-experimental,
as a cover term for any data that occurs outside of the confines of a
controlled experiment. Another term might be natural, but then that would include all kinds of data (such as
texting and e-mails) that one might not want to call natural (because of the
artificial medium on which it is written).
In
the field, getting non-experimental data can be a challenge. Unlike the
situation for English, there are no sources that one can Google for spontaneous
speech and writing when dealing with an endangered language spoken in rural
Botswana by a handful of elderly speakers. Rather, the linguist needs to record
all the speech on their own. Usually, the linguist will arrange to record an
oral text (audio and video), such as a folktale. But the circumstances of the
recording are tighly controlled, including the presence of a mic (which may be awkwardly
attached to the speaker’s clothing) and a camera (which takes several minutes
to set up). The linguist also positions the speaker with respect to the camera
and tells them when to start speaking. And they are usually paid for their
services. In spite of all this, I still count such a recorded oral text as
non-experimental, since the speaker is producing the sentences with no direct linguistic prompts (like a
sentence to judge or a sentence to translate).
Given
this background, we have the following conclusion:
(4)
There are two types of
syntactic data: experimental and non-experimental.
As a
side note, my usage of the term experimental
here is somewhat non-standard. Nowadays the term seems to have been
appropriated for syntactic data obtained in the lab, or obtained using a
questionnaire administered to a small group of subjects, or obtained with
Mechanical Turk. Such methods involve the use of statistics in evaluating the
data. I prefer the term psycholinguistics
for such methods. This kind of data is an extremely important development in
modern syntax, and worth a lot of attention and discussion of what the results
actually mean. But to call these methods of collecting data experimental, to
the exclusion of the traditional grammaticality judgment task, is a huge error.
The grammaticality judgement task is the ultimate experiment: clearly defined,
replicable, tightly controlled with a specific range of outcomes. Similar
considerations hold for the phrase experimental
semantics (Pauline Jacobson, personal communication).
Of
course, one can also discuss the use of psycholinguistic methodology in doing
fieldwork, but that is a topic that I am not qualified to write about.
From
the point of view of the grammaticality judgment task outlined above, non-experimental
data found in the wild suffers from two main problems. First, such data is all
positive, in the sense that people do not normally produce ungrammatical
sentences in their speech. If they do produce such sentences, it is because
they are non-native speakers, or they have made some kind of error in speech
production (e.g., starting to say one sentence, but finishing with another). Since
the data is all positive, one cannot make any definitive claims about what
sentences cannot be produced by a particular person speaking a particular
language. But knowing which sentences are ungrammatical is a powerful research
tool, both for describing languages in a grammar and for creating theoretical
models. This is a serious deficiency in the use of non-experimental data.
The
second main problem is that such non-experimental data is completely
uncontrolled. It is as if you looked out the window at the leaves blowing
around, and tried to come up with some kind of theory of leaf motion. But leaf
motion is complex. You need to factor in gravity, wind, the size and shape of
the leaf, the dryness of the leaf, and maybe many other factors (humidity?,
elevation?, temperature? barometric pressure?). There are way too many factors
to get a clear grasp of what is going on. And even describing what is going on
informally is difficult. What leaf blowing patterns are out there, and which
ones should you attempt to explain? This is why it was so important that the
data in (1) and (2) was organized into minimal pairs and paradigms. Irrelevant
factors are cleared away, and one focuses on some very particular issue (e.g.,
the position of up in (2a,b)).
Constructing sentences and evaluating them allows one to focus narrowly on
particular parts of the sentence, and gives one confidence in ascribing the
reason for ungrammaticality to some particular property. This narrow focus in
turn allows one to quickly test various hypothesis, and to either reject them
or to accept them.
Or to
put it another way, if one relied solely on non-experimental data, it is
unclear that the relevant minimal pair in (2) would ever occur. And (2) is a
relatively simple paradigm. When looking at more complex things (such as the
uses of logophoric pronouns, indefinite noun phrases, locality constraints or
pluractional markers), one needs quite precise data to be able to figure out
what is going on. And it is unclear whether such precise data is forthcoming
from a natural language corpus, especially in the fieldwork scenario, where the
corpus is often constructed from transcribing few hours of recorded oral texts.
To
clarify the issue, suppose you wanted to work out the pronominal system of a
language having a complex series of contrasts such as person, number, gender,
inclusive/exlusive, grammatical function (subject, object, possessor), etc. One
needs to get a paradigm of such pronouns in maximally similar contexts (to see
how the surrounding environment influences the form and use of the pronouns).
First, getting all the relevant pronouns from oral texts might not be even be possible
for large pronominal inventories. Second, getting them all in a similar context
(modulo grammatical function) will definitely be impossible even for a vast
corpus (such as Google searches in English). But such pronominal paradigms are
a core part of the description of any language.
But
what about grammaticality judgments? Are there any drawbacks to gathering data
solely based on them? There are at least three main drawbacks, which I will
outline here. They are existence and
scope, task difficulty and the observer’s paradox. Before going into
the problems, let me outline what I consider to be a few non-problems.
A
possible objection to the use of grammaticality judgments (in language description
and syntactic theory) is that they are artificial, used by syntacticians, but
with no real connection to language use in the real world. One reason to doubt
such a claim is that people do in fact have systematic and rich grammaticality
judgments. It is hard to see how this could be so, if grammaticality judgments
were not somehow connected to the language abilities of the speaker, and hence
to their use of language in the real world. In fact, in real life there are
areas where people do make use of grammaticality judgments. If you hear a
non-native speaker speaking English, and they make mistakes (e.g., in the use
of the indefinite article), you recognize it immediately. You recognize the
sentences that they use as ill-formed, sensing there is something wrong. This
recognition is a form of grammaticality judgment. So this means that
grammaticality judgments are not just an artifact of the syntactician’s
experiment, but something that everybody in every language does all the time.
In fact, grammatical judgments serve a social role. They allow you to pick out
people from different dialects (e.g., the Pittsburgh passive, positive anymore, copula drop, standing on line), and also to pick out people
who have learned your language as a second language. I am not trying to explain
the existence of grammaticality judgments in terms of their social role. That
is, I am definitely not giving a functionalist explanation of grammaticality
judgments. I am merely pointing out that grammaticality judgments are far from
an esoteric task created by generative syntacticians to amuse themselves.
Another
possible objection that could be raised against grammaticality judgments is
that they are dissociated from any context, so they are disembodied and
abstract in a way that day-to-day language is not. While it is true that
linguists often present sentences to their colleagues and students out of the
blue, with no context, there is nothing inherent in the grammaticality judgment
task that demands this form of testing. In fact, in Field Methods classes, I
stress the importance of building contexts, even for evaluating the
grammaticality of simple sentences. I also encourage students to write down the
context, so that the context becomes part of the data, making it easier to
replicate the data. The use of carefully constructed contexts is well-known in
the formal semantics fieldwork literature. I see no reason why such contexts
should not also be used for grammaticality judgments.
As an
exercise for the reader, try to find a sentence in English that seems
ungrammatical or marginal out of the blue, but gets better once a specific
linguistic context is provided.
A related objection is that grammaticality judgments are in
fact complex. There is no way for a speaker in giving a judgment to disentangle
the many different factors that could lead to a judgment of ungrammaticality,
including the inappropriateness of a sentence in a particular context or
difficulty in processing certain kinds of sentences. But this is just science
as usual. There is some data (the grammaticality judgment), and the
syntactician must make sense of that data. They (not the speaker) must figure
out why the data patterns as it does. Given a particular set of grammaticality
judgments, the linguist can decide whether to rerun the experiments, changing
context, finding shorter sentences, finding more natural sentences, etc. I do
not take this as a valid criticism of the use of grammaticality judgments in
uncovering syntactic generalizations.
To
deny the validity of grammaticality judgments is essentially a form of
disrespect toward the speakers. They have knowledge of the language. Many
speakers (but not all, see below) are easily able to judge sentences as
grammatical or not. Why not listen to what they have to say about their own
language, at least as one source of data?
Now
onto some real problems in the use of grammaticality judgments as the sole
source of data.
First,
in studying a language that one is unfamiliar with, it is impossible to predict
what kinds of data will exist in that language. Suppose that your task is to
investigate the syntax and semantics of =Hoan, a Khoisan language spoken in
Botswana. English does not have pluractionality marking on verbs, so from the
point of view of English there is no way to know that pluractionality marking exists
in =Hoan. And even if the syntactician somehow stumbles onto the existence of
pluractionality marking, by accidentally eliciting some sentence containing a
pluractionality marker, they may not be aware of the scope of the phenomenon
and so they might not be able to know what kinds of sentences to construct and
test.
So an
approach to gathering linguistic data based solely on grammaticality judgments
runs into the twin problems of existence
and scope. Does a phenomenon exist in
a language, and if so, what is its empirical scope? These are severe problems.
In fact, they are problems often encountered in ex-situ Field Methods courses
where the emphasis is on grammaticality judgment tasks and translation tasks.
The
problems of existence and scope can be partially addressed by collecting oral
texts. I have often been surprised at the kinds of interesting constructions
that pop up in oral texts. There are grammatical constructions one finds in
texts that would have been difficult to anticipate ahead of time. Once one
finds an interesting construction in an oral text, one can begin to explore it
using grammaticality judgments or translation tasks (or other tasks, like the
truth value judgment task). But without knowing the construction exists, it is
impossible to explore it. Of course, collecting oral texts is not the final
solution to the issues of existence and scope. There may be some constructions
that are simply not used very often, and hence may not show up in oral texts,
especially if the collection of oral texts is limited.
The
availability of oral texts also helps to study syntactic properties that are
discourse based, for example, properties having to do with specificity, ellipsis,
deixis, focus and information structure. Of course, all of these properties can
also be studied with controlled experiments (precisely varying context), but
having converging evidence from texts can be helpful, sometimes even providing
hints to the syntactician for designing controlled experiments. Similarly, the
nature of controlled grammaticality judgment experiments probably favors
careful speech, since one is carefully constructing a sentence and evaluating
it. But there may be many interesting fast speech or casual speech syntactic
phenomena, including contraction and ellipsis.
English
is far better studied than =Hoan, or any African language for that matter, and
that is why for many topics one can get away with only looking at
grammaticality judgments (and not looking at non-experimental data). A lot of
basic observations about English grammar have already been made by traditional
grammarians, such as Curme and Jespersen, and many important details are
documented in descriptive grammars. The best ESL grammars are rich sources of
information on English grammar. Also, there is a long tradition of generative
studies of English grammar that have uncovered piecemeal many of its
interesting properties, and these have been compiled into sources such as
Pullum and Huddleston’s CGEL (Cambridge Grammar of the English Language). For
less studied languages like =Hoan, one cannot rely on this vast background
knowledge. So one needs to use any source of data available, including various
experimental tasks, but also oral texts.
By
the way, lest the reader think that the problems of existence and scope only
plague less studied languages such as =Hoan, it is also true that in English there
are still large swaths of unexplored territory. This is a fact not often
recognized in the generative literature, and in fact, sometimes one hears
statements to the contrary (e.g., in pronouncements about “The End of Syntax”).
To take an example from my own work, the study of imposters (noun phrases such as
yours truly, the undersigned, etc.) had received no systematic description or
analysis until very recently. And overcoming the problems of existence and
scope for English crucially required using vast corpora (e.g., Google
searches), as illustrated in Collins and Postal 2012 (Imposters. MIT Press).
See
the following blog post on searching for linguistic data using Google:
Another
way to overcome the problems of existence and scope, complementing the use of
oral texts, is the growing tradition of using linguistic questionnaires, often
associated with computer databases. There are lots of interesting and powerful
questionnaires being developed by syntacticians and typologists on various
topics (e.g., word order, anaphora, focus, tense/aspect). Going through these
questionnaires carefully can help to uncover the existence and scope of
particular constructions in a language. Responding to such questionnaires
constitutes another form of experimental data.
A
second problem of relying solely on grammaticality judgments, especially in a
fieldwork scenario, is that many consultants simply cannot give them. Giving
grammaticality judgments is difficulty task. In other words, there are many
consultants who when presented a sentence and asked to give a grammaticality
judgment (suitably explained to them in terms of naturalness, etc.), either cannot
do so, or give all positive responses. These kinds of consultants are usually
elderly and illiterate. The fact that they are illiterate, in particular, means
that they have never been obliged to develop the kind of meta-linguistic awareness about linguistic form that even very
young literate speakers possess. And the unfortunate fact is that in work on
endangered languages, the remaining speakers are almost always elderly and
illiterate.
After
posting the original version of this blog post, I received feedback from other
people who have also had problems eliciting grammaticality judgments. Gary
Thoms noted that such people were pretty much always older speakers. Paco
Ordonez noted that education plays a role in the sense that some speakers who
could not give grammaticality judgments had hardly any formal education. I
assume the problem that such speakers face is the lack of meta-linguistic
awareness. Formal education forces us to become aware of properties of sounds
and words and sentences. We are told to copy sentences, and perform various
tasks on them. So it is a small step to also assign a grammaticality judgment
to sentences. But people without formal education have never consciously carried
out such tasks before and they do not have explicit awareness of how to do
them. I assume that such people do have the underlying grammar, but the problem
is in articulating judgments. It would be an interesting research project to
look at the factors determining whether or not people are able to give
grammaticality judgments and to try to get some kind of broader idea (in terms
of the general population) of what percentage of people are unable to give
them, or have some kinds of difficulties in giving them. As far as I know these
questions have not be investigated before since the idea that everybody can
give grammaticality judgments is simply taken for granted.
On a
general note, a lot of tasks that we take for granted in doing our linguistic
work are actually tasks that require a high degree of meta-linguistic
awareness. They often need to be explained in great detail to the consultants
who need to be trained on how to carry them out.
How
can one study a language when there are no ungrammaticality judgments
available? In part, one has to rely on oral texts. In part, there are various other
tasks that are easier for consultants to accomplish that are also quite rich.
For example, there is the translation
task. You can ask the consultant to translate sentence X from the language
of communication L1 (which may be English) into the target language L2. This
kind of task is quite easy for almost everybody (assuming that they are
bilingual speakers), and yields rich results. In fact, I would say that the
translation task is the most frequent form of elicitation task that I use (more
than the grammaticality judgment task, or any other task). There is also the back-translation task which is to
translate a sentence from L2 back into L1 (this test is often a good control on
the translation task). But the translation task, like the grammaticality
judgment task, is also an experiment. It is carefully controlled. The exact
construction of X is important, and usually focuses on some very narrow issue. And
therefore, some of the same issues that come up with grammaticality judgment
tasks also face translation tasks. For example, the same issues of existence
and scope arise for the translation task.
On
some basic fieldwork skills for consultants, including the translation task,
see the following posts:
The
third problem of relying solely on grammaticality judgments is the so-called
observer’s paradox. The syntactician constructs a sentence and then evaluates
it. In doing so, they set up all kinds of implicit biases. Do they want the
sentence to come out grammatical or not? Why are they looking at these sentences
in the first place? Why did they choose those particular words, instead of some
other words? Especially for non-linguist consultants, in fieldwork scenarios,
it is difficult to control for these effects. And so, there is always the
question of whether the consultant is giving the syntactican what they think
the syntactician wants. Of course, these effects can be mitigated in various
ways. One can discuss the nature of the task with the consultants, and explain
to them exactly what is wanted (and repeat these instructions on several
occasions). One can give them clear examples of grammatical and ungrammatical
sentences to prep them. One can make sure that the context in which the
sentence is evaluated is clearly stated and understood. One can ask different
consultants or teams of consultants to see if there is uniformity across
speakers.
A
related issue is the construction of natural sentences. Sometimes the
syntactician in the field will construct a perfectly straightforward and simple
sentence, which for some reason (unrelated to grammaticality or to the issues
being investigated) sounds unnatural to the speakers. Maybe it is word choice.
Maybe it is register. Maybe it is the subject matter discussed in the sentence.
Maybe there is a more common or colloquial way of saying the same thing. The
factors influencing how a sentence is perceived can be quite subtle and
difficult to isolate. In working with our own native language, English in my
case, we avoid this issue since we can easily construct natural sounding
sentences in our own language. But in fieldwork on less studied languages, it
is a real issue.
Oral
texts can provide additional data to help avoid the various facets of the observer’s
paradox. If one can find instances of constructions that one has been studying
(e.g., involving serial verb constructions) in natural speech, and those
instances have the properties that one has established by looking at
grammaticality judgments (or the results of a translation task), then that is
converging evidence. Also, using sentences from recorded oral texts as the
starting point of an investigation, based on more controlled experiments, can
help solve the issue of unnaturally constructed sentences.
As a
side note, not all non-experimental data needs to be recorded. A particularly
rich set of data in the field is just talking with the consultants either
during breaks or during off hours. Just spending time with your consultants,
trying to engage them in conversation and listening to what they say to each
other can reveal interesting syntactic data. In fact, trying to learn a
language, and practicing with your consultants in a casual setting can be an
important source of natural data. The syntactician is still present, but their
role as an observer is replaced by their role as conversational partner and as
language learner. For anybody trying to do fieldwork, or language
documentation, actually making an effort to learn a language can lead to all
kinds of interesting insights. In fact, trying to learn to speak a language by
speaking with community members constitutes a source of linguistic data which
is as important as the other kinds of data discussed in this post (e.g.,
grammaticality judgments, translations, oral texts).
I
summarize these points as follows:
(5)
Grammaticality
Judgment Task:
Pros:
controlled, both positive and negative data
Cons:
issues of existence and scope, task difficulty, observer’s paradox
Translation
Task:
Pros:
controlled, task is not difficult
Cons:
only positive data, issues of existence and scope, observer’s paradox
Non-experimental:
Pros:
helps to address issues of existence and scope, task difficulty and observer’s
paradox
Cons:
uncontrolled, only positive
My
conclusion based on this discussion is the following:
(6)
Both kinds of data, experimental and non-experimental, are crucial in syntactic fieldwork.
Both kinds of data, experimental and non-experimental, are crucial in syntactic fieldwork.
And in fact, I take the stronger
position that both kinds of data are crucial to all syntactic research (even on
well-studied languages like English). They complement each other, and partially
help to resolve each other’s shortcomings. Take the specific task of writing a
grammar or grammatical sketch of a less-studied language. Trying to write a
grammar based purely on grammaticality judgments risks producing a grammar that
is heavily skewed to what the researcher already knows (about other languages,
such as their native language). And so, it risks missing out on interesting
aspects of the language being studied (that could be quite important for
syntactic theory), and not reflecting the real richness of the language. But
trying to write a grammar purely based on a corpus, or recorded natural speech
would be just as catastrophic, missing out on interesting generalizations about
the structure of the language that could easily be uncovered by more controlled
experiments, such as grammaticality judgments or the translation task.
Of course, the conclusion in (6) now
raises the issue of how much? What is the right balance of controlled
experimentation and non-experimental data to use in writing a grammar? Should
50% of the data (in terms of number of sentences) be from controlled
experiments and 50% be from audio/video recordings of natural speech. There is
no way to answer this question a priori. There are no guidelines to follow in
terms of percentages. But one thing is clear: the presence of corpuses that can
be easily searched is of great importance for syntactic research. Such corpuses
help to overcome the problems outlined above, especially the problems of
existence and scope, problems which plague the description and analysis of even
better studied languages like English.
But from the other angle, if one
does find interesting data in an oral text, it is impossible to understand it
without a healthy dose of controlled experimentation. We need to be able to
play with language and manipulate it, and to run through the permutations and
possibilities, in order to be able to discover what its properties are.
Furthermore, the use of powerful
technologies such as audio and video recording equipment, and powerful software
packages, such as FLEx and ELAN, should not distract us from the conclusion
that it is impossible to do syntactic description and analysis solely on the
basis of corpus data. In making this comment, I am in no way criticizing the
use of these technologies in syntactic fieldwork. On the contrary, I feel that
they are extremely important, game-changing technologies. Rather, the point I
am trying to make is that one cannot see syntactic fieldwork merely as the
application of these technologies to natural speech. Rather, they need to be
supplemented by classical syntactic argumentation supported by classical
experimental methodologies, such as grammaticality judgments and translation
tasks.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.