Google
has turned out to be a revolutionary tool in syntactic research. It makes
example sentences of different constructions easily available, and makes it
possible for a syntactician to explore the limits of their knowledge much more
efficiently. Doing Google searches in syntactic research can easily become an
obsession. In this essay, I will discuss the use of *.
The other day, I was teaching a class in a seminar (on argument
structure), and I presented the following data:
(1) a. Carl
saw the white rabbit while out hunting.
b. The
white rabbit was seen by Carl while out hunting.
My judgments are that (1a) is unambiguous. The PRO in the temporal
adjunct clause is controlled by the subject Carl. On the other
hand, (1b) is ambiguous. Either DP in the matrix clause can control PRO,
although only the interpretation where Carl controls PRO is
plausible (since a white rabbit cannot go out hunting).
For me, the interesting thing about the data in (1) is that it
shows that the DP of a passive by-phrase
can control into a temporal adjunct. Surprisingly there has been very little
work on this topic in the generative syntax literature (and if you know some
work on it, let me know).
To support my discussion of (1), I gave the following internet
data:
(2) …the balance payment must be paid by the guest
before leaving the hotel
(3) The
patient questionnaire should be completed by the patient before
seeing the doctor,
(4) A bear was seen
by a resident while walking his dog on Wesley Way last night.
Then
I asked the seminar participants: How does one find such data? What kind of
search can one do using Google to find such data?
What
is tricky about these searches is that there are so many things that vary. The
presence of the preposition by in each sentence remains invariant, but the
material between by and the adjunct clause varies, as well as the particular
connective used to introduce the adjunct clause (e.g., before or while) and the
particular verb in the adjunct clause.
I found this data using *, which is a wild card
character in Google searches. For sentence (2), I used the following search:
(5) “by
the * before leaving”
Note first that the quotation marks are crucial. Without
the quotation marks, the search would not be a search for a string of words.
Rather, without the quotation marks, just the set of words (without order or
adjacency relations specified) would be searched for.
The search in (5) returns all strings of words
that start with by and the, end with before and leaving, and
contain an arbitrary number of words in between. Plugging (5) into Google
yields the example in (2) amongst others.
Apparently, the convention in Google is that
smaller strings satisfying * are returned before larger strings. This
convention has the effect that when (5) is searched for you do not get
irrelevant texts that contain by the
at one point, and then before leaving
a hundred words later (which would be irrelevant for the purposes of our
search).
If we modified (5), and left out leaving, we would have the following
search:
(5) “by
the * before”
This turns out to be uninformative, since the
preposition before can be followed by
DPs and full tensed clauses (as well as gerunds), and these options are very frequent.
So the search in (5) did not specify enough information to get useful results.
To find the strings in (3) and (4), the search
must be varied a little bit:
(6) “by
the * before seeing”
(7) “by
a * while walking”
Note that each search varies different aspects of
the string: the determiner, the connective, the verb following the connective.
For this process to be effective, you need to be able to think of possible ways
of filling in these three parts of the string.
Getting the hang of using * takes practice. But
once you master the technique, it opens the window to finding lots of
interesting data in English (and other languages with a presence on the
internet).
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.