Friday, October 5, 2018

Using * in Google Searches for Syntactic Research


Google has turned out to be a revolutionary tool in syntactic research. It makes example sentences of different constructions easily available, and makes it possible for a syntactician to explore the limits of their knowledge much more efficiently. Doing Google searches in syntactic research can easily become an obsession. In this essay, I will discuss the use of *.

The other day, I was teaching a class in a seminar (on argument structure), and I presented the following data:

(1)       a.         Carl saw the white rabbit while out hunting.
b.         The white rabbit was seen by Carl while out hunting.

My judgments are that (1a) is unambiguous. The PRO in the temporal adjunct clause is controlled by the subject Carl. On the other hand, (1b) is ambiguous.  Either DP in the matrix clause can control PRO, although only the interpretation where Carl controls PRO is plausible (since a white rabbit cannot go out hunting). 

For me, the interesting thing about the data in (1) is that it shows that the DP of a passive by-phrase can control into a temporal adjunct. Surprisingly there has been very little work on this topic in the generative syntax literature (and if you know some work on it, let me know).

To support my discussion of (1), I gave the following internet data:

(2)       …the balance payment must be paid by the guest before leaving the hotel
(3)     The patient questionnaire should be completed by the patient before seeing the doctor,
(4)     A bear was seen by a resident while walking his dog on Wesley Way last night. 

Then I asked the seminar participants: How does one find such data? What kind of search can one do using Google to find such data? 

What is tricky about these searches is that there are so many things that vary. The presence of the preposition by in each sentence remains invariant, but the material between by and the adjunct clause varies, as well as the particular connective used to introduce the adjunct clause (e.g., before or while) and the particular verb in the adjunct clause.

I found this data using *, which is a wild card character in Google searches. For sentence (2), I used the following search:

(5)       “by the * before leaving”

Note first that the quotation marks are crucial. Without the quotation marks, the search would not be a search for a string of words. Rather, without the quotation marks, just the set of words (without order or adjacency relations specified) would be searched for.

The search in (5) returns all strings of words that start with by and the, end with before and leaving, and contain an arbitrary number of words in between. Plugging (5) into Google yields the example in (2) amongst others.

Apparently, the convention in Google is that smaller strings satisfying * are returned before larger strings. This convention has the effect that when (5) is searched for you do not get irrelevant texts that contain by the at one point, and then before leaving a hundred words later (which would be irrelevant for the purposes of our search).

If we modified (5), and left out leaving, we would have the following search:

(5)       “by the * before”

This turns out to be uninformative, since the preposition before can be followed by DPs and full tensed clauses (as well as gerunds), and these options are very frequent. So the search in (5) did not specify enough information to get useful results.

To find the strings in (3) and (4), the search must be varied a little bit:

(6)       “by the * before seeing”
(7)       “by a * while walking”

Note that each search varies different aspects of the string: the determiner, the connective, the verb following the connective. For this process to be effective, you need to be able to think of possible ways of filling in these three parts of the string.

Getting the hang of using * takes practice. But once you master the technique, it opens the window to finding lots of interesting data in English (and other languages with a presence on the internet).





No comments:

Post a Comment

Note: Only a member of this blog may post a comment.