Ordinary Working Grammarian: December 2024

Thursday, December 26, 2024

Preparing for the 2025 LSA Annual Meeting

Andreas B. and I have a workshop to present at the 2025 LSA Annual meeting. During this break, I have been preparing for it. Here is a concept map of my process.

Friday, December 20, 2024

My Khoisan Grammars and Dictionaries

Here is a list of descriptive books (grammars and dictionaries) on the Khoisan languages that I have written, or helped to write. It is unlikely that I will write any further descriptive books of this nature on the Khoisan languages. So I thought I would gather all the titles together in one place. If you need any of these books, please let me know.

In chronological order:

Collins, Chris and Levi Namaseb. 2011. A Grammatical Sketch of N|uuki with Stories. Rüdiger Köppe Verlag. (https://www.koeppe.de/titel_a-grammatical-sketch-of-n-uuki-n-uuki-with-stories)

Collins, Chris and Jeffrey S. Gruber. 2014. A Grammar of ǂHȍã. Rüdiger Köppe Verlag. (https://www.koeppe.de/titel_a-grammar-of-hoa-h-a)

Sands, Bonny and Kerry Jones (chief editors). 2022. Nǀuuki Namagowab Afrikaans English ǂXoakiǂxanisi/Mîdi di ǂKhanis/Woordeboek/Dictionary. Stellenbosch: African Sun Media for African Tongue. (https://dictionary.sadilar.org/#/about) [Chris Collins was a member of the editoral team.]

Collins, Chris. 2023. A Grammatical Sketch of Kuasi (Botswana). Rüdiger Köppe Verlag. (https://www.koeppe.de/titel_a-grammatical-sketch-of-kuasi-botswana)

Collins, Chris. 2024. Sasi Dictionary (Botswana). Rüdiger Köppe Verlag. (https://www.koeppe.de/titel_sasi-dictionary-botswana)

Collins, Chris and Zachary Wellstood. 2025. A Grammatical Sketch of Cua. Peter Lang. (https://www.peterlang.com/document/1499513)

Tuesday, December 17, 2024

A Grammatical Sketch of Cua

Summary

Cua is a Kalahari Khoe language spoken in southeastern Botswana (Kweneng District). It is closely related to languages such as G||ana, Tshila and Tsua. The phonology chapter describes the consonant, tone and vowel inventories, as well as a system of depressed tones following aspirated and voiced consonants. Later chapters provide concise overviews of the morphology and syntax of the language. Cua is characterized by a complex system of person-gender-number markers (PGN markers), which play a role in the formation of the pronouns. The features defining pronouns include: singular, dual, and plural number; first, second, and third person; and masculine, feminine and neutral gender. There is also a distinction between inclusive and exclusive first-person plural pronouns.

https://www.peterlang.com/document/1499513

Remembering Andrew Radford

Andrew Radford passed away on December 16, 2024.

I first met Andrew Radford in China. We were both invited speakers for the 5th International Conference on Formal Linguistics, held in Guangzhou, China (December 2011). When we met, we hit it off right away, and the four of us (me and my wife, Andrew and his), spent most of our time outside of the conference together. The conference organizers had assigned to us two young Chinese linguistics students as guides, one male one female. So that was our little group of six. We went to lunch and dinner together, and had fantastic feasts of Chinese food, in styles from all over the country. We asked our Chinese guides endless questions about China and Chinese food and the local region. We also did some sightseeing, going to various scenic regions and a zoo, where there were pandas. It was without a doubt one of the best conference travel experiences of my life, due in large part to meeting Andrew and his wife there (and of course, the hospitality of our Chinese hosts).

While at the conference, Andrew and I gabbed pretty much non-stop about syntax. This was when ‘Imposters’ was just about to come out, so that was on my mind. As for Andrew, he was working a lot on spoken corpus data that he had put together. He was finding all kinds of interesting syntactic patterns that he told me about. After intensive discussions for a few days, we decided to collaborate on a paper, which lay at the intersection of our research domains.

Collins, Chris and Andrew Radford. 2015. Gaps, Ghosts and Gapless Relatives in Spoken English. Studia Linguistica 69.2, 191-235.

From that time onward, I valued him greatly as a colleague. He was a real syntactician’s syntactician, brilliant and deeply committed to the scientific research agenda of generative syntax. After China, I wrote to him often about all kinds of issues. For example, he gave me extensive written feedback on various versions of my 2024 monograph (‘Principles of Argument Structure’), and helped me to clarify a thorny issue concerning exempt anaphora.

I was so happy to meet Andrew in China, because I owed him a special debt. In the summer of 1984 (nearly thirty years before I met him in-person), I read through his Transformational Syntax (Cambridge University Press, 1981) in its entirety and worked through the exercises with a friend. I still remember how clearly the textbook was written and how captivating it was. It literally drew me in so that I became excited about generative syntax. Then the next academic year, I took a number of graduate level syntax courses (with Hale, Rizzi, Ross), with Andrew’s textbook as my background. It is quite possible that my career would have turned out differently if I had not found and studied his textbook.

I believe through his syntax textbooks he has probably done as much as any other individual to promote the scientific study of generative syntax in the world.

Friday, December 13, 2024

Scribbles on Agentivity

Abstract: These scribbles investigate unaccusative verbs that have an agent.

Keywords: unaccusative, unergative, agent, theme

Scribbles on Agentivity

Tuesday, December 10, 2024

A Proposal for a Database of the Syntactic Structures of the World's Languages (Collins and Kayne 2007)

Abstract: On November 9-10, 2007, a conference on creating a database of the syntactic structures of the world’s languages was held at NYU. This document contains the original proposal for the database (Chris Collins and Richard Kayne), the paper presented by Chris Collins and the paper presented by Richard Kayne.

Keywords: adposition, agreement, comparative syntax, database, dialect, glossing, primitives,

questionnaire, replicability, Wikipedia

A Proposal for a Database of the Syntactic Structures of the World's Languages (Collins and Kayne 2007)

The Scope and Limits of Syntactic Variation: A Program for AI Research

Introduction

One of the primary goals of syntactic theory is to understand the scope and limits of syntactic variation cross-linguistically. Doing this kind of research is crucial for syntactic theory. For example, it could be used to argue for properties of UG: the innate mechanisms and principles of human language.

But the task is vast.

The first dimension of complexity is the number of languages, both living and dead, and their syntactically distinguishable dialects. Conservative estimates put the number of languages at around 6,000, but with dialects the number is probably much higher.

The second dimension of complexity is that only a few languages have a corpus of materials. The vast majority of languages just have a few descriptive documents written about them (e.g., a grammar or a dictionary or some texts). So there is an unevenness of cross-linguistic coverage.

The third dimension of complexity is the set of properties that would have to be articulated to describe a single language. How many such properties are there? In fact, given a sufficiently large corpus, there may be properties that could be inferred from the corpus that are not part of any linguistic description. These implicit properties should also be investigated.

Lastly, there is the issue of just what kind of variation we would expect to look for. In standard typological sources, people are interested in implicational correlations: If P, then Q: If a language has property P, then it will also have some other property Q.

But such implicational correlations are just the simplest kind of relation that could be envisioned. One could extend these correlations using logical combinations: If P ∨ Q, then R ∧ W. And even this dramatic extension of properties is not enough. One could also introduce subsets of languages as properties. If all the languages of S have property P, then they will have Q. These are just the simplest kinds of extensions to the implicational correlation format. It may be there are many other kinds of connections we have not even conceived of yet. And it may be that the AI model could identify those connections.

Hypothetically, let’s say we could characterize a language fairly accurately with one million properties. Given the amount of information that has been uncovered about English and other languages in the history of generative grammar, this number seems conservative. Then just to compare the properties pairwise in simple correlations would require one trillion combinations (#(P) x #(Q)) (one million times one million), which would then have to be verified over at least 10,000 languages. Calculating all possible correlations in the broader sense (see paragraph above) would require an astronomical number of calculations.

The issues can be stated in terms of neural networks.

Imagine all properties as a vast network of nodes. Then in principle all the nodes could be connected to one another, such that P is connected to Q iff whenever P is true, Q is true. Possibly such connections could be given different strengths depending on the certainty of the implication (zero is no information). If there are a million properties, there a trillion connections (million squared). Of course, if we allow subsets of languages as properties, and logical connectives, there will be many more connections than that. How could we create and search such a vast network of information for possible correlations?

With the advent of AI (“Artificial Intelligence”) and LLMs (“Large Language Models”), we are possibly close to being able to think about such a task.

The general proposals follow Collins and Kayne 2007, but is updated to use Artificial Intelligence, Deep Learning and LLMs.

Collins, Chris and Richard Kayne. 2007. A Proposal for a Database of the Syntactic Structures of the World's Languages. (https://ling.auf.net/lingbuzz/003404)

Path Forward

The idea is to develop an AI program (using neural networks and deep learning) that could search the internet, including linguistic resources, to answer comparative syntax questions.

First, it needs to be understood what kind of syntactic information ChatGPT can give, and how it gets that information. Does the information come directly from texts or does it come from grammars and other descriptions? How accurate is the information?

Some preliminary studies are here concerning grammatical judgments:

https://ordinaryworkinggrammarian.blogspot.com/2024/10/acceptability-judgments-in-chatgpt.html

https://ordinaryworkinggrammarian.blogspot.com/2024/10/acceptability-judgments-in-chatgpt_20.html

Second, it needs to be understood how much access ChatGPT has to available resources on natural language syntax, such as WALS.

Third, it needs to be investigated whether a new AI interface needs to be developed (over and above ChatGPT) to interact with the online resources.

Forth, the system should have the ability to recognize the values of linguistic properties for so called low-density languages, so that those languages could play an integral role in the search for correlations. This means that any such system will have to have the ability to work on the basis of grammars, dictionaries and texts.

Fifth, the project does not envision the collection or uploading of any actual linguistic data, but rather the combined use of already existing resources to answer questions about comparative syntax. There is no need to get buy-in from language specialists to populate the database, although there may be some need to get access to databases that can be searched.

Preliminary Results with ChatGPT

First, as of December 8, 2024, ChatGPT4 seems to have some knowledge of other languages. Here are my queries and the responses:

Chris: Do you speak French?

ChatGPT said:

Oui, je parle français ! 😊 Je peux répondre à tes questions, t'aider avec des traductions, ou simplement discuter en français. Dis-moi ce dont tu as besoin!

Chris: Translate that into English.

ChatGPT said:

Yes, I speak French! 😊 I can answer your questions, help you with translations, or simply chat in French. Let me know what you need!

Chris: How many languages do you speak?

ChatGPT said:

I can communicate effectively in many languages, covering a broad range of linguistic families and regions. While I’m most proficient in widely spoken languages, I have varying levels of expertise in others. Here’s an overview…While I can assist with a variety of languages, my fluency level varies. For highly niche or less commonly spoken languages, my ability may be limited to basic phrases, cultural context, or translations. Let me know which language you have in mind, and I’ll do my best to assist! 😊

Now, consider the following interchange on pronoun verb order cross-linguistically.

Chris: What is the order of the verb and the object pronoun in English? Just a one line response, please.

ChatGPT said: In English, the object pronoun comes after the verb (e.g., I see her).

Chris: Now, what is the order of the verb and the object pronoun in French? Just a one line response, please.

ChatGPT said: In French, the object pronoun comes before the verb (e.g., Je la vois).

Chris: What is the difference between French and English with respect to the order of the object pronoun?

ChatGPT said: In French, the object pronoun comes before the verb, while in English, it comes after the verb.

Chris: In which languages that you know does the object pronoun come before the verb?

ChatGPT said: The object pronoun comes before the verb in languages such as French, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Polish, and Hungarian.

Even this small dialogue highlights the potential power of ChatGPT to do comparative syntax work.

The following dialogue shows that ChatGPT has access to information on WALS.

Chris: According to WALS, what is the basic word order of Ewe?

ChatGPT said: According to the World Atlas of Language Structures (WALS), the basic word order of Ewe, a Niger-Congo language spoken in Ghana, Togo, and neighboring countries, is Subject-Verb-Object (SVO).

Handouts for MaS Seminar (Fall 2024)

Here are the handouts which I distributed as part of the "Morphology as Syntax" seminar during Fall 2024. I do not include materials that Richie distributed or the student presentations, which were really interesting and very relevant to the goals of the course. As you can see, it was a productive semester, more for consolidating old ideas, than for proposing new ones.

Syllabus (Revised and Final)

The Juncture Morpheme in Kua (September 23 2024)

A Very Short History of MaS (September 24 2024)

Two Allomorphy Models without Late Insertion (September 24 2024)

A Syntactic Approach to Case Contiguity

Forms of the Copula in English

Bare Nominal Passives

A Note on Singular They/Them (November 6 2024)

A Note on Singular They/Them (Addendum November 6 2024)

Overview of Nanosyntax (from the Perspective of MaS) (November 11 2024)

Romance s-Forms as Imposters (November 20 2024)

Allomorphy without Context: An MaS Alternative (November 27 2024)

A MaS State of Mind (December 9 2024)

Monday, December 9, 2024

A MaS State of Mind

Abstract: MaS (“Morphology as Syntax”) is a framework for the analysis of so-called morphological phenomena developed in Collins and Kayne (2023) and the papers cited there. Without going into any technical proposals, I will cover some general ways in which MaS is more than a framework, but also a state of mind. Surprisingly, most of these high-level properties distinguish MaS from DM (“Distributed Morphology”) to some extent, as will be discussed below.