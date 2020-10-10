« previous post |

Edward Stabler, "Three Mathematical Foundations for Syntax", Annual Review of Linguistics 2019:

Three different foundational ideas can be identified in recent syntactic theory: structure from substitution classes, structure from dependencies among heads, and structure as the result of optimizing preferences. As formulated in this review, it is easy to see that these three ideas are completely independent. Each has a different mathematical foundation, each suggests a different natural connection to meaning, and each implies something different about how language acquisition could work. Since they are all well supported by the evidence, these three ideas are found in various mixtures in the prominent syntactic traditions. From this perspective, if syntax springs fundamentally from a single basic human ability, it is an ability that exploits a coincidence of a number of very different things.

The mathematical distinction between constituency (or "phrase-structure") grammars and dependency grammars is an old one. Most people in the trade view the two systems as notational variants, differing in convenience for certain kinds of operations and connections to other modes of analysis, but basically expressing the same things. That's essentially true, as I'll illustrate below in a simple example. But Stabler is also right to observe that the two formalisms focus attention on two different insights about linguistic structure. (I'll leave the third category, "optimizing preferences", for another occasion…)

This distinction has come up in two different ways for me recently. First, ling001 has gotten to the (just two) lectures on syntax, and because of the recent popularity of dependency grammar, I need to explain the difference to students with diverse backgrounds and interests, some of whom find any discussion of syntactic structure opaque. And second, someone recently asked me about whether anyone had used dependency grammar in analyzing music. (The answer seems to be "mostly not" — though see this paper — but the relevant question really is what the advantages of dependency models in this application might be.)



Let's start with substitution classes as evidence for constituency. As Beatrice Santorini and Tony Kroch put it in Chapter 2 of their 2007 syntax textbook:

The most basic test for syntactic constituenthood is the substitution test. The reasoning behind the test is simple. A constituent is any syntactic unit, regardless of length or syntactic category. A single word is the smallest possible constituent belonging to a particular syntactic category. So if a single word can substitute for a string of several words, that's evidence that the word and the string are constituents of the same category.

In the simple phrase very large boxes, this perspective leads us to conclude that the adverb very forms a constituent with the adjective large, which in turn forms a constituent with the noun boxes. Among the arguments for this analysis are the fact that we can substitute arbitrary unmodified adjectives for very large (e.g. small, red, etc.), or other adverbially-modified adjectives (like really tiny), but not plain adverbs (so that very boxes, extremely boxes, etc., are ungrammatical).

In contrast, in the phrase three large boxes, the number three forms a constituent with the adjective-noun combination large boxes, so that we can substitute plain nouns for large boxes (e.g. three bags) or more complex nominals (e.g. three absurdly small green bicycles); and we can substitute other numbers for three (nine, five hundred, etc.).

Ignoring details about node names, this suggests a distinction between left-branching and a right-branching binary trees like these:

(Those particular part-of-speech and node labels come from the Penn Treebank project.)

In the notation of labelled brackets, we can represent the distinction this way:

[NP [ADJP [RB Very] [JJ large]] [NNS boxes.]]

[NP [CD Three] [NP [JJ large] [NNS boxes.]]]

Alternatively, we might base our analysis of the same two phrases on what Stabler calls "dependencies among heads", in this case two instances of modifier-head relations. Thus in "very large boxes", very is an adverb modifying the adjective large, which in turn modifies the plural noun boxes, while in "three large boxes", the number three modifies the plural noun boxes, which is also modified by the adjective large.

Again passing over details about the names of lexical and relational categories, this gives us two different sets of binary word-to-word "dependencies", which express essentially the same distinction as the tree structures did:

The corresponding data structures tell us, for each word, what its dependency relation is to which other word (identifying word positions with sequence numbers starting from 0):

Very ADV advmod large 0 1

large ADJ amod boxes 1 2

Three NUM nummod boxes 0 2

large ADJ amod boxes 1 2

(Those part-of-speech and dependency names come from the Universal Dependencies project, and the diagrams come from the displaCy visualizer, with the head of each arrow placed on the dependent word, and the root of the arrow on its head.)

In a dependency analysis, each word is given a specific relation to another single word, its "head" — except that just one word in each sentence depends on a sort of virtual word, the "root". (In those examples, the dependent of the root is boxes, though that's not shown in the diagram.) In the simplest case, we assume that none of the dependency lines cross.

With appropriate conventions for translating relation and node names, the resulting dependency structures are isomorphic to tree structures. Details on both sides can make translation complicated — see e.g. Fei Xia and Martha Palmer, "Converting dependency structures to phrase structures", 2001, or Richard Johansson and Pierre Nugues, "Extended constituent-to-dependency conversion for English", 2007.

We can (and do) talk about dependencies and heads in tree structures, or about phrasal structures in dependency representations. But the various versions of the two formalisms lend themselves variously to various applications — more on this later.

