A few days ago, out of the 21,595 visitors to LLOG that Google Analytics counted, 88 arrived after asking what part of speech is the, and thereby landing on Arnold Zwicky's post "What part of speech is 'the'?", 3/30/2006. Unfortunately, if they were looking for how to fill-in-the-blank on a homework assignment, they probably went away unsatisfied, because Arnold's excellent post starts by complaining, cogently and at length, that "the school tradition about parts of speech is so desperately impoverished", and closes by noting that
[A] linguist who proposes to introduce, say, the technical term determiner for a class of pre-adjectival modifiers in English that includes the articles, demonstratives, quantifiers, possessives, and more is likely to be seen as UNDERMINING tradition, casting off the sureties of the past in favor of fashionable jargon.
All true — but hard for a student to boil down to a single label. And just as hard for a teacher to use as the foundation for an assignment. This confusion and controversy about what standard grammatical terminology (and methodology) ought to be is one of several reasons that grammatical analysis has all but vanished from the curriculum of American schools.
I feel that it's past time to do something about this. So, as a Christmas present to the English-speaking world, let me propose a simple and practical way to cut through the tangled undergrowth of grammatical tradition and the dense thickets of recent grammatical argumentation. The goal: a standard, canonical grammatical description for English. Yes, really. It's already Out There — all we need to do is to recognize it for what it is.
My proposal is a simple one. The standard grammatical analysis for English should be based closely on the treatment used in the Penn Treebank, which was originally planned in the early 1990s as a consensus among a collection of computational linguists, and then worked out in detail by being applied to several million words of text, and used in the development and testing of many automatic parsers.
Why should we choose this scheme as the standard? Not because its choices are always "correct", but rather because:
- Its descriptive adequacy has been demonstrated by application to millions of words of real-life English text;
- Extensive and explicit manuals exist for guiding annotators (or students or teachers) in its use;
- It has been the most widely-used standard in computational linguistics (and historical syntax) for the past two decades, used by hundreds of researchers in thousands of papers;
- Several excellent open-source parsers exist that implement a version of this standard;
- Versions of this scheme have been extended to encompass the complete history of the English language, from Old English through Middle English and Early Modern English;
- Analogous schemes have been developed and tested extensively for Chinese, Arabic, French, Portuguese, Icelandic, and several other languages as well;
- The scheme has been extended (under the unfortunate name of "OntoNotes") to include predicate-argument structure ("who did what to whom"), co-reference, and shallow semantics (ontology-linked word senses).
In simpler terms: it works, it lasts, and it's already by far the most widely used standard among people concerned with practical grammatical analysis of English (and most other languages).
Once this proposal is accepted, there's plenty of room for discussion. How much grammatical detail should we try to teach, and to whom, and when, and why? Should we teach syntax in dependency form rather than constituent-structure form? How much of predicate-argument structure and other shallow semantics should be included, at what stage? How should a didactic skill-teaching style be balanced against an approach based on exploration and argumentation?
But for people who are serious about re-introducing grammar into the school curriculum, I don't see any better alternatives, or indeed any practical alternatives at all.