Pedantry, Dr. Johnson said in the Rambler, is the unseasonable ostentation of learning. And learning is never so unseasonable as when its display impedes the workaday business of making sense. Take the sentence from The Economist that I ran across when I was writing my word-of-the-year piece for Fresh Air on "big data":
Yet even as big data are helping banks, they are also throwing up new competitors from outside the industry.
You can see what happened here—the copy editor (it had to be a copy editor, since nobody competent to write about big data would dream of treating the phrase as anything but singular) saw data followed by a singular pronoun and a singular form of be, and corrected them to plurals. The problem is that if you construe big data as a plural then it has to denote a collection of large things, in the same way that big elephants denotes a set of elephants that are each large, not a large set of elephants of any size. In that case, I suppose big data would have to be a collection of facts like this:
π = 3.1415926535897932384626433832795028841971693993751…
rather than, say
π > 3
which is a little bitty datum. If you took the sentence at face value, that is, it would be what we grammarians term “idiotic.” But I doubt whether the Economist's copy editor gave a toss, as they lot say. Sense, shmense—he or she wasn’t about to get caught out treating data as a singular noun.
The problem with such scruples is the reader is obliged to take note of them. Copy editors are meant to be gnomes working invisibly below decks to ensure that the engine of prose runs smoothly. They shouldn’t obtrude themselves conspicuously into the middle of a clause, so that the reader has to break off his attention to the writer’s argument and do a little mental stutter-step before he can remark to himself, “Oh, I see—it’s that data-must-be-plural business.” Copy-editors desirous of such notice should try another trade, one where they’re not required to hide their LittB under a bushel.
I’m not going to hash over all the arguments about the singularity of data, which has been hashed over at some length, to put it mildly. (For some generally sensible discussions of the issues, see, e.g., Motivated Grammar, Kevin Drum and the Economist’s own language critic Robert Lane Greene, writing as “Johnson,” here and here.) My own view is that there are contexts where it’s okay to treat data as a plural, but none in which you can’t treat it as a singular—and that contrary to what many “reasonable” usage writers counsel, this isn't simply a matter of “style and personal preference.” As the Economist example shows, there are times when treating data as a plural makes you sound not simply like a pedant but a fool. (There’s actually another, more substantive side to this that I want to explore, but I’ll leave that for another post.)
But it is instructive to look at the way defenders of the rule justify their position. Most simply point to the word’s etymology, but some devise synchronic explanations, comparting the word to pluralia tantum like trousers (the plural of trousum) or to British usage in sentences like “Manchester are playing Leeds” (“…and quite few of them are looking forward to it”). These arguments don’t deserve to be taken seriously, not just because they’re confused and irrelevant, but because they’re disingenuous: whatever arguments they come up with after the fact, the only reason anyone treats data as a plural nowadays is to show that they know it started its life that way.
For those purposes, it isn’t really necessary to think the rule through in all its subtlety. Usage fetishes turn copy-editing into a mechanical trade—and the machine they’re simulating doesn't need more than 64k of memory. The adherents seize on one easily identified context to demonstrate their erudition, and ignore others in which the rule would hold if it were being applied thoughtfully. In articles in The Economist, for example, data is and data are occur with roughly equal frequency, excluding cases in which data isn’t the head of the subject NP. But much (of the) data and little (of the) data occur 90 times, against 15 for few (of the) data or many (of the) data. (These figures exclude comments, explicit discussions of the plurality of the noun, and references to things like data centres, data sets and data points—the last being the way in which people nowadays most often refer to what used to be called a datum). And in these cases, too, insistence on treating data as a plural can lead to grammatical inconsistencies or semantic anomalies. My guess is that an editor’s interpolation is responsible for the number discrepancy here:
At the moment, says Anthony Tuzzolino of the University of Chicago, there is plenty of computer modelling going on of the distribution of space dust, but few data.
And in the following, the plural verb require suggests that the number-crunching applies to one datum at a time:
Repeated aerial surveys over the coming years will also give the researchers insight into how vegetation recovers from fires, how the beetles affect this process, how erosion and sedimentation affect the region’s water resources, and whether fire creates opportunities for new species to invade. So many data, of course, require a lot of number crunching.
As I said, this selective enforcement is typical of rules that have jelled into fetishes. Take the rule that unique cannot be compared, which adherents associate with modifiers like very, more, and most. People who wince at “the most unique restaurant in town” are less likely to object to a sentence like “Joyce seems to us less unique than he did to his contemporaries.” At the same time, writers overgeneralize these rules, turning them into dumb syntactic filters that block many sentences that wouldn’t have offended against the original version. That process is hard to observe directly, of course, since it manifests itself only in the absence of certain constructions. But you can draw it out in other ways, such as the responses to some items we gave to the American Heritage Usage Panel some years ago. In that survey, only 16 percent of the respondents accepted Her designs are quite unique in today's fashion scene, which is as you’d expect from a panel or writers and critics. But only 28 percent accepted The American Constitution is still nearly unique in that it allows no self-destruct mechanism. Yet even if you insist that unique is unequivocally an absolute term, there’s nothing in such sentences to object to—no more than in saying that a wound was nearly fatal. For the other 72 percent, the operating principle seems to be, "Don't modify unique with an adverb," which keeps copy editors at bay, but doesn't require any semantic insight.
I’m not troubled in the abstract by the critical attitudes that linguists condemn as prescriptivism (though I really dislike the term). But it’s a sign of what that tradition has come to that its principles so often devolve into empty gestures. The best argument against these fetishes isn’t that they’re irrational or pretentious—though there is that—but that they make us stupid.
Added 1/4: Going through my files, I found the following solecism from an article on plagiarism in the New York Times, 1/07/02: "Remarkably enough, in a profession that feeds on data, very little data have been gathered about the behavior of scientists themselves. " It's a kind of bookend to the Economist sentence.