"The data are": How fetishism makes us stupid

« previous post | next post »

Pedantry, Dr. Johnson said in the Rambler, is the unseasonable ostentation of learning. And learning is never so unseasonable as when its display impedes the workaday business of making sense. Take the sentence from The Economist that I ran across when I was writing my word-of-the-year piece for Fresh Air on "big data":

Yet even as big data are helping banks, they are also throwing up new competitors from outside the industry.

You can see what happened here—the copy editor (it had to be a copy editor, since nobody competent to write about big data would dream of treating the phrase as anything but singular) saw data followed by a singular pronoun and a singular form of be, and corrected them to plurals. The problem is that if you construe big data as a plural then it has to denote a collection of large things, in the same way that big elephants denotes a set of elephants that are each large, not a large set of elephants of any size. In that case, I suppose big data would have to be a collection of facts like this:

π = 3.1415926535897932384626433832795028841971693993751…

rather than, say

π > 3

which is a little bitty datum. If you took the sentence at face value, that is, it would be what we grammarians term “idiotic.” But I doubt whether the Economist's copy editor gave a toss, as they lot say. Sense, shmense—he or she wasn’t about to get caught out treating data as a singular noun.

The problem with such scruples is the reader is obliged to take note of them. Copy editors are meant to be gnomes working invisibly below decks to ensure that the engine of prose runs smoothly. They shouldn’t obtrude themselves conspicuously into the middle of a clause, so that the reader has to break off his attention to the writer’s argument and do a little mental stutter-step before he can remark to himself, “Oh, I see—it’s that data-must-be-plural business.” Copy-editors desirous of such notice should try another trade, one where they’re not required to hide their LittB under a bushel.

I’m not going to hash over all the arguments about the singularity of data, which has been hashed over at some length, to put it mildly. (For some generally sensible discussions of the issues, see, e.g., Motivated Grammar, Kevin Drum and the Economist’s own language critic Robert Lane Greene, writing as “Johnson,” here and here.) My own view is that there are contexts where it’s okay to treat data as a plural, but none in which you can’t treat it as a singular—and that contrary to what many “reasonable” usage writers counsel, this isn't simply a matter of “style and personal preference.” As the Economist example shows, there are times when treating data as a plural makes you sound not simply like a pedant but a fool. (There’s actually another, more substantive side to this that I want to explore, but I’ll leave that for another post.)

But it is instructive to look at the way defenders of the rule justify their position. Most simply point to the word’s etymology, but some devise synchronic explanations, comparting the word to pluralia tantum like trousers (the plural of trousum) or to British usage in sentences like “Manchester are playing Leeds” (“…and quite few of them are looking forward to it”). These arguments don’t deserve to be taken seriously, not just because they’re confused and irrelevant, but because they’re disingenuous: whatever arguments they come up with after the fact, the only reason anyone treats data as a plural nowadays is to show that they know it started its life that way.

For those purposes, it isn’t really necessary to think the rule through in all its subtlety. Usage fetishes turn copy-editing into a mechanical trade—and the machine they’re simulating doesn't need more than 64k of memory. The adherents seize on one easily identified context to demonstrate their erudition, and ignore others in which the rule would hold if it were being applied thoughtfully. In articles in The Economist, for example, data is and data are occur with roughly equal frequency, excluding cases in which data isn’t the head of the subject NP. But much (of the) data and little (of the) data occur 90 times, against 15 for few (of the) data or many (of the) data. (These figures exclude comments, explicit discussions of the plurality of the noun, and references to things like data centres, data sets and data points—the last being the way in which people nowadays most often refer to what used to be called a datum). And in these cases, too, insistence on treating data as a plural can lead to grammatical inconsistencies or semantic anomalies. My guess is that an editor’s interpolation is responsible for the number discrepancy here:

At the moment, says Anthony Tuzzolino of the University of Chicago, there is plenty of computer modelling going on of the distribution of space dust, but few data.

And in the following, the plural verb require suggests that the number-crunching applies to one datum at a time:

Repeated aerial surveys over the coming years will also give the researchers insight into how vegetation recovers from fires, how the beetles affect this process, how erosion and sedimentation affect the region’s water resources, and whether fire creates opportunities for new species to invade. So many data, of course, require a lot of number crunching.

As I said, this selective enforcement is typical of rules that have jelled into fetishes. Take the rule that unique cannot be compared, which adherents associate with modifiers like very, more, and most. People who wince at “the most unique restaurant in town” are less likely to object to a sentence like “Joyce seems to us less unique than he did to his contemporaries.” At the same time, writers overgeneralize these rules, turning them into dumb syntactic filters that block many sentences that wouldn’t have offended against the original version. That process is hard to observe directly, of course, since it manifests itself only in the absence of certain constructions. But you can draw it out in other ways, such as the responses to some items we gave to the American Heritage Usage Panel some years ago. In that survey, only 16 percent of the respondents accepted Her designs are quite unique in today's fashion scene, which is as you’d expect from a panel or writers and critics. But only 28 percent accepted The American Constitution is still nearly unique in that it allows no self-destruct mechanism. Yet even if you insist that unique is unequivocally an absolute term, there’s nothing in such sentences to object to—no more than in saying that a wound was nearly fatal. For the other 72 percent, the operating principle seems to be, "Don't modify unique with an adverb," which keeps copy editors at bay, but doesn't require any semantic insight.

I’m not troubled in the abstract by the critical attitudes that linguists condemn as prescriptivism (though I really dislike the term). But it’s a sign of what that tradition has come to that its principles so often devolve into empty gestures. The best argument against these fetishes isn’t that they’re irrational or pretentious—though there is that—but that they make us stupid.

Added 1/4: Going through my files, I found the following solecism from an article on plagiarism in the New York Times, 1/07/02: "Remarkably enough, in a profession that feeds on data, very little data have been gathered about the behavior of scientists themselves. " It's a kind of bookend to the Economist sentence.



76 Comments

  1. kamo said,

    January 1, 2013 @ 5:51 pm

    I'm not defending or decrying the 'plural data rule' here, but given the Economist is a British publication, could the plural here not simply be a result of that BrE group noun 'Manchester are playing Leeds' thing, and not a result of 'data' pedantry at all?

    "Yet even as Manchester are topping the league, they are also in dire need of a new midfield."

    Both sound fine to my (British) ear, tbh.

  2. James said,

    January 1, 2013 @ 5:54 pm

    Is it possible that The Economist (or its editors) think that 'big data' is a collective, and are using the British “Manchester are playing Leeds” construction you mention later? (That's no defense of plural 'data', but it might be a reason to use plural 'big data' anyway.)
    I don't have a good sense of whether 'big data' could plausibly be thought of as a collective. I think 'Big Oil' could.

    GN: Well, on consideration maybe it is worth enumerating the arguments against this position. In "Manchester are playing next week," Manchester is a singular noun that triggers plural verb agreement. One wouldn't say, for example, "Many Manchester are unhappy about the decision," or "A couple of England will miss the match with Wales." But nobody who holds that "data are" is the only correct form would go on to argue that "many data" is ungrammatical because it's really a singular noun. What makes this argument weird is that it's the historical plurality of the word that creates the insistence on saying "the data are," but here you have people saying, Oh, well, it's true it isn't really a plural noun but it should take plural agreement anyway. Like I said, it's spurious and after the fact.

  3. James said,

    January 1, 2013 @ 5:55 pm

    Ah, sorry, kamo commented while I was writing.

  4. John Hawks said,

    January 1, 2013 @ 5:59 pm

    Big Data seems to have a lot of agency there, helping banks and throwing up competitors and all. Great moments in prose.

  5. Robert Coren said,

    January 1, 2013 @ 6:19 pm

    I'm aware of the British usage "you lot" but does anyone actually say "they lot"? (Or am I being silly in trying to analyze a joke?)

  6. Miles said,

    January 1, 2013 @ 6:25 pm

    Great article. I feel that the term "big data" belongs in the same category as "big business" or "big pharma." It refers to an entity, not a collection of items. But it's not a term I would ever use without an explanation. It's buzzword-ish and non-standard, and in my,opinion should not be encouraged by adoption in print. Meanwhile, the word "data" remains plural, as does "media." This isn't pedantry or prescriptivist, it's just fact.

  7. Adrian said,

    January 1, 2013 @ 7:05 pm

    @Robert: No, we don't, but we do say "them lot".

  8. Glennis said,

    January 1, 2013 @ 7:05 pm

    Most copy-editors are humble free-lance gnomes, constrained by publishers' moribund house style rules. Many copy-editors are also linguists, with no interest in pointless prescriptivism. If you want to get copy-editors on your side, please try to be better informed and not so bloody rude.

    A careful reader would discern that I didn't say anything about copy editors as such, just about fetishists. I've had a lot of very good experiences with copy editors, and I'd never generalize to ascribing any property to the whole class, not even touchiness.

  9. Paul said,

    January 1, 2013 @ 7:40 pm

    I labor under a house style rule which insists that "data" are always plural, which compels me to write around the situations where it oughtn't be, rather than embarrass myself, so I agree with the point you're making, but I think the example cited may not be the best one. My first thought was echoed by the comments from Kamo and James, that "Big Data" is being treated as a collective and "are" is correct, just as the Economist probably would say "Big Oil are."

  10. J.W. Brewer said,

    January 1, 2013 @ 8:04 pm

    A datapoint for the Brits-talk-funny thesis: "Big Data" is presumably a coinage formed along the same lines as "Big Labor," "Big Business," "Big Oil," "Big Pharma," and many other such examples. All of those metaphorical entities take a singular verb, at least in my version of AmEng. But I was able to find without investing too much googling time an instance of "big business" taking a plural verb in an obviously British source, the ungrammatical-for-me sentence being "The reason big business are attracted to developing countries is consistent with neoclassical economic theory that countries with a relative abundance of low-skilled and unskilled labour will specialize in the production and export of goods using their factor endowment."

    On the other hand, presumably other "Big X" forms have appeared previously in the pages of the Economist, and if they *don't* take plural verbs and this one does, that would undercut that explanation.

    GN:

    Well, if "Big Data" had been coined on the model of "Big Pharma," "Big Business," you'd figure it would refer to IBM, EMC, and Google rather than to exabyte-sized aggregations of data. As I said in my Fresh Air piece, only half in jest, a more appropriate parallel would be "big hair." (Actually, my guess is that many hear it as an allusion to "big iron," which has been used to refer to IBM mainframes and the like, and which is itself an allusion to the terrific Marty Robbins song "Big Iron (on his Hip)".)

  11. Ralph Hickok said,

    January 1, 2013 @ 8:09 pm

    @Glennis:
    Applause!

    I am a humble free-lance gnome for a very prestigious physics journal whose style guide insists that "data" must always be treated as a plural, whether I like it or not. However, it's a rule which I have rarely had to enforce, because virtually all of the physicists who write papers for the journal know and follow the rule.

  12. Daniel Ezra Johnson said,

    January 1, 2013 @ 8:21 pm

    "it had to be a copy editor, since nobody competent to write about big data would dream of treating the phrase as anything but singular"

    is there some reason you wrote this (presumably false) statement, other than to make your post more about copy editors than it had any reason to be?

  13. Miles said,

    January 1, 2013 @ 8:47 pm

    As a longtime copy editor, I didn't take offense at any of this. The fact is, many copy editors make careers out of search-and-replace editing, without the kind of thoughtful reflection and care we might expect of someone entrusted with putting the final polish on text for public consumption. I've worked in daily newsrooms where most of the deskers found discussions about grammar and usage to be more annoying than interesting or helpful. Very disappointing.

  14. quixote said,

    January 1, 2013 @ 8:47 pm

    They do say "that lot" and (archaic?) "a bad lot" referring to an individual.

    Which makes the latter lot … what? A collective singular?

  15. Miles said,

    January 1, 2013 @ 8:57 pm

    Meanwhile, I submit the following for consideration. I view each of these as correct:
    Big Media is taking control of small-town newspapers.
    The big media are taking control of Internet advertising.
    Social media are important marketing tools.
    The use of social media is an essential skill.

  16. Dick Margulis said,

    January 1, 2013 @ 9:12 pm

    @Miles: Your last example is a red herring, because the subject is /use/, which is singular in any case.

    Regarding the LL track record on copyeditors (or copy editors, if you prefer), they seem to be subjected to the same sort of generalized slurring as science reporters. It's uncomfortable to watch an entire group disparaged because half of its members fall in the bottom 50 percent. The discomfort stems in part from the sense someone with an MBA is going to get it in her head that science reporting and copyediting are held in such low regard that they ought to be done away with altogether. I don't think that's what Professor Nunberg or Professor Liberman or Professor Pullum wishes to have happen, and I find it puzzling that such skilled writers as they continue to produce statements that are so easily read as broad aspersions against large groups of individuals. They have certainly all published enough articles and books to have a general familiarity with the way publishing works, so their dismissive attitude toward the work done by the galley slaves of the publishing world comes across as both insensitive and elitist.

  17. Jo said,

    January 1, 2013 @ 9:17 pm

    My agenda are so numerous I don't have time to write a lengthy comment here.

  18. James said,

    January 1, 2013 @ 9:37 pm

    GN, in your reply to my 5:54, you seem not to have noticed that I said very explicitly that the defense is no defense of plural ‘data’. My comment suggested that “big data” might be thought to demand plural agreement the way “Big Oil” sometimes could. We can’t assume that the person who made “big data” plural in The Economist did so on the grounds that ‘data’ is plural. Maybe there are other grounds.

  19. Helena Constantine said,

    January 1, 2013 @ 9:51 pm

    On top of everything else, neuter plurals in Greek frequently take singular verbs.

  20. S.T. said,

    January 1, 2013 @ 9:54 pm

    In my experience in the scientific world, it is fairly common to use "data" as a shortcut for both "data point" and "data set", with singular and plural constructions possible for both acceptions.

    A few usage examples:
    Data point, singular: "This data is likely the result of a glitch."
    Data point, plural: "The data are scattered uniformly in the expected range."
    Data set, singular: "The resulting data is plotted in Figure 1."
    Data set, plural: "The collected data are compatible with the model".
    For me, the sentence from the Economist reads perfectly fine.

  21. J.W. Brewer said,

    January 1, 2013 @ 10:10 pm

    http://cacm.acm.org/blogs/blog-cacm/150102-my-scientific-big-data-are-lonely/fulltext has a headline (in a non-UK, computer-nerd-specialist publication) in which "big data" takes a plural verb. But perhaps it is intended as jocular? Data is used with both singular and plural verbs in the body of the piece, but with that use of plural being in the context of a joke. The lyrics of the Marty Robbins song clearly treat "a big iron" as a singular count noun, so if that song in fact gave rise to a computer-nerd usage of "big iron" as an uncountable mass noun that's just further evidence of how permeable that boundary is.

  22. Charon said,

    January 1, 2013 @ 10:20 pm

    My experience in my part of the science world (physics and astro, mostly in the US) is that we all think of "data" as a mass noun taking singular verbs (like "information"). But we mostly all use plural verbs with "data" anyway, because our journals all require that we do in published papers. This is precisely the sort of editorial stupidity decried in this post.

  23. Josh Treleaven said,

    January 1, 2013 @ 10:34 pm

    "Yet even as big data are helping banks, they are also throwing up new competitors from outside the industry."

    "You can see what happened here—the copy editor […]"

    I'm not sure I can see what happened there. But I'm not a forensic etymologist.

  24. Greg vP said,

    January 1, 2013 @ 10:35 pm

    "Big data" is jargon, with the approximate meaning "recently developed techniques and algorithms that find patterns in data collections that contain very many elements".

    Techniques, mind. Data collections by themselves don't help banks. New techniques for discovering information in them may do so. "Big data" is therefore a phrase with an elided noun. Big data…techniques? Big data…analysis? Big data _something_, but what? The practitioners decided that "big data" was a good-enough label.

    "The Economist" goes astray in rather the same manner as your seventy-eight-year-old uncle attempting to dance gangnam style, in order to impress a lady friend: trying to be hip while really being hopelessly old and crusty. It injures itself, and confuses its audience in the process.

  25. Jonathon Owen said,

    January 1, 2013 @ 11:05 pm

    My two cents on the data is/are question.

    As an editor and linguist, I have to agree with a lot of the statements made here about copy editors. While there are certainly good copy editors and good work done even by bad copy editors, the fact is that far too many are mechanical in their approach and frequently do harm to the work they edit.

    The problem isn't that 50 percent are in the bottom half; it's that even the good ones too often err on the side of following the rules rather than following their instincts about what makes good prose.

  26. kamo said,

    January 1, 2013 @ 11:14 pm

    GN – Just to agree with James at 9.54. Unless I'm missing something (which is always possible) you seem to be enumerating arguments against a position neither he nor I have taken. Try this –

    "Yet even as The Treasury are helping banks, they are also throwing up new competitors from outside the industry."

    This sounds fine to me, despite the presence of the definite article. Maybe capping up Big Data might have made it clearer, but it's possible that the use of the plural is entirely unconnected with the presence of the word 'data'.

  27. mgh said,

    January 2, 2013 @ 3:21 am

    The argument against treating data as plural seems like a personal peeve presented as a general rule of "good" writing, with no evidence other than one's own infallible instincts.

    Both "these data are" and "big data are" sound fine to me, with "data" synonymous in these cases with "data sets" (or "corpora", as sometimes used here).

  28. the other Mark P said,

    January 2, 2013 @ 4:50 am

    Miles said,

    Meanwhile, the word "data" remains plural, as does "media." This isn't pedantry or prescriptivist, it's just fact.

    Given that I use data as a singular on a daily basis (I teach statistics) it seems awfully prescriptivist to me!

    A quick Google search would suggest that people who use phrases such as "those data are hard to find" rather than "that data is hard to find" are in a small minority. Does that not bother you in your "fact"?

  29. Thomas Thurman said,

    January 2, 2013 @ 5:03 am

    I've been working in computing, mostly in the UK but also in the US, since the nineties. I have never heard anyone on either side of the Atlantic, in the context of computing, use "data" as a singular noun. It's unrelated to the "Manchester are playing Leeds" question, since whether "Manchester" is singular or plural it's always a count noun. But "data" in computing is a mass noun; questions of singular and plural don't arise.

    It is fatuous to make a statement claiming that a word is universally used incorrectly by an entire profession and then claim that your statement is not prescriptivist.

    [(myl) You need to clarify who you're disagreeing with about what. From a grammatical point of view, mass nouns take singular verb agreement ("water is wet", not "water are wet"). In computing circles, "data" often though not always takes singular verb agreement, e.g. thousands of instances on the ACM's web site. Singular demonstrative agreement is also often found, e.g. "this data is" or "that data is". Is this the sort of thing you've never seen?

    Or are you looking for things like "we found a data that shows …"? That's indeed pretty unlikely, but it's got nothing to do with Geoff's post.]

  30. NW said,

    January 2, 2013 @ 5:36 am

    I find it interesting that geographers, about the only people who regularly need to talk about a singular datum, very sensible use the plural 'datums'. They of course like all other scientists routinely amass data too, so the distinction is useful. The final nail for the idea that 'data' is plural of something.

  31. Dick Margulis said,

    January 2, 2013 @ 7:27 am

    @Jonathon Owen:

    "The problem isn't that 50 percent are in the bottom half; it's that even the good ones too often err on the side of following the rules rather than following their instincts about what makes good prose."

    I think you're missing the point that copyeditors work at the pleasure of managing editors who have address books full of willing freelance cannon fodder. I work independently and can be as thoughtful and sensitive as I wish, but most copyeditors have to do what they are told if they want more work. So it seems unfair to blame them for following the rules they are entrusted to follow. Making good prose is above their pay grade.

    Now I have to add something here that seems not to be understood by some people. Most raw manuscripts are cesspools. The vast majority of authors—people with knowledge and authority who have a story to tell or a theory to espouse—are not writers. They may have passed high school English, but that doesn't mean they can string words together into grammatical sentences (by anyone's definition of grammatical). In academic, scientific, and medical journals, a great many articles are submitted by people who do not have English as their first language, and even the authors whose first language is English don't necessarily have a great command of it. So while it's true that a good writer may occasionally take offense at what an overzealous copyeditor does to ruin a perfectly good sentence, most of the time copyeditors—even the bottom half—are performing a valuable service, turning semi-random strings of characters into at least halfway intelligible English prose. And most of them are doing it for pay that I daresay no college professor would deign to work for.

  32. NW said,

    January 2, 2013 @ 7:44 am

    To add to myl's comment on Thomas Thurman's comment: non-count (mass) nouns don't take quantification: *each water is, *each information is, *each patience is, *three silvers are, *three furnitures are, *three equipments are; but almost all of them take singular agreement in verbs, pronouns, and determiners: this water is famous for its purity, and likewise with abstract nouns like patience, concrete collectives like furniture and equipment, and incorporeal collectives like information. There are a couple of plural (largely) non-count nouns: these police/cattle are famous for their viciousness. Clearly 'data' in normal use matches all the properties of the majority type, singular non-count nouns.

  33. MattF said,

    January 2, 2013 @ 7:45 am

    Note that in some technical fields (e.g., geodesy, surveying, and navigation), 'datum' has a specific meaning:

    http://en.wikipedia.org/wiki/World_Geodetic_System

    so writers in these fields -have- to use 'data' for both singular and plural contexts. It's confusing only when your manuscript comes back from the editor and half corrections have to be undone.

  34. Robert T McQuaid said,

    January 2, 2013 @ 10:51 am

    English generally makes compounds out of the singular, so we have an apple tree, never an apples tree. So far, none of the folks who write "these data" have been silly enough to suggest that IBM is in the datum processing business.

  35. Rob P. said,

    January 2, 2013 @ 10:56 am

    Paul – I ran your proposed test as the Google search site:economist.com "big oil are" and the only hits I got were of the type "big banking and big oil are," none in which "big oil are" was used to mean that collectively, the companies making up big oil are taking some action.

  36. Albrecht said,

    January 2, 2013 @ 11:26 am

    As a non-Anglosaxon I may bring a different perspective. A few observations:
    1. Where does the intense emotion come from that pops out at the reader from this discussion before any of the content registers??
    2. The example from the Economist doesn't make any sense with singular data either. Methinks the choice of the word "data" is the real problem the copy editor ought to have addressed.
    3. The use of the alleged singular "data" must grate on anyone who has ever taken elementary Latin. The other example "visa" is even worse; one of these stamps in the passport is a "visum", several of those are "visa". Americans however resort to the "super-plural" "visas"…
    4. The idea that "data" is a plural is not etymology, but grammar. (And by the way, the trousers example has a lot of cousins: scissors, pliers, tongs, probably knickers too. A special subgroup of pluralia tantum: two parts fit together to result in a functional unit.)
    4. The specifically Anglosaxon (and even more specifically American) idea that "grammar does not matter" is not always easy to forgive for those of us who speak an idiom where it does matter…
    5. However, who am I to try and teach 300.000.000 Americans their language? I better just live with the occasional grammatical unease.

  37. Mark F. said,

    January 2, 2013 @ 12:58 pm

    I think the message here shouldn't be that copyeditors aren't courageous enough, but that the style guides are just wrong and should prescribe singular agreement for "data" unless it really is being used as a plural.

    I second Dick Margulis' comment. The labors of copyeditors make journals much more pleasant to read than they otherwise would be.

  38. Tanya said,

    January 2, 2013 @ 1:02 pm

    Amen.

    I spent several years as a technical writer at a military lab where everyone used "data" as singular except for the lead editor. Many (including me) would go to her with examples from reputable and relevant publications, and from various style guides, trying to get her to change her ways, at least in some instances. (Our own style guide was silent on the matter. For some reason, the style guide she insisted the lab use was targeted to administrative assistants rather than something more relevant like, oh, the sciences or government.) She refused to budge. Now, really, in most cases, phrases like "the data are" sound slightly pretentious as plural, but it's pretty harmless. It's easy enough to just go with her preference. But there were some sentences where it just sounded stupid, and she continued to insist that the sentence sound stupid or she wouldn't sign off. Sometimes I'd leave it sounding stupid (without her approval, the document couldn't move on), and sometimes I'd rewrite the sentence, which often led to some really bad sentences (which she'd actually approve since it didn't break any of her hard and fast rules).

    Just remembering the misery of working under her brings back so much stress….

  39. KWillets said,

    January 2, 2013 @ 1:06 pm

    "Big Data" is indeed vague both syntactically and semantically. For some time I've favored the idea of each individual bit being large, say ten feet across. Apart from that, it will have to be called "Small Data" in a few years anyway.

  40. Miles said,

    January 2, 2013 @ 2:09 pm

    @the other Mark P:

    "Given that I use data as a singular on a daily basis (I teach statistics) it seems awfully prescriptivist to me! A quick Google search would suggest that people who use phrases such as "those data are hard to find" rather than "that data is hard to find" are in a small minority. Does that not bother you in your "fact"?"

    The only part that bothers me is that sloppy usage is being justified by a call to popular opinion. I'm not comfortable using the "Ask the Audience" lifeline to arbitrate grammar and usage decisions. The belief that this process will yield a correct answer is a fallacy. Yes, language is in a state of constant flux, and it is helpful at times to know what will be most familiar to a certain class of readers or listeners. But as a communicator, it's my job to set an example of clarity, style and grammar that will stand the test of time — to take the lead, not to follow the crowd.
    (Yeah, I know; my Who Wants to be a Millionaire? reference is pop-culture ephemera that's already dated. Perhaps, in this case, asking the members of the audience if they "get it" would have been appropriate!)

  41. Miles said,

    January 2, 2013 @ 2:16 pm

    @Dick Margulis:

    @Miles: Your last example [The use of social media is an essential skill] is a red herring, because the subject is /use/, which is singular in any case.
    Thanks. You are absolutely right. It wasn't meant as a red herring, but as an example in which the phrase social media is is grammatically correct.

  42. J.W. Brewer said,

    January 2, 2013 @ 2:20 pm

    Rob P.: thanks for running that check. I checked for "big business are" in the same way, and interestingly enough you can certainly find examples at economist.com like "That's why big business are holding on to cash, and why production is still curbed" which seem consonant with that whole BrEng usage where they say "Parliament are" where we'd say "Congress is." But the four or so I found seemed to be all reader comments, not the edited prose of writers on the Economist's own payroll. And Prof. Nunberg asserts above that "Big Data" is in any event not the "Big Business/Oil/etc." type of "Big X" snowclone, but an instance of an allegedly distinct rival snowclone (with smaller market share) responsible for "big hair" and "Big Iron."

    But I'm still stuck with the oddity that according to Prof. Nunberg's survey above "data is" and "data are" coexist at the Economist in roughly equal proportions. So there isn't some sort of crazy house-style dictatorship uniformly insisting on "are" at the expense of "is," which fetishistically failed to notice that are-enforcement was even dumber than usual in this specific context. Or does it depend on which copyeditor is on shift when a particular piece comes across the desk, with "is" fetishistically enforced on even days and "are" fetishistically enforced on odd days? (Not to defend copyeditors that much, but Prof. Nunberg seems to have overlooked the possibility that the infelicity here was the fault of the anonymous journalist's original draft rather than being introduced in the editing process. If he thinks that no journalist on the Economist's payroll would ever if left to his own devices use the NP "Big Data" with a plural verb because only an ignorant fetishist would do that, I think he's being naive about journalists.)

    GN: Could be, but it's unlikely. The Economist's reporters are familiar with the industries they cover — these aren't jack-of-all-trade newspaper feature writers — and in the industry, "big data" is pretty invariably singular. In fact I spoke at an Economist conference on big data last year (they're doing another one this year), and neither there nor at the conference on big data held by the UC School of Information (at which Mark and I were participants) did I hear anybody, journalists included, construe the phrase as a plural — and trust me, I would have noticed. It just doesn't make semantic sense, if you know what the phrase refers to. (I suppose it's possible somebody slipped in a plural verb while I was on a bathroom break, but I had Mark covering for me, just in case.) There remains the possibility that the unbylined author of the Economist article used the plurals fatalistically, in anticipation of a copy editor's inevitable insistence on them, but I doubt it, and in any event, the responsibility in that case would still lie with the editor–or more appropriately, as some commenters have observed, with the publication's style guide.

  43. Erin Brenner said,

    January 2, 2013 @ 4:37 pm

    I'm not sure how I feel about being described as a "gnome" who works "invisibly below decks to ensure that the engine of prose runs smoothly." Certainly copyeditors should be invisible in the text, but we are not slaves to be hidden away (though many are paid that way).

    I agree with the argument about "data is/are," as do many of my colleagues. And then there are those who don't. It's the nature of the beast that *all* those who work with words feel a certain ownership over them and have their blind spots about what's right and what's wrong. This is one of them.

    We need to work toward eliminating our blind spots and helping others to eliminate theirs, but respectfully and keeping in mind that copyeditors don't necessarily have total control over the rules they must employ.

    At Copyediting.com, we encourage copyeditors to think about the rules and look for the reason behind them, not to blindly follow them. Yet, as others have noted, some copyeditors are required to follow some rules strictly, foolish as those rules may be. He who writes the checks still makes the rules. This is as true in copyediting as in any other job.

  44. OrenWithAnE said,

    January 2, 2013 @ 5:57 pm

    "To add to myl's comment on Thomas Thurman's comment: non-count (mass) nouns don't take quantification"

    But the waters of Northern Minnesota are peaceful …

  45. Eugene said,

    January 2, 2013 @ 6:06 pm

    @Albrecht (no intense emotion intended)
    Re: #3
    We aren't discussing Latin usage. The question is what English speakers do with the borrowed term, data. It doesn't look or feel like a countable plural. Scientific data usually doesn't seem any more countable than sand. We overwhelmingly tend to treat it as a mass noun. That's our native speaker intuition.
    We care a little more about how English speakers in Great Britain use the word. However, we understand them quite well regardless of the choices they make. They speak a different dialect. On the matter of "data," though, they are less decisive than North Americans.
    Irregular forms, irregular plurals, are always at risk for being regularized. Keeping an irregular alive in specialized circles isn't necessarily bad, but it's a case of keeping your finger in a dike.
    Re: #4
    Nobody who reads this blog thinks that grammar doesn't matter. Most of us think that arbitrary definitions of what is grammatical shouldn't matter.

  46. Adrian said,

    January 2, 2013 @ 9:36 pm

    Since the copy editors at The Economist are relatively benign, I assumed that the author of the article had themselves written "big data are" (either because they're a language peever – there's plenty of them about – or simply because they believed it to be idiomatic in this case) and the editor(s) had chosen not to amend it.

  47. Jonathon Owen said,

    January 3, 2013 @ 12:17 am

    @Dick Margulis:

    "I think you're missing the point that copyeditors work at the pleasure of managing editors who have address books full of willing freelance cannon fodder."

    I wasn't missing it so much as ignoring it for the time being. Yes, it's true that many editors don't have much say in what rules they apply, but it's also true that many editors consistently enforce rules that are not specified in their style guides, as Tanya's story above illustrates.

    I've worked as a copy editor (or alongside copy editors) for over a decade, and I've seen a lot of editors impose bad rules on decent prose when they didn't have to. The blame doesn't always rest with some managing editor or style guide author inflicting nonsensical rules on underlings or freelancers. I think a good many editors take mastery of the rules—as demonstrated by their enforcement—as a point of pride.

    I agree with everything else you said, though. Editing is a mostly thankless and poorly paid job whose practitioners are invisible until they screw something up. Those screw-ups are exasperating, but on the whole editors do far more good than harm.

  48. Jonathon Owen said,

    January 3, 2013 @ 12:22 am

    @Miles:

    "The only part that bothers me is that sloppy usage is being justified by a call to popular opinion. I'm not comfortable using the 'Ask the Audience' lifeline to arbitrate grammar and usage decisions. The belief that this process will yield a correct answer is a fallacy."

    You're presupposing that the usage is sloppy and then coming up with post hoc rationalizations. Can you give me a process for yielding answers about what correct usage is that doesn't boil down to one type of fallacy or another? Everything ultimately runs in to the is–ought problem.

  49. Olof said,

    January 3, 2013 @ 1:13 am

    Picard: Mr. Data, are you enjoying your job as a copy editor at the Economist?

    Data: Yes, Captain, I are.

  50. Robin said,

    January 3, 2013 @ 1:34 am

    Treating "data" as plural is deeply unnatural for me and I hate doing it, but I do it because I observe my professors doing it and worry that they will think less of me if I don't. (Not because of its alleged ungrammaticality — they are linguistics professors and undoubtedly know better — but because, I suppose, they might expect me to play by the rules of the game, even if we all know that some of those rules are made up, just as they would expect my papers to be written in an appropriate academic register.)

  51. Danthelawyer said,

    January 3, 2013 @ 1:46 am

    Seems to me that Greg vP at 10:35 got it exactly right: the problem in the original sentence lies not so much in the data is/are issue as in an essential fuzziness of thinking.

    Incidentally, I've always been something of a pedant in my use of "data are," but having read the comments above, I think I will abandon that position.

  52. John F said,

    January 3, 2013 @ 4:18 am

    ST wins until his last sentence. Data is a plural word, but can be short for 'data point' singular (i.e. datum) or 'data set' singular (i.e. a collection of datums, or groups of related data).

    'Big data' is a well-known concept in computing and data analysis and there is no defence for saying 'Big Data are', unless you really do mean big to be an adjective of a datum, to differentiate a big datum from a small datum.

  53. Dick Margulis said,

    January 3, 2013 @ 6:52 am

    @Jonathon Owen: Agreed that some of the people who choose copyediting for a trade are drawn to it because they enjoy imposing a foolish consistency. I'm not trying to defend their little minds; I'm just saying that it's not possible to look at a piece of published prose and reliably ascribe responsibility for its disfluencies. It's generally unfair to blame the copyeditor unless was dere, Charlie.

  54. Dick Margulis said,

    January 3, 2013 @ 6:59 am

    @Jonathon Owen: Agreed that many people are drawn to copyediting because of their propensity for applying a foolish consistency. I'm not trying to defend their little minds. I'm just saying that it's an unfair presumption to ascribe blame for disfluencies in published prose to the copyeditor unless you was dere, Charlie.

  55. Dick Margulis said,

    January 3, 2013 @ 7:00 am

    Well, I didn't mean to post that twice. Server glitch made it look like I didn't post it the first time.

  56. Ian said,

    January 3, 2013 @ 8:00 am

    I may be missing something, and have certainly not read the many comments on the topic, but is it not common in Bitish English usage to use a plural verb conjugation when describing a singular group, which itself is comprised of multiple individuals or entities? For example, "The government are morons."

  57. The Lana said,

    January 3, 2013 @ 10:35 am

    Copy editors, like other homo sapiens, have a particular tendency to fall prey to their fear of looking like a colossal ass, i.e., committing an error, and then being eviscerated by some disgruntled editor-in-chief or, perhaps worse, knowledgeable grammarian ; )

    I really enjoy your clever and humorous writing! No one can argue you are whip smart.

    But, try not to be so hard on copy editors. Even the ones who desperately wish to be right and, in their attempts, get it wrong.

  58. Jonathon said,

    January 3, 2013 @ 12:45 pm

    @Dick: That's a very fair point. A copy editor may be more likely to enforce plural data than the writer, but there are still plenty of writers who have been trained to use plural data. I've dealt with some authors who stubbornly insisted on following certain rules even though I tried to explain that they were silly superstitions that they didn't need to worry about. And since we have only the final version and not the editor's marked-up copy, we can't know for sure who is responsible for the offending data here.

  59. Alan Riston said,

    January 3, 2013 @ 1:34 pm

    The data is stored on the hard drive.

    Versus

    The data are stored on the hard drive.

    I propose that if you say the latter you're pretentious and largely non-communicative.

    Language always evolves.

  60. Miles said,

    January 3, 2013 @ 6:53 pm

    @Jonathon Owens:

    You're presupposing that the usage is sloppy and then coming up with post hoc rationalizations….

    Touché. I should have omitted the word "sloppy." The fact is, I'd be equally bothered by superbly skilled, elegant and correct English being justified by a call to popular opinion. The fact that most people believe something doesn't make it right — or wrong. Not in trivia quizzes, and not in English usage.

  61. Keith M Ellis said,

    January 3, 2013 @ 7:41 pm

    The fact that most people believe something doesn't make it right — or wrong. Not in trivia quizzes, and not in English usage.

    I don't even…

  62. Jonathon Owen said,

    January 4, 2013 @ 12:39 am

    @Miles:

    I'm still interested to know what you think does make something right in English usage.

  63. Miles said,

    January 4, 2013 @ 1:40 pm

    @Jonathon Owen:
    I saw that coming…. Well, there's no simple answer. My usual process is to consult a dictionary (the Oxford Canadian), the Canadian Press Stylebook and the Yahoo Style Guide. I'll look at etymology and tradition, and consult with colleagues whose knowledge and opinions I respect.
    I also try to determine a rationale for whatever usage, spelling or style is being considered.
    A good example is the decision about whether or not to hyphenate "e-mail." Colleagues' opinions vary, but I feel there are several very good and logical reasons to retain the hyphen (pronunciation, consistency with other initial-based words, and the fact that "email" is French for "enamel" being chief among them), and not a single rational argument for losing the hyphen.
    The fact that many style guides and dictionaries now prefer the non-hyphenated version in no way convinces me it is right. I need to know why, based on a lot more than just "most people do it."
    I have no problem at all changing my usage practices in an instant if I can see a sensible reason to do so. When it comes to "data," I'm far from convinced.
    Back to the topic at hand, I think the term "big data" (or is it "Big Data"?) is merely a nugget of annoying jargon that will mercifully go the way of "cyberspace" soon. Unless you're writing for buzzword-addicted business types, explain it or avoid it. IMHO.

  64. Miles said,

    January 4, 2013 @ 2:00 pm

    I should also mention that Bill Walsh's authoritative, practical and entertaining book, Lapsing Into A Comma, is one of my most-relied-on resources for arbitrating usage, style, punctuation and grammar questions.

  65. J.W. Brewer said,

    January 4, 2013 @ 2:32 pm

    I'd like to see the example sentence where "email" could plausibly be construed in context as either "the English word also spelled e-mail" or "the French word meaning 'enamel.'" Canada must be a weirder place than I had supposed if that is actually a practical objection to an unhyphenated spelling of the English word.

  66. Miles said,

    January 4, 2013 @ 4:01 pm

    @J.W. Brewer:
    That's just one of a long list of reasons for rejecting "email" as a potential word in English. Not sufficient in itself to dismiss it, but it certainly adds weight. Anyone who's bought paint in Canada (and was paying attention) has seen the French word "email" often enough to recognize it. (All product labelling here is bilingual.)
    The main point is that if we want a letter to "say its name," we need a device to indicate so. In standard English, that's a hyphen, sometimes combined with capitalization, e.g. "A-frame." In the advertising world, we get creations like iPod, where the pronunciation of the "i" is indicated by the change in case.
    Meanwhile, "e-mail" is an initial-based term, like U-boat, T-shirt, I-beam and the aforementioned A-frame. Removing the hyphen turns it into nonsense.
    I'm open to being convinced otherwise…. Got anything?

  67. Keith M Ellis said,

    January 4, 2013 @ 6:01 pm

    A long time ago, in a galaxy far, far away, I was the director of the customer support department at a regional ISP. The head honcho, part owner, was pretty much exactly the embodiment of the very worst stereotypes of a prescriptivist, peeving, tyrannical boss. And he, like Miles, just couldn't accept the un-hyphenated email. He forbade it. In all contexts.

    One day he sent me an, er, e-mail demanding that I formally warn, in writing, one of the support staff for that person's use of email. You might say, well, house style and all, his prerogative. But this was for that employee's use of email within a support staff discussion on an internal mailing list, which the boss was, of course, monitoring (he liked to stand behind people and watch them work, too). I resisted up until the point that he threatened to write me up with a warning for refusing his order.

    That wasn't a fun place to work.

    …Ironically, as it happens, as this ISP was owned by and grew out of a tabletop gaming company. (Some readers will know who/what this is and of the fascinating, unique, and important 'net-culture history.)

  68. Rick Bryan said,

    January 4, 2013 @ 6:06 pm

    We sure need an uncountable noun to refer to a mass of, well, data. Evidently "data" has been pressed into service, taking on that job while still serving in its original role as the plural of "datum". The pedant who objects had better propose a better replacement. I hate hearing "media" as singular, but I have no rebuttal to my own argument here.

  69. DW said,

    January 5, 2013 @ 8:44 am

    As another humble copy editing gnome, I completely agree that "plural data" has become a fetish and shouldn't be enforced thoughtlessly.

    I'm glad to see, however, that a few others are starting to speak up about the relentless contempt for copy editors that is routinely expressed on this blog. It is tiresome, and mindless. Can't you get that copy editors are just people working for a living, we don't write the style manuals? On languagelog, everyone else's linguistic peccadilloes are treated as just so much fascinating fodder for linguistic analysis. Copy editors' peculiarities are thought to make us simple morons.

    Is it so hard to understand that, as offensive a practice as "search and replace" editing may be, copy editors work under time constraints that are not of our choosing?

    I presently have my dream job, and I (usually) get enough time to do the job right, and I often earn authors' appreciation for handling their text thoughtfully and carefully. But I have also had plenty of experience with the type of job most copy editors have, where the material that has to be got through is about sixteen times as much as anyone could possibly even read in the amount of time allotted, let alone make actual helpful editorial improvements. And that's the way it is. Does it ever occur to you that MONEY is the root of this, not the low IQs of copy editors?

  70. DW said,

    January 5, 2013 @ 9:13 am

    A couple other points. First, yes, "cess pool" is accurate for the state of many raw manuscripts in academia. I am going to go out on a limb and suggest that, although I have not done any editing in linguistics, I bet linguists' manuscripts are no different.

    Second, authors would often be surprised to get a glimpse of the lengths good editors will go to in attempts to rescue authors' work from editorial pettiness. Some journals run editorial software that ostensibly "cleans up" manuscripts pre-production, but also is often based on enforcement of terrible petty rules. I freelanced for awhile for a journal running software like this, and one of its features was a refusal to accept any sentence beginning with "However." It flagged them and literally wouldn't allow you to finalize the paper if any sentence in the file began with "however"; and you could not return the paper to them – and get your paycheck, and get future work from them … if the paper didn't "pass" the software check. But then, authors would write to us complaining about this. The freelancers working for this journal often compared notes – exchanged grievances, that is – and ultimately we figured out how to trick the software into overlooking "howevers." I also got in the habit of making lists of little things that should be done with the files AFTER the software check, which technically wasn't allowed – for instance, reverting sentences with "however" in them to stet the author's original text. This is time consuming, and the publisher didn't want to pay us for that, believe me. They bought the software in the first place in order to pay freelancers less!

    Finally, let's not overlook that editors also often deal with authors who insist on their own petty grammatical rules, and decide the copy editor herself isn't competent if she didn't change "hopefully" to "I hope that …" (or worse, "It is to be hoped that …") With my regular authors, I usually have little mental checklists of the individual's preferences/ peculiarities/ things they expect me to change, things they won't allow me to change, etc. Regardless of my personal opinions about "the rules" – or their idea of the rules – working with authors in a way that is helpful to them is my ultimate job description. I have to pick and choose my battles, and really, "battling" isn't helpful to an author anyway.

    Really, we aren't all morons. We actually are usually smart, well educated people who are interested in language. You really just shouldn't look at a final text and conclude that anything that may be wrong with it is the copy editor's fault.

  71. DW said,

    January 5, 2013 @ 9:20 am

    Jonathan Owen:

    "The problem isn't that 50 percent are in the bottom half; it's that even the good ones too often err on the side of following the rules rather than following their instincts about what makes good prose."

    This is true, but it's not because we're just not too bright. It's because it takes MUCH LONGER to edit thoughtfully and carefully – reading a text for understanding, and stepping back to think, paragraph by paragraph, "Is this good prose? How can the prose in this sentence or paragraph be improved?" – than it does to just barrel through a text mechanically applying a checklist of memorized style points, such as "data are" not "data is" …

    Guess which of these our employer pays us to do? Guess what happens to us if we decide we'd like to edit thoughtfully, i.e., take twice as long as allowed with the text? (We get fired.)

  72. Richard Wein said,

    January 5, 2013 @ 1:19 pm

    Miles,

    The reason why those style guides recommend "email" probably is because that's what most people write. But there's a difference between recommending a usage and saying that it's "right".

    Linguistic rightness is a very fuzzy concept, and there's often no one right answer. Both "email" and "e-mail" seem acceptable to me, though I use "email" myself. If "e-mail" suits you better, that's fine.

    I think you must accept that frequency of use has some relevance to the question of how to speak. You probably won't want to continue using "e-mail" if one day you find you're the only person in the world still using it. Languages inevitably change over time. On the other hand, you don't have to bow to the current majority usage. That's just one factor to consider. Consistency with more general rules is another factor. But no one factor is absolute. Some very well-established usages conflict with more general rules, such as "children" as the plural of "child". We continue to accept that inconsistency because it's so well-established.

    In choosing the most appropriate usage we need to weigh a number of factors, and there are no fixed rules for this weighing. So there's plenty of scope for different people to make different judgements, without any of them necessarily being wrong.

    There are factors favouring both "data are" and "data is". I think "data are" is on the way out, but I'm sure it will linger for a long time. Personally, I'm in a transitional state where "data is" offends my sense of grammar, but "data are" sounds pedantic. Since neither strongly commends itself to me, I use both, depending on context, mood and whim.

    On the whole, foreign forms tend to get anglicised over time. I'm happy with that process, and I made the decision a long time ago to switch to "formulas", "stadiums", etc. I guess I'm a linguistic follower, not a leader, since I waited for those forms to be well-accepted before I switched. I don't stick my head out. But if there were no leaders, no inventors of new usages, language would never change. How dull that would be.

  73. J.W. Brewer said,

    January 7, 2013 @ 2:14 pm

    http://www.theonion.com/articles/4-copy-editors-killed-in-ongoing-ap-style-chicago,30806/ is a timely warning to everyone not to gratuitously insult members of the notoriously violence-prone copy-editing profession. Miles may wish to note the link between hyphenation and violence.

  74. Rubrick said,

    January 7, 2013 @ 8:06 pm

    J.W. beat me to that.

  75. Daz said,

    January 11, 2013 @ 8:20 am

    In a large majority of contexts it's most felicitous to construe data as singular.

    But in a discussion of this and that individual datum — as is needed occasionally in statistical analyses — then in that context it's most natural to construe data as plural.

  76. Miles said,

    June 3, 2013 @ 6:28 pm

    Analysises.

RSS feed for comments on this post