The best statistical test

Today's xkcd:

Mouseover title: "We reject the null hypothesis based on the 'hot damn, check out this chart' test."

When I was an undergrad, we called this the method of interocular trauma (i.e. "what strikes the eye").


  1. mg said,

    December 21, 2020 @ 5:32 pm

    In the case of the vaccines for COVID-19, absolutely. On the other hand, I've seen graphs that give interocular trauma where the N was small or the standard deviation huger, so that adding confidence bands led to seeing one big cloud.

    [(myl) In my opinion, for the method to be valid you need to plot every data point, not just a model fit (which indeed can be totally misleading when the confidence intervals are large…).]

  2. martin schwartz said,

    December 21, 2020 @ 5:40 pm

    So it may now be said that "data" is dead as a plural contrasting, as it still does in my usage. with the singular "datum". I do remember noting that linguists were using "data" as a singular fifty years
    ago. I would be a proscriptive linguist it I didn't know, as a historical linguist, how futile and silly that would be. I mention all this because I think of Mark Liberman as a stickler for conservative

  3. cameron said,

    December 21, 2020 @ 5:52 pm

    Yeah, "data" is a grammatically singular mass noun. "Datum" exists as an option, but is only slightly less ridiculous than using "raviolo" as the singular for "ravioli"

  4. Y said,

    December 21, 2020 @ 6:02 pm

    As a professor of mine said once about something, "That's not a subtle effect!" Not as witty as "interocular trauma", but it's what stuck with me for this kind of observation.

  5. Y said,

    December 21, 2020 @ 6:08 pm

    Personally, I tend to use data as a mass noun with singular agreement "the data shows…" when talking about data collectively, say about a body of observations by many people over a period of time; but with plural agreement, e.g. "the data show…" from closer up, say a collection of data points within a particular study. Either usage is acceptable to some degree.
    Datum is not as ridiculous as raviolo but sounds a little old-fashioned or pedantic. Data point seems like the common usage.

  6. Laura Morland said,

    December 21, 2020 @ 6:09 pm

    @Martin, I still use "data" as a plural noun, although truth be told I avoid using the word "datum."

    However, data, like ravioli, are most commonly seen in groups, and so avoiding the word "datum" is not a problem.

    Mr. xkcd could easily have written "Statistics tip: Always try to get data that are good enough that you don't need to do statistics on them."

    I doubt anyone would have remarked on such a sentence. Am I right?

  7. Peter Taylor said,

    December 21, 2020 @ 7:13 pm

    @martin schwartz, do I dare ask your opinion on the use of the plural "datums" in geodesy?

  8. Daniel Barkalow said,

    December 21, 2020 @ 7:13 pm

    Note that the original graph was all horizontal and vertical lines, so you could get a lower bound on the events that were being shown. Also, the point of the graph was mostly that the vaccine doesn't work for the two weeks before the lines diverge, with a side point that you need a booster at the end because the red line is still going up a little. It's particularly impressive when your graph of the problems with your treatment can't help but show how much better than control it still is.

  9. martin schwartz said,

    December 21, 2020 @ 10:33 pm

    I used "datum" in writing an article yesterday, but it was unavoidable. I didn't know "datums", but if that's the usage in geodesics, fine. And fine to "data" as a singular, collecive or not.
    Tho I still am conservative in such matters, I know that change
    is a potential built into language, and spend a lot of time pursuing more interesting exx. of the phenomenon. I will not speak of unravelling a raviolo, nor of confecting a confetto, or exspagating a spaghetto. I still DO NOT approve of epigrahers speaking of a graffiti; there graffito is still a technical term,I hope. On the whole however (and pardon my grandiose eloquence(, WHATeverr.

  10. martin schwartz said,

    December 21, 2020 @ 10:39 pm

    …Nor will I order a biscotto in a café, especiailly avoiding such during the COVID crisis.

  11. martin schwartz said,

    December 21, 2020 @ 11:21 pm

    I meant "order a biscotti". And then theres the media. — medium mess.

  12. Viseguy said,

    December 21, 2020 @ 11:51 pm

    Which is further out the door, datum or criterion?

  13. Andreas Johansson said,

    December 22, 2020 @ 1:02 am

    While I'm aware some people use "criteria" as a singular, I don't think I've run across anyone objecting to "criterion" before.

    At work, "datum" is a technical term meaning something that's the same in all versions of a product. It's apparently a and adjective, people say things like "That one is datum, those ones are also datum."

  14. Andreas Johansson said,

    December 22, 2020 @ 1:07 am

    Somehow I managed to type "a and" for "an". Sigh.

  15. David Morris said,

    December 22, 2020 @ 6:34 am

    Several years ago my wife and I had lunch in a very fancy restaurant and ordered 'raviolo' for entrée. I assumed it was a misprint until the waiter brought out one very big one.

  16. S Frankel said,

    December 22, 2020 @ 7:14 am

    @martin schwarz
    I believe that, in America at least, the singular of "biscotti" is "one of the biscotti."

  17. Mark P said,

    December 22, 2020 @ 9:00 am

    I think I have commented here before about someone I worked with at another company who would send out an agendum for a meeting.

  18. J.W. Brewer said,

    December 22, 2020 @ 10:07 am

    "Data point" is FWIW a key piece of evidence for "data" as a mass noun, because only mass nouns need singulative constructions, like "grain of sand" or "piece of furniture." ("One of the biscotti" is a bit more dubious a candidate.) Some languages (Welsh is I think a standard example) have regular morphological inflections for doing that, but English is more ad hoc.

    The google n-gram viewer, FWIW, shows datum becoming less common and data point more common since c. 1950 even though (in the sort of texts the relevant corpus captures) datum is still slightly more common.

  19. Robert Coren said,

    December 22, 2020 @ 10:55 am

    There's also panini, which seems to be immovably established as a singular in English (at least USAn) usage (likewise cannoli). And I don't think I've ever heard anyone refer to a zucchino, and I've been hearing and talking about the squash for a lot longer than the sandwich.

  20. Robert Coren said,

    December 22, 2020 @ 10:57 am

    And of course there are all the kinds of pasta, not just ravioli, but I'm not counting them because they do tend, as Laura Morland said, to appear in groups.

  21. mg said,

    December 22, 2020 @ 11:58 am

    You can tell you're speaking to a statistician if they use data as a plural noun (the younger ones no longer do, but those of my generation were trained that way).

    As to the vaccine having no effect for two weeks, it always takes time for the antibody response to a vaccine to fully develop. In the couple of vaccine studies I've worked on, our first check of the antibody titers was at 7 days post-vaccination – it takes time for the body to manufacture them.

  22. Armando di Matteo said,

    December 22, 2020 @ 12:15 pm

    I use "data" as a grammatically plural non-count noun, like "police". If I want a count noun, I use "dataset(s)" or "data point(s)" depending on which one I mean.

  23. Terry K. said,

    December 22, 2020 @ 12:31 pm

    @martin schwartz

    Yes, "a graffiti" seems wrong, but so does "a spaghetti", "a confetti", "a data". Those are collective/mass nouns that take a singular verb, not singular nouns. An instance of graffiti (or just graffiti with no article), a piece of spaghetti, a piece of confetti, a data point (or a piece of data, or a datum).

    On the other hand "a ravioli" is fine for me. More likely is, say, "6 ravioli", but that does put it firmly in the countable category, thus in need of a singular. And ravioli works for me as a singular.

    That's different from the others. No "10 spaghetti", "600 confetti", or "200 data". "2 graffiti" is marginally okay, along with "a graffito". (And "200 datums" works, but not "200 data".)

  24. Terry K. said,

    December 22, 2020 @ 3:57 pm

    It strikes me, the argument that you can't say or write "the data is…" because "data" is (allegedly) plural is similar to the argument that you can't say "they are" when using "they" for a singular person. And in both cases, it's trying to apply a logic that language doesn't use.

  25. Dan Faulkner said,

    December 22, 2020 @ 4:16 pm

    "Datum" is very much alive in the context of mechanical engineering drawings, but not in the sense of "data point." Rather, we use "datum" to refer to a particular feature on a part (e.g. a plane, line, point, axis, etc.) which is used to define a coordinate system. The positions of other features of the part are then measured (and reported) relative to those datums (yes, the plural is "datums"). We also use "datumed," as in "Are you sure that part is datumed correctly?" and "datuming" as in "I haven't finished datuming that drawing" or "I'm not sure if that datuming scheme makes sense." I suspect that surveyors use these words in similar ways.

  26. Julian said,

    December 22, 2020 @ 5:58 pm

    Datum, criterion, raviolo – anything else on the agenda?

  27. Josh R. said,

    December 22, 2020 @ 7:30 pm

    I feel like "data" could only go the way of a mass noun because it does not seem to go well with concrete numbers. One might very well say, "a datum," but something like "5 data," or even "many data," while technically grammatically correct, butts up hard against my Sprachgefuehl. The only thing that makes "data" a plural noun, then, is vestigial verb conjugation based on etymology.

    "Criteria," at least, has not yet gone this far.

  28. Graeme said,

    December 23, 2020 @ 7:04 am

    I differ. That graph needs some numbers.
    You are all reading this assuming it's a big trial in a region with widespread infections.
    So that the x-axis is a few months and the y-axis is appreciable numbers of cases.

  29. David W said,

    December 23, 2020 @ 4:13 pm

    "The North American Datum (NAD) is the horizontal datum now used to define the geodetic network in North America. A datum is a formal description of the shape of the Earth along with an "anchor" point for the coordinate system."

    — from WP, "North American Datum"

  30. David W said,

    December 23, 2020 @ 4:16 pm

    There is also "criterium," which is a kind of bicyle race; the plural is "criteriums."

  31. Robert Coren said,

    December 24, 2020 @ 10:37 am

    @David W: Which doesn't stop people from using "criteria" as a singular. Worse still, I see people using phenomena as one as well, and pluralizing it to phenomenae.

    I never studied either Greek or Latin formally, but I do recognize that they are different.

  32. BobW said,

    December 24, 2020 @ 1:56 pm

    I like using "datums" just to be funny, and did not know it had any real use case.

  33. Kimball Kramer said,

    December 26, 2020 @ 8:34 pm

    I did an experiment (sample size: 4 people) to determine the singular of confetti and found it to be "a small piece of colored paper". Try it yourself.

