Here is one of the saddest facts about language and culture that I have noticed in quite a while: the search pattern "before turning * gun on himself" gets tens or even hundreds of thousands of hits on Google.

Most grammatical six-word sequences are very rare. That is one of the things that made Kaavya Viswanathan's plagiarism so easy to spot. For a six-word phrase to be moderately common it has to be a fixed phrase of some kind, such as a cliché, an idiom, or a proverb. I made up the six-word phrase "before anyone had even noticed it" literally at random, and Googled it, and got only a single hit in all of the web's trillion words. But shooting rampages by suicidal maniacs have become so common (there was another one in Alabama yesterday) that "before turning the gun on himself" and "before turning his gun on himself" have become commonly encountered clichés in news sources.

The reason I don't give actual numbers is that it is hard to establish accurately
how many times a phrase occurs, using Google. (A commenter who signs himself Forrest below explains why.)

Through a weird and horrible coincidence, while I was writing this post (which I have entirely rewritten because it was so badly misunderstood in its first draft), unknown to me, a new shooting rampage was occurring in Winnenden, near Stuttgart, in Germany. The gunman must have been committing suicide just about the time I first posted. And sure enough, in the Daily Telegraph account of it the phrase "before turning the gun on himself" turns up once again (in a reference to an earlier incident in 2002). The BBC News website uses the phrase yet again, referring to a 2006 incident in Germany. The ITV News site uses the phrase of the Winnenden massacre.

Most of the comments that piled up below the original draft of this post simply cited phrases, or lists of phrases, that get huge numbers of hits. This was disappointing, not only because the result is boring, and didn't quite connect with my point, but also because it diluted the pool, so the three or four really interesting comments were hard to find. I have done a ruthless cull, and deleted all the ones that merely cited a phrase and said how many hits it got. (I probably should have deleted more. See the Language Log Comments Policy. It is not incumbent upon us to host in our comments area any random observation that floats across your mind.)

It is not surprising that phrases with high hit counts can be found if you go for formulaic ways of saying things that are often said (like "Email this page to a friend"). There are so many six-word phrases in English that there is room for thousands of them to have become clichés or otherwise familiar expressions. One commenter cited "of the United States of America", for example (this is really two or three lexical items rather than six: it has the same syntax as "of the USA", or "of France") — an even more deeply boring example than the many others. But even if there are thousands of high-hit-count six-word phrases, it can still be the case (and almost certainly is) that most six-word phrases are very rare.

Anyway, my main observation here is not about the rarity of arbitrary phrases. It is about the sad fact of this one having been called upon so often, as school massacres go on and on racking up their victim tallies.


  1. DonBoy said,

    March 11, 2009 @ 11:22 am

    I present to you a quote from Dan Conner on TV's Roseanne:

    What a beautiful day – the kind that starts with a hearty breakfast and ends with a newsreader saying "…before turning the gun on himself…"

    (Roseanne's response: "What, so now I'm supposed to make you breakfast?")

  2. Mark Liberman said,

    March 11, 2009 @ 11:45 am

    For five-word phrases, a version of the question "What five-word phrase occurs most often on Google?" can definitively be answered by reference to the Web 1T 5-gram corpus, created by researchers at Google, which contains English n-gram counts from about one trillion words of web text.

  3. Randy Alexander said,

    March 11, 2009 @ 12:39 pm

    One day I went to a public bathhouse. Usually, being the only white guy there (here in northeast China), I tend to attract conversation. Soaking there in the meltingly hot pool, my supernal relaxation was interrupted by "I hear that if one memorizes 300 sentences, one can carry on all sort and manner of conversations. Is that true?"

    I tried to sink down slowly and drown myself, but the asker's look of sincerity pulled me back up. I patiently explained that no, that's not true because you can never tell what your interlocutor will say. If it's not one of those 300 sentences that you memorized, then you're shit out o' luck!

    Most of the sentences that we come up with every day are unique. One can just google them to see the truth in this.

    In language learning this is a very important concept. Memorizing sentences is almost completely useless. One should understand the grammatical constructions and some basic vocabulary. With 25 one-word subjects and 25 intransitive verbs, one can make 625 unique sentences. Add 25 one-word objects (these can be the same as the subjects so no new words need be memorized) and you're up to 15,625. Add determiners, etc, and you're through the roof.

    In a foreign language, you can memorize how to ask where the nearest bathroom is, but will you be able to understand the answer? (Maybe if it involves pointing.) But learn a little grammar and a little vocabulary and you're really off to a big start.

  4. Benjamin Zimmer said,

    March 11, 2009 @ 1:00 pm

    Julian Orbach, on his now defunct somethinkodd.com page, wrote about commonly occurring N-grams on Google. He called them "PopSents" (since each N-gram had to form a sentence). Details here.

  5. Maria said,

    March 11, 2009 @ 1:23 pm

    Randy, that's an interesting story. I'm a nonnative speaker, and while I've never gone ahead and memorized whole phrases, I sometimes feel that polite, day-to-day, short conversations – especially with other nonnative speakers – have a stilted feel to them. Like we're speaking right out of an ESL textbook.This is not true, almost tautologically, about meaningful conversations.

    I wonder if it's because we're nonnative speakers, and so we've somehow learned stock phrases, if it's in my head, or if it's a more widespread feeling.

    By the way: "I am not a native speaker" gets 1,040,000 hits.

  6. Ron said,

    March 11, 2009 @ 2:15 pm

    In language learning … [m]emorizing sentences is almost completely useless. One should understand the grammatical constructions and some basic vocabulary. With 25 one-word subjects and 25 intransitive verbs, one can make 625 unique sentences. Add 25 one-word objects (these can be the same as the subjects so no new words need be memorized) and you're up to 15,625.

    Exactly! I never understood the math before, but I have used just that strategy to "pick up" basic conversational levels of several languages. Not, sadly, Japanese, which I had to study intensively just to speak it poorly.

  7. Zach said,

    March 11, 2009 @ 2:58 pm

    David Schmader of the Stranger has over the years made a motto of "If you are planning a murder/suicide, please do the suicide part first." It's sad that he's had so many occasions to give that advice, and that the final 6 words haven't made more of an impression on their target audience.

  8. wally said,

    March 11, 2009 @ 3:06 pm

    In the flogging a dead horse department, I must say, I really, really don't
    get this about Kaavya Viswanathan's supposed plagiarism. I didn't get it
    it the first time, and it bothered me, and I don't get it now.
    Maybe I am just not clear on what is plagiarism.

    Looking at the wikipedia page, there are obviously a fair number of similar turns of phase. (No one I think is claiming any plagiarism on a higher level such as ideas or plot). And clearly, as she said, she had read the other books and they had made a strong impression on her. And as is the case of early works of many creative people – Bach, Picasso, Bob Dylan – who clearly were studying and influenced by previous masters, the works they were studying come shining thru.

    But am I really supposed to believe that in the borrowed phrase "department stores, and 170 specialty shops later" the number 170 is so magically right that she went back and checked the book and copied it exactly? Or do I believe that she liked the phrase when she read it, so it stuck in her head. The full phrase. And an hour later, something a friend said stuck in her head. And 15 minutes later an ad she heard made an impression. And when she was writing, all of these came out, all mixed together.

    If I write a scientific paper that starts "Here is one of the saddest facts about computational complexity that I have noticed for quite a while." am I plagiarizing Language Log? Really? I don't get it, but I suppose I will be enlightened soon.

    [I doubt that I can convince you, wally, but yes, you really should believe that "170 specialty stores later" was copied directly out of another novel. I cite other facts in my post. If you continue to believe it was some mysterious subconscious sticking-in-the-mind phenomenon, with no dishonesty involved, I'd say you're very naive. Only one thing might save Kaavya Viswanathan's reputation, and this has rarely been mentioned in the press: it is possible that the people working for the agency that promoted her wrote the novel for her too, and they did the plagiarizing, only she wasn't allowed to (or chose not to) rat on them. We will probably never know. —GKP]

  9. Stephen Jones said,

    March 11, 2009 @ 3:31 pm

    Actually, Randy, memorizing long phrases is incredibly useful if one has the grammatical knowledge to be able to make substitutions.

  10. Forrest said,

    March 11, 2009 @ 4:31 pm

    The reason the numbers change ( sometimes wildly ) every time you run a query through Google isn't the obvious answer. New sites showing up on the web using a given phrase, and old ones being taken down, probably don't account for much variation on any given phrases.

    Google maintains a gargantuan database, and want the fastest access possible, no matter where a user sends in a query from. They use a distributed system, and have lots of data centers with more-or-less duplicate copies of the whole shebang. But they're not perfect copies; data is pushed around, and can take a while to reflect in any given cluster of servers. Your query goes to the one that's least busy ( probably taking other factors into account ), which may not be the same one you talked to last time. If not, they're almost certainly out of date compared to each other.

  11. Tamara said,

    March 11, 2009 @ 6:06 pm

    Actually, memorizing sentences in foreign languages has its uses. When I was a Peace Corps volunteer in Nepal, I would frequently be asked a question and struggle to answer it. My questioner would sometimes take pity on me and supply the word or phrase they thought I was looking for. I was careful to remember what they said for the next time that question came my way (when you are a foreigner in a small town in Nepal, you get asked the same damn thing 10 times a day). By the time I was asked for the sixth or eighth time, I could give a long and fluent answer with perfect grammar.

    Of course, I literally did not know exactly what I was saying, I just knew that the Nepalis seemed to be satisfied with the answer and asked follow-up questions that made sense. (Except, of course, when they didn't, and I realized I was not saying what I meant to be saying. Then I was back to struggling to express myself.)

    Point is, there is a place for memorization.

  12. Simon Musgrave said,

    March 11, 2009 @ 8:04 pm

    @randy: With 25 one-word subjects and 25 intransitive verbs, one can make 625 unique sentences. Add 25 one-word objects (these can be the same as the subjects so no new words need be memorized) and you're up to 15,625. Add determiners, etc, and you're through the roof.

    I hate to be picky, but if you stick to your intransitive verbs, you're up yo 15,625 sentences of which 15,000 are ungrammatical.

  13. pavel said,

    March 11, 2009 @ 8:49 pm

    To reply to wally:
    "If I write a scientific paper that starts "Here is one of the saddest facts about computational complexity that I have noticed for quite a while." am I plagiarizing Language Log? Really?"

    A key difference between using very similar locutions in a scientific publication and in a literary one is that the style in a scientific article is clearly subsidiary to the message; in science, the key act of creation is the idea/finding rather than the way it's expressed. In literary works, on the other hand, the style is as important as the 'message' (the plot, etc.) – if not more so. Isn't it recognized that the construction of the prose is a key aspect of the writer's art?

    So, if Kaavya Viswanathan had read a particular turn of phrase in another work and it was so compelling it stuck in her head, then the original writer deserves the credit for her/his writerly craftsmanship. Subsequently using it as one's own would seem like plagiarism.

    In fact, even in scientific publications, I have sometimes found myself quoting a passage directly (and attributing of course), not because I couldn't paraphrase the content, but because the original had been phrased in a particularly lucid or concise way, and it would seem dishonest not to indicate that the wording was not original to me.

    As for creative influence in artistic works, yes, it is perfectly fine to mimic the style of another (though pastiche is often derided), but a fine line is tread. When students of classical music learn to write in the style of Bach, this essentially means his harmonic and contrapuntal style. Somehow, copying a melody from Bach (where it's not obviously a work of homage, or postmodern) seems dishonest.

  14. Neal Whitman said,

    March 11, 2009 @ 10:00 pm

    I got curious about the phrase before turning the gun on himself a few years ago; specifically in whether the R-inference that the shooter then pulled the trigger and didn't miss always turned out to be true.

  15. Peter Howard said,

    March 12, 2009 @ 8:18 am

    While not denying that it's sad that school massacres happen, I wonder how much the frequency of hits returned by the search pattern "before turning * gun on himself" tells us about the frequency of such massacres. School massacres are always very widely reported, so a single event will generate a lot of news articles. And, perhaps because it's ironic and slightly euphemistic, the phrase is very likely to be used when reporting an incident where someone shoots someone else and then themselves.

    Interestingly, (well, I found it interesting and provoking, and I'm not sure what to make of it) when I entered the pattern "before turning * gun on herself", Google asked me if I meant 'himself' rather than 'herself' before returning a slightly higher number of hits.

  16. bianca steele said,

    March 12, 2009 @ 9:48 am

    Something about the Visnawathan story, I'm not sure what, made me think of the guides to help undergraduates avoid plagiarism that say something along the lines of, "if you couldn't have come up with the idea yourself but rather got the idea from something you read, you must cite the book in which that idea originally appeared." Wally's comment strengthens this idea in my mind.

  17. Jonathan Lundell said,

    March 12, 2009 @ 10:14 am

    On taking Ghits with a grain of salt: a spot check just now yields these rather contradictory results.

    249,000: "before turning the gun on"
    55,300: "before turning the gun on himself"
    57,600: "before turning the gun on herself"
    856: "before turning the gun on" -"before turning the gun on himself"

    FWIW, the last search returned mainly the "herself" variation.

    "Before shooting/killing himself" yield high counts as well, but don't carry the implication of shooting someone else first quite so strongly as "before turning the gun…", which may give the "turning" version a push in frequency. If you were writing a news story on a short deadline, how would you rephrase it to still do the work of the six-word cliche?

  18. davidn said,

    March 12, 2009 @ 10:24 am

    Forrest's explanation of varying Google hit counts is essentially correct, but I don't think it's complete. It's true that the data is constantly changing and queries are directed to different clusters, but just based on that, the variation in hits should be proportional to the actual variation on the web, that is, fairly small, and that's not always true.

    The more interesting part is that looking up the documents that match a given query involves many different algorithms interacting at different levels, and they're optimized for producing the best set of top results, not the most accurate count of all results. Because of all the layers involved, the only way to get an accurate count of matching documents would be to list them all, which is too computationally expensive. So it basically guesses at a count based on some of the information it gathered while computing the best results. And because of various thresholds at intermediate points in the process, the guess depends on the contents of the index in a nonlinear way.

  19. Peter Howard said,

    March 12, 2009 @ 11:16 am

    I tried traipsing through the results, using the 'herself' variant, and the number of expected results dropped dramatically. Google was saying 1860 by the time it showed me result 520, and wouldn't go any further.

    And it was reporting things like 'before eventually turning the gun on herself' as hits, which don't match the search schema.

    So I don't think Google is actually much good for reporting the number of matches – maybe it would be less misleading if it adopted a 'one, two, many, lots' counting scheme.

  20. Kevin Iga said,

    March 12, 2009 @ 12:03 pm

    Pavel brings up a good point about the fact that scientific writing has more rigid expectations as to style and turns of phrase than narrative. Could this be true of journalistic writing, too? Could it be that newspapers are happy to stick to an already common phrase that has a "journalistic ring" to it because of its frequency, rather than try something new which doesn't have the backing of tradition?

    Aside from this, just how many ways are there to express the notion that the killer committed suicide, which:
    1. Express the fact that the same gun was used by the killer to kill himself (as opposed to the killer jumping off a bridge when pursued)
    2. Are fairly concise (within a few words of the 6 Pullum cites)
    3. Are not too gruesome
    4. Are not too repetitious
    5. Are not too awkward?
    I have a hard time coming up with many alternatives.

    I get the sense that these events are fairly rare (of course, one is one too many) in that every instance is reported in the news, usually even worldwide news. Compared to, say, homicides due to gang rivalries, domestic abuse, or botched burglaries. And because such shooting/suicide stories are rare, they are sensational, and as a result, a turn of phrase can quickly spread (often a Reuters story is printed with minor edits in a large number of newspapers). It then becomes a handy phrase the next time it occurs.

    In other words, perhaps the commonness of the google hits is a reflection of the rarity of the event, rather than its frequency.

  21. Julia Kriz said,

    March 12, 2009 @ 1:30 pm

    It's been said in more detail by Peter and Kevin above, but my immediate thought was that the fewer common phrases there are – i.e., the more cliched the most common phrase is to describe something – the rarer it is (though see Language Log articles about having "no words" for things).

    For example, the uncommon phrase "before anyone had even noticed it" could apply to situations that happen hundreds of times per day. Because these situations are so common, they're not noteworthy, so we don't describe them often, or we find diverse ways to describe them that keep the focus on other parts of what we're saying.

    But the cliched phrase "before turning the gun on himself" applies only to very rare situations, which makes them noteworthy. Combine that with the journalistic tendency to coin and reuse these kind of soundbites, and you get a common phrase.

  22. Ginger Yellow said,

    March 12, 2009 @ 2:04 pm

    I suspect that part of the reason for the phrase to turn up so frequently, beyond the fact that journalism is full of phrases that wouldn't be used in normal language, is that murder suicides are particularly shocking events and using a formula like this is an easy way to soften the blow for readers without being censorious.

  23. Simon Cauchi said,

    March 12, 2009 @ 2:37 pm

    Pavel writes: "As for creative influence in artistic works, yes, it is perfectly fine to mimic the style of another (though pastiche is often derided), but a fine line is tread."
    That last word should be "trod" or "trodden". The verb goes "tread" (plain form), "trod" (past tense), and "trod" or "trodden" (past participle), unlike the corresponding "spread", "spread", and "spread". I remember making similar mistakes myself when trying to speak German.
    Gerald Manley Hopkins once wrote a (rather bad) line of poetry which goes "Generations have trod, have trod, have trod". (I think I'm right, but I'm relying on fallible memory here.)

  24. Craig Russell said,

    March 12, 2009 @ 4:17 pm

    @ Simon Cauchi

    OED lists "tread" as a potential past participle, e.g.:

    1687 A. LOVELL tr. Thevenot's Trav. II. 86 Being trampled and tread upon.

  25. Simon Cauchi said,

    March 12, 2009 @ 5:05 pm

    Er, obsolete past participle, I think. In the OED "tread" the past participle is attested only for the seventeenth century. But I suppose it can't be absolutely obsolete if Pavel just used it!

  26. Merri said,

    March 13, 2009 @ 5:34 am

    One should note that memorizing sentence patterns and substituting words works well in some languages, and less well in others, where the substitution will affect non-substituted words. Perhaps that's why it seems to be difficult in Japanese.
    When the same determinant doesn't have the same form according to what it refers (as in Bantu languages), substitution is more difficult.
    Or when the same idea is expressed in different ways according to what it refers. Think of set names for animals in English.
    Classic Greek, for example, creates many such problems. OTOH, you wouldn't be anxious to speak it fluently.

  27. rootlesscosmo said,

    March 13, 2009 @ 9:52 pm

    Could this be true of journalistic writing, too? Could it be that newspapers are happy to stick to an already common phrase that has a "journalistic ring" to it because of its frequency, rather than try something new which doesn't have the backing of tradition?

    @Kevin Iga: this goes to the question of what lawyers call "boilerplate"–more or less fungible sentences that need only a few specifying details to serve their function. In some contexts, including journalism and science writing, recourse to boilerplate may actually help persuade editors and referees that the work meets professional standards; if a paper says "the animals were sacrificed" it's science, if it says "I killed the mice" it's macabre, though the event is the same.

  28. FeRD_NYC said,

    July 25, 2010 @ 10:57 am

    There's a utility to "boilerplate" and common turns of phrase beyond even what rootlesscosmo suggests — they can also be helpful to the reader. An example:

    A few years back, one particular bank in my area attempted to "humanize" their ATMs. In some misguided bid to make the interface more "friendly", the standard prompts and options were rewritten to use less stilted language and casual phrasing. Messages to the user were frequently written in first-person form ("What can I help you with?", "Please tell me your secret code"), and even the "Yes" and "No" selection buttons became "Sure" and "No, Thanks".

    I doubt that software is still in place, but I wouldn't know — it drove me away from using those ATMs almost immediately. The simplest transaction took me at least twice as long on their machines, and the number of times I chose too quickly and accidentally hit the wrong option became intolerable. It was quicker and less aggravating to walk an extra block to a different bank, than to face their "friendly" interface.

    We expect to see "Yes" and "No", when we're making choices at an ATM. They don't use those particular words because they're thrilling and clever turns of phrase, but because they're not! They're the most common, boring representations of their respective concepts. We're immediately able to match those words to the choices they represent, and we don't even need to consciously read the labels. But replace "Yes" with "Sure", and you're forced to read the words, then actively think about their meanings and determine which one represents your intended response, before you can press the correct button.

    Cliches and "standard" phrasings serve a similar purpose, in purely utilitarian communication. I can recognize and derive meaning from the words "before turning the gun on himself" without even reading the entire phrase, because it's immediately familiar to me. Which, in turn, means that I can read an article written using such phrasing far more quickly, and without devoting nearly as much concentration to the act, as when I read something crafted using unique, elegantly-formed, interesting sentences.

    Interesting, unique writing has its place, and its appeal. But to facilitate the simplest and most efficient transfer of knowledge/information, so does its opposite.

