Millionth word story botched

« previous post | next post »

Paul JJ Payack, after all the run-up, has botched the story of the millionth word. The most amusing thing was that he forgot to write a script that would stop updating his headline when the millionth word was hit and exceeded, so at 11:30 a.m. in the UK he had this headline at his Global Language Monitor website:

The English Language WordClock: 1,000,001

0 words until the 1,000,000th Word

Oops! I think that should be minus one words, not zero words until the millionth!

The other thing he screwed up on was the fixing of the choice of word. He let his script decide — not a good idea when the whole point of the exercise is promotion and P.R. I'm not sure how his script works, but what it finally picked as the millionth "word" with at least 25,000 attestations on the web turned out to be: Web 2.0. Oops! First, that isn't a word, it's a phrase containing a noun (web) and a one of those stylish postpositive decimal numeric quantifiers; and second, it is boring boring boring. If phrases containing numbers are allowed, no wonder there are a million words. I was scheduled to go to the BBC Scotland studio and talk about this in a couple of hours, but when the people at the BBC World Service heard that the millionth word was Web 2.0, and that among the runners-up was the two-word Hindi exclamation jai hoo, they dumped the story and told me not to bother going over to the studio. Quite rightly. Payack should have hand-picked a more convincing and likable word.

In addition, he should tell us what his criterion is for including phrases on his list. Recent "words" added include cloud computing, carbon neutral, slow food, shovel ready, zombie banks, overseas contingency operations, and ("word" no. 1,000,001) financial tsunami. How could anybody, however scanty their linguistic general knowledge, think all these were words rather than phrases?


  1. Cecily said,

    June 10, 2009 @ 6:55 am

    "they dumped the story completely"

    Poetic justice. LOL.

    I notice the third word was equally surprising, albeit for a different reason: if his magical algorithm is any good at trawling web pages, surely "noob" would have been picked up long before now?

    [Absolutely right. Noob gets about 14,000,000 hits, and the variant spelling n00b gets 5,130,000 more. How could he not have picked up his 25,000 attestations before this? The whole project is so bafflingly stupid it boggles the very few parts of one's mind that remain to be boggled. —GKP]

  2. Cecily said,

    June 10, 2009 @ 7:08 am

    Correction: in the GLM list, "noob" falls into the same trap as "web 2.0" as it uses numerals rather than alphabetical characters in the middle. (It's not obvious in the font they use, but if you paste it anywhere else, you can see.)

  3. Ian Preston said,

    June 10, 2009 @ 7:12 am

    They covered the millionth word story on BBC Newsnight yesterday and not in a particularly credulous way. You can find it here – the millionth word section begins about 37 minutes in. The host, Jeremy Paxman, opened the section by broaching the possibility that the research "may just be a ridiculous racket got up by a bloke in Texas with a convenient bit of software", guest David Crystal referred to it as "the biggest load of chicken droppings I've heard in a long time" and Paxman asked Payack whether it was all "rubbish" and whether he had "no shame" pushing it. I also heard LL's own Ben Zimmer dismantling its credibility on the BBC Today programme this morning, as can be listened to here.

    On a tangential point, to the best of my understanding, their putative millionth "word" "Jai ho" means something like either "Let there be victory" or "Victory ho!". I find it difficult to see how it could mean "It is accomplished" as they claim.

  4. Michael Moncur said,

    June 10, 2009 @ 7:13 am

    Actually it was "n00b" that made #3. It is, according to the GLM website, "It is also the only mainstream English word that contains within itself two numerals."

    So apparently "p0rn" and "pr0n" and "pwn" and "1337" and "teh" and every other cute 'net misspelling count as words too!

    Also, apparently anticipating your criticism, the press release says that "Web 2.0" is the "1,000,000th English word or phrase," so now phrases count too.

    Oh, and "In addition, the 1,000,001st word is Financial Tsunami."

    There must be an alternate universe somewhere where this all makes sense.

    Kudos to the BBC for not treating this as news.

  5. Alan said,

    June 10, 2009 @ 7:14 am

    This cracks me up! I must update my blog after that last post that generated some interest and was quickly dashed as a load of rubbish even then.

  6. Mark Liberman said,

    June 10, 2009 @ 7:19 am

    This enterprise has an interesting mathematical aspect that's been mostly overlooked.

    If the criterion for wordhood is something like "25,000 uses in writing" (and even if "uses in text on the web" serves as a proxy), then the very long tail of old, rare words makes it fairly probable that the next string to achieve wordhood will not be a recent coinage, but rather an old string that's been poking along for a few hundred years, and pokes past the finish line because some random event (such as digitization or republication of some old works) gives it a little push.

    Depending on the rate of coinage and the statistics of the historical "long tail", it might turn out to be true that the "Nth word" (for any value of N) is much more likely to be old and rare than new and hot. (And nothing is changed if we choose to include, on some basis, strings with internal white space.)

    I made this point as a passing joke a few days ago ("End times at hand", 6/6/2009), by suggesting that the millionth word might turn out to be Simon Winchester's long-time favorite, mallemaroking. (I should confess here that I fabricated the quote from Paul Payack in the last paragraph of the excerpt from Winchester's article; and also the alleged OED citation for mallemaroking to a non-existent 1609 work by John Dee.)

    But jokes aside, the mathematical point is real. At some point, it would be fun to see what happens given plausible ideas of the scale of the overall text-generation process, the rate of new-string coinage, etc.

  7. Mark Liberman said,

    June 10, 2009 @ 7:28 am

    Following up on Geoff's point in an earlier comment, I'll add the obvious web-search counts for "Web 2.0":

    Google 95,800,000
    Yahoo 305,000,000
    Bing 2,060,000,000

    These counts are unreliable in detail, of course, but they'd have to be off by three to five orders of magnitude for Payack's result to make sense.

  8. Ginger Yellow said,

    June 10, 2009 @ 7:50 am

    Even if we were to accept that "Web 2.0" is a word, it's been around for years, which means Payack's own algorithm has undermined his credibility about as much as any criticism of him has.

  9. Spectre-7 said,

    June 10, 2009 @ 8:00 am

    So apparently "p0rn" and "pr0n" and "pwn" and "1337″ and "teh" and every other cute 'net misspelling count as words too!

    In at least some of those cases, I will attest to the variants taking on lives of their own beyond simple 'net misspellings. In particular, pwn has unique pronunciation(s), and is used in spoken conversation with a particular meaning related to but separate from own.

    While p0rn/pr0n and teh/t3h are just simple, comical replacements, I would argue that both pwn and 1337/leet are in fact new words. Of course, your results may vary. :)

  10. Grep Agni said,

    June 10, 2009 @ 8:48 am

    Speaking of words with embedded numerals, there is the movie title Se7en (pronounced "seven"). Also, I agree with Specter-7 that leet-speak has introduced new words to the English lexicon. I'd be surprised if leet becomes widely used, but pwn may well do. It could join cwm among Scrabble-players favorite words.

  11. John Cowan said,

    June 10, 2009 @ 9:20 am

    "Teh" has a unique pronunciation too: [tEh]. In fact, it is probably the only English word with final [h].

  12. Charles Gaulke said,

    June 10, 2009 @ 9:31 am

    Only english word with a final [h]? Bah, I can think of at least one more.

  13. JonW said,

    June 10, 2009 @ 9:39 am

    I would also make the case that “teh” is a word in its own right, with it’s own specific meaning rather than just a misspelling of the definite article.
    I have seen “teh” referred to as “the sarcastic article” and have heard it used in speech in this manner- I see no reason to declare that “teh” is not a word even if MS Word insists on autocorrecting it to “the” every time I type it.

  14. Jeff said,

    June 10, 2009 @ 10:32 am

    CNN fell for it:

  15. Andy Hollandbeck said,

    June 10, 2009 @ 10:42 am

    Perhaps Payack's definition of "word" is "dictionary entry." The meanings of cloud computing, zombie banks, and shovel ready certainly can't be deduced from the meanings of their constituent words (though carbon neutral is easy enough, and financial tsunami simply relies on a metaphorical natural disaster).

    Certainly, each of these terms/phrases would merit its own dictionary entry — just like ice cream and leisure suit. Web 2.0 certainly means something different than the standard use of a software name followed by a version number.

    (I'm not saying that Payack is right, just that the thinking makes some sense to me.)

  16. Sili said,

    June 10, 2009 @ 10:47 am

    But Web 2.0 is soooo last decade.

    Pity they cancelled the interview. Without the stupid nonstory there'd have been more time for you to inject something interesting into the news.

  17. Cecily said,

    June 10, 2009 @ 11:33 am

    So, should "soooo" have been picked up by the magical algorithm, and if so, with how many Os?

    sooo = 22 million Google hits (approx)
    soooo = 14 m
    sooooo = 7 m
    and so on.

  18. Iritscen said,

    June 10, 2009 @ 11:37 am

    I have to agree with Andy, it seems quite valid to consider multi-word expressions as "words". Even "Web 2.0", with those apparently vexatious numbers in it, would probably be considered potential fodder for a dictionary, so I fail to understand the issue here.

    That's not to defend the ridiculousness of the overall project, of course.

  19. Ken Brown said,

    June 10, 2009 @ 11:41 am

    I think I agree about "pwn". Its become a word. Though the pronounciation hasn't settled down yet. Some people seem to rhyme it with "own" others with "boon" and some try a Welsh-sounding vowel.

    My own teenage daughter asked me how to say it the other day – so it must already be falling out of coolth.

  20. Nik Berry said,

    June 10, 2009 @ 12:15 pm

    The Register did not fall for it

    "And before you lot start protesting at Global Language Monitor's millionth word count, we're fully aware that linguists of weight have dismissed the whole thing as a load of old cobblers"

  21. Jim Roberts said,

    June 10, 2009 @ 12:30 pm

    "Pwnt," and it's more common variant, "pwnd," are more typical of an in-game utterance. In those games where you still text rather than speak to your fellow players, it's simply shorter, and thus more popular. One variant of the term that more interesting is "pwnage," which is either the state of being pwned or the state of pwning another.

  22. JLR said,

    June 10, 2009 @ 12:33 pm

    I hope that 'pwn' doesn't end up rhyming with 'boon'. That would probably be too embarrassing to say in most contexts.

  23. Maurice said,

    June 10, 2009 @ 1:09 pm

    Quite apart from the spurious nature of this "word count," the Global Language Monitor is so poor, logically, linguistically, and metalinguistically, that I couldn't bear to read more than about a screenful. Besides the criticisms above:

    How can there be "finalists for the one millionth English word”? The use of an ordinal number means that the "new" "words" are in some order (presumably chronological, by either coinage, detection, or admission), in which there can only be one one-millionth item.

    Reportedly, "Jai Ho! and slumdog finished No. 2 and 4," yet they are listed as numbers 999,999 and 999,997 respectively. I can just see how 999,999th means 2nd if words can be imputed a desire to be 1,000,000th, but this is confused, at the very least.

    What a feeble definition, in the self-proclaimed "Newspaper of Global English," this is of carbon neutral: "One of the many phrases relating to the effort to stem Climate Change."

    N00b is reported to be "the only mainstream English word that contains within itself two numerals," yet the "millionth word" itself is Web 2.0.

    The punctuation, spacing, and capitalization of the text are sloppy, although the overuse of initial capitals, especially for quoted lemmas, is amusingly reminiscent of both the OED and uneducated complaining letters.

    "Just missing the top spot was n00b" puts the whole site on the academic level of a television programme showing the "top five holidays from hell."

  24. Spectre-7 said,

    June 10, 2009 @ 2:57 pm

    In my own personal experience, I have heard and spoken the word teh, but only ironically, usually in the set phrase teh internets. My own pronunciation is roughly teɪ (apologies if I've biffed the IPA here), similar I believe to the Irish pronunciation of tea, or the first name of Youtube sensation Tay Zonday. The h appears not to be pronounced.

    Since the word originated on and has been disseminated visually through the internet, though, I suspect that there's a large amount of regional variation in its pronunciation, much in line with the discussion of pwn above. I personally pronounce that word pown (poʊn, to rhyme with own), but have heard both poon (pun) and pawn (pɑn). Of course, since those pronunciations aren't in line with my own, they're quite clearly dumb and should be avoided. ;)

  25. Amber said,

    June 10, 2009 @ 3:54 pm

    JLR, you might be showing your age. As best I can tell, the embarrasing word fell out of general usage around the time I was born. Neither of my sons has ever heard of it, although one has a vague idea of the meaning of poontang.

  26. Daniel von Brighoff said,

    June 10, 2009 @ 4:39 pm

    Kudos to the BBC for not treating this as news.

    Except that they did: Having Paxman ridicule something still makes it seem more important than the thousands of things he didn't even see fit to mention on BBC2 last night.

    And since I haven't the sense not to weigh in on teh, I'll mention that I've heard ['tʰʌ(ʔ)] or ['tʰɛ(ʔ)] more often than /'tei/ and certainly nothing with final [h]. The glottal stop is only present before a vowel-initial word. (Nothing is *t'awesome; this isn't Yorkshire.). But I think the stress is always there to prevent confusion with a reduced form of to.

  27. marie-lucie said,

    June 10, 2009 @ 5:04 pm

    This whole preoccupation with the number of words in English brings us back to what is the definition of an English word: it seems to me that just about any string that shows up a number of times qualifies as an English word. In what sense is a Hindi exclamation a new "English word"? I can't be the only LL reader who has never encountered the "word".

  28. Lazar said,

    June 10, 2009 @ 5:37 pm

    I don't even understand his choice – even if we consider "Web 2.0" a word, the word has been in use for 10 years. How did it just enter the language today?

  29. Marco Neves said,

    June 10, 2009 @ 5:54 pm

    This has got first-page attention on Incredible!

  30. Dmajor said,

    June 10, 2009 @ 7:06 pm

    Oh,no! (Or, as the LOLcats say, "O NOZ!") Now PJJP is going to have to go back to waiting for the two-millionth word!

    (And really, wouldn't "LOLcats" have been a much more entertaining millionth word? It's ever so much more millionthy than "Web 2.0" dontcha think?)

    (If it catches on, maybe "millionthy" could be the two-millionth word! Get busy!)

  31. A Reader said,

    June 10, 2009 @ 8:59 pm

    Pwn is certainly passing into at least a somewhat broader usage from gamers. I'm a college student and while some of my friends are video-gamers I am not–but the word is pretty commonly used by all sorts. I've seen it used when discussing movie fight scenes, verbal sparring, and elections. So not just by video-gamers while gaming. Of course, this is still all usage by college students, and most of us are at least a little geeky. It will be interesting to see how the word fares as time goes on.

    [(myl) See "Own, pone, poon, pun, pwone, whatever", 8/31/2007, for some previous coverage. ]

    Daniel von Brighoff's second pronunciation of teh describes the usage I've most commonly heard (and used).

  32. James Kabala said,

    June 10, 2009 @ 10:36 pm

    Jeff and Marco: If you read the whole article, though, it is a pretty skeptical account. Of course, Daniel van Brighoff is probably right that it would be better to ignore the issue entirely.

  33. Bryan White said,

    June 11, 2009 @ 3:12 am

    To be fair, the AP Style Guide may be to blame for some of the word\phrase identity confusion. The New York Times, Reuters, USA Today, and other news sources have been printing carbon neutral and shovel ready as hyphenated compounds, which may explain why Mr. Payack's script has flagged certain phrases as words.

    That said, I rather doubt the AP's criterion for deciding when to hyphenate are linguistically sound in the first place. And even if they were, it wouldn't help us spot Payack's criterion, as slow food, cloud computing, and overseas contingency operations are printed without hyphens.

    Perhaps these observations are irrelevant given the way contemporary linguistics deals with hyphenated compounds. (Alas! I had only one course on grammar in college, and we didn't cover this area.) It makes me curious, though: does contemporary linguistic analysis tend to view hyphenated compounds more like words or phrases?

  34. Grep Agni said,

    June 11, 2009 @ 8:27 am

    @ Bryan White:

    I don't know about the AP style book, but the Grep Agni style book requires hyphenation of multi-word compounds when they modify an adjacent noun but not when they stand alone.

    1) This is a carbon-neutral project.

    2) This project is carbon neutral.

    The writers and editors at GA Corp find this easier to read than the alternatives.

  35. LOLbulbul said,

    June 11, 2009 @ 8:41 am

    (Or, as the LOLcats say, "O NOZ!")
    It's O NOEZ, actually.

    FWIW, native English speakers I know pronounce "teh" as [tʰɛh], possibly even [tʰɛħ].

  36. greg said,

    June 11, 2009 @ 8:45 am

    I'm wondering if "jai hoo" made the list because it was the name of a dance group on the television show So You Think You Can Dance the other day.

  37. ajay said,

    June 11, 2009 @ 9:25 am

    Also dah, verandah, wallah, nullah, ayah and lots of other Hobson-Jobsonish words.

  38. Spectre-7 said,

    June 11, 2009 @ 2:38 pm


    Unless I'm mistaken (which is all too possible), that wasn't the name of the dance group, but rather the name of the song they were dancing to. Jai Ho is the Academy award winning song from Slumdog Millionaire, which also helps explain the phrase's sudden popularity.

  39. Zarggg said,

    June 11, 2009 @ 9:05 pm

    teh – [tʰɛ]
    pwn – [puːn]

    At least, that's how I've always read them.

  40. Dan T. said,

    June 11, 2009 @ 10:13 pm

    English acquires its millionth word: “bollocks”

  41. Katherine said,

    June 12, 2009 @ 1:24 am

    And they mentioned this (not mentioning the ridiculousness of it all) in the New Zealand Herald in their Sideswipe column. I thought it would be pretty obvious how ridiculous this story is…

  42. eschatokyrios said,

    June 15, 2009 @ 4:33 am

    "teh", as I and all the gamers I know IRL say it, is pronounced [tʰɛ]. No final [h]. So "teh" is not unique in that respect. As far as I know however it *is* unique in being an English word with a lax vowel in an open word-final syllable. Spectre-7's pronounciation of [tʰeɪ] is odd to me; [tʰeɪ.ɪntɚwɛbz] for "teh interwebs" sounds manifestly wrong to me.

  43. Nicholas Clayton said,

    June 16, 2009 @ 7:41 am

    Anyone else seen 'Got Medieval' on the subject?

RSS feed for comments on this post