The Tiger Woods Index, one more time

« previous post | next post »

James Delingpole, "Climategate goes uber-viral, Gore flees leaving evil henchmen to defend crumbling citadel", The Telegraph, 12/4/2009:

Climategate is now huge. Way, way bigger than the Mainstream Media (MSM) is admitting it is – as Richard North demonstrates in this fascinating analysis. Using what he calls a Tiger Woods Index (TWI), he compares the amount of interest being shown by internet users (as shown by the number of general web pages on Google) and compares it with the number of news reports recorded. The ratio indicates what people are really interested in, as opposed to what the MSM thinks they ought to be interested in.

Richard North's idea is to take the ratio of Google web hits to Google News hits; he gets 22.5 million web hits vs. 46,025 news hits for Tiger Woods (a ratio of 489), and compares some other topics (some of them a bit UK-centric):

1. Climategate: 28,400,000 – 2,930 = 9693
2. Afghanistan: 143,000,000 – 154,145 = 928
3. Obama: 202,000,000 – 252,583 = 800
4. Tiger Woods: 22,500,000 – 46,025 = 489
5. Gordon Brown: 12,300,000 – 37,021 = 332
6. Climate change: 22,200,000 – 68,419 = 324
7. Sally Bercow: 25,000 – 86 = 290
8. David Cameron: 545,000 – 4837 = 113
9. Meredith Kercher: 261,000 – 3,471 = 75
10. Chilcot Inquiry: 125,000 – 4,350 = 29

And the National Review Online has a pie chart.

I'm glad to see this evidence of widening interest in empirical punditry. I'm a long-time enthusiast for web-search counts as a proxy for various cultural and political metrics — but I've also been warning for years that such metrics are tricky and not always reliable, and I always try to find confirmation from alternative approaches.

In this case, an obvious thing to try is Google Trends, which gives numbers not from web pages but from web searches. This is a more direct indication of interest levels in the general population, many more of whom search the web than write for it; and I believe that the counts are more or less veridical (at least for the N highest-ranked searches) rather than estimated by complex and fallible algorithms.

Here's the graphical display of four relevant searches over the past 30 days. (Click on the image for a larger version; look here for information about the meaning of the numbers.)

Google Trends also allows you to download the data in spreadsheet form, which is helpful because the overall numbers for the past 30 days, relative to the number of searches for climategate (climategate 1, "Tiger Woods" 33, Afghanistan 7, Obama 32) are a bit unfair to climategate, since it had non-zero (recorded) searches only for Nov. 23 to Dec. 3 (the last day for which the numbers are now available), while the other search terms had non-zero values for all the surveyed days.

So here are the numbers from downloading a Google Trends CSV file with fixed scaling, and pulling out just those 11 days:

climategate Tiger Woods Afghanistan Obama
11/23 2 4 12 40
11/24 4 4 12 64
11/25 4 4 10 282
11/26 2 4 10 118
11/27 4 166 8 66
11/28 4 196 8 50
11/29 6 184 10 44
11/30 6 200 14 44
12/01 6 160 18 52
12/02 6 302 32 88
12/03 6 322 20 50
Total 50 1546 154 898
Ratio to "climategate" 1 30.9 3.1 18.0
Ratio to "Tiger Woods" 0.03 1 0.10 0.58

So let's compare North's Google News counts (and current Google News counts, just for grins) to the Google Trends figures:

News (North) News (Now) Trends North/Trends Now/Trends
climategate 2,930 5,197 50 58.6 103.9
Tiger Woods 46,025 52,833 1,546 29.8 34.2
Afghanistan 154,145 166,063 154 1,001 1,078
Obama 252,583 288,210 898 281 321

If we re-run the "Tiger Woods Index" in terms the ratio of news stories to Google Trends numbers, we see that climategate is getting between two and three times more press than Tiger is, relative to public interest (58.6/29.8 = 1.966; 103.9/34.2 = 3.038).  Of course, Afghanistan racks up 1078/34.2 = 31.5 on the TWI, and Obama's at 321/34.2 = 9.39.

In order to make the number comparable to North's, we need to take the ratio of the proxy for level of public attention (here Google Trends numbers) to the proxy for MSM attention (here Google News numbers).  In order to get the numbers into the same general range, I'll re-scale North's index as the ratio of Google web hits in millions to Google News hits in thousands; and I'll scale my index as the ratio of fixed-scaling Google Trends sums to Google News hits in thousands:

Web (M) News (K) Trends sum North MYL
climategate 30 5.2 50 5.8 9.6
Tiger Woods 12.4 53.0 1546 0.2 29.2
Afghanistan 28.1 169.9 154 0.1 0.9
Obama 41.1 273.2 898 0.2 3.3

What does all that mean? My index says that major geopolitical events get a lot of press relative to the rate of web search; celebrity scandals, contrary to what you might think, not so much. And as for "climategate", it's kind of in between Tiger Woods and Obama, in terms of the ratio of web searches to news stories.  That seems about right to me.

Why do Mr. North's index numbers look so different? In the first place, for things that have been in the news for years (Tiger Woods, Afghanistan, Obama), the total number of web pages out there is not a very good index of the current level of short-term public interest. A term like "climategate", which didn't exist until a few days ago, is a different matter. And it's certainly impressive for it to have racked up tens of millions of web hits so quickly.

Caveat: Google's hit counts are extrapolated from samples by means of an algorithm that might over-estimate the total number of pages for a term that has increased very rapidly in the recent past. (Only 716 of the "30,000,000" hits are actually available in the index.) But if we take the counts at face value, then apparently there are a lot of people generating a lot of pages about climategate, but not all that many people trying to find out about it.


  1. David Eddyshaw said,

    December 6, 2009 @ 6:02 pm

    While it's always disturbing when scientists refuse to fully divulge their data and methods, on whatever pretext, James Delingpole seems typical of a lot of commentators in that he seems to be succumbing to wishful thinking over Climategate, and trying to make it bear more weight than it really can.

    Unfortunately it's all too easy to think of other examples (in the UK, anyhow) where government policy is based, at least partly, on proprietary data that we're not given access to.

  2. Dominik Lukes said,

    December 6, 2009 @ 6:09 pm

    Thanks for doing great analytical work, as usual. But perhaps I could put another spin on the concluding sentence. It shouldn't be surprising that not as many people try to find out about Climategate by doing a web search. Two reasons: 1. need for information and 2. demographics.

    1. In 2008, I never searched for Obama once because I already had all the information I needed from sources like Realpolitics or Memorandum in addition to non-web media. But I did search for a few minor celebrities because I had no idea who they were.

    2. This is a related issue. The population of people interested in the climategate skews towards those for whom either the web is not a primary source of the information or who follow the story through other channels.

    Now, it's entirely possible that neither of these factors actually plays any role in the numbers or that they are completely imaginary but perhaps they're worth considering.

  3. Alex said,

    December 6, 2009 @ 6:20 pm

    It's a pretty stupid thing for Delingpole to have said, although such stupidity is typical of his writing. I mean, do we take the (probable) gigantic ratio between Google web hits on "porn" versus Google news hits for same as evidence that the media isn't talking about porn as much as it should?

    But regardless of all this, the maxim we should always keep in mind, is that what's in the public interest isn't necessarily what interests the public.

  4. Alex said,

    December 6, 2009 @ 6:21 pm

    Oh, and Dominik is spot on.

  5. B.W. said,

    December 6, 2009 @ 7:41 pm

    I'm surprised no-one's mentioned so far that it's pretty unlikely for many people to be using the search term 'climategate'. It's a typical coinage that would be used by journalistic and blogging people, but if I wanted to find information and didn't know where to start looking, I would google other terms, depending on how much I had already heard about the topic. I might try 'stolen e-mails cru' or 'stolen e-mails jones' or 'stolen e-mails climate' or similar terms.

    I think comparing websites containing the word is more useful than comparing search terms. Everyone knows the names Barack Obama and Tiger Woods, and they're easy search terms. Same for Afghanistan (how much more basic can a search term get?) But climategate is definitely not a term I would ever use, not in conversation, and certainly not on google. It just sounds too tabloid.

  6. John Cowan said,

    December 6, 2009 @ 7:58 pm

    On Google hit counts: for "complex and fallible" read "simple, cheap, and stupid".

  7. fev said,

    December 6, 2009 @ 8:09 pm

    Nice attack on the operational definitions, but I have more problems with the validity of the concepts themselves. People aren't searching for "climategate" (and I have no doubt they're using the term) to satisfy informational needs that aren't met by the alleged mainstream media, which are covering Crashergate instead in lieu of shark attacks and missing toddlers. They're doing it to remind themselves that they're right about what they already know.

    Notwithstanding all that, James Delingpole is a wanker of epic proportion even by Torygraph standards, though I'll grant he's substantially funnier than most of the people at Fox.

  8. Ran Ari-Gur said,

    December 6, 2009 @ 8:19 pm

    One problem with your index is that Google News is not restricted to the MSM; I'm not sure exactly how it decides what to include, but some of the "climategate" hits are blog posts and other non-MSM Web sites.

    [(myl) The denominator (Google News counts) is the same in Mr. North's index and in the one that I compared to it — the difference is the numerator, which is intended (according to him) to be a proxy for level of the public's interest in a topic (to the extent that "topic" is determined by a search term), and for which I suggested substituting Google Trends numbers to see how stable the results are. I don't actually think that either index is especially enlightening — certainly, the fact that their results are so dramatically different is a clue that one or the other or both is deeply flawed as an indicator of media under- or over-coverage.

    You're right that Google News counts are a flawed proxy for MSM activity. But all the constituent parts in both indices have got problems, both in accuracy and in relevance.]

  9. Alex said,

    December 6, 2009 @ 9:27 pm

    Another way it's a flawed proxy is that each news article is (in the proxy) judged to be as prominent as any other article. It does not take into account the length of the articles, or how prominent they are on a website, nor (in the case of tv outlets) does it take into account how much time is spent discussing the issue.

  10. John Atkinson said,

    December 7, 2009 @ 2:41 am

    I agree with BW. Not only is "climategate" an unlikely term for someone to think of using if they wanted to find out about the emails hacked from CRU, but I'd be surprised if the word even occurs in more than a small fraction of the web pages seriously discussing the topic — as opposed to denialist blogs and tabloid newspapers.

  11. Graeme said,

    December 7, 2009 @ 2:54 am

    Tigergate is easier to picture, ergo it's televisually prominent. Being satiated by it on tv, why would people bother googling it?

    Ps – this is interesting, but aside from the suffix -gate, how is this language log material?

    [(myl) Attempted political inference from unreliable corpus linguisitics: how could we resist?]

  12. fev said,

    December 7, 2009 @ 11:39 am

    "I'd be surprised if the word even occurs in more than a small fraction of the web pages seriously discussing the topic — as opposed to denialist blogs and tabloid newspapers." … that's exactly the point. People who search for "climategate" aren't trying to find out about the e-mails or looking for serious discussions. They want to go to denialist blogs and tabloids (or the broadsheets and Web sites of the Murdoch empire) and find out that they were right all along about the Marxist Kenyan socialist Nazis who are trying to cram this unconstitutional takeover down their throats. (sorry if I left anything out there)

    I'm not a massive uses-and-grats fan, but this is a lovely example of how uses-and-grats works.

  13. Kenny Easwaran said,

    December 7, 2009 @ 7:10 pm

    Does the difference between his index and yours suggest that, regardless of whether the MSM is giving too much or too little attention to "climategate", the blogosphere is giving it too much attention?

  14. Trond Engen said,

    December 7, 2009 @ 7:40 pm

    We need a term for what Mr. Delingpole has been doing. May I suggest carpus linguistics?

    You may add your own images.

  15. lucia said,

    December 8, 2009 @ 1:16 am

    John Atkinson–
    Climategate is being used in newspaper articles. Go to google news, and search "cru hack", "climategate", "climate-gate" and "cru email". Click to narrow down to article one day old. Right now, both "climate-gate" and "climategate" show more results than the other two and many of the hits are mainstream media with names like "Time", "CBS", "USA Today".

    Of course, google isn't perfect. It also may well be that the name first sprung up at skeptic blogs, but that doesn't mean the main stream media isn't using it now.

RSS feed for comments on this post