Suggestibility

« previous post | next post »

Google Suggest is an fun new tool for probing the textual Zeitgeist. Using it on "Language Log" yields:

Bare "Language Log" gets 36,900,000 results (as we can see by getting suggested continuations for "language lo", though I'll spare you the picture). It's clear that lots of regular readers use Google to find us, rather than typing in the URL or using a browser bookmark or an RSS feed.

The Google Suggest FAQ explains:

Our algorithms use a wide range of information to predict the queries users are most likely to want to see. For example, Google Suggest uses data about the overall popularity of various searches to help rank the refinements it offers. An example of this type of popularity information can be found in the Google Zeitgeist. Google Suggest does not base its suggestions on your personal search history.

Exactly how the "results" number is related to query-log counts is not clear — is it a cumulative number, or a number from a recent period, or some kind of weighted running sum, or a projection of some more complex kind that also involves click-through rates, number of results, etc.?  Whatever the meaning of the suggestfulness (suggestionality?) numbers, a test for continuations of "linguistic" and "psycholog" shows that in this dimension, psychology is only about 4.3 times more popular than linguistics — 105 million to 24.3 million. Could we be catching up?

Alas, this same metric suggests that Language Log (36,900,000 results) is 6.2 times more popular than Psychology Today (5,920,000 results), which is absurd. (Though this doubtless gives a fair picture of web search counts, it leaves out the crucial Supermarket Checkout Line factor, where the psychologists are still winning by default.)

Aaron Davies, in an email under the subject line "Snowclones are the new Breakfast Experiment",  has drawn my attention to the fact that this method is not just for doing misleading market research — it can also be used to probe for popular phrasal templates.



14 Comments

  1. Flooey said,

    January 10, 2009 @ 10:25 am

    I think the results number is unrelated to query log counts and the like. The ordering of the suggestions is based on things like query logs, but I think the listed number of results is just approximately how many search results there are for that search. (The listed number differs from the number given if you actually make the search, but I expect that has to do with either different methods of estimation, one of the numbers being older than the other, or just variance in their estimation technique.)

    [(myl) Do you have any evidential basis for this thought? It seems to contradict what the FAQ says, as quoted above: "For example, Google Suggest uses data about the overall popularity of various searches to help rank the refinements it offers. An example of this type of popularity information can be found in the Google Zeitgeist." It's clear that the Google Zeitgeist numbers come from query logs, not from page counts.]

  2. Mark P said,

    January 10, 2009 @ 11:26 am

    I have seen computer geeks complaining about the fact that so many people use Google to find (or return to?) sites instead of typing in the URL or using a bookmark.

  3. Laura Kalin said,

    January 10, 2009 @ 11:39 am

    Flooey is right – a quick google search on "google suggest results" returned this quote from google:

    "The green number next to each suggested query represents the approximate number of results that would return if you select the query."

    http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=106230

    [(myl) Interesting — that makes sense of the word "results", though it seems to contradict the explanation of Google Suggest offered in other contexts. Maybe the list of completions to suggest is derived from query logs, while the associated number is derived from hit counts? ]

  4. John Cowan said,

    January 10, 2009 @ 11:49 am

    Firefox 3.x users can get this same list by typing into the search bar (provided it is set to search Google, as is the default) in the upper right, but sans numbers.

  5. bianca steele said,

    January 10, 2009 @ 12:05 pm

    Along the lines of what Mark P said,
    I've seen political trolls complain that when you click on a web link, the person who placed it there is able to gather information about who and where you are, and to gather information about you such as whether you are likely to click on the kind of link they placed there — a person concerned about that might use the Google cache so that the owner of the web site will never know they were looking. You might say it seems paranoid but no one said the Internet lacks more than its share of paranoids.

    It also might indicate a large number of people who can't remember the name and exact spelling of the site they're looking for, and are too lazy to set up their bookmarks, or to relearn how to set up a bookmark every time they switch browsers or operating systems.

  6. linda seebach said,

    January 10, 2009 @ 12:48 pm

    If the NYTimes crossword has a clue for which Google might find the answer, the clue is often right at the top of the list of suggested searches.

  7. blahedo said,

    January 10, 2009 @ 3:36 pm

    Youtube has been doing something similar for some time in its main search bar; I've definitely had an unsettling feeling when I was typing in a query I thought was unique or at least unusual in its combination of terms, but youtube's first suggestion included the full search before I'd even finished the first part of it. Given that youtube is part of the Google hegemony, I assume that the underlying algorithms are also related, if not identical.

  8. Laura Kalin said,

    January 10, 2009 @ 4:27 pm

    @myl – I think that's exactly it – the list of completions comes from the frequency of the query, and is not at all related to the number of "results" each would return when searched for. Also, it wouldn't logically make sense for the query frequency and "results" number to be related to each other, because the ordering of the "suggestions" doesn't follow the rank order of the number of "results". The rank order probably goes from most frequently queried at the top of the list to less frequently at the bottom of the list. (At least that's what I've always assumed it does…)

  9. Mark F. said,

    January 10, 2009 @ 9:49 pm

    I have seen people Google their own name to get to their own home page. And I didn't think they were crazy.

  10. Mark A. Mandel said,

    January 10, 2009 @ 10:04 pm

    Does web history affect it? I see this at the top of my search:

    Personalized based on your web history. More details
    Results 1 – 10 of about 34,100,000 for language log. (0.09 seconds)

    The link leads to:

    When possible, Google will customize your search results based on location and/or recent search activity. Additionally, when you're signed in to your Google Account, you may see even more relevant, useful results based on your web history.

    I get the same results "without these improvements", but YMMV.

  11. Aaron Davies said,

    January 11, 2009 @ 8:48 am

    @bianca steele: most websites will indeed be able to get information about the website where you found their link. it's in the "referer" [sic] header of the http request. of course, the simple way to avoid that is to right-click the link, copy it, and paste it into the location bar, in which case there will be no referer header to check.

  12. bianca steele said,

    January 11, 2009 @ 9:46 pm

    Aaron,
    Yes, you're right. An HTTP server can determine the URL of the page from which a user clicked if the user's software participates in that part of the protocol. There are other pieces of information the server can acquire and there are ways of avoiding revealing those too. But the idea that people might prefer Google to know their browsing habits, rather than revealing information to individuals and firms whose sites they click on, seems a stretch, doesn't it?

    Language Log does have an unusually hard to remember address, though.

  13. Cath the Canberra Cook said,

    January 11, 2009 @ 10:14 pm

    I confess: I have used google to get not only to language log, but also to my blog. It's less typing.

    I very rarely use bookmarks for frequently visited sites. I have a one-click toolbar button for my blog, and my browser does autocompletion. That only works if I'm on my own computer. I only bother with bookmarks if I think a page will be hard to find again.

  14. Aaron Davies said,

    January 12, 2009 @ 2:05 am

    languagelog.org works fine

RSS feed for comments on this post