More Metadata Muddles on Google Books

« previous post | next post »

Mark's discovery of a mistitled Google Books entry—a book on experimental theater filed as a 2009 book on management—is entertaining but not that unusual. Like the other metadata mixups at Google books (involving authorship, genre classification and publication date, among other things) that I enumerated in a 2009 post "Google Books: A Metadata Train Wreck," there are probably thousands of cases in which the metadata for one book is associated with an entirely different work. Or at least that's what induction suggests; Paul Duguid and I have happened on quite a number of these, some as inadvertantly comical as Mark's example. Clicking on the entry for a book called Tudor Historical Thought turns up the text of a book on tattoo culture, the entry for an 1832 work on the question of whether the clergy of the Church of England can receive tithes turns up a work by Trotzky, the entry for Last Year at Marienbad turns up the text of Sam Pickering's Letters to a Teacher, and so on (see more examples below the fold). What's particularly interesting about Mark's example, though, is that the work is similarly misidentified on Amazon and Abe Books, which indicates that for many modern titles, at least, the error is likely due not to "some (perhaps algorithmic) drudge on the Google assembly line," as Mark suggests, but to one of the third-party offshore cataloguers on which Google and others rely for their metadata.



  1. Karen said,

    February 21, 2014 @ 7:43 pm

    In these metadata mixups the correct and incorrect books both start with the same letter (or at least with a likely letter for them to be alphabetized under, and presuming the tattoo book's title begins with "tattoo" etc). Is that typical of these problems? Is it significant?

  2. Ray Dillinger said,

    February 21, 2014 @ 9:08 pm

    I wouldn't look as far as a third-party metadata service until I were satisfied that it isn't a first-party (plagiarised metadata) problem.

  3. languagehat said,

    February 22, 2014 @ 10:48 am

    I too have run across a fair number of these. Google doesn't make it all that easy to alert them, either.

  4. Emily said,

    February 22, 2014 @ 8:53 pm

    The best/worst Google Books error I've seen was a book on theology whose preview contained a large chunk of material from an advanced syntax textbook. Searching for particular (ir)relevant terms still turns up the book, but the syntax pages seem to have been removed for the actual preview:

  5. languagehat said,

    February 23, 2014 @ 10:28 am

    I'm surprised this post isn't getting more comments. Have people simply given up and accepted the situation? It's been almost five years since Geoff's previous post, and not much has changed (including the lousy interface for reporting errors). Back then I wrote, in response to Jon Orwant's long comment,

    I can't begin to tell you how much good it does me to hear you say "Yes, there's far more bad data than there should be and much of it is our fault, we appreciate your criticism and are taking it into account, and here's how." For the first time I feel that the people in charge there are taking the problem seriously and really doing something about it. So don't rue the time you spent crafting that comment — it was time well spent.

    I'm afraid my good feelings have crashed and burned in the intervening years. Perhaps Jon, or someone with an equivalent position and desire to inform, could drop by and report on developments and the likelihood of improvement in our lifetime?

  6. Emily said,

    February 23, 2014 @ 3:14 pm

    Re error reporting: Over a year ago I found this out of copyright book while doing a research project, and found that almost every other page was so blurry as to be unreadable. So I found and used their error submission form, but as you can see, much of the book is still blurry.

  7. Tom Freeman said,

    February 23, 2014 @ 6:04 pm

    I don't for a minute believe that the AmE spellings of 'color', 'center', 'defense' and 'aluminum' are really 40-60% as common as the BrE spellings in BrE books, as the Ngram suggests.

  8. Daisy said,

    February 24, 2014 @ 8:27 pm

    I actually look at Tudor Historical Thought quite a lot since it's important to what I'm working on (perhaps once a week, certainly in the last week), and I've never had that error, so presumably and don't share the same data on the same books? Strange if that's the case.

    [GN: That's not likely, and in fact Tudor Historical Thought seems to come up appropriately on the American side as well now. What happened originally was that the text on tattoo culture came up in response to a query on asshole, but was associated with the entry for Tudor Historical Thought. It no longer comes up when I query on strings from the text it originally turned up (such as "As Jill noted enthusiasts consider") , nor does anything else for that matter so some corrections must have been made, with the text on tattoos somehow no longer present in the database.]

  9. Garrett Wollman said,

    February 26, 2014 @ 10:46 pm

    I mentioned this in the comments on Mark's post, too, but thought I mention it here too: Abebooks is an Amazon subsidiary, so no conclusion can be drawn from any common errors between the two Web sites. On the other hand, it's entirely possible that GooBoo gets some if not all of its metadata from the same database supplier as Amazon (which might well be some other company in the Amazon empire, but is more likely one of the few remaining major book distributors).

RSS feed for comments on this post