Google Scholar: another metadata muddle?

Following on the critiques of the faulty metadata in Google Books that I offered here and in the Chronicle of Higher Education, Peter Jacso of the University of Hawaii writes in the Library Journal that Google Scholar is laced with millions of metadata errors of its own. These include wildly inflated publication and citation counts (which Jacso compares to Bernie Madoff's profit reports), numerous missing author names, and phantom authors assigned by the parser that Google elected to use to extract metadata, rather than using the metadata offered them by scholarly publishers and indexing/abstracting services:

In its stupor, the parser fancies as author names (parts of) section titles, article titles, journal names, company names, and addresses, such as Methods (42,700 records), Evaluation (43,900), Population (23,300), Contents (25,200), Technique(s) (30,000), Results (17,900), Background (10,500), or—in a whopping number of records— Limited (234,000) and Ltd (452,000). 

What makes this a serious problem is that many people regard the Google Scholar metadata as a reliable index of scholarly influence and reputation, particularly now that there are tools like the Google Scholar Citation Count gadget by Jan Feyereisl and the Publish or Perish software produced by Tarma Software, both of which take Google Scholar's metadata at face value. True, the data provided by traditional abstracting and indexing services are far from perfect, but their errors are dwarfed by those of Google Scholar, Jacso says.

Of course you could argue that Google's responsibilities with Google Scholar aren't quite analogous to those with Google Book, where the settlement has to pass federal scrutiny and where Google has obligations to the research libraries that provided the scans. Still, you have to feel sorry for any academic whose tenure or promotion case rests in part on the accuracy of one of Google's algorithms.

Comments (9)


Memetic dynamics of summative cliches

Following up on this morning's post about phrases that some people find irritating, I thought that I'd take a look at the recent history of one of them, "At the end of the day", which was the Plain English Campaign's 2004 "most irritating phrase in the language". Geoff Pullum ("Irritating cliches? Get a life", 3/25/2004) took this phrase to "have a meaning somewhere in the same region as after all, all in all, the bottom line is, and when the chips are down", and he observes that it "may shock people by its complete bleaching away of temporal meaning", resulting in things like "at the end of the day, you've got to get up in the morning".

A Google News Archive search for "at the end of the day" shows a rapid recent rise in hits from around 1985 onward.  But so do some similar phrases, like  "when all is said and done", which doesn't seem to have incurred the ire of peevers to nearly the same extent. So I thought I'd look at the relative frequency of four phrases with similar meanings: "in the last analysis", "in the final analysis", "when all is said and done", and "at the end of the day".  I queried the Google News archive in 5-year increments from 1951 to 2009.

Read the rest of this entry »

Comments (22)


Moving low-hanging fruit forward at the end of the day

Comments (26)


WTF? No, TFW!

The comments on my post "The inherent ambiguity of WTF" drifted to other possible expansions of WTF, like the World Taekwondo Federation. That reminded me of something I saw back in July on the blog Your Logo Makes Me Barf, mocking the abbreviatory logo of the Wisconsin Tourism Federation. The ridicule got some attention from local Wisconsin media, such as Kathy Flanigan of the Milwaukee Journal Sentinel:

Folks at the Wisconsin Tourism Federation couldn't possibly have seen how the Internet would change the lingo when it was established in 1979.
But now that it's been pointed out, the lobbying coalition might want to rethink using an acronym in the logo. To anyone online, WTF has a different meaning these days. And it's not the kind of thing you want visitors thinking about when they think Wisconsin.

I decided to check out the tourism board's website, and lo and behold, they've bowed to pressure and changed their name to the Tourism Federation of Wisconsin. The old logo lives on, however, at the Internet Archive. Compare:

Read the rest of this entry »

Comments (42)


The inherent ambiguity of "WTF"

I'd like to echo Arnold Zwicky's praise for the third edition of Jesse Sheidlower's fan-fucking-tastic dictionary, The F Word. (See page 33 to read the entry for fan-fucking-tastic, dated to 1970 in Terry Southern's Blue Movie. And see page 143 for the more general use of -fucking- as an infix, in use at least since World War I.) Full disclosure: I made some contributions to this edition, suggesting possible new entries and digging up earlier citations ("antedatings") for various words and phrases. I took a particular interest in researching effing acronyms and initialisms. For instance, I was pleased to contribute the earliest known appearance of the now-ubiquitous MILF — and no, I'm not talking about the Moro Islamic Liberation Front. (For the record, a Buffalo-based rock band adopted the name MILF in early 1991, based on slang used by lifeguards at Fort Niagara State Park.) Another entry I helped out on is the endlessly flexible expression of bewilderment, WTF.

Read the rest of this entry »

Comments (33)


The F Word, take 3

The third edition of Jesse Sheidlower's dictionary The F Word is now out, to much (and much-deserved) acclaim. The book has a scholarly introduction (of 33 pages) on the etymology of fuck; its taboo status; its appearance in print (including in dictionaries) and movies; euphemism and taboo avoidance; and this dictionary and its policies. The many uses of fuck are then covered in detail in the main entries.

There's an excellent review of the book by slang scholar Jonathon Green on the World Wide Words site. From Green's review:

… as a fellow lexicographer (and, I must admit, a friend — slang is a small world) what impresses me most is the excellence of the overall treatment. The subject happens to be fuck, but this is how any such study should be conducted and sadly so rarely is. Not via the slipshod infantilism of the Net’s Urban Dictionary, but disinterestedly, seriously and in depth. The F Word, I would suggest, is a template that we would all be wise to follow.

Website for the book here.

Comments off


Convention, uniqueness, and truth

Kevin Drum recently laid out a long-standing unsolved problem, one that has preoccupied such luminaries as Paul Krugman, James Fallows, and Glenn Beck ("Saving the Frogs", Mother Jones, 9/23/2009). The problem is that there's no good substitute for the over-used and untrue story about how a frog, if placed in a pot of gradually heated water, will eventually allow itself to be boiled without jumping out.  And since this is a rhetorical problem, Drum describes the failure as a linguistic one:

So here's what I'm interested in. The boiling frog cliche is untrue. But it stays alive because, as Krugman says, it's a useful metaphor. So why aren't there any good substitutes?

This is very strange. Most useful adages and metaphors not only have substitutes, they have multiple substitutes. "Look before you leap" and "Curiosity killed the cat." "Fast as lightning" and "Faster than a speeding bullet." Etc. Usually you have lots of choices.

But in this case we don't seem to have a single one aside from the boiling frog. Why? Is it because it's not really all that useful a metaphor after all? Because the frog has ruthlessly killed off every competitor? Because it's not actually true in any circumstance, let alone with frogs in pots of water? What accounts for this linguistic failure?

Yesterday, Jonathan Lundell sent me a link to Drum's article, with the comment "Sounds like a job for Language Log". That was almost enough to make me move on immediately: when Geoff Pullum and I started Language Log, I promised myself that if it ever got to feel like a job, I'd quit.

But this morning, after half a cup of coffee, I realized that Jonathan's remark was just an instance of the conventionalized phrasal template "sounds like a job for ___". And this one usually refers to the super-activities of superheros, which are by definition superfluous to their day jobs.

Read the rest of this entry »

Comments (43)


Kingsoft Strikes Again

Yesterday, I received this message from a young person who has been corresponding with me about ancient DNA and the movements of peoples across Eurasia during the Bronze Age and Early Iron Age:

The police reaved my computer due to I reprinted a news report of US about the National Day of China yesterday. I came back home from police department just now. They said they will check my computer exhaustively. I'm afraid about my thesises on each area. It is not constitutional to do like that. All acts in violation of the Constitution and the law must be investigated. But this is in China. I doubt [VHM: he means "I suspect / fear"] that they will install a detectaphone on my computer and destroy my essays. I feel like crying but shed no tears. The only feeling is indignation for an intellectual.

Although the young man's English is generally quite good, my immediate assumption was that the third word of his message was a typing error for "removed" or that he simply misremembered some other word meaning "seize." However, considering that quirky archaisms are rampant in Chinese use of English, a phenomenon that I have often documented on Language Log, e.g. here and here, I thought that I had better give the young man the benefit of the doubt, so I trudged over to my dictionary and looked up "reave."

Read the rest of this entry »

Comments (37)


Crash blossom du jour

A crash blossom, you'll recall, is an infelicitously worded headline that leads the reader down the garden path. Here's a fine example from today's Associated Press headlines:

McDonald's fries the holy grail for potato farmers

(Hat tip: Stephen Anderson via Larry Horn.)

Comments (28)


One for the Fellowship of the Gapless Relative

According to Michael Goldstein, writing one of the opinion pieces in the NYT's 9/22/2009 symposium on "National Academic Standards: The First Test":

The politics has changed. All governors now recognize a problem: incentives to set low passing scores. Currently, a kid in Alabama might pass a 4th grade reading test that, if he lived in Massachusetts and took our version, he would fail.

You could add a resumptive pronoun: "a kid in Alabama might pass a 4th grade reading test that, if he lived in Massachusetts and took our version [of it], he would fail".

For some background, read "Ask Language Log: Gapless Relatives" and "More gapless relatives", 10/14/2007. This case is especially interesting because it might alternatively be construed as having a gap after fail, though that would seem to make the sentence self-refuting.

Read the rest of this entry »

Comments (14)


Quotation marks, non-necessity of

One is in favor of diversity in the blogosphere, of course. And yet somehow, when one learns that there now exists a blog entirely devoted to pictures of signs in which quotation marks are used incorrectly (used as if they were some sort of special font face like italics), one is somehow tempted to think that we are in danger of running out of words like esoteric and arcane. Still, check it out. Some of the pictures are quite astonishing. Keep in mind that in many cases people paid good money to have these signs made. They may even have paid a dime or two extra per quotation mark. Or "quotation mark", as they would put it. All one can tell you about one's own reaction is that one found some of them jaw-dropping. One's jaw actually dropped.

Read the rest of this entry »

Comments (41)


Volitive polarity items

Comments (14)


The Vulture Reading Room feeds the eternal flame

If I and my friends and colleagues could just have found the strength of will to not talk about Dan Brown's new novel The Lost Symbol, perhaps we could have stopped his march to inevitable victory as the fastest-selling and most renowned novelist in human history, and The Lost Symbol could have just faded away to become his Lost Novel. If only we could just have shut up. And we tried. But we just couldn't resist the temptation to gabble on about the new blockbuster. Sam Anderson at New York Magazine has set up a discussion salon devoted to The Lost Symbol, under the title the Vulture Reading Room, to allow us to tell each other (and you, and the world) what we think about the book. Already Sam's own weakness has become clear: he struggled mightily to avoid doing the obvious — a Dan Brown parody — and of course he failed. His cringingly funny parody is already up on the site (as of about 4 p.m. Eastern time on September 22). Soon my own first post there will be up. I know that Sarah Weinman (the crime reviewer) will not be far behind, and Matt Taibbi (the political journalist) and NYM's own contributing editor Boris Kachka will not be far behind her.

Read the rest of this entry »

Comments (10)