Archive for August, 2009

Serial improvement

Although I share Geoff Nunberg's disappointment in some aspects of Google's metadata for books,  I've noticed a significant — though apparently unheralded — recent improvement.  So I decided to check this out by following up Bill Poser's post yesterday about insect species, which I thought was likely to turn up an example of the right sort. And in fact, the third hit in a search for {hemipteran} is a relevant one: Irene McCulloch, "A comparison of the life cycle of Crithidia with that of Trypanosoma in the invertebrate host", University of California Publications in Zoology, 19(4) 135-190, October 4, 1919.

This paper appears in a volume that is part of a serial publication. And until recently, Google Books  routinely gave all such publications the date of the first in the series, even if the result was a decade or a century out of whack.

Read the rest of this entry »

Comments (12)

Uttaris pallidipennis in Miami

In the news today I came across this rather strange report from the Associated Press:

MIAMI — U.S. Customs and Border Protection officials say they have intercepted a rare and dangerous insect found in a shipment of flowers at a Miami airport that could cause significant damage.

Officials said Saturday they were examining a box of flowers last week at Miami International airport when they found Hemiptera. Hemiptera's are typically aphids, cicadas, and leaf hoppers and comprise about 80,000 different species. They feed on the seed heads of grasses and sedges. The insect is found in South America.

Officials believe it is the first time the insect has been found in the U.S.

Read the rest of this entry »

Comments (46)

How science reporting works

The latest from Zach Wiener at SMBC:

(There are four more panels — click on the image to see them all.)

Read the rest of this entry »

Comments (7)

Sino-American Name Reversion

Yesterday an applicant from China came to my office and introduced herself to me as Runxiao ("Moist Dawn").  However, in previous correspondence, she had always referred to herself as Layn (a variant of Lane; other variants of the name Lane are:  Laen, Laene, Lain, Laine, Laney, Lanie, Layn, Layne, and Laynne ("living near a lane"; "descendant of Laighean or Luan" in Gaelic) — so say the name books.  When I asked her which name she preferred, she said, "You can call me Runxiao."

"But what about Layn?" I asked.  "Didn't you used to go by the name Layn?"

"Oh, yes!" she replied cheerfully with a gleaming smile.  "When I was in China, I called myself Layn, but now that I'm in America, I call myself Runxiao."

Read the rest of this entry »

Comments (53)

Google Books: A Metadata Train Wreck

Mark has already extensively blogged the Google Books Settlement Conference at Berkeley yesterday, where he and I both spoke on the panel on "quality" — which is to say, how well is Google Books doing this and what if anything will hold their feet to the fire? This is almost certainly the Last Library, after all. There's no Moore's Law for capture, and nobody is ever going to scan most of these books again. So whoever is in charge of the collection a hundred years from now — Google? UNESCO? Wal-Mart? — these are the files that scholars are going to be using then. All of which lends a particular urgency to the concerns about whether Google is doing this right.

My presentation focussed on GB's metadata — a feature absolutely necessary to doing most serious scholarly work with the corpus. It's well and good to use the corpus just for finding information on a topic — entering some key words and barrelling in sideways. (That's what "googling" means, isn't it?) But for scholars looking for a particular edition of Leaves of Grass, say, it doesn't do a lot of good just to enter "I contain multitudes" in the search box and hope for the best. Ditto for someone who wants to look at early-19th century French editions of Le Contrat Social, or to linguists, historians or literary scholars trying to trace the development of words or constructions: Can we observe the way happiness replaced felicity in the seventeenth century, as Keith Thomas suggests? When did "the United States are" start to lose ground to "the United States is"? How did the use of propaganda rise and fall by decade over the course of the twentieth century? And so on for all the questions that have made Google Books such an exciting prospect for all of us wordinistas and wordastri. But to answer those questions you need good metadata. And Google's are a train wreck: a mish-mash wrapped in a muddle wrapped in a mess.

Read the rest of this entry »

Comments (81)

Sorites in the comics

Today's Dinosaur Comics disposes of the sorites paradox:

(Click image for a larger version.)

Comments (46)

The Google Books Settlement

I'm spending today at Berkeley, participating in a one-day conference on "The Google Books Settlement and the Future of Information Access".  I'll live-blog the discussion as the day unfolds, leaving comments off until it's over. I believe that the sessions are being recorded, and the recordings will be available on the web at some time in the near future. [Gary Price at Resource Shelf provides some other links here, and a press round-up here. Another summary by an attendee is here.]

Regular LL readers will know that we've been long-time users and supporters of Google Books, with occasional complaints about the poor quality of its metadata. For a lucid discussion of some issues with the terms of the proposed settlement, read Pamela Samuelson's articles "The Audacity of the Google Books Settlement", Huffington Post, 8/10/2009, and "Why is the Antitrust Division Investigating the Google Books Search Settlement?", Huffington Post, 8/19/2009.

Read the rest of this entry »

Comments (7)

Failing immediately to

As BBC Radio 4 reported the death of Senator Kennedy on the news, I heard a line about how his career had been blighted by the incident at the bridge at Chappaquiddick where "he failed immediately to report an accident". You can see what has happened: in an inadvisable attempt to avoid a split infinitive, the adverb has been placed before to, but this puts it next to failed, so we get interference from a distracting and unintended meaning that involves immediate failure (whatever that might mean). It was the reporting that should have been immediate. The right word order to pick would have been "he failed to immediately report an accident". But you just can't stop writers of news copy from being worried (falsely) that splitting an infinitive is some kind of mistake.

Read the rest of this entry »

Comments (48)

Where evidence counts for nothing and nobody will listen

You just can't stop people putting themselves in harm's way. If they're not walking into the buzzsaw they're crashing like bugs into the windshield… As the previously referenced discussion about usage in The Guardian's online pages developed a bit further, a commenter called scherfig responded to Steve Jones's devastating piece of evidence about Mark Twain not obeying Fowler's which/that rule by saying this:

OK, steve, let's forget Mark Twain and Fowler (old hat) and take a giant leap forward to George Orwell in the 30's and 40's. In my opinion, in his essays, the finest writer of the English language ever . Check out his use of English – it is, after all, several decades after Twain and still 70 years ago, and he has actually written sensibly about language (quite a lot).

What Steve immediately did, of course, was to take a relevant piece of Orwell's work and look at it; scherfig, the Orwell fan, astonishingly, had been too lazy to do this. And again his result was total and almost instant annihilation of the opponent.

Read the rest of this entry »

Comments (40)

"Team, Meet Girls; Girls, Meet Team"

The ideal David Bowie song, according to (Nick Troop's interpretation of) the output of Jamie Pennebaker's LIWC program, correlated with sales figures across Bowie's oeuvre:

Read the rest of this entry »

Comments (8)

Nun study update

For the last dozen years, it's been known that young people who follow the stylistic advice of Strunk & White are more likely to get Alzheimer's disease when they get old. Well, at least, in a cohort of nuns,

Low idea density and low grammatical complexity in autobiographies written in early life were associated with low cognitive test scores in late life. Low idea density in early life had stronger and more consistent associations with poor cognitive function than did low grammatical complexity. Among the 14 sisters who died, neuropathologically confirmed Alzheimer's disease was present in all of those with low idea density in early life and in none of those with high idea density.

And if you look into what "idea density" means, you'll see that many aspects of Strunkish writing style, especially avoidance of adjectives and adverbs, are precisely designed to lower it. (For details and links, see "Writing style and dementia", 12/3/2004; and "Miers dementia unlikely", 10/21/2005.)

Now there's a new chapter in the story, based on looking for physical symptoms of Alzheimer's in living nuns using positron emission tomography (PET) brain imaging, rather than relying on post-mortem examination of the brains of dead ones ("Can Language Skills Ward Off Alzheimer's? A Nuns' Study", Time, 7/9/2009).

Read the rest of this entry »

Comments (17)

Crash blossoms

From John McIntyre:

You've heard about the Cupertino. You have seen the eggcorn. You know about the snowclone. Now — flourish by trumpets and hautboys — we have the crash blossom.

At Testy Copy, a worthy colleague, Nessie3, posted this headline:

Violinist linked to JAL crash blossoms

(If this seems a bit opaque, and it should, the story is about a young violinist whose career has prospered since the death of her father in a Japan Airlines crash in 1985.)

A quick response by subtle_body suggested that crash blossom would be an excellent name for headlines done in by some such ambiguity — a word understood in a meaning other than the intended one. The elliptical name of headline writing makes such ambiguities an inevitable hazard.

And danbloom was quick to set up a blog to collect examples of "infelicitously worded headlines."

Chris Waigl, reporting on the same neologism, describes "crash blossoms" as "those train wrecks of newspaper headlines that lead us down the garden path to end up against a wall, scratching our head and wondering what on earth the subeditor might possibly have been thinking." Indeed, when such infelicitous headlines have come up here on Language Log, they have typically been discussed as examples of "garden path sentences." After the break, a recent headline of the classic "garden path" variety.

Read the rest of this entry »

Comments (60)

Museum musing

John McIntyre at You Don't Say considers a hypothetical Museé des Peevologies. The curator's job is apparently open, or will be once a founding donor is located.

Comments (18)