Archive for Research tools

Open Access Handbooks in Linguistics!

A couple of weeks ago, I wrung my hands on Facebook over the proliferation of commercial publishers’ Handbooks of Linguistics. These are usually priced out of individuals’ budgets, being sold mostly to university libraries, and the thousands of hours of work poured into them by dedicated linguists are often lost behind a paywall, inaccessible to many of the people who would most like to read them.

That post prompted a flood of urgent discussion; it seemed like this was a thought that was being simultaneously had around the world. (Indeed, Kai von Fintel had posted the identical thought about six months prior; probably that butterfly was the ultimate cause of the veritable hurricane  that erupted on my feed.)

Long story short, a few weeks later we now have a proto-editorial board and are on to the next steps of identifying a venue and a business model for the series. Please check out our announcement below the fold, and follow along on our blog for updates as the series develops!

Read the rest of this entry »

Comments (5)


I recall that, as a graduate student in Sinology, one of the most troublesome tasks was figuring out how to romanize the names of Japanese authors, the titles of their works, place names, technical terms, and so forth. Overall, Japanese Sinological (not to mention Indological and other fields) scholarship is outstanding, so we have to consult it, and when we cite Japanese works, we need to be able to romanize names, titles, and so forth to reflect their Japanese pronunciations.

Read the rest of this entry »

Comments (27)

Another SOS for DARE

Two years ago I sent out an “SOS for DARE,” that is, a plea for the indispensable Dictionary of American Regional English, which had run into funding troubles. Though DARE was granted a temporary reprieve, the latest news is more dire than ever.

Marc Johnson laid out the situation in an article for the Milwaukee Journal-Sentinel:

The end may be near for one of the University of Wisconsin-Madison’s most celebrated humanities projects, the half-century-old Dictionary of American Regional English. In a few months, the budget pool will drain to a puddle. Layoff notices have been sent, eulogies composed…

Read the rest of this entry »

Comments (7)

The sparseness of linguistic data

Gary Marcus and Ernest Davis say in a New York Times piece on why we shouldn’t buy all the hype about the Big Data revolution in science:

Big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common. For instance, programs that use big data to deal with text, such as search engines and translation programs, often rely heavily on something called trigrams: sequences of three words in a row (like “in a row”). Reliable statistical information can be compiled about common trigrams, precisely because they appear frequently. But no existing body of data will ever be large enough to include all the trigrams that people might use, because of the continuing inventiveness of language.

To select an example more or less at random, a book review that the actor Rob Lowe recently wrote for this newspaper contained nine trigrams such as “dumbed-down escapist fare” that had never before appeared anywhere in all the petabytes of text indexed by Google. To witness the limitations that big data can have with novelty, Google-translate “dumbed-down escapist fare” into German and then back into English: out comes the incoherent “scaled-flight fare.” That is a long way from what Mr. Lowe intended — and from big data’s aspirations for translation.

Read the rest of this entry »

Comments off

A reprieve for DARE

A month ago, I posted an “SOS for DARE,” detailing the impending financial threat faced by the Dictionary of American Regional English, a national treasure of lexicography. At the time it appeared that the College of Letters and Sciences at the University of Wisconsin, where DARE is based, would be unable to provide support to offset the loss of federal and private grant money. But now there’s finally some good news out of Madison, in the form of new funds from the University and external gifts.

Read the rest of this entry »

Comments (1)


Many Language Log readers are no doubt familiar with the Dictionary of American Regional English, which I hailed in a Boston Globe column last year as “a great project on how Americans speak — make that the great project on how Americans speak.” At the time, I was previewing DARE’s fifth volume, which completed the alphabetical run all the way to zydeco.  Since then, a sixth volume of supplemental materials has also been published, and plans are underway to launch the digital version of DARE, which would serve as an online home for future expansions and revisions. But now DARE editor Joan Hall passes along some troubling news about the dictionary’s financial fate.

Read the rest of this entry »

Comments (5)

The American Heritage Dictionary of the English Language, 5th edition

As soon as I heard that the 5th edition of The American Heritage Dictionary of the English Language (AHD) had come out, I rushed to the nearest Barnes & Noble bookstore (yes, they still exist — that was Borders that closed) and plunked down two Bens (hundred dollar bills) to buy three copies at $60 each:  one for my office at Penn, one for my study at home, and one for a friend.  The 5th ed. was actually published in November, 2011, but I was in China then, and didn’t get a chance to buy my own copies until the day I arrived back on American soil.

Read the rest of this entry »

Comments (31)

A new chapter for Google Ngrams

When Google’s Ngram Viewer was launched in December 2010 it encouraged everyone to be an amateur computational linguist, an amateur historical lexicographer, or a little of both. Today, the public interface that allows users to plumb the Google Books megacorpus has been relaunched, and the new version makes it even more enticing to researchers, both scholarly and nonscholarly. You can read all about it in my online piece for The Atlantic, as well as Jon Orwant’s official introduction on the Google Research blog.

Read the rest of this entry »

Comments (13)

Soundex and Metaphone

One of the earliest and best photographers in China was called John Zumbrun, but I have also seen his surname spelled various different ways, including Zumbrum.  Some of his pictures may be seen here (this site is run by Thomas H. Hahn, digital archivist of old photographs).

As soon as I saw his surname, I suspected that it might be a variant of the Zumbrunnen among my own maternal relatives who were of Swiss German extraction.  When I mentioned to my sister Heidi (who does intense genealogical research on our family) that I thought Zumbrun might be a variant of Zumbrunnen, she replied, “Oh man, the variant spellings of Zumbrunnen are driving me batty.  I have even seen Zum Pwunnen.  Have you heard of the soundex?  It is a way to index names & deal with all of the variant spellings.”

Read the rest of this entry »

Comments (16)

New search service for language resources

It has just become a whole lot easier to search the world’s language archives.  The new OLAC Language Resource Catalog contains descriptions of over 100,000 language resources from over 40 language archives worldwide.

This catalog, developed by the Open Language Archives Community (OLAC), provides access to a wealth of information about thousands of languages, including details of text collections, audio recordings, dictionaries, and software, sourced from dozens of digital and traditional archives.

OLAC is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.  The OLAC Language Resource Catalog was developed by staff at the Linguistic Data Consortium, the University of Pennsylvania Libraries, the Graduate Institute of Applied Linguistics, and the University of Melbourne.  The primary sponsor is the National Science Foundation.

Comments (2)

Oxford Chinese Dictionary

Well, my copy of the new English-Chinese Chinese-English (hereafter ECCE) Oxford Chinese Dictionary (hereafter OCD) from Oxford University Press has arrived, and I must admit that it is very big and very impressive.  There has been a lot of buzz about this dictionary in the last couple of weeks, most of it generated by their own publicity department, working with the media.

Read the rest of this entry »

Comments (21)

Embuggerance & Feisty

Problems with Google’s metadata are a recurrent theme here on Language Log. Now on his blog Stephen Chrisomalis reports a stunning cascade of screw-ups that led to Google Scholar producing the following citation:

Embuggerance, E., and H. Feisty. 2008. The linguistics of laughter. English Today 1, no. 04: 47-47.

Comments (22)

Google Demotes Literary Stars

My post about Google’s metadata problems, along with a similar piece in the Chronicle of Higher Education, got a lot of people talking about the problem in the press and the blogs. (I even ran into an allusion to it in a La Repubblica piece on the Google Book Settlement when I arrived in Rome yesterday morning.) A number of people passed along their own experiences with flaky metadata. Others criticized me on grounds that could be broadly summed up as “Don’t look a gift horse in the server,” “It’s better than nothing,” “Who needs metadata anyway?,” “Just give them time,” and “Why concentrate on trivialities like metadata while ignoring the real perils of corporate monopoly” (as in “serving as a consultant for monitoring the proper temperatures of the pitchforks in hell”).

This is all to the good, if it helps move up the metadata issues in Google’s queue. I do think this will get a lot better as Google puts its considerable mind to it. But there was one other aspect of the metadata problem which I hadn’t noticed or even thought about, but which in its own small way was unkindest cut of all. It was noticed by the children’s book author Ace Bauer, who was prompted by my account of the metadata problems to check his Google Books listing:

Turns out my review rating ranked only one star out of 5. That’s dim. But see, the review upon which they based this ranking was Kirkus‘s. Kirkus loved the book. They gave it a star. One star. That’s all they give folks. It’s considered a major honor.

Indeed it is, and actually the falling-star glitch affects a number of writers, for example Roy Blount, Jr., the president of the Author’s Guild, who is has been an enthusiastic backer of the settlement. Google Books assigns a one-out-of-five star rating to at least two of Blount’s books on the basis of their starred Kirkus reviews, Crackers and First Hubby, and visits similar review rating downgrades on books by Guild vice-president Judy Blume and Guild board members Nick LemannJames GlieckOscar Hijuelos, among others.

 I don’t know exactly what the Google people will say when they cotton to this one, but it’s a good guess the first sentence will begin with “oy.”

Read the rest of this entry »

Comments (11)