Archive for Research tools

Google is scary good

Before I finish typing "red", Google is already suggesting "red herring", which is what I was looking for.

When I've barely begun to type "Philadelphia" or "Seattle" and only one "Walla", Google is already suggesting ""Philadelphia weather", "Seattle weather" and "Walla Walla weather", which is what I was looking for in each case.

If I want to check in on American Airlines, all I have to type is "ame", and — voilà! — there it is!

I was trying to think of a certain kind of Japanese tomb.  I typed in "Japanese tombs" and, remembering that these tombs resemble a keyhole, I added "k", and up popped "japanese keyhole tombs".

I do this kind of search hundreds of times every day, and I'm infinitely grateful to Google for enabling them and making them seem (on my end) so effortless.

Read the rest of this entry »

Comments (54)

Open Access Handbooks in Linguistics!

A couple of weeks ago, I wrung my hands on Facebook over the proliferation of commercial publishers' Handbooks of Linguistics. These are usually priced out of individuals' budgets, being sold mostly to university libraries, and the thousands of hours of work poured into them by dedicated linguists are often lost behind a paywall, inaccessible to many of the people who would most like to read them.

That post prompted a flood of urgent discussion; it seemed like this was a thought that was being simultaneously had around the world. (Indeed, Kai von Fintel had posted the identical thought about six months prior; probably that butterfly was the ultimate cause of the veritable hurricane  that erupted on my feed.)

Long story short, a few weeks later we now have a proto-editorial board and are on to the next steps of identifying a venue and a business model for the series. Please check out our announcement below the fold, and follow along on our blog for updates as the series develops!

Read the rest of this entry »

Comments (5)

Sino-Japanese

I recall that, as a graduate student in Sinology, one of the most troublesome tasks was figuring out how to romanize the names of Japanese authors, the titles of their works, place names, technical terms, and so forth. Overall, Japanese Sinological (not to mention Indological and other fields) scholarship is outstanding, so we have to consult it, and when we cite Japanese works, we need to be able to romanize names, titles, and so forth to reflect their Japanese pronunciations.

Read the rest of this entry »

Comments (27)

Another SOS for DARE

Two years ago I sent out an "SOS for DARE," that is, a plea for the indispensable Dictionary of American Regional English, which had run into funding troubles. Though DARE was granted a temporary reprieve, the latest news is more dire than ever.

Marc Johnson laid out the situation in an article for the Milwaukee Journal-Sentinel:

The end may be near for one of the University of Wisconsin-Madison's most celebrated humanities projects, the half-century-old Dictionary of American Regional English. In a few months, the budget pool will drain to a puddle. Layoff notices have been sent, eulogies composed…

Read the rest of this entry »

Comments (7)

The sparseness of linguistic data

Gary Marcus and Ernest Davis say in a New York Times piece on why we shouldn't buy all the hype about the Big Data revolution in science:

Big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common. For instance, programs that use big data to deal with text, such as search engines and translation programs, often rely heavily on something called trigrams: sequences of three words in a row (like "in a row"). Reliable statistical information can be compiled about common trigrams, precisely because they appear frequently. But no existing body of data will ever be large enough to include all the trigrams that people might use, because of the continuing inventiveness of language.

To select an example more or less at random, a book review that the actor Rob Lowe recently wrote for this newspaper contained nine trigrams such as "dumbed-down escapist fare" that had never before appeared anywhere in all the petabytes of text indexed by Google. To witness the limitations that big data can have with novelty, Google-translate "dumbed-down escapist fare" into German and then back into English: out comes the incoherent "scaled-flight fare." That is a long way from what Mr. Lowe intended — and from big data's aspirations for translation.

Read the rest of this entry »

Comments off

A reprieve for DARE

A month ago, I posted an "SOS for DARE," detailing the impending financial threat faced by the Dictionary of American Regional English, a national treasure of lexicography. At the time it appeared that the College of Letters and Sciences at the University of Wisconsin, where DARE is based, would be unable to provide support to offset the loss of federal and private grant money. But now there's finally some good news out of Madison, in the form of new funds from the University and external gifts.

Read the rest of this entry »

Comments (1)

SOS for DARE

Many Language Log readers are no doubt familiar with the Dictionary of American Regional English, which I hailed in a Boston Globe column last year as "a great project on how Americans speak — make that the great project on how Americans speak." At the time, I was previewing DARE's fifth volume, which completed the alphabetical run all the way to zydeco.  Since then, a sixth volume of supplemental materials has also been published, and plans are underway to launch the digital version of DARE, which would serve as an online home for future expansions and revisions. But now DARE editor Joan Hall passes along some troubling news about the dictionary's financial fate.

Read the rest of this entry »

Comments (5)

The American Heritage Dictionary of the English Language, 5th edition

As soon as I heard that the 5th edition of The American Heritage Dictionary of the English Language (AHD) had come out, I rushed to the nearest Barnes & Noble bookstore (yes, they still exist — that was Borders that closed) and plunked down two Bens (hundred dollar bills) to buy three copies at $60 each:  one for my office at Penn, one for my study at home, and one for a friend.  The 5th ed. was actually published in November, 2011, but I was in China then, and didn't get a chance to buy my own copies until the day I arrived back on American soil.

Read the rest of this entry »

Comments (31)

A new chapter for Google Ngrams

When Google's Ngram Viewer was launched in December 2010 it encouraged everyone to be an amateur computational linguist, an amateur historical lexicographer, or a little of both. Today, the public interface that allows users to plumb the Google Books megacorpus has been relaunched, and the new version makes it even more enticing to researchers, both scholarly and nonscholarly. You can read all about it in my online piece for The Atlantic, as well as Jon Orwant's official introduction on the Google Research blog.

Read the rest of this entry »

Comments (13)

Soundex and Metaphone

One of the earliest and best photographers in China was called John Zumbrun, but I have also seen his surname spelled various different ways, including Zumbrum.  Some of his pictures may be seen here (this site is run by Thomas H. Hahn, digital archivist of old photographs).

As soon as I saw his surname, I suspected that it might be a variant of the Zumbrunnen among my own maternal relatives who were of Swiss German extraction.  When I mentioned to my sister Heidi (who does intense genealogical research on our family) that I thought Zumbrun might be a variant of Zumbrunnen, she replied, "Oh man, the variant spellings of Zumbrunnen are driving me batty.  I have even seen Zum Pwunnen.  Have you heard of the soundex?  It is a way to index names & deal with all of the variant spellings."

Read the rest of this entry »

Comments (16)

New search service for language resources

It has just become a whole lot easier to search the world's language archives.  The new OLAC Language Resource Catalog contains descriptions of over 100,000 language resources from over 40 language archives worldwide.

This catalog, developed by the Open Language Archives Community (OLAC), provides access to a wealth of information about thousands of languages, including details of text collections, audio recordings, dictionaries, and software, sourced from dozens of digital and traditional archives.

OLAC is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.  The OLAC Language Resource Catalog was developed by staff at the Linguistic Data Consortium, the University of Pennsylvania Libraries, the Graduate Institute of Applied Linguistics, and the University of Melbourne.  The primary sponsor is the National Science Foundation.

Comments (2)

Oxford Chinese Dictionary

Well, my copy of the new English-Chinese Chinese-English (hereafter ECCE) Oxford Chinese Dictionary (hereafter OCD) from Oxford University Press has arrived, and I must admit that it is very big and very impressive.  There has been a lot of buzz about this dictionary in the last couple of weeks, most of it generated by their own publicity department, working with the media.

Read the rest of this entry »

Comments (21)

Embuggerance & Feisty

Problems with Google's metadata are a recurrent theme here on Language Log. Now on his blog Stephen Chrisomalis reports a stunning cascade of screw-ups that led to Google Scholar producing the following citation:

Embuggerance, E., and H. Feisty. 2008. The linguistics of laughter. English Today 1, no. 04: 47-47.

Comments (22)