Archive for Resources

New search service for language resources

It has just become a whole lot easier to search the world's language archives.  The new OLAC Language Resource Catalog contains descriptions of over 100,000 language resources from over 40 language archives worldwide.

This catalog, developed by the Open Language Archives Community (OLAC), provides access to a wealth of information about thousands of languages, including details of text collections, audio recordings, dictionaries, and software, sourced from dozens of digital and traditional archives.

OLAC is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.  The OLAC Language Resource Catalog was developed by staff at the Linguistic Data Consortium, the University of Pennsylvania Libraries, the Graduate Institute of Applied Linguistics, and the University of Melbourne.  The primary sponsor is the National Science Foundation.

Comments (2)

LINGUIST List (2010)!

It's that time of  the year again: the LINGUIST List's annual fund drive is under way, for the month of March; the drive is about halfway (about $32,000) to its goal of $65,000 (the money goes to support the student staff). From the list's site:

The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world.

Read the rest of this entry »

Comments off

Google Demotes Literary Stars

My post about Google's metadata problems, along with a similar piece in the Chronicle of Higher Education, got a lot of people talking about the problem in the press and the blogs. (I even ran into an allusion to it in a La Repubblica piece on the Google Book Settlement when I arrived in Rome yesterday morning.) A number of people passed along their own experiences with flaky metadata. Others criticized me on grounds that could be broadly summed up as "Don't look a gift horse in the server," "It's better than nothing," "Who needs metadata anyway?," "Just give them time," and "Why concentrate on trivialities like metadata while ignoring the real perils of corporate monopoly" (as in "serving as a consultant for monitoring the proper temperatures of the pitchforks in hell").

This is all to the good, if it helps move up the metadata issues in Google's queue. I do think this will get a lot better as Google puts its considerable mind to it. But there was one other aspect of the metadata problem which I hadn't noticed or even thought about, but which in its own small way was unkindest cut of all. It was noticed by the children's book author Ace Bauer, who was prompted by my account of the metadata problems to check his Google Books listing:

Turns out my review rating ranked only one star out of 5. That's dim. But see, the review upon which they based this ranking was Kirkus's. Kirkus loved the book. They gave it a star. One star. That's all they give folks. It's considered a major honor.

Indeed it is, and actually the falling-star glitch affects a number of writers, for example Roy Blount, Jr., the president of the Author's Guild, who is has been an enthusiastic backer of the settlement. Google Books assigns a one-out-of-five star rating to at least two of Blount's books on the basis of their starred Kirkus reviews, Crackers and First Hubby, and visits similar review rating downgrades on books by Guild vice-president Judy Blume and Guild board members Nick LemannJames GlieckOscar Hijuelos, among others.

 I don't know exactly what the Google people will say when they cotton to this one, but it's a good guess the first sentence will begin with "oy."

Read the rest of this entry »

Comments (11)

Some little inventories

Comments off

NLTK Book on Sale Now

The NLTK book, Natural Language Processing with Python, went on sale yesterday:

Cover of Natural Language Processing with Python

"This book is here to help you get your job done." I love that line (from the preface). It captures the spirit of the book. Right from the start, readers/users get to do advanced things with large corpora, including information-rich visualizations and sophisticated theory implementation. If you've started to see that your research would benefit from some computational power, but you have limited (or no) programming experience, don't despair — install NLTK and its data sets (it's a snap), then work through this book.

Read the rest of this entry »

Comments (5)

Inventories of postings

Over on my blog, I've been posting inventories of postings (on Language Log and my blog) on various topics: back on 13 June, an inventory of postings on two-part back-formed verbs and one on split infinitives; today, one on Omit Needless advice and one on conflicts between faithfulness and well-formedness.

More to come.

[Added 6/28: Now an inventory on the usage advice One Right Way.]

Comments off

A Potpourri of Materials on Shanghainese

There are four parts to this very long post:  1. a message from a Shanghainese mother explaining her attitude toward the language she speaks with her little daughter, 2. the use of Shanghainese in the poster that I discussed in my previous post, 3. non-Mandarin college entrance exams, 4. an important resource for those interested in Wu topolects.

Read the rest of this entry »

Comments (13)

In defense of Amazon's Mechanical Turk

I can find no better description of Amazon's Mechanical Turk than in the "description" tag at the site itself:

The online market place for work. We give businesses and developers access to an on-demand scalable workforce. Workers can work at home and make money by choosing from thousands of tasks and jobs.

This is followed by a "keywords" meta tag:

make money, make money at home, make money from home, make money on the internet, make extra money, make money …

This makes the site sound a bit like the next stop on Dave Chapelle's tour of his imagined Internet as physical place, and indeed it does have its seamy side. But I come to defend Mechanical Turk as a useful tool for linguistic research — a quick and inexpensive way to gather data and conduct simple experiments.

Read the rest of this entry »

Comments (11)

Authors@Google

Paul Armstrong has reminded me of the Authors@Google videos (available on YouTube) — videos of authors talking at Google on their recent books. At least five are of interest to linguists:

Noam Chomsky (4/25/08)

Erin McKean (3/29/06)

Geoffrey Nunberg (10/12/06)

Steven Pinker (9/24/07)

John Searle (10/30/07)

[Added 2/2:

Ray Jackendoff (8/30/07)

George Lakoff (7/12/07)

George Lakoff (6/4/08)

(Of the eight videos listed here, some are more directly related to linguistics than others.)]

[Added 2/7: from Ben Zimmer, three more talks at Google:

Tom Dalzell
Christine Kenneally
David Harrison and Gregory Anderson ]

Comments off