Coming soon, to an airport near you?

« previous post | next post »

PC and Pixel for 12/11/2008 (link sent in by Michael Erlewine) :

In fact, reasonably robust identification of the language being spoken is more like what the state of the art for this sort of technology can now aspire to.

For example, here are some recent results for eight systems tested on a two-way choice between Caribbean and non-Caribbean Spanish based on a three-second sample. The different bars represent performance for the different systems, with the performance of a human judge included for comparison.

Here is the performance of two systems on an open set, where the task was to detect Caribbean Spanish in a set of samples that included not only other sorts of Spanish, but also Arabic, Bengali, Farsi, German, Japanese, Korean, Russian, Tamil, Thai, Vietnamese, Cantonese, Mandarin (mainland and Taiwanese), Min, Wu, English (American and Indian), Hindi, Urdu, and five other languages:

In both graphs, the red portion of the bars indicates the proportion of misses (where the test segment was Caribbean Spanish but the system failed to detect it), and the blue portion of the graphs indicates the proportion of false alarms (where the system claimed to detect Caribbean Spanish, but the test segment was some other dialect of Spanish, in the closed-set trial, or some language altogether, in the open-set trial).

Here's the performance for a closed-set language-detection task, for a larger number of participating systems, averaged over the 14 languages in the set:

As you can see, the best of the systems are doing pretty well at recognizing languages, but not so well at distinguishing one dialect from another.

If you'd like more details, the full evaluation plan for this test (known as LRE07) is here, and the results are here. The evaluations were conducted by the Multimodal Information Group at the National Institute of Standards and Technology, and the data for the LRE evaluations was collected and distributed by the Linguistic Data Consortium (where Language Log is now hosted).


  1. Paul said,

    December 12, 2008 @ 3:04 am

    Have I failed to understand something (quite likely!), or is there an error in the vertical axes: are the worst of these systems really getting 199 instances correct out of every 200? That would seem astonishingly successful to me, and the fact that several systems seem to score exactly 0.5% and none score a higher percentage than that makes me suspicious that I've missed something. I'm just not clear what these results are 0-0.5% of.

    [(myl) As you suspect, the vertical axes are really proportions, even though the labels misleadingly say "percent". The exact definition of the metric that the NIST group calls Cavg is given in section 3.2 of the evaluation plan. Since correct answers are half "yes" and half "no", a system that (for example) always answers "yes" would have a miss rate of 0 and a false-alarm rate of 0.5.]

  2. Paul said,

    December 12, 2008 @ 5:51 am

    Thanks, Mark. I'd looked at the material you linked to, but couldn't quite work out why they were claiming to be in percentages. No doubt I've made similar mistakes myself many times – very easy to do.

  3. Sili said,

    December 12, 2008 @ 2:53 pm

    Indeed. I hated it when the students caught me saying "Minus ten Kelvin".

  4. Bob Ladd said,

    December 14, 2008 @ 10:57 am

    Expert humans used to be able to make such judgements with detail comparable to that achieved by the computer in the cartoon. When my father was an undergraduate at Brown in the late 1930s, he was on a radio program called "Where are you from?", hosted by Brown linguist Henry Lee Smith (the Smith of Trager and Smith, for linguists of a certain age). As my father used to recall it, Smith asked him to say "marry merry Mary" and promptly judged that he was from "within 15 miles of the Cape Cod Canal", which in fact he was. Whether the story was embellished in the retelling I have no way of knowing, but I have no reason to doubt that there was such a radio program or that Smith could actually do this sort of thing. Since (according to Mark's first graph) human listeners are still substantially outperforming machines, it seems unlikely that we will be seeing scenes like the cartoon any time soon, and an electronic Smith is probably still several decades off.

  5. Mark Liberman said,

    December 14, 2008 @ 11:20 am

    Bob Ladd: Expert humans used to be able to make such judgements with detail comparable to that achieved by the computer in the cartoon.

    There's the famous quote from Shaw's Pygmalion:

    THE GENTLEMAN [returning to his former place on the note taker's left] How do you do it, if I may ask?

    THE NOTE TAKER. Simply phonetics. The science of speech. That's my profession; also my hobby. Happy is the man who can make a living by his hobby! You can spot an Irishman or a Yorkshireman by his brogue. I can place any man within six miles. I can place him within two miles in London. Sometimes within two streets.

    THE FLOWER GIRL. Ought to be ashamed of himself, unmanly coward!

    This was easier in 1912 England than in the English-speaking world of 2008. But still, Bob is right that some human experts are *much* better at this than average human judges are.

  6. Airport shibboleth – mutually occluded said,

    December 15, 2008 @ 2:00 pm

    […] by a recent PC and Pixel cartoon, Mark Liberman over at Language Log gives a quick run-down on the state of 'linguistic profiling' technology. ("As […]

RSS feed for comments on this post