Archive for Computational linguistics

Job trends

At the Revolutions blog, David Smith posts a nice little discussion about growth in jobs where people are making sense of data; he used job search site indeed.com to look at trends in job postings. Apparently postings involving "statistician" are not seeing a lot of growth, but "data scientists" have really started to catch on during the last year or so. (Hat tip to Joe Reisinger for tweeting this. He comments that data scientist is a "truly terrible name, but it's undeniably a different skill set: way too many statisticians can't code".)

Read the rest of this entry »

Comments off

Markov's Heart of Darkness

It seems that the length of Joseph Conrad's paragraphs — unlike the length of zebra finch song bouts — is well approximated by a two-state markov process.

Read the rest of this entry »

Comments (46)

Finch linguistics

Andy Coughlan, "First evidence that birds tweet using grammar", New Scientist 6/26/2011:

They may not have verbs, nouns or past participles, but birds challenge the notion that humans alone have evolved grammatical rules.

Bengal finches have their own versions of such rules – known as syntax – says Kentaro Abe of Kyoto University, Japan. "Songbirds have a spontaneous ability to process syntactic structures in their songs," he says.

To show a sense of syntax in the animals, Abe's team played jumbled "ungrammatical" remixes of finch songs to the birds and measured the response calls.

The basic article is Kentaro Abe & Dai Watanabe, "Songbirds possess the spontaneous ability to discriminate syntactic rules", Nature Neuroscience 6/26/2011. And like the coverage in New Scientist, it's both true and misleading.

Read the rest of this entry »

Comments (15)

Biblical scholarship at the ACL

The 49th Annual Meeting of the Association for Computational Linguistics took place last week in Portland OR, and one of the papers presented there has gotten some (well deserved) press coverage: Moshe Koppel, Navot Akiva, Idan Dershowitz and Nachum Dershowitz, "Unsupervised Decomposition of a Document into Authorial Components", ACL2011.

Well, at least the AP covered it: Matti Friedman, "An Israeli algorithm sheds light on the Bible", AP 6/29/2011 (as usual published under different headlines in various publications, e.g. "Algorithm developed by Israeli scholars sheds light on the Bible’s authorship" (WaPo), "Software deciphers authorship of the Bible" (CathNews), etc.).

Read the rest of this entry »

Comments (21)

Spoken style correction: the iPeeve™

I just had a terrible idea that could probably make someone a modest fortune. I was inspired by Erin Gloria Ryan, "My Love Affair With 'Like'", Jezebel 6/26/2011:

I use the word "like" with embarrassing frequency. I've started paying attention to how other people talk as well, and it's amazing how many women who I know are very smart are similarly infected with like-itis.

Where does this come from? Why do we do this? […]

Since we know that saying "like" too much leads others to negatively judge our intelligence, maybe inserting "like" into a sentence is something that we do to purposefully make ourselves sound less intelligent and forceful and therefore less formidable than we actually are. We're sabotaging ourselves! […]

Maybe women of my generation have been taught, through positive social reinforcement, that we're supposed to pepper our speech with meaningless modifiers that make us sounds a little less sure of ourselves, a little less credible. No one likes a show off or a know-it-all. Better temper your smart-talk with assurance to whoever you're speaking that you're not, like, a threat or anything. Any girl who's been teased for middle school nerdery has likely developed a long standing aversion for the feeling of being excluded for being too smart or opinionated. This is the way that socially acceptable people talk. This is the way that pretty people talk. Women are taught that it's more important to be pretty and socially accepted than it is to be smart. Ergo, like.

Read the rest of this entry »

Comments (52)

Please don't tell me about it

Those who can read German may be interested in some recent work by Gerd Fritz, of the Zentrum für Medien und Interaktivität at the Justus-Liebig-Universitaet Giessen, on "Texttypen im Language Log" ("Text types in Language Log"). Prof. Fritz tells me that this is "a brief summary of a longer paper to be published shortly".

Read the rest of this entry »

Comments (14)

Speech-based lie detection in Russia

Andrew E. Kramer, "Russian A.T.M. With an Ear for the Truth", NYT 6/8/2011:

Russia’s biggest retail bank is testing a machine that the old K.G.B. might have loved, an A.T.M. with a built-in lie detector intended to prevent consumer credit fraud.

Consumers with no previous relationship with the bank could talk to the machine to apply for a credit card, with no human intervention required on the bank’s end.

The machine scans a passport, records fingerprints and takes a three-dimensional scan for facial recognition. And it uses voice-analysis software to help assess whether the person is truthfully answering questions that include “Are you employed?” and “At this moment, do you have any other outstanding loans?”

The voice-analysis system was developed by the Speech Technology Center, a company whose other big clients include the Federal Security Service — the Russian domestic intelligence agency descended from the Soviet K.G.B.

Dmitri V. Dyrmovsky, director of the center’s Moscow offices, said the new system was designed in part by sampling Russian law enforcement databases of recorded voices of people found to be lying during police interrogations.

Read the rest of this entry »

Comments (12)

Remembering 9/11/2001

Like almost everyone else, I was happy to learn that Osama bin Laden is now an ex-terrorist; and I was mildly surprised to learn that he had been holed up in a large and luxurious compound located less than a mile by road from PMA Kakul, Pakistan's equivalent of West Point.

Read the rest of this entry »

Comments (38)

Phonemic diversity decays "out of Africa"?

A striking recent paper by Quentin Atkinson ("Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa", Science 4/15/2011) has been the subject of a lot of discussion recently. Its abstract:

Human genetic and phenotypic diversity declines with distance from Africa, as predicted by a serial founder effect in which successive population bottlenecks during range expansion progressively reduce diversity, underpinning support for an African origin of modern humans. Recent work suggests that a similar founder effect may operate on human culture and language. Here I show that the number of phonemes used in a global sample of 504 languages is also clinal and fits a serial founder–effect model of expansion from an inferred origin in Africa. This result, which is not explained by more recent demographic history, local language diversity, or statistical non-independence within language families, points to parallel mechanisms shaping genetic and linguistic diversity and supports an African origin of modern human languages.

Read the rest of this entry »

Comments (68)

Word-order "universals" are lineage-specific?

This post is the promised short discussion of Michael Dunn, Simon J. Greenhill, Stephen C. Levinson & Russell D. Gray, "Evolved structure of language shows lineage-specific trends in word-order universals", Nature, published online 4/13/2011. [Update: free downloadable copies are available here.] As I noted earlier, I recommend the clear and accessible explanation that Simon Greenhill and Russell Gray have put on the Austronesian Database website in Auckland — in fact, if you haven't read that explanation, you should go do so now, because I'm not going to recapitulate what they did and their reasons for doing it, beyond quoting the conclusion:

These family-specific linkages suggest that language structure is not set by innate features of the cognitive language parser (as suggested by the generativists), or by some over-riding concern to "harmonize" word-order (as suggested by the statistical universalists). Instead language structure evolves by exploring alternative ways to construct coherent language systems. Languages are instead the product of cultural evolution, canalized by the systems that have evolved during diversification, so that future states lie in an evolutionary landscape with channels and basins of attraction that are specific to linguistic lineages.

And I should start by saying that I'm neither a syntactician nor a typologist.  The charitable way to interpret this is that I don't start with any strong prejudices on the subject of syntactic typology. From this unbiased perspective, it seems to me that this paper adds a good idea that has been missing from most traditional work in syntactic typology, but at the same time, it misses two good ideas that have been extensively developed in the related area of historical syntax.

Read the rest of this entry »

Comments (96)

Oice-vay Earch-say

According to the Official Google Research Blog,

As you might know, Google Voice Search is available in more than two dozen languages and dialects, making it easy to perform Google searches just by speaking into your phone.

Today it is our pleasure to announce the launch of Pig Latin Voice Search! […]

To configure Pig Latin Voice Search in your Android phone just go to Settings, select “Voice input & output settings”, and then “Voice recognizer settings”. In the list of languages you’ll see Pig Latin. Just select it and you are ready to roll in the mud!

It also works on iPhone with the Google Search app. In the app, tap the Settings icon, then "Voice Search" and select Pig Latin.

Read the rest of this entry »

Comments (10)

Waseda talker

"This is cool", writes John Coleman — and it is. More later.

Comments (8)

Two Breakfast Experiments™: Literally

A couple of days ago, following up on Sunday's post about literally, Michael Ramscar sent me this fascinating graph:

What this shows us is a remarkably lawful relationship between the frequency of a verb and the probability of its being modified by literally, as revealed by counts from the 410-million-word COCA corpus. (The R2 value means that a verb's frequency accounts for 88% of the variance in  its chances of being modified by literally.)

Read the rest of this entry »

Comments (40)