Archive for Computational linguistics

Talking to the TV

Farhad Manjoo, "Apple Doesn’t Need To Make the TV of the Future: The revolution is already here—and it’s called the Xbox", Slate 3/27/2012.

If the rumors are true, Apple will release a television set later this year that it will tout as the most amazing boob tube ever invented.

The biggest selling point will be Apple’s promise to make navigating our viewing choices easier. Say you want to watch Tower Heist on a Saturday night. You’d first check Netflix, because if it’s there, it’ll be streamed free for members. If it’s not, and if you subscribe to Amazon’s Prime service, you ought to check there, because you might get a discount. If that fails, you’ll look for the movie on iTunes, Hulu Plus, or Comcast in whatever order is most convenient for you. The whole process is a frustrating mess, one that Apple will likely try to solve by building a cross-platform search engine into its TV. Instead of going to every service separately, you’ll just say, “Hey TV, I’d like to watch Tower Heist!” and the screen will show you where the flick is playing, and for how much. You’ll just have to choose one and press Play.

When CEO Tim Cook shows off Apple’s TV set this fall, I bet he’ll call voice-activated universal search a revolutionary way to interact with your television. What Cook probably won’t mention is that it already exists. Indeed, much of what Apple is likely to build into its TV is available today on a gadget whose interface is just as easy to use as anything Apple will cook up. The device is called the Xbox 360.

Over the last few months, Microsoft has turned its video-game console into your TV’s best friend.

Read the rest of this entry »

Comments (32)

The birth and death of typos

Alexander M. Petersen, Joel Tenenbaum, Shlomo Havlin, and H. Eugene Stanley, "Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death" (appearing in Scientific Reports, 3/15/2012):

We analyze the dynamic properties of 10^7 words recorded in English, Spanish and Hebrew over the period 1800–2008 in order to gain insight into the coevolution of language and culture. We report language independent patterns useful as benchmarks for theoretical models of language evolution. A significantly decreasing (increasing) trend in the birth (death) rate of words indicates a recent shift in the selection laws governing word use. For new words, we observe a peak in the growth-rate fluctuations around 40 years after introduction, consistent with the typical entry time into standard dictionaries and the human generational timescale. Pronounced changes in the dynamics of language during periods of war shows that word correlations, occurring across time and between words, are largely influenced by coevolutionary social, technological, and political factors. We quantify cultural memory by analyzing the long-term correlations in the use of individual words using detrended fluctuation analysis.

Read the rest of this entry »

Comments (6)

The QWERTY effect

Rebecca Rosen, "The QWERTY Effect: The Keyboards Are Changing Our Language!", The Atlantic:

It's long been thought that how a word sounds — it's very phonemes — can be related in some ways to what that word means. But language is no longer solely oral. Much of our word production happens not in our throats and mouths but on our keyboards. Could that process shape a word's meaning as well?

That's the contention of an intriguing new paper by linguists Kyle Jasmin and Daniel Casasanto. They argue that because of the QWERTY keyboard's asymmetrical shape (more letters on the left than the right), words dominated by right-side letters "acquire more positive valences" — that is to say, they become more likable. Their argument is that because its easier for your fingers to find the correct letters for typing right-side dominated words, the words subtly gain favor in your mind.

There's a lot of media uptake for this work: Rachel Zimmerman, "Typing and the meaning of words", Common Health; "QWERTY Keyboard Leads to Feelings about Words", Scientific American; Rob Waugh, "Why just typing 'LOL' makes you happy: People like words made of letters from the right-hand side of the QWERTY keyboard", Daily Mail; Alasdair Williams, "The 'QWERTY Effect' is changing what words mean to us", io9; "The right type of words", e! Science News; Dave Mosher "The QWERTY Effect: How Typing May Shape the Meaning of Words", Wired News; Rebecca Rosen "The QWERTY Effect: The Keyboards Are Changing Our Language", The Atlantic, etc.

Read the rest of this entry »

Comments (60)

"The victims are large and costly machines"

From Shay Cohen, via Lyle Ungar, some output from a PCFG ("probabilistic context-free grammar") trained on the Wall Street Journal part of the Penn Treebank:

A manager is a better value of well-polished desks .

I have been able to force to be more receptive to therapy , and to keep the committee informed , usually in advance , of covert actions : ; the victims are large and costly machines .

The purchase of all women is in September .

Their museum had been dumping their securities for comment .

It can remember one million truly inspiring teachers from Rainbow Technologies .

Read the rest of this entry »

Comments (18)

SpeechJammer

Kazutaka Kurihara & Koji Tsukada, "SpeechJammer: A System Utilizing Artificial Speech Disturbance with Delayed Auditory Feedback", arXiv:1202.6106v1 [cs.HC], 2/28/2012:

In this paper we report on a system, "SpeechJammer", which can be used to disturb people's speech. In general, human speech is jammed by giving back to the speakers their own utterances at a delay of a few hundred milliseconds. This effect can disturb people without any physical discomfort, and disappears immediately by stop speaking. Furthermore, this effect does not involve anyone but the speaker. We utilize this phenomenon and implemented two prototype versions by combining a direction-sensitive microphone and a direction-sensitive speaker, enabling the speech of a specific person to be disturbed. We discuss practical application scenarios of the system, such as facilitating and controlling discussions. Finally, we argue what system parameters should be examined in detail in future formal studies based on the lessons learned from our preliminary study.

Read the rest of this entry »

Comments (11)

Distances among genres and authors

Jon Gertner, "True Innovation", NYT 2/25/2012

At Bell Labs, the man most responsible for the culture of creativity was Mervin Kelly. […] In 1950, he traveled around Europe, delivering a presentation that explained to audiences how his laboratory worked.

His fundamental belief was that an “institute of creative technology” like his own needed a “critical mass” of talented people to foster a busy exchange of ideas. But innovation required much more than that. Mr. Kelly was convinced that physical proximity was everything; phone calls alone wouldn’t do. Quite intentionally, Bell Labs housed thinkers and doers under one roof. Purposefully mixed together on the transistor project were physicists, metallurgists and electrical engineers; side by side were specialists in theory, experimentation and manufacturing. Like an able concert hall conductor, he sought a harmony, and sometimes a tension, between scientific disciplines; between researchers and developers; and between soloists and groups.

One element of his approach was architectural. He personally helped design a building in Murray Hill, N.J., opened in 1941, where everyone would interact with one another. Some of the hallways in the building were designed to be so long that to look down their length was to see the end disappear at a vanishing point. Traveling the hall’s length without encountering a number of acquaintances, problems, diversions and ideas was almost impossible. A physicist on his way to lunch in the cafeteria was like a magnet rolling past iron filings.

I started work at Murray Hill in 1975, nine years after someone staged that picture of white lab coats extending to the vanishing point. And even though my first office was in an unused chemistry lab, I don't recall ever seeing more than an occasional pragmatic lab coat —  whoever staged the photograph was apparently using the same lab-coat=scientist iconography as a couple of generations of cartoonists and movie-makers.  But I can certainly attest to the  value of hallway and lunchroom serendipity.

These days, some of the same serendipitous conversational cross-fertilization comes from random encounters in the corridors and cafeterias of the internet.

Read the rest of this entry »

Comments (15)

Cultural diffusion and the Whorfian hypothesis

Geoff Pullum summarizes Keith Chen's view of "The Effect of Language on Economic Behavior" as follows ("Keith Chen, Whorfian economist", 2/9/2012):

Chen […] thinks that if your language has clear grammatical future tense marking […], then you and your fellow native speakers have a dramatically increased likelihood of exhibiting high rates of obesity, smoking, drinking, debt, and poor pension provision. And conversely, if your language uses present-tense forms to express future time reference […], you and your fellow speakers are strikingly more likely to have good financial planning for retirement and sensible health habits. It is as if grammatical marking of the difference between the present and the future insulates you from seeing that the two are coterminous so you should plan ahead. Using present-tense forms for future time reference, on the other hand, encourages you to see that the future is just more of the present, and thus encourages you to put money in a 401(k).

Geoff notes that "Chen's evidence on the lifestyle indicators comes from massive amounts of hard data, and his mathematical analysis is serious". But in addition to expressing some qualms about the linguistic data, Geoff worries that the large number of linguistic traits and the large number of lifestyle and other cultural traits might give rise to spurious connections:

I also worry that it is too easy to find correlations of this kind, and we don't have any idea just how easy until a concerted effort has been made to show that the spurious ones are not supportable. For example, if we took "has (vs. does not have) pharyngeal consonants", or "uses (vs. does not use) close front rounded vowels", would we find correlations there too?

I have similar concerns; but I believe that I can explain and justify my worries without looking at any real data at all. There are two qualitative facts about the world that make it especially easy to fool ourselves about quantitative connections of this kind.

Read the rest of this entry »

Comments (18)

Automatic measurement of media bias

Mediate Metrics ("Objectively Measuring Media Bias") explains that

Based in Wheaton, IL, Mediate Metrics LLC is a privately held start-up founded by technology veteran and entrepreneur Barry Hardek. Our goal is to cultivate knowledgeable consumers of political news by objectively measuring media “slant” — news which contains either an embedded statements of bias (opinion) or an elements of editorial influence (factual content that reflects positively or negatively on U.S. political parties).

Mediate Metrics’ core technology is based on a custom machine classifier designed specifically for this application, and developed based on social science best practices with recognized leaders in the field of text analysis. Today,  text mining systems are primarily used as general purpose marketing tools for extracting insights from platforms such as like Twitter and Facebook, or from other large electronic databases. In contrast, the Mediate Metrics classifier was specifically devised to identify statements of bias (opinions) and influence (facts that reflects positively or negatively) on U.S. political parties from news program transcripts.

(The links to Wikipedia articles on "social science" and "text mining" are original to their page.)

Read the rest of this entry »

Comments (14)

The "dance of the p's and b's": truth or noise?

Stanley Fish asks  ("Mind Your P’s and B’s: The Digital Humanities and Interpretation", NYT 1/23/2011):

[H]ow do the technologies wielded by digital humanities practitioners either facilitate the work of the humanities, as it has been traditionally understood, or bring about an entirely new conception of what work in the humanities can and should be?

After a couple of lengthy detours, he concludes that neither any facilitation nor any worthwhile new conception is likely: the digital humanities

… will have little place for the likes of me and for the kind of criticism I practice: a criticism that narrows meaning to the significances designed by an author, a criticism that generalizes from a text as small as half a line, a criticism that insists on the distinction between the true and the false, between what is relevant and what is noise, between what is serious and what is mere play.

In other words, he agrees with Noam Chomsky that statistical analysis of the natural (or textual) world is intellectually empty — though I suspect that they agree on little else.

Read the rest of this entry »

Comments (39)

#CompuPolitics

A couple of months ago, I pointed out that entertainment industry folks are tracking Justin Bieber's popularity using automated sentiment analysis, and I used that as a leaping-off point for some comments about language technology and social media. Here I am again, but suddenly it's not just Justin's bank account we're talking about, it's the future of the country.

As the Republican primary season marches along, a novel use of technology in politics is evolving even more rapidly, and arguably in a more interesting way, than the race itself: the analysis of social media to take the pulse of public opinion about candidates. In addition to simply tracking mentions about political candidates, people are starting to suggest that volume and sentiment analysis on tweets (and other social media, but Twitter is the poster child here) might produce useful information about people's viewpoints, or even predict the success of political campaigns. Indeed, it's been suggested that numbers derived from Twitter traffic might be better than polls, or at least better than pundits. (Is that much of a bar to set? Never mind.)

Read the rest of this entry »

Comments (1)

Sexual accommodation

You've probably noticed that how people talk depends on who they're talking with. And for 40 years or so, linguists and psychologists and sociologists have referred to this process as "speech accommodation" or "communication accommodation" — or, for short, just plain "accommodation".  This morning's Breakfast Experiment™  explores a version of the speech accommodation effect as applied to groups rather than individuals — some ways that men and women talk differently in same-sex vs. mixed-sex conversations.

Read the rest of this entry »

Comments (13)

Logic! Language! Information! Scholarships!

’Tis the season to announce seasonal schools. Geoff Pullum announced a short course on grammar for language technologists as part of a winter school in Tarragona next month, and Mark Liberman announced a call for course proposals for the LSA's Linguistic Institute in summer 2013. But what if you can't make it to Tarragona next month, and can't wait a year and a half to get your seasonal school fix? Well, I have just the school for you!

Read the rest of this entry »

Comments (1)

Linguistic Deception Detection: Part 1

In "Reputable linguistic "lie detection"?", 12/5/2011, I promised to scrutinize some of the research on linguistic deception detection, focusing especially on the work cited in Anne Eisenberg's 12/3/2011 NYT article "Software that listens for lies".  This post is a first installment, looking at the work of David Larcker and Anastasia Zakolyukina ("Detecting Deceptive Discussions in Corporate Conference Calls", Rock Center for Corporate Governance, Working Paper No. 83, July 2010).

[Update: as of 6/5/2019, the working papers version no longer exists, but a version under the same title was published in the Journal of Accounting Research in 2012.]

Read the rest of this entry »

Comments (4)