Language Log

Archive for Computational linguistics

Textual narcissism

July 13, 2012 @ 8:10 am· Filed by Mark Liberman under Computational linguistics, Language and culture

Tyler Cowen, "I wonder if this is actually true", Marginal Revolution 7/12/2012.

Researchers who have scanned books published over the past 50 years report an increasing use of words and phrases that reflect an ethos of self-absorption and self-satisfaction.

"Language in American books has become increasingly focused on the self and uniqueness in the decades since 1960,” a research team led by San Diego State University psychologist Jean Twenge writes in the online journal PLoS One. “We believe these data provide further evidence that American culture has become increasingly focused on individualistic concerns.”

Their results are consistent with those of a 2011 study which found that lyrics of best-selling pop songs have grown increasingly narcissistic since 1980. Twenge’s study encompasses a longer period of time—1960 through 2008—and a much larger set of data.

That 2011 study was not very convincing — for details, see "Lyrical Narcissism?", 4/9/2011; "'Vampirical' hypotheses", 4/28/2011; "Pop-culture narcissism again", 4/30/2011; "Let me count the ways", 6/9/2011.

On the face of it, however, the new study (Jean M. Twenge, W. Keith Campbell, and Brittany Gentile, "Increases in Individualistic Words and Phrases in American Books, 1960–2008", PLoS One 7/10/2012) looks more plausible. But I thought that for this morning's Breakfast Experiment™ I'd take a closer look. And what I found diverges pretty seriously from the conclusions of the cited paper.

Read the rest of this entry »

Permalink Comments (22)

Not raising hogs

June 19, 2012 @ 5:55 pm· Filed by David Beaver under Computational linguistics, Humor, Semantics

Following on from Barbara Partee's example of Kruschev not banging his shoe, I just came across a great example of chained hypothetical negative events. It was during Bonnie Webber's plenary address here in Austin yesterday, at the NASSLLI Summer School. (BTW, if you'll be in the Austin area on Saturday, I have an announcement for you: NASSLLI is hosting a big event commemorating the centenary of Turing's birth, and it's free and open to the public.) But without more ado, here's the "Not raising hogs" text, a good Texas story of how to get something from nothing:

THE NOT RAISING HOGS BUSINESS

To: Mr. Clayton Yeutter
Secretary of Agriculture
Washington, D.C.

Dear Sir,
My friends, Wayne and Janelle, over at Wichita Falls, Texas, received a check the other day for $1,000 from the government for not raising hogs. So, I want to go into the "not raising hogs" business myself next year.

Read the rest of this entry »

Permalink Comments (24)

Your typical sentence

June 13, 2012 @ 11:03 am· Filed by Mark Liberman under Computational linguistics

Today's xkcd:

Mouseover title: Although the Markov chain-style text model is still rudimentary; it recently gave me "Massachusetts Institute of America". Although I have to admit it sounds prestigious.

Read the rest of this entry »

Permalink Comments (11)

Big Inaccessible Data

June 4, 2012 @ 12:25 pm· Filed by Mark Liberman under Computational linguistics, Language and the law

John Markoff, "Troves of Personal Data, Forbidden to Researchers", NYT 5/21/2012:

When scientists publish their research, they also make the underlying data available so the results can be verified by other scientists.

(I wish this were generally true…)

At least that is how the system is supposed to work. But lately social scientists have come up against an exception that is, true to its name, huge.

It is “big data,” the vast sets of information gathered by researchers at companies like Facebook, Google and Microsoft from patterns of cellphone calls, text messages and Internet clicks by millions of users around the world. Companies often refuse to make such information public, sometimes for competitive reasons and sometimes to protect customers’ privacy. But to many scientists, the practice is an invitation to bad science, secrecy and even potential fraud.

For those who don't care much about science, and oppose data publication on the basis of some combination of beliefs in corporate secrecy, personal privacy, and researchers' "sweat equity", here's a stronger argument: lack of broad access to representative data is also a recipe for bad engineering. Or rather, it's a recipe for slow to non-existent development of workable solutions to the the technical problems of turning recorded data into useful information.

At the recent DataEDGE workwhop in Berkeley, as well as at the recent LREC 2012 conference in Istanbul, I was unpleasantly surprised by the widespread lack of awareness of this (in my opinion evident) fact.

Read the rest of this entry »

Permalink Comments (7)

Big Data in the humanities and social sciences

May 31, 2012 @ 10:31 am· Filed by Mark Liberman under Changing times, Computational linguistics

I'm in Berkeley for the DataEDGE Conference, where I'm due to participate in a "living room chat" advertised as follows:

Size Matters: Big Data, New Vistas in the Humanities and Social Sciences
Mark Liberman, Geoffrey Nunberg, Matthew Salganik
Vast archives of digital text, speech, and video, along with new analysis technology and inexpensive computation, are the modern equivalent of the 17th-century invention of the telescope and microscope. We can now observe social and linguistic patterns in space, time, and cultural context, on a scale many orders of magnitude greater than in the recent past, and in much greater detail than before. This transforms not just the study of speech, language, and communication but fields ranging from sociology and empirical economics to education, history, and medicine — with major implications for both scholarship and technology development.

Read the rest of this entry »

Permalink Comments (22)

Help Wanted: Sharing Data for Research on Reading and Writing

May 18, 2012 @ 8:11 am· Filed by Mark Liberman under Computational linguistics, Language and education, Psychology of language

On Friday, July 20, at the 2012 meeting of the Council of Writing Program Administrators in Albuquerque NM, there will be a session called "Help Wanted: Sharing Data for Research on Reading and Writing". Here's the proposal that was submitted for this session:

Read the rest of this entry »

Permalink Comments (5)

Blizzard Challenge 2012

May 16, 2012 @ 8:33 pm· Filed by Mark Liberman under Computational linguistics

Every year since 2005, speech synthesis researchers have organized a yearly Blizzard Challenge, "[i]n order to better understand and compare research techniques in building corpus-based speech synthesizers". Part of the research effort involves the general public, who are invited to perform a series of evaluations of the results.

Participation takes about one hour in total — but your participation is registered, so that you can leave at any point, and then return and take the evaluation up again at the point where you left off. If you're willing, please follow this link to enroll and participate.

Read the rest of this entry »

Permalink Comments (6)

Names in the Frequency Domain

May 5, 2012 @ 12:53 pm· Filed by Mark Liberman under Computational linguistics

Yesterday evening at dinner, some members of the LSA Publications Committee were idly discussing the changes over time in fashions for given names. It's obvious that things change — but it's less obvious whether these changes are cyclic. It makes sense that out-of-fashion names might come back after a generation or two — but does this really happen on a regular basis?

Read the rest of this entry »

Permalink Comments (51)

Hyperbolic lots

May 2, 2012 @ 12:34 am· Filed by Ben Zimmer under Computational linguistics, Language and technology

For the past couple of years, Google has provided automatic captioning for all YouTube videos, using a speech-recognition system similar to the one that creates transcriptions for Google Voice messages. It's certainly a boon to the deaf and hearing-impaired. But as with Google's other ventures in natural language processing (notably Google Translate), this is imperfect technology that is gradually becoming less imperfect over time. In the meantime, however, the imperfections can be quite entertaining.

Read the rest of this entry »

Permalink Comments (9)

The quality of quantity

April 24, 2012 @ 1:45 pm· Filed by Mark Liberman under Computational linguistics

The longer it is, the higher the rating:

Read the rest of this entry »

Permalink Comments (26)

Watson v. Watson

April 21, 2012 @ 8:32 am· Filed by Mark Liberman under Computational linguistics

As Wikipedia explains,

Watson is an artificial intelligence computer system capable of answering questions posed in natural language,[2] developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first president, Thomas J. Watson.

But as a page at AT&T Labs Research tells us,

AT&T WATSON^SM is AT&T's speech and language engine that integrates a variety of speech technologies, including network-based, speaker-independent automatic speech recognition (ASR), AT&T Labs Natural Voices® text-to-speech conversion, natural language understanding (which includes machine learning), and dialog management tasks.

WATSON has been used within AT&T for IVR customers, including AT&T's VoiceTone® service, for over 20 years during which time the algorithms, tools, and plug-in architecture have been refined to increase accuracy, convenience, and integration. Besides customer care IVR, AT&T WATSON^SM has been used for speech analytics, mobile voice search of multimedia data, video search, voice remote, voice mail to text, web search, and SMS.

Read the rest of this entry »

Permalink Comments (10)

Pulling out (the words whose distribution is most similar to that of) a plum

April 17, 2012 @ 5:04 am· Filed by Mark Liberman under Computational linguistics

A few days ago ("Evaluative words for wines", 4/7/2012), I illustrated how a trivial method can help us uncover the contribution of individual words to the expression of opinion in text. For this morning's Breakfast Experiment™, I'll illustrate an equally trivial approach to learning how words fit together structurally, using the same small collection of 20,888 wine reviews.

Read the rest of this entry »

Permalink Comments (6)

Evaluative words for wines

April 7, 2012 @ 7:48 am· Filed by Mark Liberman under Computational linguistics, Language and culture

There are two basic reasons for the increased interest in "text analytics" and "sentiment analysis": In the first place, there's more and more data available to analyze; and second, the basic techniques are pretty easy.

This is not to deny the benefits of sophisticated statistical and text-processing methods. But algorithmic sophistication adds value to simple-minded baselines that are often pretty good to start with. In particular, simple "bag of words" techniques can be surprisingly effective. I'll illustrate this point with a simple Breakfast Experiment™.

Read the rest of this entry »

Permalink Comments (14)

« Previous Page — « Previous Entries

Next Entries » — Next Page »

Archive for Computational linguistics

Textual narcissism

Not raising hogs

Your typical sentence

Big Inaccessible Data

Big Data in the humanities and social sciences

Help Wanted: Sharing Data for Research on Reading and Writing

Blizzard Challenge 2012

Names in the Frequency Domain

Hyperbolic lots

The quality of quantity

Watson v. Watson

Pulling out (the words whose distribution is most similar to that of) a plum

Evaluative words for wines

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta