Archive for Computational linguistics
Sanette Tanaka, "Fancy Real-Estate Listing, Fancier Verbiage", WSJ 6/6/2013:
Savvy real-estate agents know it's not just what you say. It's how long it takes you to say it.
More-expensive homes go hand-in-hand with longer real-estate agents' remarks—the language written by the agent that supplements the house description and photos in a listing. Agents use a median 250 characters for homes listed under $100,000, according to an analysis for The Wall Street Journal by real-estate listings company Zillow. For homes priced over $1 million, they go nearly twice as long, with a median 487 characters. (That's about the length of this paragraph.)
"Generally, what you find is that regardless of the region, the more expensive the home is, the more characters are used to describe that home," says Stan Humphries, chief economist at Zillow.
That's not from the chorus of a postmodern country song — it's the title of a National Geographic piece discussing Morgan R. Frank, Kameron Decker Harris, Peter Sheridan Dodds, and Christopher M. Danforth, "The Geography of Happiness: Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place", PLoS ONE 5/29/2013.
David Brooks has found a congenial story in Google ngrams — or rather, in three papers about ngrammatical history, which he interprets to show that virtue, discipline, and concern for the common good have been declining, while subjectivity and concern for self-esteem have increased ("What Our Words Tell Us", NYT 5/20/2013)).
Brooks doesn't cite or link to the papers, which in my opinion is a form of journalistic malpractice, so here they are:
Jean M. Twenge, W. Keith Campbell, and Brittany Gentile, "Increases in Individualistic Words and Phrases in American Books, 1960–2008", PLoS One 7/10/2012
Pelin Kesebir and Selin Kesebir, "The Cultural Salience of Moral Character and Virtue Declined in Twentieth Century America", Journal of Positive Psychology, Forthcoming
Daniel B. Klein, "Ngrams of the Great Transformations", GMU Working Paper in Economics, 2013
From Simon King:
I am pleased to announce that the English section of this year's Blizzard Challenge listening test is now live. Please help us out by taking part, and encouraging your colleagues, students, friends, contacts, etc. to take part too. It's your chance to hear a range of speech synthesisers, including some really good ones. Please circulate this message widely – for example, on mailing lists, forums and using social media – we need to reach as many people as possible in the coming month or so.
Yesterday afternoon, I got this interesting email message:
The departure time for US Airways flight # 3314, from Detroit to Philadelphia on May 11 at 6:05 PM has changed. The flight is delayed due to air traffic at the destination airport. Your estimated time of Departure is 6:05 PM.
Once you've written down your responses to the dozen audio clips in yesterday's perception experiment, you can check them against the truth, and also against the transcripts generated by Google's automatic captioning system, both given below.
Here are a dozen short audio clips from a lecture, stripped from YouTube, and re-encoded after editing as mp3 files. Despite being handicapped by this marginal sound quality, and even more by the lack of context, you will probably be able to transcribe them fairly well. Please do so, and retain your results for discussion tomorrow morning (where "tomorrow" = Wednesday 5/8/2013). Read the rest of this entry »
Read the rest of this entry »
"Once Under Wraps, Supreme Court Audio Trove Now Online", NPR All Things Considered 4/24/2013:
The court has been releasing audio during the same week as arguments only since 2010. Before that, audio from one term generally wasn't available until the beginning of the next term. But the court has been recording its arguments for nearly 60 years, at first only for the use of the justices and their law clerks, and eventually also for researchers at the National Archives, who could hear — but couldn't duplicate — the tapes. As a result, until the 1990s, few in the public had ever heard recordings of the justices at work.
But as of just a few weeks ago, all of the archived historical audio — which dates back to 1955 — has been digitized, and almost all of those cases can now be heard and explored at an online archive called the Oyez Project.
We've often had occasion to wonder how spammy blog comments are linguistically constructed. (See, most recently, Mark Liberman's post, "Numerous upon the written content material," in which he refers to spam comments as "aleatoric sub-poetry.") Now, on Quartz, David Yanofsky and Zachary M. Seward expose how spam comments are engineered:
Comment spam follows a formula, which was made plain the other day when a spambot accidentally posted its entire template on the blog of programmer Scott Hanselman. With his permission, we’ve reproduced some of the spam comment recipes here and added colorful formatting to make it readable. The spambot constructs new, vaguely unique comments by selecting from each set of options. We hope you find it wonderful | terrific | brilliant | amazing | great | excellent | fantastic | outstanding | superb.
Another fragment of aleatoric sub-poetry, from the 5,036,601 spam comments that Akismet has caught since we installed it:
I image this might be numerous upon the written content material? nevertheless I nonetheless believe that it may be suitable for just about any type of topic material, because it could frequently be pleasant to resolve a warm and delightful face or possibly listen a voice whilst initial landing.
George Orwell, in his hugely overrated essay "Politics and the English language", famously insists you should "Never use a metaphor, simile, or other figure of speech which you are used to seeing in print." He thinks modern writing "consists in gumming together long strips of words which have already been set in order by someone else" (only he doesn't mean "long") — joining togther "ready-made phrases" instead of thinking out what to say. His hope is that one can occasionally, "if one jeers loudly enough, send some worn-out and useless phrase … into the dustbin, where it belongs." That is, one can eliminate some popular phrase from the language by mocking it out of existence. In effect, he wants us to collaborate in getting rid of the most widely-used phrases in the language. In a Lingua Franca post published today I called his program elimination of the fittest (tongue in cheek, of course: the proposal is actually just to depopularize the most popular).
For a while, after I began thinking about this, I wondered what would be the ultimate fate of a language in which this policy was consistently and iteratively implemented. I even spoke to a distinguished theoretical computer scientist about how one might represent the problem mathematically. But eventually I realized it was really quite simple; at least in a simplified ideal case, I knew what would happen, and I could do the proof myself.
Read the rest of this entry »
Read the rest of this entry »
The village of Akazu’yw lies in the rainforest, a day’s drive from the state capital of Belém, deep in the Brazilian Amazon. Last week I traveled there, carrying a dozen Android phones with a specialized app for recording speech. It wasn't all plain sailing…
Read the full story here.
Steven Bird, "Cyberlinguistics: recording the world's vanishing voices", 3/11/2013:
Of the 7,000 languages spoken on the planet, Tembé is at the small end with just 150 speakers left. In a few days, I will head into the Brazilian Amazon to record Tembé – via specially-designed technology – for posterity. Welcome to the world of cyberlinguistics.
Our new Android app Aikuma is still in the prototype stage. But it will dramatically speed up the process of collecting and preserving oral literature from endangered languages, if last year’s field trip to Papua New Guinea is anything to go by.
Read the whole thing.
Alex Williams, "Creating Hipsturbia", NYT 2/15/2013:
“When we checked towns out,” Ms. Miziolek recalled, “I saw some moms out in Hastings with their kids with tattoos. A little glimmer of Williamsburg!”