Language Log

Sirimania

October 14, 2011 @ 5:50 am · Filed by Mark Liberman under Changing times, Computational linguistics, Linguistics in the comics

« previous post | next post »

Yesterday's Doonesbury joins the parade of praise for Siri:

Tthe New York Times has been a non-stop Siri PR factory for the past week or so: Jenna Wortham, "Will Siri Bring Back the iPhone's Wow Factor?", 10/5/2011; Steve Lohr, "Siri and Apple's Future", 10/6/2011; David Pogue, "iPhone 3S Conceals Sheer Magic", 10/12/2011; Nick Bilton, "The iPhone 4S Review Roundup", 10/12/2011; Sam Grobart, "What Should We Ask Siri?", 10/13/2011; Sam Grobart, "Siri, Can You Hear Me?", 10/12/2011; and so on.

Not that other media voices have been negative. Agence France Presse gushes that "Siri give IPhone 4S sass":

The robotic assistant built into Apple's latest iPhone might win your heart, but she won't marry you.

Siri will let you down gently though, explaining in a synthetic female voice that such a union would violate the iPhone 4S end user licensing agreement.

A website devoted to offbeat exchanges with the "intelligent personal assistant" had thousands of followers and was overwhelmed with submissions on the eve of the Friday arrival of the iPhone 4S.

Telling Siri "I want to hide a body" triggered suggestions including reservoirs, swamps, and trash dumps.

Admitting to being drunk met with a list of local taxi companies, while feeling randy resulted in Siri displaying escort services.

As a long-time advocate of speech and language technology, I'm really happy to see all this enthusiasm, although I expect that some of it is as much of an exaggeration in the positive direction as this (for example) was in the negative direction. Speech recognition technology has been pretty good for quite a while now; integrating it with a smartphone's ceontextual knowledge of its user gives the language model a big boost; and then there's the Eliza effect.

As Jenna Wortham wrote,

Voice recognition technology certainly isn’t new, and neither are virtual personal assistants. Siri was available as a stand-alone app before Apple acquired it last year. But the full integration into a phone’s operating system, where the software can start to learn about the daily habits of a user, could help recreate the early wonder of playing with an iPhone. It could even have a trickle-down impact on the entire phone ecosystem — much as the App Store did.

The most interesting commentary so far, in my opinion, is Casey Neistat's little movie, "iPhone's Siri vs. My Human Assistant":

Way back in 1981, Steve Levinson and I wrote an article for Scientific American about speech recognition and speech-mediated human-computer interaction. We noted that it would take a lot of improvement in several technologies before it would be possible for people to interact by voice with a machine as easily, conveniently and successfully as with a human conversational partner. But we also noted that the social implications of even partial success would be profound, citing Norbert Wiener's discussion of automation in his 1950 book The Human Use of Human Beings:

It is the thesis of this book that society can only be understood through a study of the messages and the communication facilities which belong to it; and that in the future development of these messages and communication facilities, messages between man and machines, between machines and man, and between machine and machine, are destined to play an ever-increasing part.

Weiner's imagination was very limited in some ways. He wrote as if vacuum tubes and punched cards were essential elements in the cybernetic transformation of society, so that his vision of the automation of textual communication imagines that "a large part of the outside correspondence [of a business] may be received from the correspondents on punched cards". But these implementation details to the side, I think that he was right about the central role of "messages and communication facilities".

What social impacts will the popularization of speech and language technology have? When people look back from 60 years in our future, smartphone assistants will look as quaint to them as vacuum tubes and punched cards do to us.

October 14, 2011 @ 5:50 am · Filed by Mark Liberman under Changing times, Computational linguistics, Linguistics in the comics

Permalink

10 Comments

Jonathan Badger said,

October 14, 2011 @ 8:44 am

The Doonesbury praise of Siri is amusing in that nearly twenty years ago Trudeau wasn't nearly so kind to Apple's language-recognition technology of the time — probably the first sign that the Apple Newton was going to be a failure was Trudeau's 1993 cartoon where he showed the surreal phrases, such as "Egg Freckles", that the handwriting recognition generated from more meaningful input.
GeorgeW said,

October 14, 2011 @ 9:45 am

So, is Siri a technological breakthrough of any kind? If so, in what way?

[(myl) I haven't had a chance to try the software yet, and I haven't seen any systematic evaluations, so I can't really tell. My tentative conclusion is that it's a marketing breakthrough, not a technological breakthrough; but that doesn't diminish its potential importance.]
Rob said,

October 14, 2011 @ 12:08 pm

Gene Weingarten addresses speech recognition issues, and Google Voice in particular, in his most recent column. http://www.washingtonpost.com/lifestyle/magazine/gene-weingarten-playing-telephone/2011/09/19/gIQAOYL9PL_story.html

[(myl) This is quite strongly at variance with my own Google Voice experience. I don't use it for voice mail, but I do use it extensive for sending a "note to self", which emails the speech-to-text transcription with the recorded audio as an attachment — and I've never had to open the attachment. I also sometimes use it in a Siri-like way for "Navigate to X" or "Call Y"; these commands generally also work. Details of the performance lead me to believe that all of these capabilities are making use of the contents of my Google Voice contacts list, my gmail account, my current GPS locations, etc. (It also worked on the Scots voice-controlled elevator skit…)

But one of the lessons learned long ago by speech recognition researchers is that anecdotes, like advertisements and demos, are almost completely meaningless. If you actually want to know how well a system works, especially in comparison to other systems, you need to design a testing protocol, gather an appropriate sample of testing material, and run the test to get some performance numbers. Everything else is just blather against blather.]
Theodore said,

October 14, 2011 @ 2:57 pm

Off topic, but relevant: How do you pronounce "Siri"? The Wikipedia entry has /ˈsɪəri/ which seems erroneous.
zafrom said,

October 14, 2011 @ 6:47 pm

@Theodore, 2:57 pm
Various YouTube videos have the pronunciation as SEAR-ee, perhaps to reassure us that we aren't the ones being burned.

One short video is at
http://www.youtube.com/watch?v=rNsrl86inpo
which shows us, after the introductory segment of the otherwise-oblivious jogger, what the San Francisco Bay Area looked like at 9:41 AM. It's an area of 6-day weeks (never enough days in a week), and Siri evidently is for all education levels. One segment shows a woman baker in a utensil-filled kitchen asking, "How may cups in 12 ounces?" (Just how useful is a measuring cup nowadays?) Siri responds with a screen full of information, starting with "1.5 cups" and continuing on with conversions for pints, quarts, gallons, deciliters, and milliliters. Aren't you glad you asked.
YM said,

October 15, 2011 @ 10:08 am

I haven't tried Siri, but is it as entertaining as the Subservient chicken?
Keith M Ellis said,

October 15, 2011 @ 2:53 pm

Q: What's your favorite color?

Siri: My favorite color is… well, I don't know how to say it in your language. It's sort of greenish, but with more dimensions.
Dan said,

October 18, 2011 @ 12:18 am

I haven't tried Siri, but I can't help thinking of it as a kind of linguistic uncanny valley. I'm more-or-less used to command-based voice recognition "Say a command…" "Call Mike," but I have more problems trying to give natural language inputs and get back less-than-natural responses.

Is there linguistic precedence for my half-baked theory?

[(myl) I'm afraid I can't do that, Dave.]
Janice Byer said,

October 21, 2011 @ 2:29 pm

In today's Guardian, a braver woman than I speculates on her blog as to why Siri transatlanticly transgenders:

http://www.guardian.co.uk/lifeandstyle/the-womens-blog-with-jane-martinson/2011/oct/21/siri-apple-prejudice-behind-digital-voices
Geophysicist Discovers Modeling Error (in Economics) « Statistical Modeling, Causal Inference, and Social Science said,

October 27, 2011 @ 4:27 pm

[…] articles. I was envious that my boss at Bell Labs, Steve Levinson (along with Mark Liberman, of The Language Log), had written a Scientific American article on speech recognition. The article didn't assume […]

RSS feed for comments on this post

Sirimania

10 Comments

Jonathan Badger said,

GeorgeW said,

Rob said,

Theodore said,

zafrom said,

YM said,

Keith M Ellis said,

Dan said,

Janice Byer said,

Geophysicist Discovers Modeling Error (in Economics) « Statistical Modeling, Causal Inference, and Social Science said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta