Archive for Speech technology

Skin motion

Nia Wesley, "Girl's tattoo of late grandma's voicemail can be played with iPhone":

A Chicago singer honored her late grandmother in a unique way.

Sakyrah Morris held on to a voicemail her grandmother sent her just a month before she passed away. She got a tattoo of the voicemail's exact sound waves.

Through technology with a company called Skin Motion, Morris can hold her iPhone camera over the tattoo and hear her grandmother's voice at any moment.

Read the rest of this entry »

Comments (7)

News program presenter meets robot avatar

Yesterday BBC's Radio 4 program "Today", the cultural counterpart of NPR's "Morning Edition", invited into the studio a robot from the University of Sheffield, Mishal Husain and the Mishalbot the Mishalbot, which had been trained to conduct interviews by exposure to the on-air speech of co-presenter Mishal Husain. They let it talk for three minutes with the real Mishal. (video clip here, at least for UK readers; may not be available in the US). Once again I was appalled at the credulity of journalists when confronted with AI. Despite all the evidence that the robot was just parroting Mishalesque phrases, Ms Husain continued with the absurd charade, pretending politely that her robotic alter ego was really conversing. Afterward there was half-serious on-air discussion of the possibility that some day the jobs of the Today program presenters and interviewers might be taken over by robots.

The main thing differentiating the Sheffield robot from Joseph Weizenbaum's ELIZA program of 1966 (apart from a babyish plastic face and movable fingers and eyes, which didn't work well on radio) was that the Mishalbot is voice-driven (with ELIZA you had to type on a terminal). So the main technological development has been in speech recognition engineering. On interaction, the Mishalbot seemed to me to be at sub-ELIZA level. "What do you mean? Can you give an example?" it said repeatedly, at various inappropriate points.

Read the rest of this entry »

Comments off

Voice recognition for inputting

When I'm with my sister Heidi, whether it be in Seattle or northeast Ohio or anywhere else in the world, she's often talking to Siri.  She asks Siri to look up information about trees, about food, about traditional medicines, about Yoga, about genealogy, and anything else she wants to investigate.  Above all, when we're driving around, she asks Siri for directions about how to get where we're going.

To me, who doesn't even own a cell phone, this is all quite miraculous.  A few days ago, at the conclusion of my "Language, Script, and Society in China" class, however, a new (for me) dimension of voice recognition was demonstrated by one of the students.

Read the rest of this entry »

Comments (21)

Awesome / sugoi すごい!

Comments (7)

Siri and flatulence

An acquaintance of mine has a new iPhone, which he carries in a pocket that is (relevantly) below waist level. He has discovered something that dramatically illustrates the difference between (i) responding to speech and (ii) responding to speech as humans do, on the basis of knowing that it is speech.

Read the rest of this entry »

Comments off

Another fake AI failure?

The "silly AI doing something stupidly funny" trope is a powerful one, partly because people like to see the mighty cast down, and partly because the "silly stupid AI" stereotype is often valid.

But as with stereotypes of human groups, the most viral examples are often fakes. Take the "Voice Recognition Elevator"  skit from a few years ago, which showed an ASR system that was baffled by a Scottish accent, epitomizing the plight of Scots trapped in a dehumanized world that doesn't understand them. But back in the real world, I found that when I played the YouTube version of the skit to the 2010 version of Google Voice on my cell phone, it correctly transcribed the whole thing.

And I suspect that the recent viral "tuba-to-text conversion" meme is another artful fraud.

Read the rest of this entry »

Comments (14)

Advances in tuba-to-text conversion

My dad accidentally texted me with voice recognition…while playing the tuba

(h/t Chris Waigl)

[Update: Mark Liberman suggests this might be some artful fakery. See: "Another fake AI failure?"]

Comments (11)

Voice recognition for English and Mandarin typing

In all tech considered (8/24/16), Arrti Shahani has an article titled "Voice Recognition Software Finally Beats Humans At Typing, Study Finds".

Turns out voice recognition software has improved to the point where it is significantly faster and more accurate at producing text on a mobile device than we are at typing on its keyboard. That's according to a new study by Stanford University, the University of Washington and Baidu, the Chinese Internet giant. The study ran tests in English and Mandarin Chinese.

Baidu chief scientist Andrew Ng says this should not feel like defeat. "Humanity was never designed to communicate by using our fingers to poke at a tiny little keyboard on a mobile phone. Speech has always been a much more natural way for humans to communicate with each other," he says.

Read the rest of this entry »

Comments (18)

The swazzle: a simple device for voice modulation

Until two days ago, I had never heard of this word — even though I knew about Punch and Judy shows.

From Wikipedia:

A swazzle is a device made of two strips of metal bound around a cotton tape reed. The device is used to produce the distinctive harsh, rasping voice of Punch and is held in the mouth by the Professor (performer) in a Punch and Judy show.

Swazzle can also be pronounced or spelled Schwazzle or swatchel.

I like the fact that the performer is called "Professor"!

Read the rest of this entry »

Comments (17)

Swype and Voice Recognition for mobile device inputting

In late 2012, while visiting my son Tom in Dallas, I noticed that he was doing something very odd with his cell phone.  Most people enter text into their cell phone by pressing their thumbs (or their fingertip) on the letters of a small keyboard, whether virtual or actual.  But Tom was doing something altogether different:  he was sliding his finger over the glass surface of his phone and somehow, by so doing, he was able to enter text.  I was dumbfounded!  What amazed me most of all was how casual he was about it.  He'd be talking to me about something, then glance down at his cell phone, move his fingertip around on the glass, and — presto digito! — he'd have typed a message to someone and sent it off.

Read the rest of this entry »

Comments (42)

Love <–> hate

Baidu ("the Chinese Google") is a popular search engine in China.  The web services company (registered in the Cayman Islands) and its name are discussed in "Soon to be lost in translation," which I posted a little over a year ago.

Now Baidu has launched a new machine translation service.  A friend of mine in China impishly suggested that I give Baidu Fanyi a whirl by typing in 我恨中国.  Language Log readers are invited to try it themselves and see what they get.

Read the rest of this entry »

Comments (21)

A synthetic singing president?

A couple of days ago, Gary Marcus told me about the Beatles Complete on Ukulele project, and introduced me to its creator, David Barratt.

Gary got involved because he's working on a book about "learning to become musical at the age of 40", and so he's joining a roster of performers that includes the Fort Greene Childrens Choir (Age 7 and Under Section), Samantha Fox, and many others (82 so far), recording voice-and-ukulele versions of all 185 songs in the Beatles catalog. Gary is of course singing With a Little Help from My Friends (because, he explains, "otherwise I couldn't carry a tune in a bucket"), and his contribution is scheduled to be released on July 19, 2011.

So how does Language Log come into this? Well, David wants to recruit Barack Obama to sing Let it Be, and Gary thought that I could help. In turn, I believe that YOU can help.

Read the rest of this entry »

Comments (11)

Death or birth?

The most recent IEEE Signal Processing Society Newsletter has an interesting article by David Suendermann, "Speech scientists are dead. Interaction designers are dead. Who is next?".

His argument is that "Commercial spoken dialog systems can process millions of calls per week", and therefore "one can implement a variety of changes at different points in the application and randomly choose one competitor every time the point is hit in the course of a call", using techniques like reinforcement learning to adaptively optimize the design. As a result, "the contender approach can change the life of interaction designers and speech scientists in that best practices and experience-based decisions can be replaced by straight-forward implementation of every alternative one can think of".

Read the rest of this entry »

Comments (17)