Siri can you hear me?
« previous post | next post »
Wired.com has some perfect linguaphile clickbait: “Watch People With Accents Confuse the Hell Out of AI Assistants.” By “accents” they mean, non-American ones (e.g., Irish English). The AI Assistants were Siri, Amazon Echo, and Google Home. I’m curious about how well the voice recognition systems in these devices work with varieties of spoken English, so I clicked. Sucker! Can’t tell anything from the video except that it’s fun to say “Add Worcestershire sauce to my shopping list” to a machine. This definitely beats asking Siri “What is the meaning of life?”
Mainly I was impressed by how poorly I understood the speakers. I have a bad time understanding other people’s accents but that’s only one data point. How well do people understand speech that is in the same language as their own but spoken with a different accent?
Here Siri responds correctly: “I’m sorry, I can’t answer that.”
The research literature on listeners’ processing of accent is huge, but surprisingly little of it focuses on the comprehension of naturalistic speech. Most studies have examined listeners’ responses to individual elements of accented speech: alternative pronunciations of vowels, consonants, or words; phonemic substitutions (what a non-native speaker says in place of a phoneme that doesn’t occur in their language); cross-linguistic differences in phoneme boundaries (e.g., when we were on sabbatical in France our then-8 year old son heard the name of the playground game tag as douche instead of touche); atypical syllabic stress; the other stuff that “sounds different.” The accented speech might be produced by someone from a different region of the US or another country, or by a non-native speaker, under noisy or clear conditions. (Studies are also conducted in other countries and languages, of course.) How well listeners can adapt to such features and learn to produce them are major topics.
But my question is more like the one in the Wired video: how well do people comprehend meaningful sentences, or better, extended discourse spoken with an accent that differs from one’s own?
There’s fun to be had on this site, which has recordings of a single passage spoken in many English accents. This is not a research-quality archive, and whether the speaker is representative of the designated area is unclear. The speaker’s age, race/ethnicity, gender, and education seem to affect intelligibility, as does quality of the recording. Plus, all bets are off once you’ve listened to the passage a few times and can top-down the hard parts. This accent’s hard for me, though maybe I’d adapt to it with sufficient exposure.
There’s an informal comprehension exercise here, which suggests that accent might be an issue, sometimes. But consider just the narrower range of North American variants (other good examples here). Are any of them sufficiently different from each other to affect comprehension?
We (Lynn Perry, Emily Mech, Maryellen MacDonald and I) did one modest study that doesn’t settle anything but raises some interesting questions. The subjects (college students from the Wisconsin area) listened to passages that had been recorded by two speakers. One spoke with a Midwestern accent similar to the subjects’ own speech. The other, a native of southeast Georgia, spoke with a markedly different regional accent. For half the passages subjects performed a shadowing task: they repeated the passages as quickly and accurately as they could. In some famous research from long ago, William Marslen-Wilson showed that even close shadowers (who lag only a syllable or so behind) comprehend as they go along, allowing them to override anomalies embedded in the stimuli (e.g., the word “company” pronounced “compsiny”). For the other half of the passages, our subjects performed a standard comprehension task: listen to the entire passage, then answer questions about it. (See the article for detailed methods and results.)
The shadowing task reflects subjects’ performance as they are listening; the comprehension task reflects how well they understood a passage having heard the whole thing. The question was whether performance would be affected by the similarity of the recordings to the subjects’ own speech.
The main findings were simple: shadowing performance was affected by the familiarity of the accent, whereas performance on the comprehension test was not. Subjects shadowed more slowly and made more errors on the Southern-accented passages, but answered comprehension questions as well as on the Midwestern-accented ones.
This study is clearly limited (we had only one Northern and one Southern speaker; we ran Northern subjects but not Southern ones; comprehension might have been affected if the texts were more difficult, etc. etc.), but it’s a decent opening gambit. Looking at the comprehension results one would conclude that listeners easily coped with speech that was heavily accented (to them), but the shadowing data show that they were having more difficulty keeping up with it.
We think both results are likely to be meaningful. In the limit—mostly middle class, mostly well-educated college students listening to quality recordings of complete passages in a quiet laboratory setting—people can comprehend speech with a markedly unfamiliar accent pretty well. One might then conclude that differences among American accents don’t pose much of a problem. Speech rarely conforms to those laboratory conditions, however. The signal is not as clear as in our experiment and other events compete for attention. The shadowing results suggest that mishearings occur and are more likely with unfamiliar-sounding speech. Asking comprehension questions at the end gives the listener time to recover. The fact that we manage to understand each other pretty well suggests that something similar may occur in many real-world situations—but not always. I am thinking of cases in which the participants are a police officer and a suspect. Or a judge and jury listening to a witness. Or a teacher listening to one of the 30-some children in their noisy classroom. Differences in accent probably do affect comprehension under some conditions, including consequential ones.
Like most research in this area, our study was about accent: the two speakers read aloud identical texts written in standard English. Now take this accented speech and mix in some alternative lexical items, collocations, morphosyntactic features, syntactic structures, and pragmatic conventions, as spoken by an identifiable community of speakers, and you might call the result a dialect. How well do people understand different dialects of English? I’ll take that up in a future post.
Adrian said,
May 22, 2017 @ 1:49 pm
There's an opinion, which I share myself although with only limited evidence, that British people are better at understanding accents than Americans. It's a cliche that when UK shows or movies are shown in the US, many viewers complain that they need subtitles; there don't seem to be many similar complaints in the UK though.
Ralph Hickok said,
May 22, 2017 @ 1:54 pm
I know that Alexa can be trained to better understand the speaker and I assume that Siri and Google Home also have that feature. Presenting them with a speaker whom they haven't heard before is probably not a fair test. It's fun, though :)
leoboiko said,
May 22, 2017 @ 2:31 pm
I've heard that average, dumb speech recognition systems are a good way for non-natives to practice a standard pronunciation (for professional purposes, for example). You have to speak textbook idealized English or else the machine won't understand you…
Mick O said,
May 22, 2017 @ 4:00 pm
@Adrian
I suspect that UK viewers have seen more US-produced filmed entertainment than vice versa, simply based on the amount produced and distributed. Your anecdotal evidence supports an idea that British people have had more practice at understanding a particular subset of non-native accent. Not that they are better at understanding all accents.
:-)
Robert said,
May 22, 2017 @ 4:05 pm
I, a modern RP speaker, found that speech recognition systems made for Americans quite often do not understand me. The odd thing is, if I repeat what I said in a Peter Sellers-type fake American accent, which any American would find fake and horrible sounding, what I say is then recognized immediately.
Jen in Edinburgh said,
May 22, 2017 @ 4:12 pm
The Torry boy is pretty comprehensible to me – enough that the word the transcipt gives as [unclear] is very clearly 'aabody' (everybody). (And a line or two earlier they've got 'put there' for what I think is 'pit ower'/'put over'.)
But then that's not really surprising – there are many accents I'm far less familiar with.
Bob Ladd said,
May 22, 2017 @ 4:15 pm
What Mick O said. This ought to affect the results of experiments like the one Mark describes in the main post: my guess is that speakers from Georgia would have an easier time shadowing midwesterners (or anyone whose accent approaches "General American") than the midwesterners had shadowing Georgian speakers, for exactly the reasons Mick O suggests.
Jen in Edinburgh said,
May 22, 2017 @ 4:38 pm
Mick O./Bob Ladd: But could having regular practice in deciphering more than one accent – or more than your local subset of accents – then make you better at coping with differing accents in general than someone who basically only encounters one? It seems like it might, although not infallibly.
Mark Seidenberg said,
May 22, 2017 @ 4:39 pm
The observations about asymmetries in who understands whom may well be correct. The parallel observation in the US is that Southern speakers are better at understanding Northern speakers than vice versa. We hope to repeat our experiment with Southern speakers; the prediction is that they would do equally well at shadowing speech in both accents, as well as comprehending.
Geoff said,
May 22, 2017 @ 6:17 pm
60yo Australian here. Anecdote: 1972, sitting on a bench at the train station at Thurso, Scotland, eavesdropping furiously on two old railway employees who were chatting a few yards away. Not understanding a word, after a while I said, 'Excuse me – I'm interested in languages – you're speaking Gaelic, are you?' They replied, 'No – English!'
It's interesting how little difference it takes to throw you off, especially with distractions or without conversational context. For example, talking to Indian call centre operators who have extremely competent and fluent English but (for me) the wrong syllable stress patterns.
David Morris said,
May 22, 2017 @ 6:46 pm
@Geoff. A few years ago I was sitting on a Sydney train when I heard two people speaking something which I just couldn't decide was English in an accent I was unfamiliar with, or a closely related language. How many Frisian speakers would be likely to travel to Australia?
Stephen Hart said,
May 22, 2017 @ 7:09 pm
The "shadowing" condition seems similar to the Peace Corps/Department of Defense language instruction style. The learner is trying to repeat some sounds (from a native speaker) without worrying much at first about the words being spoken.
Thorin said,
May 22, 2017 @ 7:26 pm
I'm from Michigan, and have an easier time understanding rural Irish accents than I do understanding a lot of people with a thick Cajun accent.
Travis said,
May 22, 2017 @ 7:27 pm
Of relevance : a joke video about Siri's failures to understand Hawai'i pidgin.
https://youtu.be/7zyplVPJuF4
Bob Crossley said,
May 22, 2017 @ 8:36 pm
As someone who isn't a professional it seems rather odd to me that there has been so little research in this area given that speech recognition is now so widespread, but it's not surprising given my experience of using voice recognition systems. As an English northerner with a very much eroded accent I'm regularly misunderstood by British Telecom's version, which seems to have been tuned to speakers from London and environs. If I posh up or cocknify* my vowels I can get through easily enough. Then, after strangulating my pure vowels to be understood by a machine, I'm usually greeted from a call centre in the North.
*(To my ear Londoners, and most South easterners, are posh or cockney – though I don't doubt many of the cockneys think they speak RP "without an accent").
MD said,
May 22, 2017 @ 10:56 pm
OP:
You might call it that, but I always thought it was a bit silly when North American linguists talked about a "Chicago dialect" that was different from the "Indianapolis dialect." That's a misleading usage to people from other parts of the world (e.g., the Middle East, Scandinavia or Germany). From a larger perspective, those are just slightly different accents that may differ in a few other ways that are barely even worth mentioning. People from those two cities basically don't have any trouble whatsoever understanding one another. Yes, those trivial differences can be fun to talk or joke about sometimes, but they don't cause any serious problems.
That's why I like the terms "accent" or "variety" for English…uh…accents or varieties. In my view, the only varieties of English that come close to approaching dialect status are some of those spoken by black people in the Caribbean area or the US; and maybe also some Scottish varieties like the Torry sample above. Rather than just saying a very basic word, e.g., "from" with a different accent, these people might use an entirely different word that no one outside of their region uses (like "fae", in the case of Scotland); that's more what a dialect is like, in my opinion. Everyone else in countries where English is a native language says something that could be written "from."
Jenny Chu said,
May 23, 2017 @ 5:05 am
It *seems* to me to be the case, but I would have to test it … that people who are more exposed to many different varieties of their native language would have an easier time comprehending even a previously unencountered variety of that language.
Is it true?
Karen said,
May 23, 2017 @ 12:51 pm
It generally takes me one or two sentences to be able to understand someone whose accent is very different from mine – especially if I am not expecting that. I wonder if Siri can adapt?
peter said,
May 23, 2017 @ 3:41 pm
In a diner in Irving, Texas, I once overheard a conversation between two Texan women of a certain age, in which one said to the other:
"You have an accent. But I was born in Dallas, so I don't have an accent."