Language Log

Phonological processing and speaker identification

August 6, 2011 @ 9:20 am · Filed by Mark Liberman under Psychology of language

There's been a fair amount of media interest in a recent study suggesting that dyslexics are worse than controls at (certain kinds of) speaker recognition. This is an interesting study in itself, which is why it made it into Science. But I'm just as interested in its uptake in the popular press, which mostly ranged from "missing the point" to "catastrophic confusion" (and you may not be surprised to learn where on the spectrum the BBC's coverage landed, alas). I'll discuss the study itself here, and then take up the press coverage in another post.

The work in question is Tyler K. Perrachione, Stephanie N. Del Tufo, & John D. E. Gabrieli, "Human Voice Recognition Depends on Language Ability", Science 7/29/2011:

The ability to recognize people by their voice is an important social behavior. Individuals differ in how they pronounce words, and listeners may take advantage of language-specific knowledge of speech phonology to facilitate recognizing voices. Impaired phonological processing is characteristic of dyslexia and thought to be a basis for difficulty in learning to read. We tested voice-recognition abilities of dyslexic and control listeners for voices speaking listeners’ native language or an unfamiliar language. Individuals with dyslexia exhibited impaired voice-recognition abilities compared with controls only for voices speaking their native language. These results demonstrate the importance of linguistic representations for voice recognition. Humans appear to identify voices by making comparisons between talkers’ pronunciations of words and listeners’ stored abstract representations of the sounds in those words.

In interpreting this work, we should start with some of the implicit background. The current scientific consensus on dyslexia is clearly expressed in the National Institute of Health's PubMed Health page:

Developmental reading disorder (DRD), or dyslexia, occurs when there is a problem in areas of the brain that help interpret language. It is not caused by vision problems. The disorder is a specific information processing problem that does not interfere with one's ability to think or to understand complex ideas. Most people with DRD have normal intelligence, and many have above-average intelligence. […]

A person with DRD may have trouble rhyming and separating sounds that make up spoken words. These abilities appear to be critical in the process of learning to read. A child's initial reading skills are based on word recognition, which involves being able to separate out the sounds in words and match them with letters and groups of letters.

Because people with DRD have difficulty connecting the sounds of language to the letters of words, they may have difficulty understanding sentences.

True dyslexia is much broader than simply confusing or transposing letters, for example mistaking ”b” and “d.".

In general, symptoms of DRD may include:

Difficulty determining the meaning (idea content) of a simple sentence
Difficulty learning to recognize written words
Difficulty rhyming

You'll find a similar perspective in The International Dyslexia Association's FAQ.

An older view, which remains strong in the popular imagination and in some corners of the specialist literature, is that reading-specific difficulties are caused by visual problems, especially by a propensity to reverse letters. The NIH PubMed page goes out of its way to reject this idea specifically, and Perrachione et al. endorse the current scholarly consensus in the phrase "Impaired phonological processing is characteristic of dyslexia and thought to be a basis for difficulty in learning to read".

Another now-generally-debunked theory is that reading difficulties are caused by children skipping the crawling phase of motor development. There are still people out there who offer therapy for poor readers in the form of re-learning to crawl, although as far as I know, there is no good evidence that this works. The conceptual link between crawling and reading is much less intuitive than the link between letter-recognition and reading, so the motor-development idea is much less common than the letter-reversal idea.

Turning to the problem of speaker recognition (or "voice recognition", as Perrachione et al. confusingly call it), it's plausible that people should be better at doing this in a language they know well than in one that they don't. This was confirmed by J.P. Goggin et al., "The role of language familiarity in voice identification", Memory and Cognition 1991:

Four experiments examined the effects of language characteristics on voice identification. In Experiment I, monolingual English listeners identified bilinguals' voices much better when they spoke English than when they spoke German. The opposite outcome was found in Experiment 2, in which the listeners were monolingual in German. In Experiment 3, monolingual English listeners also showed betterr voice identification when bilinguals spoke a familiar language (English) than when they spoke an unfamiliar one (Spanish). However, English-Spanish bilinguals hearing the same voices showed a different pattern, with the English-Spanish difference being statistically eliminated. Finally, Experiment 4 demonstrated that , for English-dominant listeners, voice recognition deteriorates systematically as the passage being spoken is made less similar to English by rearranging words, rearranging syllables, and reversing normal text. Taken together, the four experiments confirm that language familiarity plays an important role in voice identification.

So if dyslexics have deficiencies in phonological processing, and if speaker recognition is improved by native-language linguistic analysis, which includes phonological processing, then it's plausible that dyslexics would show native-language-specific deficiencies in speaker recognition. The recent Perrachione et al. paper tests this plausible hypothesis, and finds supporting evidence:

(A) Mean voice-recognition performance of dyslexic and control listeners (error bars indicate SEM). All individuals scored above chance (20%), shown as baseline. (B and C) Relationships between clinical measures of language (phonological) ability in dyslexia and voice-recognition ability. CTOPP, Comprehensive Test of Phonological Processing.

The (A) graph shows that their dyslexic subjects, who were native speakers of English, performed just as well as controls in learning to identify new Chinese speakers' voices, but quite a bit worse at learning to recognize new English speakers' voices. [The paper doesn't give any numbers, but from a careful measurement of the graph, the performance of the dyslexic group appear to be 50% correct on average, while the performance of the control group was 68% correct on average. The standard deviations are roughly 14 and 15 respectively. This translates to an average of 2.5 correct out of 5, compared to an average of 3.4 correct out of 5, and an effect size of about 1.2.]

The (B) graph compares speaker-identification performance to scores on a "nonword repetition" task, which involves repeating (verbally-presented) nonsense words like "dooloowheep". The (C) graph shows the relationship to an "elision" task, which involves following instructions like "say 'blend' without saying /l/".

OK, now comes the boring part where we look at the experiment itself (see "Never mind the conclusions, what's the evidence?", 8/30/2010).

There were 16 controls and 16 "individuals with dyslexia". According to Perrachione et al.'s Supporting Online Material,

Inclusionary criteria for dyslexia consisted of a prior clinical diagnosis or lifelong history of reading disability and scoring below the 16th percentile (one standard deviation below the age-normed mean) on any two subtests from the following standard clinical reading and language assessments: Woodcock Reading Mastery Test-Revised (WRMT-R/NU), Test of Word Reading Efficiency (TOWRE), and Comprehensive Test of Phonological Processing (CTOPP).

For more about these tests, see WRMT-R, TOWRE, CTOPP.

Groups were matched based on cognitive performance (“Matrices” and “Block Design” from the Wechsler Abbreviated Scale of Intelligence, WASI; (10)), working memory (Wechsler Adult Intelligence Scale WAIS-IV; (11)), age, and education.

As usual, it's worth giving a bit of thought to the population from which the subjects were taken. Again as usual, we don't know a great deal about this, but the authors of the study are all at MIT, and the age of the subjects was 21.3 ± 2.7 (controls) and 23.9 ± 6.8 (dyslexia), so most of them were probably MIT students. As a result, it's worth registering the usual mental reservation to the effect that neither the control group nor the dyslexic group are typical in other respects of the groups they conceptually represent. As usual, it's not clear whether this matters or not.

Also, we should note that when they say that the two groups were "matched on cognitive performance […], working memory […], age, and education", what they mean (apparently) is something like "group means were within a standard deviation of one another". It's slightly worrisome that in fact the dyslexia group was on average below the control group in every cognitive dimension on which they were supposed to be "matched", by as much as 0.644 standard deviations:

What about the experimental design?

Two sets of ten sentences designed for acoustic assessment were recorded for this experiment: one spoken in English, the other in Mandarin. The English sentences were read by five male native speakers of American English (aged 19-26 years, M = 21.6). The Mandarin sentences were read by five male native speakers of Mandarin Chinese (aged 21-26 years, M = 22.6). […] Recordings of sentences were 1.46sec to 4.09sec in duration (M = 2.43, SD = 0.54). In each language, five sentences were used during the familiarization and practice phases, and all ten were used during the final voice recognition test.

A few comments are in order here.

First, the same set of ten sentences was recorded by every speaker. This may put a premium on detailed segment-by-segment comparison, in a way that a text-independent task might not (i.e. a task where each speaker's utterances involved different words and phrases). So it would be nice to know whether the effect is maintained or attenuated or eliminated in a text-independent task.

Second, these are relatively short stimuli (mean of 2.43 seconds). In the automatic speaker-recognition area, the relative performance of different algorithms can be quite different as the length of the training and testing stimuli increases, and it's plausible that this is also true for various aspects of human voice-identification abilities.

Third, we aren't told whether the speakers differed significantly in regional, class, or ethnic features. If they did, then this would plausibly put a premium on paying attention to a phonological analysis, so that specific features (e.g. ae-raising) could be noted.

Fourth, it's important that the same stimuli were (partly) used in training and in testing: "In each language, five sentences were used during the familiarization and practice phases, and all ten were used during the final voice recognition test". This changes the nature of the test even further in the direction of text-dependent speaker recognition — which obviously puts a premium on phonological memory. (This is especially true when the texts are few and short, so that they can be memorized and used as a key for registering speaker-and-text-specific acoustic properties.)

And fifth, this is a test of read sentences rather than naturally-occurring speech. Read speech and natural speech have quite different properties, and it's possible that these differences include a different balance between phonological and other (e.g. prosodic) cues to speaker identity.

Here's more about the procedures used:

Participants learned to identify five talkers in each of two language conditions (English and Mandarin) from the sound of their voice. Each talker was associated with a distinct cartoon avatar. Training and testing on voice recognition were completed in each language condition separately, and the order was counterbalanced across listeners. During an initial familiarization phase, participants heard each of the voices in succession while the corresponding avatars were displayed on a computer screen. Participants then actively practiced identifying the talkers with corrective feedback: The five avatars appeared on the screen while a recording from one talker was played, and participants selected the avatar matching the voice they heard. If participants selected incorrectly, the computer indicated the correct response. During the task, all instructions were presented both as text on the screen and as auditory prompts recorded by an additional female talker. The familiarization and active practice phases were repeated over five training sentences, and each sentence was practiced ten times. Following training, participants undertook a 50-item talker identification test, in which they identified the voices without feedback.

Summing up:

This experimental design has many features that are likely to increase the value of phonological analysis and phonological memory: the stimuli are quite short; each speaker reads the same small set of sentences; half of the stimuli in the testing phase were also used in the training (with feedback) phase; the stimuli are read rather than natural.

To the extent that the goal is to confirm, in a new way, the existing consensus that the (probably diverse) collection of reading difficulties known as "dyslexia" is strongly associated with deficiencies in phonological processing, none of this matters.

To the extent that the goal is to confirm, in a new way, the plausible hypothesis that human speaker recognition is mediated in part by phonological processing, none of this matters.

But if we're interested in whether people with reading difficulties are likely also to have problems recognizing who's talking when, in the context of everyday life, these design issues matter quite a bit.

So which interpretive frame do you think dominated the media coverage?

August 6, 2011 @ 9:20 am · Filed by Mark Liberman under Psychology of language

Permalink

6 Comments

John Lawler said,

August 6, 2011 @ 11:53 am

Thank you, Mark. That was very useful and informative. One thing still troubles me — not about the experiment, but about the "current scientific consensus" on "DRD", as expressed by the NIMH:

> Difficulty determining the meaning (idea content) of a simple sentence

Not to be obscurantist, but does this refer to difficulty determining the meaning of written or spoken sentences? Since speech is biological while literacy is technological (as you hint above), the difference is pretty significant, and ought to be indicated in anything that claims to be definitive.
Whenever I read anything about "dyslexia" any more, I always check to see whether the claims made could possibly be true for some other technological skill deficiency, like difficulty in driving or programming a computer. If it's even remotely conceivable, I get interested; if not, I stop reading.
neuromusic said,

August 6, 2011 @ 3:50 pm

regarding the false but popular perspective of dyslexia being a visual reading impairment where people switch "b" and "d" I recall my blood starting to boil when I saw this video floating around the interwebs recently…

http://www.youtube.com/watch?v=VLtYFcHx7ec

giving the letters more "gravity" to help dyslexics is completely absurd. supposedly, this typeface was "tested" but I couldn't find the study.
Adrian Morgan said,

August 6, 2011 @ 11:00 pm

@neuromusic I would very much like to see a balanced discussion of that video and the website it comes from. http://www.studiostudio.nl/project-dyslexie/

I linked to it from my blog last night (along with various other things I found interesting in the last fortnight), but as with most things I'm just a layperson and generally bow to the expertise of people who seem to know what they're talking about.

Disorders involving the brain are always more complicated than just one symptom, and it would raise red flags if the video/website suggested that dyslexia could be boiled down to a single symptom (such as confusing similar letters). But I didn't get the impression it was doing that. If the font helps with respect to just one symptom of dyslexia, then it helps, and the fact that there's more to dyslexia doesn't alter this.

In short, I'm not in a position to criticise the video/website, but I'm definitely interested in listening to discussion from people who are.
Mary Bull said,

August 7, 2011 @ 9:35 am

For a time in the 1960s I was one of a group of teachers giving special help to children whose reading skills were delayed by at least 2 years according to the curriculum at that time and who were further identified as to the kind of help they needed by some specific tests. I worked with these children one-on-one.

In learning to use the tests, I discovered that I myself had a number of the characteristics associated with what was then being termed "perceptual-motor disorder."

Besides difficulty processing spoken language, I also shared "dyslexic" children's directional confusion. This manifests itself not only in misperception of letters but also in confusing left and right in other contexts, such as navigating when driving.

I concur with John Lawler's last paragraph, and also with Adrian Morgan's reservations.

Finally, I appreciate very much ML's report on this study, and hazard the guess that the media coverage was mostly along the lines of the first of his three suggested interpretive frames.
Mark F. said,

August 7, 2011 @ 11:48 am

Mary, I had the impression ML meant that the media coverage was dominated by the third frame, the idea that dyslexic people would have trouble recognizing others by voice in everyday life. This is a much more powerful hook — if you think much of your audience is only marginally interested in science, you're going to want to tie things to personal experience as much as possible.

Looking at his link to the media coverage, I thought the first four linked articles actually did a pretty good job of not missing the point. The first two had headlines characterizing the study as being about the link between linguistics and phonological processing, although I admit that's charitable, since "auditory" is not the same as "phonological", and "may have auditory tie" implies the introduction of a new hypothesis, rather than further confirmation of an existing consensus. But still, both headlines convey the idea that the study was about the relationship between dyslexia and dealing with sounds.

The US News site is just presenting an article provided by the NSF, sort of the equivalent of repackaging a press release except that they gave NSF full credit. And that article seems to be pretty good.

I thought the Discover blog post was also fairly on target, if breezy.

But it does go downhill from there.
Mary Bull said,

August 7, 2011 @ 10:13 pm

Mark F., thank you for your thoughtful response to my comment. I see why you say, just looking at the first four headlines, that the media coverage was from the point of view of ML's third interpretive frame.

Actually, I missed seeing that ML had put a link in the first sentence of his post, and so have just now had a look at the Google page with the headlines and the articles those listings link to.

My browser makes very little difference in the color of the regular text and the color of the words constituting the link — the difference between gray and a pale slate-blue. I simply overlooked that link, consequently, and went hunting it after reading your comment.

BTW, at the time when I was teaching those special children in the mid-to-late 1960s (I'd been pulled out of my regular classroom to participate in a new approach funded by a grant under NDEA Title I) one thing being investigated was how children with reading difficulties were affected by colors of text and background. It had been observed that a duplicating process common in those years (used because it was cheap) which resulted in purple text was not a good way to prepare learning or test materials for these students. That light purple font on a white background may have been hard for them to distinguish letters and words in, or there may have been more complex causes at work. But these students definitely performed better with clear black type on a light but not pure white background than they did with lavender on white texts.

No comparative testing as to how much difficulty text color may have made for students reading at grade level was done. But they were managing to read the lavender on white materials much more successfully than the children with delayed reading skills could. So the special reading teachers in my school district asked for and received better ways of duplicating home-made lessons and tests.

The point of this long personal anecdote: I may have today supplied one more piece of data confirming my own dyslexic tendencies by not noticing ML's link. :)

RSS feed for comments on this post

Phonological processing and speaker identification

6 Comments

John Lawler said,

neuromusic said,

Adrian Morgan said,

Mary Bull said,

Mark F. said,

Mary Bull said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta