At the recent Acoustics 2008 meeting, I heard a presentation that reminded me of a mystery that I've been wondering about for nearly two decades. The paper presented was Maria Uther et al., "Training of English vowel perception by Finnish speakers to focus on spectral rather than durational cues", JASA 123(5):3566, 2008. And the mystery is why HVPT — a simple, quick, and inexpensive technique for helping adults to learn the sounds of new languages — is not widely used.
In fact, as far as I can tell, it's not used at all. Over the years, I've asked many people in the language-teaching business about this, and the answer has always been the same. It's not "Oh yes, well, we tried it and it doesn't really work"; or "It works, but the problems that it solves are not very important"; or "I'd like to, but it doesn't fit into my syllabus". Rather, their answer is some form of "What's that? I've never heard of it."
Actually, the initialism HVPT (for "High Variability Phonetic Training") is new, or at least new to me. But the ideas behind this type of training, and the basic evidence that it works, have been around for a while. The locus classicus is a series of presentations at Acoustical Society of America meetings between 1991 and 1995, by Dave Pisoni and colleagues. (See the end of this post for a list.)
The starting point is the fact that speakers of a given language sometimes have a terrible time with certain "sounds" — certain phonological categories or distinctions — in another language. And I'm not talking about production problems, like the difficulty of learning to roll an [r], but about perceptual problems, the problem of learning to hear certain sound categories as distinct from others. (There's usually an associated production problem as well, of course.)
English poses a number of such problems for speakers of certain other languages. For example, Japanese native speakers have a notoriously difficult time with English /r/ and /l/. And speakers of many languages, Japanese and Spanish among them, have a hard time with the English vowel distinctions in BIT vs. BEAT or LOOK vs. LUKE.
Because these problems can be completely masked by linguistic redundancy, it's easy to ignore how serious and persistent they often are.
Thus if you have no ability at all to distinguish the English vowel categories /ɪ/ and /i/, you'll hear English "big" as one of the two possibilities /bɪg/ and /big/. But in this case, you can solve the problem on purely lexical grounds, since there is no ordinary English words "beeg" (or however it might be spelled). If you hear "sick" as either /sɪk/ or /sik/, you need to rely on context to tell you that the word "sick" and not "seek" — but context will almost always solve the problem for you, as in these two random examples from today's news:
I thought I was going to be __ when I walked onto court because there were so many people watching.
One of Hong Kong's most prominent democracy advocates said Sunday she will not __ re-election.
About 15 years ago, a student at Penn looked into this problem for a class project, with results that surprised me. In a forced-choice classification task involving English minimal pairs like "sick" vs. "seek", several fluent speakers of English whose native language was Spanish or Japanese performed essentially at chance levels. Her subjects included some people who had been living in the U.S. and interacting daily in English for a decade or more.
If someone has good communications skills in English, maybe surprising deficits in some aspects of English phonetic perception don't matter. On the other hand, if there were an easy way to fix the problem, it couldn't hurt. And based on my own experiences with other languages, I think that it ought to help, especially in the earlier stages of language learning, when you don't have a lot of lexical redundancy to work with.
The first and most striking example of this that I encountered personally was as an undergraduate, in a field methods course where we worked on Javanese. The Javanese consonant distinction that is written in romanization as p/b, t/d, c/j, k/g — and is cognate to a voicing distinction in related languages — is realized phonetically without any difference in voicing. The distinction is sometimes described as "light" vs. "heavy", where the "heavy" consonants (written b, d, j, g in romanization) apparently have a widened pharynx caused by advancing the root of the tongue, and sometimes a murmured or slightly breathy voice quality.
After a semester of trying, I still couldn't reliably transcribe this aspect of the language, though of course once I learned a word, I could remember how to "spell" it. Thus after we'd recorded, transcribed and analyzed a folktale about the mouse deer, "kancil", I knew that the trickster was Kancil and not Kanjil or Gancil or Ganjil. But categorizing the stop consonants in a new word as "heavy" or "light" remained a struggle.
Everyone else in the class, including the instructor, had the same problem. Curiously, we could discriminate the categories perfectly well: if we heard a minimal pair, say /ba/ vs. /pa/ in either order, it was easy to tell which was heavy and which was light. The problem was that when we heard just one, we couldn't identify the category at all accurately.
OK, enter HVPT. This is a simple method for teaching people to distinguish foreign-language sounds that they find difficult. The basic idea is incredibly straightforward: lots of practice in forced-choice identification of minimal pairs, with immediate feedback, using recordings from multiple speakers.
Suppose we're teaching English /i/ vs. /ɪ/. Then on each trial, the subject sees a minimal pair — say mitt vs. meet — and hears a recorded voice saying one of the two words. The subject makes a choice, and immediately learns whether the choice was right or wrong.
(Of course, you can eliminate the written-language aspect by giving the categories descriptive names, like "lax i" vs. "tense i", or just arbitrary names, like "type 1" and "type 2". If you also put the response categories in the same order on the screen, then this sort of categorization emerges automatically even if you use an opaque notation like English spelling.)
What psycholinguists showed, more than 15 years ago, was that this simple method only works if the stimuli are varied enough. If you test repeatedly on a single example, subjects won't be able to generalize to other examples. If you use just one speaker, then subjects won't be able to generalize to the productions of others. But experience with a few different repetitions of a few dozen example types by each of a half a dozen or so varied speakers seems to be enough to allow generalization to new examples and new speakers.
Over the past decade and a half, continuing research shows that considerable improvement generally comes quickly (e.g. from chance responses to 70-80% correct, after 10 half-hour sessions spread over two weeks), lasts a long time (with good retention six months or a year later), and also creates improvements in production. And these days, it would be trivial to make this technique available as a web application, so that students could do their practice sessions whenever and wherever.
But as far as I know, there are no language courses where HVPT is in routine (or even experimental) use. I don't believe that this is because it's been tried and found wanting — as far as I know, no one has any evidence either way about what impact HVPT has on overall language learning.
So I'm puzzled. As I mentioned at the beginning of this post, I've been asking language-teaching professionals about this since 1992 or so, when I first heard about the technique. And I've never run accross one who's heard of the idea.
Maybe in the end HVPT doesn't make enough impact on overall language-learning progress to be worth doing. But if I had to bet, I'd put my money the other way.
There are many other obvious questions to ask, some of which have no doubt been answered in research that I don't know about. One that comes to mind is the role of variation due to discourse and sentence context, as opposed to variation due to phonological context and speaker differences. But for me, the biggest question is a sociological one: why the big disconnect between research and practice?
W. Strange and S. Dittman, "Effects of discrimination training on the perception of /r-l/ by Japanese adults learning English", Perception & Psychophysics, 36(2): 131-145, 1984.
J. S. Logan, S. E. Lively, and D. B. Pisoni, "Training Japanese listeners to identify English /r/ and /l/: A first report", JASA 89: 874-886, 1991.
S.E. Lively, J.S. Logan & D.B. Pisoni, "Training Japanese listeners to identify English /r/ and /l/. II: The role of phonetic environment and talker variability in learning new perceptual categories", JASA 94(3): 1242-1255, 1993.
S.E. Lively, D.B. Pisoni, R. Yamada, Y. Tokhura, & T. Yamada, "Training Japanese listeners to identify English /r/ and /l/. III. Long-term retention of new phonetic categories. JASA 96(4):2076-2087, 1994.
A.R. Bradlow, D.B. Pisoni, R.A. Yamada, & Y. Tohkura. "Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production", JASA 101:2299-2310., 1995
David B. Pisoni, Scott E. Lively & John S. Logan, "Perceptual Learning of Nonnative Speech Contrasts: Implications for Theories of Speech Perception", pp. 121-166 in Judith Goodman and Howard C. Nusbaum, Eds., The Development of Speech Perception, MIT Press 1994.