« previous post | next post »

At the recent Acoustics 2008 meeting, I heard a presentation that reminded me of a mystery that I've been wondering about for nearly two decades. The paper presented was Maria Uther et al., "Training of English vowel perception by Finnish speakers to focus on spectral rather than durational cues", JASA 123(5):3566, 2008. And the mystery is why HVPT — a simple, quick, and inexpensive technique for helping adults to learn the sounds of new languages — is not widely used.

In fact, as far as I can tell, it's not used at all. Over the years, I've asked many people in the language-teaching business about this, and the answer has always been the same. It's not "Oh yes, well, we tried it and it doesn't really work"; or "It works, but the problems that it solves are not very important"; or "I'd like to, but it doesn't fit into my syllabus". Rather, their answer is some form of "What's that? I've never heard of it."

Actually, the initialism HVPT (for "High Variability Phonetic Training") is new, or at least new to me. But the ideas behind this type of training, and the basic evidence that it works, have been around for a while. The locus classicus is a series of presentations at Acoustical Society of America meetings between 1991 and 1995, by Dave Pisoni and colleagues. (See the end of this post for a list.)

The starting point is the fact that speakers of a given language sometimes have a terrible time with certain "sounds" — certain phonological categories or distinctions — in another language. And I'm not talking about production problems, like the difficulty of learning to roll an [r], but about perceptual problems, the problem of learning to hear certain sound categories as distinct from others. (There's usually an associated production problem as well, of course.)

English poses a number of such problems for speakers of certain other languages. For example, Japanese native speakers have a notoriously difficult time with English /r/ and /l/. And speakers of many languages, Japanese and Spanish among them, have a hard time with the English vowel distinctions in BIT vs. BEAT or LOOK vs. LUKE.

Because these problems can be completely masked by linguistic redundancy, it's easy to ignore how serious and persistent they often are.

Thus if you have no ability at all to distinguish the English vowel categories /ɪ/ and /i/, you'll hear English "big" as one of the two possibilities /bɪg/ and /big/. But in this case, you can solve the problem on purely lexical grounds, since there is no ordinary English words "beeg" (or however it might be spelled). If you hear "sick" as either /sɪk/ or /sik/, you need to rely on context to tell you that the word "sick" and not "seek" — but context will almost always solve the problem for you, as in these two random examples from today's news:

I thought I was going to be __ when I walked onto court because there were so many people watching.
One of Hong Kong's most prominent democracy advocates said Sunday she will not __ re-election.

About 15 years ago, a student at Penn looked into this problem for a class project, with results that surprised me. In a forced-choice classification task involving English minimal pairs like "sick" vs. "seek", several fluent speakers of English whose native language was Spanish or Japanese performed essentially at chance levels. Her subjects included some people who had been living in the U.S. and interacting daily in English for a decade or more.

If someone has good communications skills in English, maybe surprising deficits in some aspects of English phonetic perception don't matter. On the other hand, if there were an easy way to fix the problem, it couldn't hurt. And based on my own experiences with other languages, I think that it ought to help, especially in the earlier stages of language learning, when you don't have a lot of lexical redundancy to work with.

The first and most striking example of this that I encountered personally was as an undergraduate, in a field methods course where we worked on Javanese. The Javanese consonant distinction that is written in romanization as p/b, t/d, c/j, k/g — and is cognate to a voicing distinction in related languages — is realized phonetically without any difference in voicing. The distinction is sometimes described as "light" vs. "heavy", where the "heavy" consonants (written b, d, j, g in romanization) apparently have a widened pharynx caused by advancing the root of the tongue, and sometimes a murmured or slightly breathy voice quality.

After a semester of trying, I still couldn't reliably transcribe this aspect of the language, though of course once I learned a word, I could remember how to "spell" it. Thus after we'd recorded, transcribed and analyzed a folktale about the mouse deer, "kancil", I knew that the trickster was Kancil and not Kanjil or Gancil or Ganjil. But categorizing the stop consonants in a new word as "heavy" or "light" remained a struggle.

Everyone else in the class, including the instructor, had the same problem. Curiously, we could discriminate the categories perfectly well: if we heard a minimal pair, say /ba/ vs. /pa/ in either order, it was easy to tell which was heavy and which was light. The problem was that when we heard just one, we couldn't identify the category at all accurately.

OK, enter HVPT. This is a simple method for teaching people to distinguish foreign-language sounds that they find difficult. The basic idea is incredibly straightforward: lots of practice in forced-choice identification of minimal pairs, with immediate feedback, using recordings from multiple speakers.

Suppose we're teaching English /i/ vs. /ɪ/. Then on each trial, the subject sees a minimal pair — say mitt vs. meet — and hears a recorded voice saying one of the two words. The subject makes a choice, and immediately learns whether the choice was right or wrong.

(Of course, you can eliminate the written-language aspect by giving the categories descriptive names, like "lax i" vs. "tense i", or just arbitrary names, like "type 1" and "type 2".  If you also put the response categories in the same order on the screen, then this sort of categorization emerges automatically even if you use an opaque notation like English spelling.)

What psycholinguists showed, more than 15 years ago, was that this simple method only works if the stimuli are varied enough. If you test repeatedly on a single example, subjects won't be able to generalize to other examples. If you use just one speaker, then subjects won't be able to generalize to the productions of others. But experience with a few different repetitions of a few dozen example types by each of a half a dozen or so varied speakers seems to be enough to allow generalization to new examples and new speakers.

Over the past decade and a half, continuing research shows that considerable improvement generally comes quickly (e.g. from chance responses to 70-80% correct, after 10 half-hour sessions spread over two weeks), lasts a long time (with good retention six months or a year later), and also creates improvements in production. And these days, it would be trivial to make this technique available as a web application, so that students could do their practice sessions whenever and wherever.

But as far as I know, there are no language courses where HVPT is in routine (or even experimental) use. I don't believe that this is because it's been tried and found wanting — as far as I know, no one has any evidence either way about what impact HVPT has on overall language learning.

So I'm puzzled. As I mentioned at the beginning of this post, I've been asking language-teaching professionals about this since 1992 or so, when I first heard about the technique. And I've never run accross one who's heard of the idea.

Maybe in the end HVPT doesn't make enough impact on overall language-learning progress to be worth doing. But if I had to bet, I'd put my money the other way.

There are many other obvious questions to ask, some of which have no doubt been answered in research that I don't know about. One that comes to mind is the role of variation due to discourse and sentence context, as opposed to variation due to phonological context and speaker differences.  But for me, the biggest question is a sociological one: why the big disconnect between research and practice?

W. Strange and S. Dittman, "Effects of discrimination training on the perception of /r-l/ by Japanese adults learning English", Perception & Psychophysics, 36(2): 131-145, 1984.

J. S. Logan, S. E. Lively, and D. B. Pisoni, "Training Japanese listeners to identify English /r/ and /l/: A first report", JASA 89: 874-886, 1991.
S.E. Lively, J.S. Logan & D.B. Pisoni, "Training Japanese listeners to identify English /r/ and /l/. II: The role of phonetic environment and talker variability in learning new perceptual categories", JASA 94(3): 1242-1255, 1993.
S.E. Lively, D.B. Pisoni, R. Yamada, Y. Tokhura, & T. Yamada, "Training Japanese listeners to identify English /r/ and /l/. III. Long-term retention of new phonetic categories. JASA 96(4):2076-2087, 1994.
A.R. Bradlow, D.B. Pisoni, R.A. Yamada, & Y. Tohkura. "Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production", JASA 101:2299-2310., 1995

David B. Pisoni, Scott E. Lively & John S. Logan, "Perceptual Learning of Nonnative Speech Contrasts: Implications for Theories of Speech Perception", pp. 121-166 in Judith Goodman and Howard C. Nusbaum, Eds., The Development of Speech Perception, MIT Press 1994.


  1. Elizabeth McCullough said,

    July 6, 2008 @ 2:19 pm

    Cool to see David Pisoni cited — he was a professor of mine.

  2. Cheryl Thornett said,

    July 6, 2008 @ 2:20 pm

    I haven't heard of this technique before, but I would like to try it in my classrooms. Have any materials been produced, or would I have to try to create all the recordings in my own unpaid time, having recruited unpaid volunteers to be speakers? I certainly use listening materials as frequently as possible, and quality and variety of voices is one of my criteria for choice, but I have never come across anything like this.

    I will try to mention this idea among colleagues at a professional conference I am attending next weekend, partly to see if anyone has heard of it or is aware of any resources. With your permission, I will post it to a professional forum; many of the members are researchers and teacher trainers in ESOL, and might be interested or have relevant knowledge.

    Sorry to bring up the financial angle, but those of us who are sessionally paid, rather than part-time salaried are seriously underpaid over the course of a year, as our pay is based on student contact time and covers only a fraction of the time actually needed to prepare and complete an ever-growing burden of paperwork.

  3. dr pepper said,

    July 6, 2008 @ 2:38 pm

    Hmm, sounds as if the effect is to enable the student's brain to detect some sort of essence of each phoneme behind the individual examples and thus establish it as a category.

  4. doviende said,

    July 6, 2008 @ 3:20 pm

    I wonder if it could be combined nicely with that other great learning tool that language teachers never use, SRS software. maybe use one like Anki that lets you put sounds in the flashcards, and then the sounds you have trouble with will be repeated more. All that would be needed is a freely available library of recordings for your particular language. It shouldn't be too hard to round up 10 people for each language to redundantly pronounce a few things on a website, and give the resulting recordings out to whoever needs them.

  5. Mark Liberman said,

    July 6, 2008 @ 3:58 pm

    dr pepper: "sounds as if the effect is to enable the student's brain to detect some sort of essence of each phoneme behind the individual examples and thus establish it as a category.

    Certainly this is a problem of category learning, and it's not a surprise that learners need exposure to "highly" (i.e. suitably) variable exemplars of the categories to be learned.

    There is an additional wrinkle in the case of second-language learning, because of the hard-to-avoid tendency to re-use L1 categories in perceiving and producing L2.

    Teachers notice this easily in production, and try (with mixed success) to fix it.

    It's harder to notice it in perception, because phonetic perception may not be explicitly tested, and because linguistic redundancy can usually compensate for the L1-L2 mismatch.

    The most striking cases are those like Japanese speakers dealing with English light/right, where L1 simply lacks a distinction that L2 uses. There are also cases like the one that Uther et al. discussed in the paper I heard a couple of days ago — Finnish speakers dealing with English mitt/meet, where L1 has an analogous distinction but with with a different phonetic basis.

    In general, it seems that the right sort of HVPT can help L2 learners to form new phonological categories. Apparently this doesn't happen automatically even after decades of communicative use of L2, since the perceptual side may work adequately using the L1 categories, which are apparently hard to change even if they don't fit well at all. That seems to be especially true where L1 merges L2 categories — without intervention, most learners apparently just get along without the distinction, disambiguating based on lexical and phrasal context.

    At least, that's the theoretical background as I understand it. I don't know this literature very well, though, and I'm sure that I'm leaving some relevant things out.

  6. Mark Liberman said,

    July 6, 2008 @ 4:05 pm

    Cheryl Thornett: Have any materials been produced, or would I have to try to create all the recordings in my own unpaid time, having recruited unpaid volunteers to be speakers?

    I'm sure that many sets of materials have been produced, at least one for each of the studies that have been done. Unfortunately, I don't know of any that have been published. I'll look into this and see what I can find or arrange to create.

    It would also be good to have some free software for managing the student interactions — keeping track of who did it, when, for how long, with what results, as well as of course presenting the stimuli and offering feedback on the responses. This is not an especially complicated programming task, but there's no reason to expect language teachers to be good at that kind of programming, and wasteful at best for everyone who wants to try it to have to write their own system.

    Assuming that I'm right to think that neither recordings nor software are available for this purpose, we have part of the explanation for the lack of transfer from theory to practice. On the supply side, the psycholinguists have not felt called on to make such materials and tools available.

    I think that there are some issues on the demand side as well. For example, there's a fairly widespread belief that communicatively valid materials and tasks are the only things that ought to be used in language teaching, and for people who believe that, specialized drills like HVPT are obviously beyond the pale.

  7. John Cowan said,

    July 6, 2008 @ 4:08 pm

    William Labov put together a test of English fluency, very early in his career, based precisely on this sort of phoneme discrimination: the testees were shown a picture, and the proctor read several sentences only one of which described the picture. (I happen to remember "She is singing" and "She is sinking".) This test was resurrected (with Labov's permission) in the 70s for use by the CCNY ESL program as the basic screening test given to all students to see if they required ESL classes. (My mother was in charge of the program, and it was I who actually telephoned for the permission.)

  8. Cheryl Thornett said,

    July 6, 2008 @ 4:36 pm

    I can think of a couple of ESOL academics who might be interested in this area. I will put it to them. There are groups such as the NRDC and LLU+ in the UK which are more likely to have the resources and funds to produce something than individual teachers normally do.

    Thank you for the information; I hope something may come of it.

  9. Paul said,

    July 6, 2008 @ 4:37 pm

    I know that there is a selection of minimal pairs exercises available at http://www.manythings.org/pp/

    If anyone knows of any others and wouldn't mind posting the links, we could cobble together a list here to use as a resource.

  10. dr pepper said,

    July 6, 2008 @ 6:35 pm

    Hmm. I just had a thought. The kindergarten i went to had Reading and Phonics as separate subjects. Phonics involved chanting syllables out loud. I remember the vowel exercise began "A, a, apple. E, e, elephant. I, i, indian."

    Obviously that was intended to standardize our pronunciation. But now i wonder if there was also a normalizing effect from hearing all the other children. It might be that just a little strategic guidance could be amplified in such a setting.

  11. Alexpri said,

    July 6, 2008 @ 8:34 pm

    The description of the perceptual problem of phoneme distinction accords with my own experience as a language learner. For example, when I first learned French, I didn’t realize that the nasal vowels in “dans” and “don” were different. Simply through imitation, presumably, I usually produced a more or less correct vowel but would sometimes get confused — at least that was what I was told when I finally learned about the problem. But even after I began to work on hearing the difference between them (and being careful to produce it), it was very hard. It took me four or five years before I could hear it more or less reliably and easily (during that time, I spent about 6 months in France and otherwise would listen to at least an hour of French a day and often quite a bit more).

    As for why the HVPT technique hasn’t been adopted by language teachers, that opens up quite a can of worms for me. First of all, I think the issues raised by Cheryl Thornett are pertinent: making the sound recordings and setting them up online is not particularly easy for many language departments. But I also agree with Mark’s point, that there is a relative lack of interest in this sort of material on the part of teachers because the focus is on task-based or communicative methods. In the department I teach in, the curriculum places very little emphasis on phonetics. Typically, the workbook exercises that students do as part of their homework include phonetics exercises, but we spend next to no time on it the classroom.

  12. Nick Lamb said,

    July 6, 2008 @ 8:55 pm

    Maybe, in terms of software, you could get good re-use from ABX testing software.

  13. Shira said,

    July 6, 2008 @ 9:51 pm

    As another ESL teacher, but not an academic, I would definitely like to try this. A quick check of Google didn't show me any software — commercial or otherwise — that could be used to implement this technique. If someone is writing software, I'd be happy to record minimal pairs for you.

  14. Lucian said,

    July 6, 2008 @ 11:54 pm

    From just a few minutes of thinking about this, I think it would be fairly simple to build a website that could do this sort of thing, at least once the pairs were recorded.

    It could be adaptive as well (ie, if a user had trouble with a particular pair, it could show that pair more often).

  15. Cheryl Thornett said,

    July 7, 2008 @ 1:36 am

    I have to remind everyone that much language learning takes place in woefully poorly-equipped settings. Many of my colleagues have nothing more than a tape or CD player available on a regular basis.

    I still want to pursue this.

  16. Mark Liberman said,

    July 7, 2008 @ 1:45 am

    Cheryl Thornett: I have to remind everyone that much language learning takes place in woefully poorly-equipped settings. Many of my colleagues have nothing more than a tape or CD player available on a regular basis.

    At least in the U.S., most students have a computer or have access to a computer. And if exercises of this kind were available on line in browser-based form — which would be the best thing anyway — then the teacher wouldn't have to supply anything.

    You'd want the students to be able to use headphones or earbuds as well, but again, many students have their own these days, and some of the computers in schools and libraries have headphones attached.

  17. Alex Case said,

    July 7, 2008 @ 2:07 am

    Are you sure they aren't just confused by the name? There are lots of published courses including minimal pairs practice, e.g. Headway Pronunciation, and some which even just do it, e.g. Ship and Sheep. Surely someone has also done a software version!? The ensuring multiple speakers idea is new to me, but you would have to ensure that there was a clear distinction in all their accents, not always the case. For example, I used a vowel sounds minimal pairs activity that included a New Zealander and had no idea which sound it was!

    Japanese speakers do have problems with English vowel sounds, but have none with simple short/ long distinction like bit/ beat. Far/ fur and cat/ cut are more of a problem as are distinguishing between similar long sounds and dipthongs, e.g. court/ coat.

  18. Mark Liberman said,

    July 7, 2008 @ 5:18 am

    Alex Case: Are you sure they aren't just confused by the name?

    Yes, because the name is recent, and was unknown to me until last week. All of my conversations with language-teaching pros have introduced the problem and the method in descriptive terms.

    Alex Case: The ensuring multiple speakers idea is new to me …

    But it's a crucial part of the method. The research indicates clearly that the same exercises with a single speaker don't work. It's crucial to expose learners to a wide variety of pronunciations by a suitable sample of speakers.

    Alex Case: Japanese speakers do have problems with English vowel sounds, but have none with simple short/ long distinction like bit/ beat.

    I can't speak about all Japanese speakers, or even make any claims about a statistically representative sample. But what I recall from my student's long-ago experient is that she tested a number of Japanese native speakers living and working in the U.S., and functioning well in English; and all of those that she tested performed at or near chance levels in forced-choice classification of precisely the bit/beat distinction.

    She didn't test their production. My impression, for what little it's worth, is that Japanese speakers do generally produce a distinction in such word pairs, in contrast to Spanish speakers, who often use the same vowel for both in production.

    Alex Case: There are lots of published courses including minimal pairs practice, e.g. Headway Pronunciation, and some which even just do it, e.g. Ship and Sheep.

    I'm not familiar with either of these.

    Based on what I can tell from the Headway Pronunciation web site, its exercises don't include the key features of many (hundreds) of forced-choice binary classifications per session, with multiple (10 or so) sessions for each distinction, using stimuli representing many words from many speakers, with immediate feedback after each choice. (However, I can't really tell what the Headway exercises are like.)

    shiporsheep is closer: it's got multiple words in multiple voices for each distinction — though not nearly as many as are — or should be — used in the HVPT technique. (The online shiporsheep page has five pairs of words for each distinction — the HVPT technique, in the version that I once tried, involved more than a thousand pairs for each distinction.)

    Also, the set-up of the exercise is completely different: in shiporsheep, the user sees a pair of words, and mouses each one in order to hear it. In the HVPT technique, the user hears a word, and must classify it without knowing what it is.

    I haven't seen an experimental comparison — perhaps the shiporsheep technique — passive exposure to O(10) stimuli — also works. But the results in the literature that I know about are all for a very different method, namely active forced-choice classification with feedback of O(1000) stimuli.

  19. Joaquim said,

    July 7, 2008 @ 6:01 am

    I'd say that some foreign speakers are aware of the existence of such "difficult" distinctions, but others just don't know, or don't care. I mean people who systematically use their L1 categories (Spanish in my experience) as universal. Who moreover believe "Spanish is written as it sounds" in contrast to foreign languages.
    Would the method work for those? I mean, could they continue to believe that "bit" and "beat" have the same vowel (Spanish i), and at the same time be able to effectively tell them apart in the tests?

  20. Cheryl Thornett said,

    July 7, 2008 @ 7:45 am

    My students are adults, mostly between 25 and 30 years of age and mostly with family responsibilities. Some are refugees. Many more are members of low income families. Some are part of families which consider computers the property of the men of the family. Public computers in places like libraries are difficult to access for a woman with children below school age. Believe me, personal computer access is far from universal, even in the first world, never mind in the third.

    These students need a resource which their teacher can provide in a classsroom setting, even if the teacher has to make language or person-specific recordings to bring into class.

  21. Jorge said,

    July 7, 2008 @ 11:29 am

    "sheet" and "beach" are words I sometimes avoid for that reason

  22. Steve said,

    July 7, 2008 @ 11:57 am

    I too teach English as a foreign language, and neither I nor any of my colleagues have ever heard of HVPT. There are no doubt lots of reasons for this – pronunciation has never been taught as systematically as other aspects of the language (probably because it is harder to reduce to a 'system' than, say, tenses or conditionals); as Mark says, the generally-accepted 'communicative' or 'task-based' models of language teaching are not hospitable to such techniques; non-native-speaker teachers are not confident about their own pronunciation, and so on.

    Whatever the reasons, I agree with the other comments here that it sounds like a valuable technique and I want to know more. Of course, minimal pairs have long been a familiar feature of the EFL terrain, but the approach of both Headway and Ship and Sheep tends to be passive – 'listen to these words and tick the ones you hear'. I suppose this is because it is considered more important for students to be able to identify the difference between 'ship' and 'sheep' or 'tree' 'three' and 'free', than to pronounce them correctly themselves (a fair point – the average Spanish accent can be incomprehensible in far more radical ways than its inability to distinguish between /I/ and /i:/, and there are plenty of native speakers (Irish, Cockney etc.) whose dialects confuse th with t or f (or d or v, for the other pronunciation of 'th'). But correct production of such phonemes is important as well as understanding them, and HVPT sounds like it might be a very valuable addition to EFL methodology.

    Personally, I have always got my students to say minimal pairs to each other as well as listen to them, which helps to focus their attention on active production along with passive understanding – and, as Cheryl will be pleased to note, requires no additional resources. Now that I have been alerted to the fact that repetition by different speakers is also important, I will now get them to change partners several times in the course of the exercise as well. But I would certainly be interested to find out more about any other resources or software.

  23. Tom said,

    July 7, 2008 @ 12:10 pm

    I happen to be both a programmer and a foreign language teacher (Spanish). I'd be interested in seeking out or helping create free resources for just this kind of activity. What we need is a large stock of speech examples, transcribed and classified. I'd imagine that, given that stock, not only could we easily implement this activity, but also others that might come along. This seems like just the kind of thing the internet can help create, a la wikipedia — I know I'd be happy to read some sample text in English for what it's worth. Is anyone aware of a project like this now? If not, does anyone have ideas of where we could get hosting to create such a project?

  24. dr pepper said,

    July 7, 2008 @ 2:31 pm

    Hmm, sounds like a new Seti @ Home initiative. Establish a small committee to choose the pairs, build a database to hold the sounds, and a clint program that can request and record. So then you sign up volunteers who will run the client program, which will request pairs chosen randomly, record their response, and transmit it to the database manager. I suppose there will have to be people sifting the new entries to ensure nobody said "my hovercraft is full of eels".

    After the database is established it can be used to create teaching programs. And these would include straight audio recording for teachers with minimal equipment.

  25. Cheryl Thornett said,

    July 7, 2008 @ 2:32 pm

    One of the advantages of teaching people with different first languages in the same class is that you get some built-in variation. Traditional minimal pairs work is great because you can do it whenever needed.

    When I was reading up on literacy teaching, I found that the US NIFL site has many useful suggestions for developing phonemic awareness, which helps first language and second language literacy. It's important to include middle and final sounds as well as initial sounds, so one exercise I do is to ask students to listen to for first, middle or final sounds of a minimal pair and tell me whether it is the same or different. This works in pairs or small groups as well and works very well with different language backgrounds and listening/pronunciation problems.

  26. Anna said,

    July 7, 2008 @ 2:43 pm

    Why the big disconnect between research and practice?

    Well, for starters, teachers don't read JASA. Unless this research has been summarized elsewhere in a fashion that's accessible to language teachers (who may have very little training in linguistics), then part of the disconnect is because people don't know about the research.

    And as a couple of commenters mentioned above, this sort of drill training isn't particularly en vogue with language teachers right now because it's not a communicative exercise. One could make it so, however–for instance by embedding it in some sort of information exchange exercise. But it's also true that teachers who haven't had a lot of training in linguistics won't necessarily figure out ways to do this. A case in point; I once worked with an ESL teacher who was very interested in teaching pronunciation but felt that drilling in minimal pairs wasn't useful for her students because there was no communicative aspect. I created an exercise where students were required to distinguish between minimal pairs to give and understand directions (on a map with examples like "Bell St." and "Vell St."); my colleague liked the exercise, but when I referred to it as a "minimal pairs task" she didn't know what I meant. She hadn't made the connection between the linguistic concept that she'd learned during her training and the communicative functions of language.

  27. Bernadette Buck said,

    July 7, 2008 @ 4:36 pm

    Thank you for the fascinating post. I can relate as I had terrible problems perceiving the difference between Russian (sh) and (shch) until I'd been in Russia for several months, and this provides an explanation as to why that would be — I needed to hear it from many speakers first. I think this is a much better, more complete theory than the one I'd had; previously, I had thought that I simply needed first to *believe* there was a difference (similar to Joaquim's point.) You see, my "aha" moment came after I complained about the lack of difference to a Russian friend, and her incredulous reaction — "Of COURSE there's a HUGE difference!" — caused me to listen again and work it out. But by that time I'd been hearing it for months from many speakers, whereas when I'd tried like heck back home in the language lab, there had been only the one.

    As for your question regarding the disconnect between research and practice — I don't think the reasons are sociological, exactly; rather, they're economic (in the broader sense, not meaning just money, but incentives/allocation of resources.) What are the incentives someone would have to use this technique? As you point out several times, it's not at all necessary to language acquisition! If it's not necessary, then it's just an improvement in quality, and sadly, quality doesn't win on its own. The technique would have to create benefits beyond the costs to implement. Perhaps this one would just take someone willing to do an initial labor of love to develop a system and experiment with it, so the benefits can be shown. Right now, the benefits are still just theoretical while the costs, although perhaps trivial (especially as Lucien's designed it, Tom's about to start coding, we're all willing to record pairs…) are still non-zero.

    Is that really your question? Or is your question really "who among you is going to make this happen already!?"

  28. Ravi Purushotma said,

    July 7, 2008 @ 4:39 pm

    A college and I put together a simple system for learning from DVDs ( http://www.lingualgamers.com/thesis/dvds.html ). It would be relatively straightforward to further extend the cellphone extension of it to pull out the sound clips from various movies where the timecode in the subtitles match the desired utterance. Then you'd have an instant database of unlimited exemplars from super high quality, fully authentic, recordings. The main problem, though, is copyrights. Hopefully once DRM becomes as unpopular in movies as it is becoming in music we'll see a whole slew of applications

    > why the big disconnect between research and practice?

    Journals and the academic world give comparatively way less recognition to those solving practical problems than those solving intellectual ones. I think if we were more efficient at disseminating good ideas and people felt they would actually get used, more would make them. But, having to go from conference to conference individually showing teachers is just way too inefficient. And the commercial/publishing world has totally dropped the ball so far.

  29. Timothy M said,

    July 8, 2008 @ 5:50 am

    So how is it that, say, a child being raised in Japan can learn native-like English pronunciation just from the example of one or two native English-speaking parents?

  30. Mark Liberman said,

    July 8, 2008 @ 8:12 am

    Timothy M: So how is it that, say, a child being raised in Japan can learn native-like English pronunciation just from the example of one or two native English-speaking parents?

    An unhelpful, but true, answer: child language learning seems to be quite different from adult language learning, at least in certain ways. This is apparently one of them.

    I should also say that we don't actually know what happens in cases of the kind that you describe. Remember that poor performance on some phonemic distinctions can be retained even by L2 learners who become quite fluent, because linguistic redundancy allows them to understand what is in effect degraded speech, sort of like reading text with some of the letters left out. In advance of doing the research to check, we shouldn't rule out the possibility that children raised with only one or two linguistic models might actually have some unexpected perceptual deficits with respect to a broader population.

    This situation will be muddied by the fact that most such children will have learned the local language as well, and from many models. So you'd have to look at which language is dominant for them, as well.

  31. Agnès said,

    July 8, 2008 @ 1:25 pm

    Wow, I would live to see such an application available online. I am a native French speaker who is reasonably fluent in English after 8 years spent in the US, but I know that my pronunciation is far from perfect (nobody who heard me ever confused me with a native English speaker) and that part of the issue is that I cannot produce some sounds when I cannot hear the difference between them in conversation. And yes, sheep vs. ship is a classic.

    Re. the child being raised in Japan who learns native-like English pronunciation just from the example of one or two native English-speaking parents… While Mark Liberman might be right about the differences between child and adult learning, my experience is that true native-like proficiency will require a little bit more than a single adult can provide (especially if the language gets early competition). Very few English speaking expatriates live in a non-English vacuum. There are other English speakers around — and in most cases expats tend to congregate (especially in Japan!). There is TV. There is songs from CDs.

    I can see all the gradation in French language ability for my and other children of native French speakers in our area. The ones who are completely fluent and lack any hint of an American accent are basically the ones who spent their first three years alone at home with a parent who doesn't speak much English and thus does not interact much with other non-French speaking adults and children. All books and media at home are in French. They also spend at least a few weeks a year in France, getting exposure from a wider variety of native speakers. Interaction with other children seems to be very important (not sure whether for motivation or other reasons).

    My kids (2 and 4, the 2-year old is not quite verbal yet) who have been exposed to English 8-9 hours a day since they were 3 months old are nowhere in that league. My 4 year-old understand French, speaks English outside of the home and speaks English sentences with French words (mostly nouns and verbs, with English endings/conjugations) at home. And while he *can* do a correct French 'r' if corrected, he will default to the English 'r'.

  32. john riemann soong said,

    July 8, 2008 @ 5:42 pm

    Could this technique have unanticipated implications for say, the idea of a "critical window" (though not really for an "optimal window")? After all, the concept behind a critical window seems to be drawn from observations of speakers who have been immersed in another culture for 30 years and still sound non-native. But if such speech can be remedied through an array of techniques including HVPT, then it really seems the window is not so "critical" after all.

  33. Michael Roberts said,

    July 9, 2008 @ 10:07 pm

    I would be happy to both program and host such an application. (No charge.) Incidentally, I also took a course from Dave Pisoni once. Small world…

    Anybody who'd like to collaborate on an online resource — my hosting, my code, your specs, and your audio — please feel free to email me at michael@vivtek.com.

  34. Michael Roberts said,

    July 9, 2008 @ 10:32 pm

    Placeholder for the project idea.

  35. Rebecca Spainhower said,

    July 10, 2008 @ 1:11 pm

    Excellent suggestion. I've been working on learning Tibetan language for a couple of years now. They have a number of sounds that they distinguish, but English doesn't – k / kh, ch / chh, p / p'h, t / t'h – the difference is in both tone and aspiration, depending on dialect. I would love to be able to practice learning how to hear these different sounds (and producing them as well). I feel certain we could enlist the help of various native Tibetan speakers to generate recordings. My contact info is rspainhower at gmail dot com.

  36. David Marjanović said,

    July 12, 2008 @ 1:07 pm

    Japanese speakers do have problems with English vowel sounds, but have none with simple short/ long distinction like bit/ beat.

    Not that I knew anything, but it would surprise me, because the bit/beat distinction is not a simple short/long distinction in at least most kinds of English; the vowels can even be pronounced at the exact same length, and often are, without losing their distinction.

    I mean people who systematically use their L1 categories (Spanish in my experience) as universal.

    There are lots of those. I just came back from a congress where several scientists from Germany spoke fluent English with exactly or almost exactly the… not just German, but northeastern German sound inventory and rules. With this kind of people it happens that they start speaking and you don't know which language they are speaking.

  37. Dougal Graham said,

    September 9, 2008 @ 6:48 pm

    David, I agree. As a teacher in Japan of English, I have noticed that most of my students have trouble with both recognition and production of the tense/lax distinction (i/I) as well as other sounds like l/r, etc…

    With this kind of people it happens that they start speaking and you don't know which language they are speaking.

    This occurs here as well. Although I would guess that it's probably a bit more rare than in German speakers.

  38. John Cowan said,

    October 8, 2008 @ 5:25 pm

    The same problems arise, and presumably the same techniques would work, for people who want to switch to a different native accent. My wife was born and raised in North Carolina to age 18. After more than 30 years in NYC she can produce [pɛn] reliably on demand, but in isolated minimal-pair tests her ability to discriminate pin from pen is still at chance level. What's more, she also occasionally produces [pɛn] when asked to say "pin" carefully, so phonologically they are still pretty much merged.

    After 30 years of marriage, though, I've finally made some headway at convincing her that a Southern accent is not a badge of inferiority, and she should go on saying ink pen and safety pin (or straight pin, etc.) in peace.

  39. Stephen Jones said,

    October 9, 2008 @ 9:22 am

    But isn't this simply the audio-lingual method going back to the 1950s, Mark, and still used at the Defense Language Institute and I believe in China. As a result of Robert Lado's book on Comparative Linguistics courses were set up which started by getting the students to recognize the phonemic scheme of the target language and didn't do anything else until this was mastered. It was only ever popular in situations, like the two I've mentioned, where the student was captive.

    Certainly the use of minimal pairs has long been a standard part of language teaching. You can get books which list which phonemes provide difficulties for speakers of each language (Learner English) is the standard textbook in this respect (and in other respects regarding L1 and L2 differences). The use of the IPA is standard in short EFL training courses such as the Celta, where student teachers are encouraged to put the IPA spelling for all new vocabulary.

    There are a couple of reasons why the use of phonetics and minimal pairs does meet with resistance. One is that a large number of native EFL teachers don't know the phonetic alphabet. The second of course is that in with a multi-national faculty the phonetic transcription may not reflect what is said in their dialect. We used a course that put great emphasis on minimal pairs and the IPA but faced strong opposition from many staff to any attempt to use it in exams. I produced a large amount of minimal pair work for CALL for another course, and once got embarrassed when the person I'd chosen to the words didn't distinguish between the pairs for many words. On another occasion I had to go around staff's offices pointing to my arm and asking them 'what do you call this?', in order to filter out the rhotic speakers (luckily sanity has never been a requirement for hire amongst EFL teachers).

    There was research published in Scientific American around 1986 which showed that at three months babies could distinguish between different sounds, but at six months could only distinguish between sounds that were phonemically different in their mother's language. Distinguishing the phonemic structure of a language seems to be the first stage when learning a language as L1 (which explainswhen a child goes to another country you get long periods of silence before it starts to speak in the new country's language which will become his second L1). Now when we are learning a language as L2 we don't need to go through the same procedure, which is why we have fluent speakers of languages who have serious difficulties with certain basic phonemic distinctions.

  40. Stephen Jones said,

    October 9, 2008 @ 10:14 am

    One point not mentioned is that sometimes you learn to distinguish without learning the distinction. I find it very difficult to distinguish between 'r' and 'rr' in Spanish (and can pronounce neither since for strange reasons I pronounce 'r' the French way so everybody in Spain thinks it's a 'g') but no difficulty distinguishing words using either, like 'pero' and 'perro', because I listen for the allophonic difference in the vowel, which puzzles the Spanish speaker who doesn't normally hear the difference).

  41. Stephen Jones said,

    October 9, 2008 @ 10:19 am

    The online shiporsheep page has five pairs of words for each distinction — the HVPT technique, in the version that I once tried, involved more than a thousand pairs for each distinction.

    How do you find a thousand pairs for each distinction? For certain minimal pair exercises I find I'm racking my brains to find the full complement of nine.

  42. Maria Uther said,

    November 19, 2008 @ 5:34 pm

    Hi there,

    Nice to see a discussion on our HVPT work! To answer your question, I know that in Japan, there *is* use of the HVPT program. It has been sold as "ATR CALL", produced by Advanced Telecommunications Research Labs (Part private, part government-sponsored), mainly through the efforts by Reiko Akahane-Yamada (one of Pisoni's and my collaborators).

    The URL is: http://atrcall.jp/atrcall/ (but unfortunately on in Japanese).

    I would love to see this in more wider usage in Europe. I already had a bit of an interest from at least one company on the results we have seen.

  43. Lisa Morano said,

    June 22, 2010 @ 1:07 pm

    I was reading one article which studies the effects of a 12 hours HVPT training on Catalan/Spanish bilinguals learners of English (for the contrasts : /i:/-/I/, /ae/-/^/ and /p/-/b/ and /t/-/d/ in initial word position) and I found this page while looking for what was HVPT. Here's the article :
    Cristina ALIAGA-GARCIA, joan C. MORA, "Assessing the effects of phonetic training on L2 sound perception and production", pp 2-31, in Michael A WATKINS, Andreia S. RAUBER, Barbara O. BAPTISTA (eds.) (2009), Recent reseach in second language phonetics/phonology, perception and production.
    The results are unclear to me. It seems there's a better perception, even the beginning of a new categorisation for the vowels but no significant change in production. But that's only a 12 hours training.
    In annex you can find minimal pairs.

    At john riemann soong :
    As regards the critical period, I remember reading a research concluding that around 10% of adult migrants would end up with a near native or native prononciation without phonetics classes. It depends mostly on motivation and the permeability of ego. Children don't have this ego permeability problem, they have fun trying new sounds while the adults sometimes find it riduculous. So provided you don't have any physical hearing impairment and that you are motivated enough, you can end up speaking like a native eventually, no matter the age.

  44. Emily Frost said,

    July 27, 2010 @ 2:39 pm

    It's been 2 years since this conversation was started. Has anything been done to implement this idea?

  45. Judith Meyer said,

    July 29, 2010 @ 6:31 am

    I came across an excellent program for French many years back, which used this idea. They made distinguishing pairs not just for inner-French problems such as the various nasals, but also for foreign sounds vs. French sounds. I recently looked for this program again but couldn't find it anymore. There was a shareware or demo for it available. Anyway, I'm tempted to create something similar for German.

  46. Izabelle Grenon said,

    April 3, 2011 @ 9:18 am

    Thanks Dr. Liberman for this insightful article and feedback. As a researcher in second language acquisition, I'm glad to see that so many instructors seem to respond favorably to this approach.
    I think there are non-negligible difficulties, however, in developing freely and easily accessible HVPT program of decent quality (if you are not a programmer) :

    1) First and foremost, it is more time consuming than it may appear. First, finding and choosing enough minimal-pairs is not necessarily easy (especially if you want to include only words that are worth learning, which has been criticized of some training programs meant for educational, rather than research purposes). Then for having good recording quality you need access to a professional recording booth, and be able to do proper acoustic manipulations by increasing the intensity to appropriate and constant level across speakers, and removing background noise or non-speech sounds such as breathing, swallowing or some kind of clicking sounds (cause it can be very annoying for the learner).

    2) Without any programming skills, it is not that straightforward to program either, especially if you want to keep track of students' progress.

    3) As for sharing the program on the internet for free, I'm no expert but I think unless you have access to a free broadband connection, this may be expensive, and you may have to develop one program for PC and one for Mac (so I was told, especially if you want to save the results and keep track of the students' progress).

    4) Although you make it sounds like the "magic" recipe, as Iverson et al 2005 pointed out, none of the learners typically reach ceiling scores with the common HVPT program, not even the versions with manipulated stimuli, thus the need to further improve this technique. Besides, we still don't know how it affects the general language competence as you mentioned.

    5) Finally, one concern of teachers may be "how to test this"? Not only you have to develop a good training program, but if we want this kind of educational tools to be integrated in the classroom or used by students, we need to find a way to "test" it, don't you think? I may be wrong, but I think that if we want both instructors and learners to take the teaching and learning of pronunciation "seriously" (I include in this "perception") in the same way as grammar, we need to find a proper way to evaluate learners' progress. I might be mistaken (sorry if I am), but I have the impression that sometimes teaching pronunciation is perceived as simply "something fun to do, but not as crucial as learning proper grammar". That is being said, I think part of why this may be so partly stem from the fact that we still know very little about the acquisition of perception/production and instructors definitely lack the proper tools (and sometimes also acoustic knowledge) to teach it too.

    I still think there is something there worth pursuing, especially if our endeavor is supported by those who are the target users: teachers and language learners.

    P.S. I am currently developing an adaptive version of the HVPT with English vowel contrast as in "beat" and "bit" (with the level of difficulty progressively increasing as training progress, by changing not only the speakers, but also speech rate and stimuli complexity from simple CVC words, to complex, to near-minimal pairs and finally sentences). This project involves the preparation of more than 2500 stimuli (6 speakers and 3 speech rates), some of which are acoustically manipulated to force learners to focus on the vowel quality contrast. I will be using SuperLab as the basis for presenting the stimuli, which is quite expensive, but it can provide feedback and can switch level after x number of good responses (i.e. I can make the software adaptive).

    I would be happy to share my work or work with others on similar projects, but we are still developing the same program after 5 months of work with 2 research assistants. And that is only for one vowel contrast…

  47. Tsienwei (David) Li said,

    August 25, 2011 @ 4:22 pm

    I hope Elizabeth McCullough still remembers me. I was her student at SUNY Albany in September-December 1985, and the oldest one, aged 53 then. I also hope she still has a photo of my white kittens.
    I was later trained at the Russian Dept's Russian-English Translation Program.
    Tsienwei in Pasadena, CA
    AshMemory@gmail.com (In Loving Memory of AshleyBoyCat April 1990 – January 24, 2003)

  48. Ron Thomson said,

    January 27, 2012 @ 4:11 pm

    I've been trying for some time to make HVPT available to learners outside the lab. It is not as simple as it seems. I've currently got a website in Beta that works for anyone wanting to learn a Canadian/General American English pronunciation. Of course it's missing ɔ for some American varieties, but I take Jennifer Jenkin's view that we should take a minimalistic approach, and only focus on contrasts that are essential.
    If anyone is interested in seeing the website, it's http://www.englishaccentcoach.com

RSS feed for comments on this post