More on a#n vs. an#

« previous post | next post »

Here, with additional context, are the audio clips from yesterday's post "The phonetics of a#n vs. an# juncture":

1.
2.
3.
4.

The answers in the comments were about 73% correct overall.  It's not easy to count, since many people didn't obey the "forced choice" constraints, but here's a breakdown by item, counting "I can't tell" and similar answers as half a vote for each alternative:

an ice a nice
1 14.5% 85.5%
2 80.3% 19.7%
3 63.2% 36.8%
4 36.8% 63.2%

The audio clips came from the "Fisher" collection of telephone conversations, published by the LDC in two parts in 2004 and 2005. In the collection's transcripts, there were 53 instances of "an ice" and 1726 of "a nice", from which I selected two of each at random.

The overall tally of 73% correct is consistent with the "two thirds" cited by Bob Ladd from Ladd and Schepman, "'Sagging transitions' between high pitch accents in English: experimental evidence", J. Phonetics 31(1) 2003.

Our little "experiment" here was flawed in various ways that make the exact numbers not worth interpreting —

  • there were too few test items, given the usual item-by-item variability;
  • the context was missing (though it could not easily have been included without giving the answers away);
  • the "subjects" saw one another's answers;
  • some of the "subjects" were non-native speakers;
  • etc.

But the general pattern is consistent with what such experiments generally show, which is that ordinary speech contains probabilistic clues about junctural ambiguities, and people are pretty good about interpreting these cues, but most exemplars nevertheless remain ambiguous to some extent.

In such experiments, attention is typically focused in advance on the ambiguity to be resolved, to the exclusion of other cognitive tasks. The results therefore tend to show somewhat better performance than when people hear the same sort of material performed in a context where either interpretation is plausible, and where the judgment needs to be made on the fly,  in the course of ordinary speech understanding.

In that case,  the individual listener's in-context bias towards one interpretation or the other is likely to play a much larger role.  And if the listeners are equally disposed to hear either alternative, they're like to recognize that in fact they can't tell what the speaker meant to say. Which is exactly as it should be, from a communicative point of view.

By the way, this small sample also confirms Bob's citation of Lehiste's observation about [n] duration — we expect [n] in V#n'V to be generally longer than [n] in Vn#'V,  given that the onset of stressed syllables is the strongest position for consonants in general, while the context V__#V is a very weak one (as shown by the fact that flapping and voicing occurs, in American English, in sequences like "at all" or "Fat Albert").

Specifically, I measure the following durations in milliseconds for the nasal murmurs in question:

#1 #2 #3 #4
68 48 29 46

FWIW, this gives us averages of 57 msec. for the [n] in "a nice", vs. 38.5 msec. for the [n] in "an ice". These are a considerably shorter overall that the 67 and 47 msec. that Bob quoted from his 2003 paper. This is expected given the differences in linguistic and communicative context; but the direction of the effect is the same.

Note also that the ranges (in this tiny sample) overlap, consistent with my previous suggestion that the within-category variation would be roughly as large as the average between-category difference.



22 Comments

  1. V said,

    May 9, 2010 @ 10:25 am

    Hahahaha. Now with the context given to us this problem becomes very easy… Structured Prediction (http://nips.cc/Conferences/2007/Program/event.php?ID=573) will be so useful here…

    [(myl) Ben Taskar's idea certainly applies to the general problem of speech recognition, but disentangling the interrelations among correlated phonetic, lexical, and pragmatic variables is not crucial in making it easy for English speakers to perform this particular example of disambiguation in context. With only textual context, and no phonetic information at all, even the simplest n-gram model would have no trouble in deciding between e.g.

    … watch a little TV, have (an ice | a nice) breakfast …

    or

    … originally wanted to set up (an ice | a nice) rink in Newport News, Virginia …

    ]

  2. Ben said,

    May 9, 2010 @ 10:56 am

    It's interesting to note that there were three commenters who heard knife or ife for number three. The transcription for that one is "example of a…of an ice skater, you know". There is a false start and a correction both containing of and immediately followed an ice. The F in ife and the S in ice are both voiceless fricatives and the only difference is positional and slight (labiodental vs. alveolar). So I wonder if the two occurrences of of actually resulted in a small but perceptible phonetic persevervation error. When listening to it myself again, I do sort of hear on F sound now, but I can't tell whether I'm making it up or not.

    [(myl) This is all telephone speech, and most of the cues to the [s]/[f] distinction are above upper limit of the telephone bandwidth (which depending on the equipment involved is 3200 to 4000 Hz), so [s]/[f] confusions (which are rare in face-to-face communication) are common over the phone.]

  3. Ben said,

    May 9, 2010 @ 10:57 am

    Actually, reconsidering that the F in OF is voiced, but still, it's pretty close in the phonetic space.

  4. V said,

    May 9, 2010 @ 11:09 am

    Yes you are correct Prof. Liberman. We donot need a heavy duty graphical model here. The problem can be solved by a simple n gram model.
    But isn't n gram a very simple graphical model??? I guess, it is and so the problem is a very very simple application of structured prediction.

  5. Army1987 said,

    May 9, 2010 @ 11:21 am

    Yeah, I heard the fricative in #2 as F too, but I assumed that was just an artifact of the speaker keeping the microphone too close to her mouth.

    [(myl) That sort of thing can happen, and it clearly does in the [f] of "my friend" in that clip, but I don't hear or see any evidence of it in the "an ice rink" segment.]

  6. unekdoud said,

    May 9, 2010 @ 11:28 am

    The third sample is noticeably shorter in length, and at first I thought there was too much background noise to be sure.
    I suspect the length of the "s" sound also affects the interpretation, but the results above say otherwise. Given that the acoustic details of each phoneme vary between samples it should be more difficult to test whether specific combinations of them are interpreted differently.
    Ideally, we should be able to tell "a nice a nice a nice" and "an ice an ice an ice" apart when spoken by the same person. But this is not reflective of everyday use.

    With that settled, we can move on to related phenomena in other languages…

  7. Stephen Nicholson said,

    May 9, 2010 @ 12:25 pm

    I got the first two right and the second two wrong. (All I did was listen the clips, I didn't read the rest of the post after them or read the comments before hand.)

    After I was done, I had a pattern: 1) a nice, 2) an ice, 3) a nice, and 4) an ice. I thought to myself, "well, then I'm probably wrong." Since I didn't think that you would put arrange them in a pattern like that. When I read the rest of the post, I figured it was even less likely.

    [(myl) It's amazing how elaborately people will over-think perception tests. Luckily, the result tends to be random, so when you average a bunch of subjects together, they more or less cancel out.]

  8. Army1987 said,

    May 9, 2010 @ 12:39 pm

    I meant #3 not #2… sorry

  9. Beth said,

    May 9, 2010 @ 1:39 pm

    I encountered an example of juncture ambiguity last night: a friend thought NPR's Steve Inskeep was actually Steven Skeep. His response was, "Well, he always says his name so fast, how are you supposed to tell?"

  10. Eric said,

    May 9, 2010 @ 4:15 pm

    I still can't entirely tell what the answer to number three is. My best guess as to what the person is saying is "an example of an ice scare," though I could maybe also buy "ice skater" as Ben has it. It seems more likely to be "an ice" rather than "a nice", because it seems to admit more plausible options for what the person might be saying, but it's certainly not conclusive for me.

  11. Russell said,

    May 9, 2010 @ 4:58 pm

    Eric,

    With even more context it would definitely get clearer. The people being recorded are talking about professional athletes, and had just mentioned ice skaters a few turns back. Here's the full turn:

    Yet you would think in regards to, ah, people getting this type of pay and wanting to strike for it, you would think something — you know, using, ah, an example of a — of an ice skater, you know — they would probably say the same thing, you know, "Why can't — why are we not getting paid as much as somebody else is getting paid.

  12. Rubrick said,

    May 9, 2010 @ 5:54 pm

    Out of curiosity, was my waggish comment referring to this post purged by a human, an automated filter, or did I actually just fail to post it? And if the first, was it done with or without knowledge of the referent? (I realize I shouldn't assume LL posters read all each other's LL posts.)

    [(myl) I'm not sure what comment you're refering to, but I certainly wouldn't delete one of your contributions, which I esteem highly. If it never showed up, it was probably caught by the automated Akismet spam filter. If it appeared on the site and then disappeared, one of the contributors must have removed it for some reason, but the culprit certainly wasn't me.]

  13. Dave said,

    May 9, 2010 @ 7:44 pm

    Given that the n in #2 was longer than the one in #4, it remains a mystery how people were able to tell that #2 was "an ice" and #4 was "a nice". They must be using something besides length.

  14. Mark F. said,

    May 9, 2010 @ 8:48 pm

    So how do people say these things differently when they're actually trying to make a distinction? Going back to the "give it a name" example, suppose you asked "An aim?" or alternatively, "A name?" In that case, you'd probably make it pretty unambiguous. But I'm not sure if I trust my intuitions on what would be different. My impression is that, in the first, the schwa in "an" will be colored by the n, and there will be a pause or hesitation of some sort before "aim", and in the second case neither of these things will be true. Does that sound right?

  15. stevesp101 said,

    May 9, 2010 @ 11:20 pm

    Could stress make a difference? It sounds like there's usually a difference in which words are stressed between a#n and an#. In clip 1 (caps for stress): a nice BREAKfast; 2: an ICE rink; 3: an ICE skater; 4: a nice GUY.

    The difference is most pronounced between clips 1 and 3. Clip 4 actually doesn't seem to have a very noticeable stress difference between nice and guy, but (to me) GUY seems to be a little more stressed.

    I realize that often people will stress 'nice.' You could easily imagine 'a NICE BREAKfast.' But 'a NICE breakfast,' sounds a little odd to me. Maybe I'm wrong though?

  16. Ben said,

    May 9, 2010 @ 11:40 pm

    @Dave: It seems to me that speaker four is generally just a faster speaker than speaker two, so I would expect all her sounds to be shorter. At least for these examples, if you do some quasi-approximate-normalization for the speed of the speaker it seems (subjectively) that the within-group variance decreases. And I think just the three phonemes surrounding the N are enough context to get an idea of the speaker's speed.

    I didn't do any actual analysis on the samples, so I am just speculating here. And even if I'm right, I have no evidence on hand that would suggest that this pattern holds beyond the four examples given.

  17. stevesp101 said,

    May 10, 2010 @ 12:23 am

    Follow up on my last comment: The rule I'm positing is that when saying 'a nice,' nice is generally not stressed, but when saying 'an ice,' ice is generally stressed. There is a group of counterexamples which I think should be ignored — direct contrasts. Sentences like: John is a nice guy, but Jim is a bad one. 'Nice' would probably be stressed. Similarly, 'I am an ice person, you are a no ice person,' (i.e. one person likes ice in his drinks, the other does not). Ice and no ice would both be stressed. I think this group should be ignored, because in most/all contrasts, you stress what is being contrasted.

  18. Dave said,

    May 10, 2010 @ 2:18 am

    Ben,

    Having listened to the (clipped) recordings again, I think you're right, at least in that speaker 2 appears slower than the other three, and I was comparing #2 and #4.

    Stevesp101,

    I agree that it's "a nice BREAKfast" in #1. I agree with your observation that, in #4, although it certainly isn't "NICE guy", there is a degree of stress on "nice". After all, the person she's talking about is obviously a guy; only "nice" carries new information.

    It seems plausible that we all made the inference that "ice" was going to appear as the first element of a compound, and carry stress for that reason. That would be true in the example "ice storm" that was given to us, though less so in "ice cream sundae". "Nice" would have been even more strongly stressed in "nice one" than in "nice guy".

    These considerations might explain why there was such strong agreement about #1.

    It might have been preferable to present examples in which stress was unlikely to play a role, either in the samples or in our expectations. Perhaps instead of "ice" and "nice", two countable nouns could have been chosen.

  19. Rubrick said,

    May 10, 2010 @ 11:21 am

    (myl) I'm not sure what comment you're refering to, but I certainly wouldn't delete one of your contributions, which I esteem highly.

    I'm immensely flattered.

    The comment consisted entirely of the first four color names disproportionately favored by men (omitted here in case it was indeed the spam filter). I felt it was one of my finer inspirations. Made me laugh, anyway.

  20. Phil said,

    May 11, 2010 @ 10:17 am

    @Beth: I encountered an example of juncture ambiguity last night: a friend thought NPR's Steve Inskeep was actually Steven Skeep. His response was, "Well, he always says his name so fast, how are you supposed to tell?"

    I like to pretend PRI's Marco Werman is Irish for the same reason.

  21. Bloix said,

    May 11, 2010 @ 8:39 pm

    Marie Kapartridge.

  22. Mel Nicholson said,

    May 12, 2010 @ 2:23 pm

    I see you measured the duration of the nasal murmur in isolation. I'd be interested to see the ratio of durations between that and the preceding vowel. That would help control for the variations in speech speed between the different clips.

RSS feed for comments on this post