## ER and ERM in the spoken BNC

From John Coleman:

Inspired by your recent Language Log pieces, I tried an analysis of "er" vs "erm" in the Spoken BNC. These are the two main transcriptions for filled pauses labelled as "UNC" in the Claws-5 tagset and also "UNC" in the richer set of pos labels used in BNC. I.e. they are distinguished from items labelled as ITJ / INTERJ, in which the few tokens of "uh" and "um" are classified. These "uh"s are almost all in "uh huh" meaning "yes", and many of the "um"s and "mm"s are also in contexts where the "yes" sense is clear. So I disregarded the ITJs and restricted the analysis to UNC "er" and "erm", which are far more numerous in any case. As these are mostly nonrhotic dialects one can interpret "erm" as just schwa + nasality, with no implication of rhoticity; ditto for "er".

The British National Corpus is a balanced corpus of 100 million words, collected in the early 1990s. The spoken portion comprises ten million words; Jiahong Yuan and I collaborated with John Coleman and others, five years ago, on a project to help rescue the recordings and connect them appropriately with the transcripts.

Here are John's counts:

SEX AGE  ER  ERM
f    0  315  499
f    1  669  813
f    2 1299 2121
f    3 1574 1398
f    4 2255 1674
f    5 3226 2071
m    0  602  518
m    1 1125  893
m    2 2530 2111
m    3 2631 1642
m    4 4614 3513
m    5 5648 1605

As John explains,

The age groups are 0 (1-14 years old), 1 (15-24), 2 (25-34), 3 (35-44), 4 (45-59), and 5 (60-95). We have 45,246 tokens from turns that are labelled with speaker age and sex; I left out the remaining 28,378.

The overall ERM/(ERM+ER) proportion for female speakers is 47.9%, while for male speakers it's 37.5%. Thus the direction of the sex effect is consistent with what we've seen in a five American datasets and the Glaswegian dataset from the HCRC Map Task corpus.

A plot of the interaction between sex and age is here:

This also shows the same apparent-time change in the direction of greater ERM (==UM) usage found in Fisher, Switchboard, and the Philadelphia Neighborhood Corpus (PNC).

In the PNC, where we have data collected over a period of 40 years, this seems to be partly a life-cycle effect and partly a genuine change in progress.

It's surprising that there's such a widespread and robust marker of gender (and age) identity that (as far as I know) no one noticed before I stumbled on it in 2005 while looking for something else. I won't be surprised to learn that there are some earlier observations, but in any case, ordinary people don't seem to register these differences consciously at all.

Update — John supplied the total word counts for each age and sex combination, making it possible to calculate ER percentages by age and sex, which shows the same age grading as we saw in the Fisher and Philadelphia Neighborhood datasets:

The ERM percentages by age and sex show a less clear pattern:

The overall filled pause (ERM+ER) percentages:

In response to my observation that "It's surprising that there's such a widespread and robust marker of gender (and age) identity that (as far as I know) no one noticed before I stumbled on it in 2005", John responded:

Ah, not really surprising? (a) probably nobody looked for it, and (b) as you have pointed out many times, it is only recently that the right combination of corpora, easy access to corpora, and widespread ownership of laptops etc permitting breakfast and other periprandial experiments to be carried out have come together. Earlier studies of filled pauses etc in e.g. the Conversation Analysis literature tended to be small-scale micrographic studies in which changes over time and/or differences in usage frequency would be invisible.

I agree, but what surprises me is that this rather large gender difference, in a rather common aspect of speech,  wasn't a commonplace anecdotal reaction to people's experience of everyday life.

John is just as puzzled as I am about how this difference came to be as geographically and socially widespread as it clearly is:

It's also pretty mysterious as to why or how the change should be so widespread? Sure, there is always a certain amount of linguistic to-ing and fro-ing between US and UK English, on a fairly low level (catchwords etc), but otherwise the many dialects involved have retained their separate characteristics quite extensively, it seems to me. The UH/UM change precedes Facebook, universal internet, huge increase in US TV shows in Britain etc … I am baffled as to *cause*.

It's a plausible hypothesis that phonetic symbolism is involved somehow, but the only evidence for this idea is Sherlock Holmes' assertion that "when you have eliminated the impossible, whatever remains, however improbable, must be the truth".

The accumulating set of LLOG posts on UM vs. UH:

Past LLOG posts on UM vs. UH:

"Young men talk like old women", 11/6/2005
"Fillers: Autism, gender, age", 7/30/2014
"More on UM and UH", 8/3/2014
"UM UH 3", 8/4/2014
"Male and female word usage", 8/7/2014
"UM / UH Geography", 8/13/2014
"Educational UM / UH", 8/13/2014
"UM / UH: Life-cycle effects vs. language change", 8/15/2014
"Filled pauses in Glasgow", 8/17/2014

1. ### Victor Mair said,

August 18, 2014 @ 7:55 pm

In this spate of posts on pause fillers, I've been meaning to tell this true anecdote for weeks, but now, since this one is still awaiting the first comment, I might as well just go ahead and put it down. What I'm about to say may sound a bit frivolous, but I think that it may be relevant to the overall deliberations in some way. So here goes.

When I was at Dartmouth (1961-65), there was a Classics professor whose course on Greek civilization I attended. Whenever he needed to pause, instead of saying "um", "er", or "uh" like most people, he would say "oink" — at least that's what it sounded like to me. I must have heard him say that syllable hundreds of times that semester. The first few times, and even later on depending on how giggly I was feeling on any given day, I actually burst out laughing and felt very bad about that. Later, I would somehow manage to squelch my audible chuckles, but I must say that it never ceased to amuse me.

I asked some classicists whether there is any evidence that early Greeks, or even modern Greeks, had a pause particle that sounded like "oink", but never got a satisfactory answer. I wondered whether his "oink" was part of any speech community anywhere, or whether it was totally idiosyncratic. To this day, I still do not know how to account for this most peculiar sound repeatedly uttered by my Classics professor.

2. ### Levantine said,

August 19, 2014 @ 12:59 am

I'm still unclear on the difference (if any) between "er" and "uh", though both the post and first comment seem to treat them as separate sounds.

3. ### Pflaumbaum said,

August 19, 2014 @ 2:08 am

There's a question over whether non-rhotic speakers would insert an /r/ if a vowel came hot on the heels of the er/uh. As in "…errrr actually". I'm not sure from analyzing my own speech.

If not, it would differ from the normal NURSE vowel, but I don't suppose that's a particularly unusual state of affairs – previous LL posts/comments have discussed cases like yep/nope and Malinki-hmm.

4. ### Pflaumbaum said,

August 19, 2014 @ 2:09 am

Sorry, *mm-hmm*, I meant. Predictive text substituted the pet-name for my toddler!

5. ### Doreen said,

August 19, 2014 @ 6:50 am

@Victor Mair
I seem to recall that Ken Livingstone (UK politician and former mayor of London) has, or at least had, a habit of using 'I think' as a frequent verbal filler which came out sounding like 'oink'. I haven't got time to hunt through YouTube videos to find instances of this in his TV appearances, but maybe another LL reader does.

6. ### Victor Mair said,

August 19, 2014 @ 7:08 am

@Doreen

Thanks very much, Doreen. That's the best lead I've received to the solution of the linguistic puzzle that has been intriguing me for the last half century.

7. ### Bloix said,

August 19, 2014 @ 10:30 am

Typo in the label of the y scale (3=45-44, should be 35-44).

Along the lines of oink, a lot of younger people say something like "meme," for "I mean."

8. ### Mr Punch said,

August 19, 2014 @ 10:34 am

In line with a couple of comments above, what is this "erm"? The spelling is a Briticism (at least, I can't recall ever seeing it in a US context) – but (as noted) much British speech is npnrhotic. Are there people (Scots, perhaps) who actually pronounce "erm" as spelled?

9. ### Levantine said,

August 19, 2014 @ 11:07 am

Mr Punch, "erm" is just a non-rhotic way of transcribing what Americans write as "um" (same goes for the pair "er" and "uh"). That said, some of what's been posted here on LL in recent days implies that these interjections are pronounced differently, but I can't imagine any rhotic Brit sounding the Rs (any Scots of West Country folk care to weigh in here?) . Following on from Pflaumbaum's comment, I (a non-rhotic Londoner) would not insert an R in cases where "er"/"uh" is closely followed by a vowel, but instead merge the schwa of the interjection with the opening vowel of the following word. This suggests to me that the R really is just an orthographic convention, and never actually pronounced except in the imagination of Americans who misunderstand the spelling.

10. ### Ginger Yellow said,

August 19, 2014 @ 11:08 am

It's not rhotic in my southern English speech, but it's a different vowel sound than "um" – the same as non-rhotic "er" (ie the "her" vowel"), as opposed to the "mum" vowel.

11. ### Ginger Yellow said,

August 19, 2014 @ 11:11 am

hat said, some of what's been posted here on LL in recent days implies that these interjections are pronounced differently

They are, by me. But they're not the same kind of thing, in my usage. "Erm" is mainly used at the start of a sentence, for instance when I've been asked a question and I'm trying to think of an answer. "Um" might be used at the start of a sentence or clause, but can fill a pause anywhere in the utterance.

12. ### Levantine said,

August 19, 2014 @ 11:17 am

Ginger Yellow, that's interesting. Is the difference one of length, then, or are you saying you actually sound an R when pronouncing "erm"?

13. ### Levantine said,

August 19, 2014 @ 11:20 am

Oops, I missed your first post, so ignore my question.

14. ### Levantine said,

August 19, 2014 @ 11:28 am

For me, the two sounds you describe are part of the same continuum, and I wouldn't think to distinguish them orthographically. My version of your "um" does not, as far as I can tell, rhyme with "mum"; it's just a shorter variant of "erm", and still basically a schwa.

15. ### Levantine said,

August 19, 2014 @ 11:37 am

Interestingly, the OED has "er", "uh", and "um", but no "erm". It gives the pronunciation of the first of these as /ɜː(r )/, which makes me even more curious about whether there are any rhotic speakers out there who really do sound the R. And it just dawned on me that I would insert an intervocalic R before the final syllable of "umming and ahing" (how strange it looks written down!).

16. ### pj said,

August 19, 2014 @ 12:22 pm

It's not rhotic in my southern English speech, but it's a different vowel sound than "um" – the same as non-rhotic "er" (ie the "her" vowel"), as opposed to the "mum" vowel

I agree. To my mind, also, an 'um' is lengthened to fill more time by extending the nasal ('Ummmmmmmmm…'), whereas an 'erm' is lengthened by extending the vowel ('Errrrrrrrrrrrrrrm…')

17. ### Levantine said,

August 19, 2014 @ 12:36 pm

pj, that's actually convincing. I guess that means that the two spellings can be said to represent different interjections (though neither of them involving an R sound). But then the question is how Americans would draw such an orthographic distinction, since they only have "um" available to them.

18. ### Bloix said,

August 19, 2014 @ 2:09 pm

I wonder if the old-fashioned expression "hemming and hawing" actually refers to umming and uh-ing. Or erming and erring, if you prefer.

19. ### Pflaumbaum said,

August 19, 2014 @ 5:02 pm

Yes erm and um are distinct phonemically, not just semantically, at least for a lot of Southerners: erm has the NURSE vowel, um has STRUT.

That said, some English accents have schwa for STRUT (Geoff Lindsey actually recommends teaching foreigners this pronunciation, since it's native to much of the Midlands and North, and /ʌ/ is so hard for many learners). So I guess the distinction for them would be purely a matter of length – /əm/ versus /əːm/. And how genuinely contrastive would that be in the case of these filler words?

As for the majority of Northerners/Irish who have [ʊ] for STRUT, my impression is that the two words sound the same (somewhere along the /əːm/~/ɛːm/ continuum). At any rate I'm pretty certain I've never heard [ʊm].

20. ### Ray Girvan said,

August 19, 2014 @ 7:13 pm

@Victor Mair / Doreen: oink

Just so. This is from a different dialect/accent, but my stepfather, who was from Fife, used to have a filler "I think" that he pronounced as /əhɛŋk/. It always tickled me, given that I didn't like him, that it sounded very close to "oink" to me, and no doubt to others.

21. ### Bathrobe said,

August 19, 2014 @ 9:05 pm

I've said this before: 'um' and 'erm' may be different phonemically, but I personally suspect that transferring this over to the interjection is something of a red herring. The interjection uses a vague vowel, not a 'phonemic' one.

In my own speech (Australian), the lengthened schwa in 'nurse' and the short schwa in 'mother' are totally different sounds. And for me, the sound in the interjection 'er' or 'erm' is closer to the final sound in 'mother' than it is to the sound in 'nurse'. That is why it is equally valid to write 'erm' as 'um'. So my question to Pflaumbaum is: Do people really deliberately pronounce 'er' and 'erm' with the vowel in 'nurse'? Or is this just a convenient spelling adopted to write this indeterminate vowel? I would plump for the latter.

22. ### RP said,

August 20, 2014 @ 3:18 am

It might well be the case that someone says something that would be better written "um" but it might get transcribed as "erm" because this is more in line with British conventions. When you see the spelling "erm", I don't think you can draw any conclusion as to the quality or length of the vowel the speaker used.

On the other hand, when reading back what someone has spelt out, I'd be influenced by the spelling they chose (even though their choice would have been driven mostly or wholly by British/American conventions, and not by the quality or length of the speaker's vowel). So when I read "erm", in my head I'm using the 'nurse' vowel, and for "um" I'm using 'strut', but I don't claim to know that these really were the specific vowels the writer had in mind.

23. ### Pflaumbaum said,

August 20, 2014 @ 4:45 am

@ Bathrobe –

You might well have a point. If it's true that er doesn't have /r/ even preceding a vowel in the same prosodic unit (though I don't know if that's true), perhaps that would suggest that these filler words can differ phonetically from normal words?

As I mentioned above, there's been talk on LL before of yep and nope having some sort of ejective /p/, and mm-hmm having – I don't know what you'd even call it – a voiced bilabial nasal /h/? Not that these are filler words… but they are stand-alone, I guess.

24. ### Maryellen MacDonald said,

August 23, 2014 @ 4:46 pm

There seem to be two general classes of hypotheses here for these gender differences. The first is that women are on average more fluent and need fewer filled pauses in their utterances than men need. The second, which seems much more plausible to me, is that women and men have a different distribution of these fillers than men do–not only the ratio of uh to um (and British equivalents) that we see in several corpora, but also in their use of other filler words that to my knowledge haven't been considered in these analyses. One common filler in US English is "like" and another is "sorta." So women may have the same rate of filled pauses as men but may be favoring non-uh/um fillers more than men, and as a result, women's rate of uh/um is different from that of men. (Of course both hypotheses can be true, that is, one gender could have different overall rate of filled pause usage than the other, and there could also be a different distribution of filled pause types across genders.)

Alas, the tricky thing about "like," "sort of" and other non uh/um fillers is that these words have other uses, and so merely looking at the gender distribution of "like" and "sort of" usage will not immediately get at their use only as filled pauses. But could be interesting anyway.

25. ### Bloix said,

August 28, 2014 @ 8:33 am

Re oink – I met a person last weekend who had a filler that sounded something like "yo" or "y'oh – apparently a contracted "y'know."