Trending in the Media: Um, not exactly…

« previous post | next post »

I like journalists, really I do. But sometimes they make it hard for me to maintain my positive attitude. The recent flurry of U.K. media uptake of Language Log posts on UM and UH provides some examples of this stress and strain.

Here’s Stuart Jeffries, “Um or er: which do you, um, use more in, er, conversation?“, The Guardian 10/6/2014:

In the historic struggle between the ummers and the errers, the ummers are getting the upper hand. A study of speech patterns by socio-linguists at Edinburgh University has found that English speakers increasingly tend to use “um” rather than “er” as the filler of choice.

The “socio-linguists at Edinburgh University” are Joe Fruehwald, a sociolinguist who’s at Edinburgh these days, adding to work by me, a phonetician from the University of Pennsylvania who has occasionally visited Edinburgh and has many friends there, and Martijn Wieling, a dialectologist from the University of Groningen, and John Coleman, a phonetician from the University of Oxford, and Jack Grieve, a linguist at Aston University. And the “study” is a set of blog posts in which we’ve pulled out some data from existing collections of various kinds, a mode of research that I’ve jokingly called Breakfast Experiments™ because writing and running the scripts involved can generally be done in the time it takes to drink a couple of cups of coffee.

This doesn’t mean that the data or the analysis is unreal or unserious — and we’ll probably turn all this stuff into a conventional paper in a traditional journal before long. Meanwhile, the relevant blog posts, in chronological order, are: “Young men talk like old women“, 11/6/2005; “Fillers: Autism, gender, age“, 7/30/2014;  “More on UM and UH“, 8/3/2014; “UM UH 3“, 8/4/2014; “Male and female word usage“, 8/7/2014; “UM / UH geography“, 8/13/2014; “Educational UM / UH“, 8/13/2014; “UM / UH: Lifecycle effects vs. language change“, 8/15/2014; “Filled pauses in Glasgow“, 8/17/2014; “ER and ERM in the spoken BNC“, 8/18/2014; “Um and uh in Dutch“, 9/16/2014 “UM/ UH in German“, 9/29/2014; “Um, there’s timing information in Switchboard?“, 10/5/2014. (The hyperlink in Jeffries’ article goes to the eighth of those 13 posts.)

So it’s interesting to see all of this framed in the traditional journalistic fashion as “A study of X by Y-ists at Z University” — and to see what values for X, Y, and Z Jeffries picks up. This misreading then sets up  a bit of boffin-bashing:

Fruehwald examined 25,000 examples of people in the US city of Philadelphia saying “um” and “uh”. You might say that’s because socio-linguists have exhausted important things to study but I, um, couldn’t possibly, like, comment.

Or you might say that Jeffries is too badly-informed and/or lazy to grasp the fact that Joe spent a few minutes writing a computer program, which in turn spent a few seconds sorting instances of UM and UH by age and gender in Joe’s copy of the Philadelphia Neighborhood Corpus, which was collected over a few decades by students in a course on sociolinguistic field methods, and has been used in hundreds (maybe thousands) of published papers over the decades. But I, um, couldn’t possibly, like, characterize Mr. Jeffries as an arrogant ignoramus, without knowing more about his actual expertise and motivations.

More seriously, it seems to me that Jeffries suffers from the journalistic version of the blind spot that I attributed to old-fashioned psycholinguistic researchers recently (“Um, there’s timing information in Switchboard?“, 10/5/2014). People think of a “study” as an enterprise where you go out and spend months or years gathering data, not as an easy-to-write computer script that pulls out some new aspect of an existing large shared multi-purpose dataset. So it makes sense for Jeffries to make fun of us “sociolinguists at Edinburgh University” — if we had really collected and transcribed hundreds of sociolinguistic interviews over four decades, solely in order to study the distribution of filled pauses, we’d deserve to be mocked.

Then there’s this:

But why the shift from “er” to “um”? Is it because inside every “um” there’s a little “er” that’s been elongated and given a stronger terminal sound, and favouring the former indicates our growing existential confusion at a world increasingly gone, um, nuts? It’s a theory. Here’s another. According to the University of Pennsylvania’s Professor Mark Liberman, who did another study of filled pauses, people tend to use “um” when they’re trying to decide what to say, and “er/uh” when they’re trying to decide how to say it.

That last sentence starts from the usual scientific game of “what if?”, as exemplified in this blog-post passage where I laid out three logically-possible types of hypothesis, gave the “what to say” vs. “how to say it” idea as a for-instance example of one of the three types, and observed that “none of these explanations seems very plausible to me”. In order to suggest how a functional difference between UM and UH might generate something like the observed sex and age effects, I discussed these hypotheticals at greater length in an email Q&A with Olga Khazan, author of an Atlantic Magazine piece that Jeffries links to, “Men Say ‘Uh’ and Women Say ‘Um’“, 8/8/2014.

In that context, I was spinning out conceivable theories, not offering an explanation that I believe is correct. But Olga quoted me in way that may leave the reader unsure about that — here’s the part that Jeffries quotes from her article:

Liberman also posits that “um” and “uh” portray language fluency and intelligence differently. “People tend to use UM when they’re trying to decide what to say, and UH when they’re trying to decide how to say it,” he told me in an email. “As people get older, they have less trouble deciding what to say (because they know more stuff), and more trouble deciding how to say it (because they know more words and fixed phrases, and so have a harder time making a choice). As a result, older people use fewer UMs and more UHs.”  

Thus, one theory is that perhaps, “At any given (adult) age, men are more linguistically experienced than women, and so use UM and UH as if they were older,” he says. “OR MAYBE: Women are more communicatively circumspect than men, and therefore more likely to pause before deciding what to say; but women are more linguistically fluent than men, and therefore less likely to pause while deciding what words to use.”

In fact, I’d argue that the “what to say” vs. “how to say it” differentiation, if it exists at all, can’t account for most of the observed variation.  But I should have realized that “maybe it’s X, maybe it’s Y” talk is dangerous in interviews with journalists (though essential in conversations among scientists) ,and I should be happy that (so far) no one has set up a fake debate between me and (say) Josef Fruehwald, along the lines that I’ve described in earlier posts like “Imaginary debates and stereotypical roles“, 5/3/2006.

Moving along in Jeffries’ article, we learn that

Liberman transcribed 14,000 phone conversations, totalling more than 26 million words from 12,000 speakers across the US and found that the use of “um” and “er/uh” can reveal the speaker’s gender, language skills and life experience.

Yes, I did this by transcribing at super-speed in my Fortress of Solitude near the headwaters of the Schuylkill River. Because this was a Study, you know, and that’s how we Scientists do it.

(Actually, the transcriptions in question were done, over a period of 15 years, by dozens or even hundreds of people, many of them professional transcriptionists, in several projects creating datasets for government-sponsored research in speaker identification, speech recognition, and other technology-development areas.)

Meanwhile, over at The Times, Oliver Moody (“To um or to er? Studies probe how brains fill the speech-thought gap“, 10/4/2014) elevates the question to mock-epic status:

In Gulliver’s Travels, the land of Lilliput has been shaken by six vicious rebellions after a controversy over which end of a boiled egg should be broken first.

It seems that real life is out to compete with satire. A growing body of linguistic evidence points to a faultline emerging between two tribes in western society: the ummers and the errers.

Separate studies in Glasgow, the US, Germany and the Netherlands over recent months have all shown that women and young people are much more likely to use “um” when waiting for the next thought to come along, while men and older people go for “er”. And in the battle of the disengaged brains, “um” is winning.

(The Wachowskis have optioned the movie rights for The Battle of the Disengaged Brains, with Shia Labeouf to play Martijn, Keanu Reeves in the role of Josef, and Clint Eastwood as yours truly.)

Anyhow, there are those “studies” again.  Moody goes on to describe what he calls a “deeply unscientific analysis of celebrity interviews”:

Men of the old school overwhelmingly resort to “er”. Appearing on the US chat show Late Night With David Letterman in 1988, the comedian John Cleese managed 24 “ers” and eight “ums” in ten minutes.

The princeling of the modern-day errers, however, is Nigel Farage, the Ukip leader, who used a positively Victorian 15 “ers” (88 per cent) and just two “ums” (12 per cent) on The Andrew Marr Show in March this year.

At the other end of the spectrum was Steven Gerrard, captain of Liverpool Football Club. In the immediate aftermath of his side’s 1-1 draw with Everton last month, Gerrard trundled out nine “ums” and one “er” in two and a half minutes.

Lena Dunham, the writer and star of the sitcom series Girls, used 79 per cent “ums” on Letterman this year, while the Harry Potter actress Emma Watson hit 58 per cent on the same show.

But the thing is, the only difference between what we did and what Moody did is a matter of scale. If he had access to a corpus of (say) 10,000 celebrity interviews over a period of 20 years, with demographic metadata about gender, date of birth, etc., and if someone at The Times (or The Guardian or wherever) set this collection up in a well-designed specialized search-and-vizualization engine, we’d be inviting him to join our forthcoming paper as a co-author. And much smaller investigations would still be quite “scientific”.

On a smaller scale, an excellent undergraduate term project might look at this aspect of on-line celebrity interviews by gender, age, and nationality. A small team of students (or one student shut up in her Fortress of Solitude for a week) could easily transcribe an adequate sample of 100 interviews…

Unfortunately, the other recent contribution of Oliver Moody’s newspaper to this topic is not nearly as  empirically sound — “Ah: Sounds to signal hesitation are part of our linguistic heritage“, The Times 10/6/2014:

What is the most common word in the typical English speaker’s vocabulary? Is it, perhaps, “the” or “a” or “an”, or an interjection such as “oh”? Um, er, let’s see now . . .

In fact, it’s probably some variant of the first two words of the previous sentence. They don’t mean anything, but every English speaker uses them, or an equivalent vocalisation, because everyone, however fluent, sometimes wonders what to say next. Linguists call them “filler words” and they appear to go in fashions. In current usage, “um” is overtaking “er” as the filler of choice. 

In fact, it probably isn’t.

In the Switchboard corpus, out of 520 speakers, UH is the commonest “word” for 45 speakers, or 8.6%, otherwise losing out to “I”, “and”, “the”, “you”, or “[silence]”. The median rank of UH or UM (whichever is commonest for a given speaker) is 8.

In the Fisher corpus, UH is the commonest “word” for 89 speakers, and UM is the commonest “word” for 35 speakers, for a total of 124 out of 10,401 speakers, or 1.2%. The median rank of UH or UM is 17.

This is consistent with the fact that the overall UM+UH rate in Switchboard (2.79%) is about 65% higher than the overall rate in Fisher (1.69%) — see here for some further discussion. I suspect that the difference is mostly due to different transcription processes, though I have no concrete evidence for this view. But anyhow, in neither collection is the frequency of UH or UM (or even their sum) likely to be greater than the frequency of “I”, or “and”, or “the”.

Then there are a couple of articles in the Daily Mail, starting with Ellie Zolfagharaifard, “Men are from ‘er’ and women are from ‘um’: Speech markers reveal details about your age, sex and lifestyle, scientists claim“, Daily Mail 10/6/2014:

A major rift has emerged within the English-speaking world, pitting Barack Obama against Kim Kardashian and Eminem against David Beckham.  

The division is between ‘ummers’ and ‘errers’, and, while you may not be aware of it, your gender and age could have a huge influence on which group you fall into.  

Men and older people prefer to use ‘er’ during gaps in speech, while women and teenagers are more likely to use ‘um’, according to recent research.

I like this celebrity-speech angle — typical of us intellectuals not to have thought of it. We need to get out of our Ivory Tower of Solitude more often, really.

And then there’s Emily Kent Smith, “Stuck for words? How saying ‘um’ or ‘er’ in conversation can reveal a lot about who you are“, Daily Mail 10/6/2014:

It may simply sound inarticulate, but whether you say ‘um’ or ‘er’ in conversation could actually reveal a lot about you.  

Studies carried out across the world, from Edinburgh to America, Germany and the Netherlands have all concluded that women and young people are more likely to say ‘um’ whilst deep in thought while men and older people favour ‘er’.  

But ‘er’ could become extinct after ‘um’ is beginning to be used more in everyday language, the research showed.

You can contemplate this one on your own — of course with some suitable background music:

Which, now that I think of it, works depressingly well as background music for the other papers too.



30 Comments

  1. MTBradley said,

    October 7, 2014 @ 10:58 am

    I would love to know how much scientific journalism is done by those trained to be journalists or science writers and how much is done by those trained to be scientists.

  2. Jonathan Badger said,

    October 7, 2014 @ 11:02 am

    People think of a “study” as an enterprise where you go out and spend months or years gathering data, not as an easy-to-write computer script that pulls out some new aspect of an existing large shared multi-purpose dataset

    Yes, this. This is how a lot of computational biology is done as well (relatively simple analyses on pre-existing datasets). On the other hand, this idea can also come back and bite us when data-generating scientists expect that *every* computational analysis is a 15 minute problem rather than one requiring a couple years of work.

    [(myl) Indeed. And not every analysis worth doing is a “computational analysis” — sometimes you need to go out and find or create new datasets.]

  3. Martijn Wieling said,

    October 7, 2014 @ 11:08 am

    Oh boy… I got a call by a news reporter from Belgium (‘Het Laatste Nieuws’) yesterday. She wanted to know a bit more about the Dutch results. I’ve told her our various ideas and caveats giving the example you gave as well. In addition, I’ve told her that women and younger people generally are somewhat more modern in their language use than men and older people. She therefore ended her piece with the *joke* that since ‘um’ may be seen as a more modern form, speakers from Belgium are more modern than the speakers from the Netherlands. I wonder how many phone calls I will receive from people interpreting that sentence as fact…

  4. Robot Therapist said,

    October 7, 2014 @ 12:15 pm

    Whenever I’ve seen an article in the press on something I know a lot about, they always somehow miss the point, even if there are no factual errors as such.

  5. Rubrick said,

    October 7, 2014 @ 4:29 pm

    The one silver lining I can see is that if journalists are paying that much attention to Language Log, perhaps, miracle of miracles, a few of them will take to heart the derision that’s regularly (and deservedly) heaped on them here.

  6. Chris C. said,

    October 7, 2014 @ 4:33 pm

    I’ve observed the same thing as Robot Therapist, except there almost always are factual errors as such.

  7. David L said,

    October 7, 2014 @ 4:51 pm

    Yes, indeed, nothing motivates a person better than being buried under derision and scorn.

  8. Josef Fruehwald said,

    October 7, 2014 @ 5:14 pm

    The Daily Mail also prematurely promoted me to Professor, which in the UK is a title most people don’t hold till later in their careers. As a mildly title conscious culture, this did not go unnoticed by my colleagues!

    [(myl) On the other hand, they spelled everyone’s names correctly, I think, which is not nothing. For example, NPR called me “Mark Lieberman” just today.]

  9. John Coleman said,

    October 7, 2014 @ 5:59 pm

    When we finally discover the TRUTH …
    then they’ll TRULY be sorry they mocked us, heh heh!
    Wait! “Heh heh”? — that gives me a NEW IDEA!

  10. John Coleman said,

    October 7, 2014 @ 6:03 pm

    Just wait till they find out that today I was measuring tens of thousands of silences following ER and ERM …
    And getting paid for it!

  11. John Coleman said,

    October 7, 2014 @ 6:21 pm

    See, I just knew this whole “big data” thing would be trouble. The world just isn’t ready for it. (Well, the Daily Mail …)

  12. David Morris said,

    October 7, 2014 @ 6:35 pm

    Umm …

  13. David Morris said,

    October 7, 2014 @ 6:36 pm

    To er is human, obviously!

  14. David Morris said,

    October 7, 2014 @ 6:37 pm

    (I was trying to decide what to say.)

  15. bratschegirl said,

    October 7, 2014 @ 6:43 pm

    So does “erm” count as an “er” or an “um?”

  16. Brett said,

    October 7, 2014 @ 7:46 pm

    “Erm” is a British spelling of “Um.”

  17. D.O. said,

    October 8, 2014 @ 2:24 am

    Switchboard corpus, which seems to be especially useful, also recorded the geographic information on speakers. Here’s what I’ve got (all values are medians of um/uh frequency per 1000 words)

                                Female UH | Female UM | Male UH | Male UM
    New England              9.4                  8.0          33.7         6.6
    Northern                  14.0                  7.1          27.1         3.9
    NYC                         11.5                  8.6          27.1         4.7
    North Midlands         11.3                  7.8         &nbsp 26.7        5.7
    South Midland          11.4                  4.2          25.6         1.9
    South                      16.2                 &nbsp6.6          28.4          3.8
    West                         9.8                  6.1          25.3         2.9
    Mixed                      11.8                  6.8          22.2         2.3

    There is probably something going on with um increase toward North and East. So the change associated with younger people and women probably also comes from NE. The pattern seems to be somewhat different from the Twitter map though.

  18. D.O. said,

    October 8, 2014 @ 2:43 am

    If um is trending up then it probably is less suppressed in the informal contexts (the overall English language trend is toward less formality). Of course, all filled pauses are suppressed in formal speech, but I guess, um is more so. There might be even a dataset for that purpose. I vaguely remember reading about someone who transcribed random snippets of conversations by the same people during the day…

  19. Rodger C said,

    October 8, 2014 @ 7:53 am

    I tried to post a link to Tom Paxton’s “Daily News,” but it doesn’t appear, maybe because the link was the whole post. Let me try again:

    http://www.youtube.com/watch?v=10LMdzJgIWQ

  20. Chris Waters said,

    October 8, 2014 @ 1:10 pm

    Ok, as a rhotic speaker who actually uses “er” as distinct from “uh” sometimes (albeit rarely), I have to ask: is this really so rare that there’s no point in distinguishing them when discussing English in general? Might it be a west-coast (US) thing? Or is it a personal idiosyncrasy derived from reading too much? Or is there insufficient evidence to draw a firm conclusion at this point?

    (I assume that rhotic “er” isn’t trending anywhere, since nobody seems to be discussing it at all.)

  21. Keith M Ellis said,

    October 8, 2014 @ 1:49 pm

    I don’t know if anyone’s mentioned this aspect, but many years ago when I first began as a radio DJ (which I did for a couple of years when I was a young adult), I was very anxious about whether I could avoid these fillers. What I found, though, was that within only a couple of weeks I’d managed to discipline my on-air speech such that I eliminated them, and this was within the context of my strength, which was extemporaneous speech and very natural, casual reading of material (such as news items, weather forecasts, etc). And what I found was that this carried into my daily life — for years, these fillers were mostly absent from my speech.

    I don’t know how that’s exactly relevant or something that everyone doesn’t already know — we’re aware that professional speakers do this. Still, from personal experience I can say that after a bit of time and effort, it’s not something that works as deliberate and self-monitored speech, it was just something that I changed about my speech and it didn’t require constant effort.

    But that’s long in the past, now. These days (I’m 50), I find that I’m often particularly tongue-tied, practically drowning in uh’s and um’s.

  22. Brett said,

    October 8, 2014 @ 9:34 pm

    @Chris Waters: I actually use rhotic “er” as well (or used to, at least). However, I am certain that I picked it up from reading British novels.

  23. Matthew McIrvin said,

    October 9, 2014 @ 9:37 am

    I’ve always assumed that rhotic “er” is a back-formation concocted by bookish Americans who read “er” in print (originally in British English writing) and never made the connection to the sound it was attempting to spell. Some of those Americans then use it in US English writing, which just perpetuates it.

  24. Pflaumbaum said,

    October 9, 2014 @ 8:14 pm

    Brett said,
    “Erm” is a British spelling of “Um.”

    I don’t think that’s right. My “um” has a short /ʌ/, my “erm” a long /əː/

  25. John Coleman said,

    October 10, 2014 @ 3:00 am

    There are multiple spellings for hesitation sounds in UK English; as well as “erm” and “um”, there is also “mm”. My analysis of the relative frequency of “erm” vs. “er” in the Audio British National Corpus, reported in an earlier blog in this thread, focussed on those two in particular, just because they are the most frequent forms of hesitation sound.

  26. Stuart Brown said,

    October 10, 2014 @ 7:07 pm

    Don’t know if you looked at the comments on Jeffries’ article, but there were some beauties. A few people thought the existence of fillers was all the fault of those Americans who can’t speak properly, somebody attributed it to the declining IQ of the general population, but the ace was the commenter who opined: “I just can’t believe that so-called academics spent time and money on this pointless study. A far more fitting exercise for an infant school surely.”

  27. Chris Waters said,

    October 10, 2014 @ 7:30 pm

    Yeah, I’ve heard the hypothesis that the rhotic “er” is just a back-formation, but I’m not familiar with the supporting evidence. To me, here on the US west coast, it just seems like something people say. Has anyone actually studied the history of “er” in any depth? I assume, at the least, that this would mean: A) it first appeared in British writing, B) after Britain became heavily non-rhotic, and C) only appeared in American writing some time after that. Have A, B, and C all been confirmed by anyone?

  28. BasJ said,

    October 12, 2014 @ 5:13 pm

    On BBC radio they found this funny enough to joke about it, on a topical comedy show no less. Here it is on the iplayer:
    The Now Show
    Or download:
    Now Show download
    (at about 14:18)

  29. Edith T said,

    October 15, 2014 @ 7:14 pm

    I’ve lived in Scotland for some time now, so now my fillers are closer to something more accurately transcribed (lol! ‘transcribed’) as EHM.

  30. Ran Ari-Gur said,

    October 20, 2014 @ 12:54 am

    A fair number of those news articles say that um is becoming more common relative to uh. Have you determined that to be true? I recall your raising that as one possible explanation of why the uh/um ratio rises with age (since older folks will presumably be somewhat conservative, partially retaining speech patterns that were common when they were younger), but I don’t recall if you reached a conclusion about this?

RSS feed for comments on this post