I like journalists, really I do. But sometimes they make it hard for me to maintain my positive attitude. The recent flurry of U.K. media uptake of Language Log posts on UM and UH provides some examples of this stress and strain.
Here's Stuart Jeffries, "Um or er: which do you, um, use more in, er, conversation?", The Guardian 10/6/2014:
In the historic struggle between the ummers and the errers, the ummers are getting the upper hand. A study of speech patterns by socio-linguists at Edinburgh University has found that English speakers increasingly tend to use “um” rather than “er” as the filler of choice.
The "socio-linguists at Edinburgh University" are Joe Fruehwald, a sociolinguist who's at Edinburgh these days, adding to work by me, a phonetician from the University of Pennsylvania who has occasionally visited Edinburgh and has many friends there, and Martijn Wieling, a dialectologist from the University of Groningen, and John Coleman, a phonetician from the University of Oxford, and Jack Grieve, a linguist at Aston University. And the "study" is a set of blog posts in which we've pulled out some data from existing collections of various kinds, a mode of research that I've jokingly called Breakfast Experiments™ because writing and running the scripts involved can generally be done in the time it takes to drink a couple of cups of coffee.
This doesn't mean that the data or the analysis is unreal or unserious — and we'll probably turn all this stuff into a conventional paper in a traditional journal before long. Meanwhile, the relevant blog posts, in chronological order, are: "Young men talk like old women", 11/6/2005; "Fillers: Autism, gender, age", 7/30/2014; "More on UM and UH", 8/3/2014; "UM UH 3", 8/4/2014; "Male and female word usage", 8/7/2014; "UM / UH geography", 8/13/2014; "Educational UM / UH", 8/13/2014; "UM / UH: Lifecycle effects vs. language change", 8/15/2014; "Filled pauses in Glasgow", 8/17/2014; "ER and ERM in the spoken BNC", 8/18/2014; "Um and uh in Dutch", 9/16/2014 "UM/ UH in German", 9/29/2014; "Um, there's timing information in Switchboard?", 10/5/2014. (The hyperlink in Jeffries' article goes to the eighth of those 13 posts.)
So it's interesting to see all of this framed in the traditional journalistic fashion as "A study of X by Y-ists at Z University" — and to see what values for X, Y, and Z Jeffries picks up. This misreading then sets up a bit of boffin-bashing:
Fruehwald examined 25,000 examples of people in the US city of Philadelphia saying “um” and “uh”. You might say that’s because socio-linguists have exhausted important things to study but I, um, couldn’t possibly, like, comment.
Or you might say that Jeffries is too badly-informed and/or lazy to grasp the fact that Joe spent a few minutes writing a computer program, which in turn spent a few seconds sorting instances of UM and UH by age and gender in Joe's copy of the Philadelphia Neighborhood Corpus, which was collected over a few decades by students in a course on sociolinguistic field methods, and has been used in hundreds (maybe thousands) of published papers over the decades. But I, um, couldn't possibly, like, characterize Mr. Jeffries as an arrogant ignoramus, without knowing more about his actual expertise and motivations.
More seriously, it seems to me that Jeffries suffers from the journalistic version of the blind spot that I attributed to old-fashioned psycholinguistic researchers recently ("Um, there's timing information in Switchboard?", 10/5/2014). People think of a "study" as an enterprise where you go out and spend months or years gathering data, not as an easy-to-write computer script that pulls out some new aspect of an existing large shared multi-purpose dataset. So it makes sense for Jeffries to make fun of us "sociolinguists at Edinburgh University" — if we had really collected and transcribed hundreds of sociolinguistic interviews over four decades, solely in order to study the distribution of filled pauses, we'd deserve to be mocked.
Then there's this:
But why the shift from “er” to “um”? Is it because inside every “um” there’s a little “er” that’s been elongated and given a stronger terminal sound, and favouring the former indicates our growing existential confusion at a world increasingly gone, um, nuts? It’s a theory. Here’s another. According to the University of Pennsylvania’s Professor Mark Liberman, who did another study of filled pauses, people tend to use “um” when they’re trying to decide what to say, and “er/uh” when they’re trying to decide how to say it.
That last sentence starts from the usual scientific game of "what if?", as exemplified in this blog-post passage where I laid out three logically-possible types of hypothesis, gave the "what to say" vs. "how to say it" idea as a for-instance example of one of the three types, and observed that "none of these explanations seems very plausible to me". In order to suggest how a functional difference between UM and UH might generate something like the observed sex and age effects, I discussed these hypotheticals at greater length in an email Q&A with Olga Khazan, author of an Atlantic Magazine piece that Jeffries links to, "Men Say 'Uh' and Women Say 'Um'", 8/8/2014.
In that context, I was spinning out conceivable theories, not offering an explanation that I believe is correct. But Olga quoted me in way that may leave the reader unsure about that — here's the part that Jeffries quotes from her article:
Liberman also posits that "um" and "uh" portray language fluency and intelligence differently. "People tend to use UM when they're trying to decide what to say, and UH when they're trying to decide how to say it," he told me in an email. "As people get older, they have less trouble deciding what to say (because they know more stuff), and more trouble deciding how to say it (because they know more words and fixed phrases, and so have a harder time making a choice). As a result, older people use fewer UMs and more UHs."
Thus, one theory is that perhaps, "At any given (adult) age, men are more linguistically experienced than women, and so use UM and UH as if they were older," he says. "OR MAYBE: Women are more communicatively circumspect than men, and therefore more likely to pause before deciding what to say; but women are more linguistically fluent than men, and therefore less likely to pause while deciding what words to use."
In fact, I'd argue that the "what to say" vs. "how to say it" differentiation, if it exists at all, can't account for most of the observed variation. But I should have realized that "maybe it's X, maybe it's Y" talk is dangerous in interviews with journalists (though essential in conversations among scientists) ,and I should be happy that (so far) no one has set up a fake debate between me and (say) Josef Fruehwald, along the lines that I've described in earlier posts like "Imaginary debates and stereotypical roles", 5/3/2006.
Moving along in Jeffries' article, we learn that
Liberman transcribed 14,000 phone conversations, totalling more than 26 million words from 12,000 speakers across the US and found that the use of “um” and “er/uh” can reveal the speaker’s gender, language skills and life experience.
Yes, I did this by transcribing at super-speed in my Fortress of Solitude near the headwaters of the Schuylkill River. Because this was a Study, you know, and that's how we Scientists do it.
(Actually, the transcriptions in question were done, over a period of 15 years, by dozens or even hundreds of people, many of them professional transcriptionists, in several projects creating datasets for government-sponsored research in speaker identification, speech recognition, and other technology-development areas.)
Meanwhile, over at The Times, Oliver Moody ("To um or to er? Studies probe how brains fill the speech-thought gap", 10/4/2014) elevates the question to mock-epic status:
In Gulliver’s Travels, the land of Lilliput has been shaken by six vicious rebellions after a controversy over which end of a boiled egg should be broken first.
It seems that real life is out to compete with satire. A growing body of linguistic evidence points to a faultline emerging between two tribes in western society: the ummers and the errers.
Separate studies in Glasgow, the US, Germany and the Netherlands over recent months have all shown that women and young people are much more likely to use “um” when waiting for the next thought to come along, while men and older people go for “er”. And in the battle of the disengaged brains, “um” is winning.
(The Wachowskis have optioned the movie rights for The Battle of the Disengaged Brains, with Shia Labeouf to play Martijn, Keanu Reeves in the role of Josef, and Clint Eastwood as yours truly.)
Anyhow, there are those "studies" again. Moody goes on to describe what he calls a "deeply unscientific analysis of celebrity interviews":
Men of the old school overwhelmingly resort to “er”. Appearing on the US chat show Late Night With David Letterman in 1988, the comedian John Cleese managed 24 “ers” and eight “ums” in ten minutes.
The princeling of the modern-day errers, however, is Nigel Farage, the Ukip leader, who used a positively Victorian 15 “ers” (88 per cent) and just two “ums” (12 per cent) on The Andrew Marr Show in March this year.
At the other end of the spectrum was Steven Gerrard, captain of Liverpool Football Club. In the immediate aftermath of his side’s 1-1 draw with Everton last month, Gerrard trundled out nine “ums” and one “er” in two and a half minutes.
Lena Dunham, the writer and star of the sitcom series Girls, used 79 per cent “ums” on Letterman this year, while the Harry Potter actress Emma Watson hit 58 per cent on the same show.
But the thing is, the only difference between what we did and what Moody did is a matter of scale. If he had access to a corpus of (say) 10,000 celebrity interviews over a period of 20 years, with demographic metadata about gender, date of birth, etc., and if someone at The Times (or The Guardian or wherever) set this collection up in a well-designed specialized search-and-vizualization engine, we'd be inviting him to join our forthcoming paper as a co-author. And much smaller investigations would still be quite "scientific".
On a smaller scale, an excellent undergraduate term project might look at this aspect of on-line celebrity interviews by gender, age, and nationality. A small team of students (or one student shut up in her Fortress of Solitude for a week) could easily transcribe an adequate sample of 100 interviews…
Unfortunately, the other recent contribution of Oliver Moody's newspaper to this topic is not nearly as empirically sound — "Ah: Sounds to signal hesitation are part of our linguistic heritage", The Times 10/6/2014:
What is the most common word in the typical English speaker’s vocabulary? Is it, perhaps, “the” or “a” or “an”, or an interjection such as “oh”? Um, er, let’s see now . . .
In fact, it’s probably some variant of the first two words of the previous sentence. They don’t mean anything, but every English speaker uses them, or an equivalent vocalisation, because everyone, however fluent, sometimes wonders what to say next. Linguists call them “filler words” and they appear to go in fashions. In current usage, “um” is overtaking “er” as the filler of choice.
In fact, it probably isn't.
In the Switchboard corpus, out of 520 speakers, UH is the commonest "word" for 45 speakers, or 8.6%, otherwise losing out to "I", "and", "the", "you", or "[silence]". The median rank of UH or UM (whichever is commonest for a given speaker) is 8.
In the Fisher corpus, UH is the commonest "word" for 89 speakers, and UM is the commonest "word" for 35 speakers, for a total of 124 out of 10,401 speakers, or 1.2%. The median rank of UH or UM is 17.
This is consistent with the fact that the overall UM+UH rate in Switchboard (2.79%) is about 65% higher than the overall rate in Fisher (1.69%) — see here for some further discussion. I suspect that the difference is mostly due to different transcription processes, though I have no concrete evidence for this view. But anyhow, in neither collection is the frequency of UH or UM (or even their sum) likely to be greater than the frequency of "I", or "and", or "the".
Then there are a couple of articles in the Daily Mail, starting with Ellie Zolfagharaifard, "Men are from 'er' and women are from 'um': Speech markers reveal details about your age, sex and lifestyle, scientists claim", Daily Mail 10/6/2014:
A major rift has emerged within the English-speaking world, pitting Barack Obama against Kim Kardashian and Eminem against David Beckham.
The division is between ‘ummers’ and ‘errers’, and, while you may not be aware of it, your gender and age could have a huge influence on which group you fall into.
Men and older people prefer to use ‘er’ during gaps in speech, while women and teenagers are more likely to use ‘um’, according to recent research.
I like this celebrity-speech angle — typical of us intellectuals not to have thought of it. We need to get out of our Ivory Tower of Solitude more often, really.
And then there's Emily Kent Smith, "Stuck for words? How saying 'um' or 'er' in conversation can reveal a lot about who you are", Daily Mail 10/6/2014:
It may simply sound inarticulate, but whether you say ‘um’ or ‘er’ in conversation could actually reveal a lot about you.
Studies carried out across the world, from Edinburgh to America, Germany and the Netherlands have all concluded that women and young people are more likely to say ‘um’ whilst deep in thought while men and older people favour ‘er’.
But ‘er’ could become extinct after ‘um’ is beginning to be used more in everyday language, the research showed.
You can contemplate this one on your own — of course with some suitable background music:
Which, now that I think of it, works depressingly well as background music for the other papers too.