{"id":15048,"date":"2014-10-07T09:49:53","date_gmt":"2014-10-07T14:49:53","guid":{"rendered":"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=15048"},"modified":"2014-10-09T06:36:37","modified_gmt":"2014-10-09T11:36:37","slug":"trending-in-the-media-um-not-exactly","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=15048","title":{"rendered":"Trending in the Media: Um, not exactly&#8230;"},"content":{"rendered":"<p>I like journalists, really I do. But sometimes they make it hard for me to maintain my positive attitude. The recent flurry of U.K. media uptake of Language Log posts on UM and UH provides some examples of this stress and strain.<\/p>\n<p><!--more--><\/p>\n<p>Here's Stuart Jeffries, \"<a href=\"http:\/\/www.theguardian.com\/science\/shortcuts\/2014\/oct\/06\/um-er-conversation-english-speakers-socio-linguistics-edinburgh-university\" target=\"_blank\">Um or er: which do you, um, use more in, er, conversation?<\/a>\", The Guardian 10\/6\/2014:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">In the historic struggle between the ummers and the errers, the ummers are getting the upper hand. A <span style=\"text-decoration: underline;\"><a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=13713\" target=\"_blank\">study of speech patterns by socio-linguists at Edinburgh University<\/a><\/span> has found that English speakers increasingly tend to use \u201cum\u201d rather than \u201cer\u201d as the filler of choice.<\/span><\/p>\n<p>The \"socio-linguists at Edinburgh University\" are Joe Fruehwald, a sociolinguist who's at Edinburgh these days, adding to work by me, a phonetician from the University of Pennsylvania who has occasionally visited Edinburgh and has many friends there, and Martijn Wieling, a dialectologist from the University of Groningen, and John Coleman, a phonetician from the University of Oxford, and Jack Grieve, a linguist at Aston University. And the \"study\" is a set of blog posts in which\u00a0we've pulled out some data from existing collections of various kinds, a mode of research that I've jokingly called Breakfast Experiments\u2122 because writing and running the scripts involved can generally be done in the time it takes to drink a couple of cups of coffee.<\/p>\n<p>This doesn't mean that the data or the analysis is unreal or unserious &#8212; and we'll probably turn all this stuff into a conventional paper in a traditional journal before long. Meanwhile, the relevant blog posts, in chronological order, are: \"<a style=\"color: #af0505;\" href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/002629.html\" target=\"_blank\">Young men talk like old women<\/a>\", 11\/6\/2005; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=13581\" target=\"_blank\">Fillers: Autism, gender, age<\/a>\", 7\/30\/2014; \u00a0\"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=13713\" target=\"_blank\">More on UM and UH<\/a>\", 8\/3\/2014; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=13805\" target=\"_blank\">UM UH 3<\/a>\", 8\/4\/2014; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=13873\" target=\"_blank\">Male and female word usage<\/a>\", 8\/7\/2014; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14015\" target=\"_blank\">UM \/ UH geography<\/a>\", 8\/13\/2014; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=13990\" target=\"_blank\">Educational UM \/ UH<\/a>\", 8\/13\/2014; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14058\" target=\"_blank\">UM \/ UH: Lifecycle effects vs. language change<\/a>\", 8\/15\/2014; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14093\" target=\"_blank\">Filled pauses in Glasgow<\/a>\", 8\/17\/2014; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14143\" target=\"_blank\">ER and ERM in the spoken BNC<\/a>\", 8\/18\/2014; \"<a style=\"color: #af0505;\" href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14638\" target=\"_blank\">Um and uh in Dutch<\/a>\", 9\/16\/2014\u00a0\"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14894\" target=\"_blank\">UM\/ UH in German<\/a>\", 9\/29\/2014; \"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14894\" target=\"_blank\">Um, there's timing information in Switchboard?<\/a>\",\u00a010\/5\/2014. (The hyperlink in Jeffries' article goes to the eighth of those 13 posts.)<\/p>\n<p>So\u00a0it's interesting to see all of this framed in the traditional journalistic fashion as \"A study of X by Y-ists at Z University\" &#8212; and to see what values for X, Y, and Z Jeffries picks up. This misreading then sets up\u00a0\u00a0a bit of boffin-bashing:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">Fruehwald examined 25,000 examples of people in the US city of Philadelphia saying \u201cum\u201d and \u201cuh\u201d. You might say that\u2019s because socio-linguists have exhausted important things to study but I, um, couldn\u2019t possibly, like, comment.<\/span><\/p>\n<p>Or you might say\u00a0that Jeffries is too badly-informed and\/or lazy to grasp the fact that Joe spent a few minutes writing a computer program, which in turn spent\u00a0a few seconds sorting instances of UM and UH by age and gender in Joe's copy of the Philadelphia Neighborhood Corpus, which was collected over a few decades by students in a course on sociolinguistic field methods, and has been used in hundreds (maybe thousands) of published papers over the decades. But I, um, couldn't possibly, like, characterize Mr. Jeffries as an arrogant\u00a0ignoramus, without knowing more\u00a0about his actual expertise\u00a0and motivations.<\/p>\n<p>More seriously, it\u00a0seems to me\u00a0that Jeffries suffers from the journalistic version of the blind spot that I attributed to old-fashioned psycholinguistic researchers recently (\"<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14991\" target=\"_blank\">Um, there's timing information in Switchboard?<\/a>\", 10\/5\/2014). People think of a\u00a0\"study\" as\u00a0an enterprise where\u00a0you go out and spend months or years gathering data, not as an easy-to-write computer script that\u00a0pulls out some new aspect of an existing large shared multi-purpose dataset. So\u00a0it makes sense for Jeffries to make\u00a0fun of us \"sociolinguists at Edinburgh University\"\u00a0&#8212;\u00a0if we had really\u00a0collected and transcribed hundreds of sociolinguistic interviews over four decades, solely\u00a0in order to study the distribution of filled pauses, we'd deserve to be mocked.<\/p>\n<p>Then there's this:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">But why the shift from \u201cer\u201d to \u201cum\u201d? Is it because inside every \u201cum\u201d there\u2019s a little \u201cer\u201d that\u2019s been elongated and given a stronger terminal sound, and favouring the former indicates our growing existential confusion at a world increasingly gone, um, nuts? It\u2019s a theory. Here\u2019s another. According to the University of Pennsylvania\u2019s Professor Mark Liberman, who did another study of filled pauses, people tend to use \u201cum\u201d when they\u2019re trying to decide what to say, and \u201cer\/uh\u201d when they\u2019re trying to decide how to say it.<\/span><\/p>\n<p>That last sentence starts from the usual scientific game of \"what if?\", as exemplified in this\u00a0<a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=14894\" target=\"_blank\">blog-post passage<\/a> where I laid out three logically-possible types of hypothesis, gave the \"what to say\" vs. \"how to say it\" idea as a for-instance example of one of the three types, and observed that \"none of these explanations seems very plausible to me\". In order to suggest how a functional difference between UM and UH might generate something like the observed sex and age effects, I discussed these hypotheticals\u00a0at greater length in an email Q&amp;A with Olga Khazan, author of an Atlantic Magazine piece that Jeffries links to, \"<a href=\"http:\/\/www.theatlantic.com\/health\/archive\/2014\/08\/men-say-uh-and-women-say-um\/375729\/\" target=\"_blank\">Men Say 'Uh' and Women Say 'Um'<\/a>\", 8\/8\/2014.<\/p>\n<p>In that context,\u00a0I was spinning out\u00a0conceivable theories, not offering an explanation that I believe is correct. But\u00a0Olga quoted me in way that may\u00a0leave the reader unsure\u00a0about that &#8212; here's the part that Jeffries quotes from her\u00a0article:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">Liberman also posits that \"um\" and \"uh\" portray language fluency and intelligence differently. \"People tend to use UM when they're trying to decide what to say, and UH when they're trying to decide how to say it,\" he told me in an email. \"As people get older, they have less trouble deciding what to say (because they know more stuff), and more trouble deciding how to say it (because they know more words and fixed phrases, and so have a harder time making a choice). As a result, older people use fewer UMs and more UHs.\" \u00a0<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">Thus, one theory is that perhaps, \"At any given (adult) age, men are more linguistically experienced than women, and so use UM and UH as if they were older,\" he says. \"OR MAYBE: Women are more communicatively circumspect than men, and therefore more likely to pause before deciding what to say; but women are more linguistically fluent than men, and therefore less likely to pause while deciding what words to use.\"<\/span><\/p>\n<p>In fact, I'd argue\u00a0that the \"what to say\" vs. \"how to say it\" differentiation, if it exists at all, can't account for most of the observed variation. \u00a0But I should have realized that \"maybe it's X, maybe it's Y\" talk is dangerous in interviews with journalists (though essential in conversations among scientists) ,and I should be happy that (so far) no one has\u00a0set up a fake debate between me and (say) Josef Fruehwald, along the lines that I've described in earlier posts like\u00a0\"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/003102.html\" target=\"_blank\">Imaginary debates and stereotypical roles<\/a>\", 5\/3\/2006.<\/p>\n<p>Moving along in Jeffries' article, we learn that<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">Liberman transcribed 14,000 phone conversations, totalling more than 26 million words from 12,000 speakers across the US and found that the use of \u201cum\u201d and \u201cer\/uh\u201d can reveal the speaker\u2019s gender, language skills and life experience.<\/span><\/p>\n<p>Yes, I did this by transcribing at super-speed in my Fortress of Solitude near the headwaters of the Schuylkill River. Because this was a Study, you know, and that's how\u00a0we Scientists do it.<\/p>\n<p>(Actually, the transcriptions in question were done, over a period of 15 years, by dozens or even hundreds of people, many of them professional transcriptionists, in several projects creating datasets for government-sponsored research in speaker identification, speech recognition, and other technology-development areas.)<\/p>\n<p>Meanwhile,\u00a0over at <em>The Times<\/em>, Oliver Moody (\"<a href=\"http:\/\/www.thetimes.co.uk\/tto\/news\/uk\/article4227955.ece\" target=\"_blank\">To um or to er? Studies probe how brains fill the speech-thought gap<\/a>\", 10\/4\/2014) elevates the question to mock-epic status:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">In Gulliver\u2019s Travels, the land of Lilliput has been shaken by six vicious rebellions after a controversy over which end of a boiled egg should be broken first. <\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">It seems that real life is out to compete with satire. A growing body of linguistic evidence points to a faultline emerging between two tribes in western society: the ummers and the errers. <\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #000080;\">Separate studies in Glasgow, the US, Germany and the Netherlands over recent months have all shown that women and young people are much more likely to use \u201cum\u201d when waiting for the next thought to come along, while men and older people go for \u201cer\u201d. And in the battle of the disengaged brains, \u201cum\u201d is winning.<\/span><\/p>\n<p>(The Wachowskis have\u00a0optioned the movie rights for <em>The Battle of the Disengaged Brains<\/em>, with Shia Labeouf to play Martijn, Keanu Reeves\u00a0in the role of Josef, and Clint Eastwood as yours truly.)<\/p>\n<p><span style=\"color: #000000;\">Anyhow, there are those \"studies\" again. \u00a0Moody goes on to describe\u00a0what he calls a\u00a0\"deeply unscientific analysis of celebrity interviews\":<\/span><\/p>\n<p style=\"color: #000000; padding-left: 30px;\"><span style=\"color: #800000;\">Men of the old school overwhelmingly resort to \u201cer\u201d. Appearing on the US chat show\u00a0<i>Late Night With David Letterman\u00a0<\/i>in 1988, the comedian John Cleese managed 24 \u201cers\u201d and eight \u201cums\u201d in ten minutes.<\/span><\/p>\n<p style=\"color: #000000; padding-left: 30px;\"><span style=\"color: #800000;\">The princeling of the modern-day errers, however, is Nigel Farage, the Ukip leader, who used a positively Victorian 15 \u201cers\u201d (88 per cent) and just two \u201cums\u201d (12 per cent) on\u00a0<i>The Andrew Marr Show\u00a0<\/i>in March this year.<\/span><\/p>\n<p style=\"color: #000000; padding-left: 30px;\"><span style=\"color: #800000;\">At the other end of the spectrum was Steven Gerrard, captain of Liverpool Football Club. In the immediate aftermath of his side\u2019s 1-1 draw with Everton last month, Gerrard trundled out nine \u201cums\u201d and one \u201cer\u201d in two and a half minutes.<\/span><\/p>\n<p style=\"color: #000000; padding-left: 30px;\"><span style=\"color: #800000;\">Lena Dunham, the writer and star of the sitcom series\u00a0<i>Girls<\/i>, used 79 per cent \u201cums\u201d on\u00a0<i>Letterman\u00a0<\/i>this year, while the\u00a0<i>Harry Potter\u00a0<\/i>actress Emma Watson hit 58 per cent on the same show.<\/span><\/p>\n<p style=\"color: #000000;\">But the\u00a0thing is, the only difference between what we did and what Moody did is a matter of scale. If he had access to a corpus of (say) 10,000 celebrity interviews over a period of 20 years, with demographic metadata about gender, date of birth, etc., and if someone at The Times (or The Guardian or wherever) set this collection up in a well-designed specialized search-and-vizualization engine, we'd be inviting him to join our forthcoming paper as a co-author. And much smaller investigations would still be quite \"scientific\".<\/p>\n<p style=\"color: #000000;\">On a smaller scale, an excellent undergraduate term project might look at this aspect of on-line celebrity interviews by gender, age, and nationality. A small team of students (or one student shut up in her Fortress of Solitude for a week) could easily transcribe an adequate sample of 100 interviews&#8230;<\/p>\n<p>Unfortunately, the\u00a0other recent contribution of Oliver Moody's newspaper to this topic is not nearly as \u00a0empirically sound &#8212; \"<a href=\"http:\/\/www.thetimes.co.uk\/tto\/opinion\/leaders\/article4227815.ece?CMP=OTH-gnws-standard-2014_10_05\" target=\"_blank\">Ah: Sounds to signal hesitation are part of our linguistic heritage<\/a>\", <em>The Times<\/em> 10\/6\/2014:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">What is the most common word in the typical English speaker\u2019s vocabulary? Is it, perhaps, \u201cthe\u201d or \u201ca\u201d or \u201can\u201d, or an interjection such as \u201coh\u201d? Um, er, let\u2019s see now . . . <\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">In fact, it\u2019s probably some variant of the first two words of the previous sentence. They don\u2019t mean anything, but every English speaker uses them, or an equivalent vocalisation, because everyone, however fluent, sometimes wonders what to say next. Linguists call them \u201cfiller words\u201d and they appear to go in fashions. In current usage, \u201cum\u201d is overtaking \u201cer\u201d as the filler of choice.\u00a0<\/span><\/p>\n<p>In fact, it probably isn't.<\/p>\n<p>In the Switchboard corpus, out of 520 speakers, UH is the commonest \"word\"\u00a0for 45 speakers, or 8.6%, otherwise losing out to \"I\", \"and\", \"the\", \"you\", or \"[silence]\". The median rank of UH or UM (whichever is commonest for a given\u00a0speaker) is 8.<\/p>\n<p>In the Fisher corpus, UH is the commonest \"word\" for 89 speakers, and UM is the commonest \"word\" for 35 speakers, for a total of 124 out of 10,401 speakers, or 1.2%. The median rank of UH or UM is 17.<\/p>\n<p>This is consistent with the fact that the overall UM+UH rate in Switchboard (2.79%) is about 65% higher than the overall rate in Fisher (1.69%) &#8212; see <a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=13805\" target=\"_blank\">here<\/a> for some further discussion. I suspect that the difference is mostly due to different transcription processes, though I have no concrete evidence for this view. But anyhow, in neither collection is the frequency of UH or UM (or even their sum) likely to be greater than the frequency of \"I\", or \"and\", or \"the\".<\/p>\n<p>Then there are a couple of articles in the Daily Mail, starting with\u00a0Ellie Zolfagharaifard, \"<span style=\"color: #000000;\"><a href=\"http:\/\/www.dailymail.co.uk\/sciencetech\/article-2782105\/Men-er-women-um-Speech-markers-reveal-details-age-sex-lifestyle-scientists-claim.html\" target=\"_blank\">Men are from 'er' and women are from 'um': Speech markers reveal details about your age, sex and lifestyle, scientists claim<\/a>\", Daily Mail 10\/6\/2014:<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #003300;\">A major rift has emerged within the English-speaking world, pitting Barack Obama against Kim Kardashian and Eminem against David Beckham. \u00a0<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #003300;\">The division is between \u2018ummers\u2019 and \u2018errers\u2019, and, while you may not be aware of it, your gender and age could have a huge influence on which group you fall into. \u00a0<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #003300;\">Men and older people prefer to use \u2018er\u2019 during gaps in speech, while women and teenagers are more likely to use \u2018um\u2019, according to recent research.<\/span><\/p>\n<p>I like this celebrity-speech angle &#8212; typical of us intellectuals not to have thought of it. We need to get out of our Ivory Tower of Solitude more often, really.<\/p>\n<p>And then there's Emily Kent Smith, \"<a href=\"http:\/\/www.dailymail.co.uk\/sciencetech\/article-2782977\/Stuck-words-How-saying-um-er-conversation-reveal-lot-are.html\" target=\"_blank\">Stuck for words? How saying 'um' or 'er' in conversation can reveal a lot about who you are<\/a>\", Daily Mail 10\/6\/2014:<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #003366;\">It may simply sound inarticulate, but whether you say \u2018um\u2019 or \u2018er\u2019 in conversation could actually reveal a lot about you. \u00a0 <\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #003366;\">Studies carried out across the world, from Edinburgh to America, Germany and the Netherlands have all concluded that women and young people are more likely to say \u2018um\u2019 whilst deep in thought while men and older people favour \u2018er\u2019. \u00a0<\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #003366;\">But \u2018er\u2019 could become extinct after \u2018um\u2019 is beginning to be used more in everyday language, the research showed.<\/span><\/p>\n<p>You can contemplate this one on your own &#8212; of course with some suitable background music:<\/p>\n<p><object width=\"480\" height=\"270\"><param name=\"movie\" value=\"https:\/\/www.youtube.com\/v\/5eBT6OSr1TI?hl=en_US&amp;version=3&amp;rel=0\"><\/param><param name=\"allowFullScreen\" value=\"true\"><\/param><param name=\"allowscriptaccess\" value=\"always\"><\/param><\/object><\/p>\n<p>Which, now that I think of it, works depressingly well as background music for the other papers too.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I like journalists, really I do. But sometimes they make it hard for me to maintain my positive attitude. The recent flurry of U.K. media uptake of Language Log posts on UM and UH provides some examples of this stress and strain.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[26],"tags":[],"class_list":["post-15048","post","type-post","status-publish","format-standard","hentry","category-language-and-the-media"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/15048","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=15048"}],"version-history":[{"count":22,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/15048\/revisions"}],"predecessor-version":[{"id":15089,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/15048\/revisions\/15089"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=15048"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=15048"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=15048"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}