« previous post | next post »

Mark Dingemanse,  Francisco Torreira, and N.J. Enfield, “Is ‘Huh?’ a universal word? Conversational infrastructure and the convergent evolution of linguistic items", PLOS ONE 2013:

A word like Huh?–used as a repair initiator when, for example, one has not clearly heard what someone just said– is found in roughly the same form and function in spoken languages across the globe. We investigate it in naturally occurring conversations in ten languages and present evidence and arguments for two distinct claims: that Huh? is universal, and that it is a word. In support of the first, we show that the similarities in form and function of this interjection across languages are much greater than expected by chance. In support of the second claim we show that it is a lexical, conventionalised form that has to be learnt, unlike grunts or emotional cries. We discuss possible reasons for the cross-linguistic similarity and propose an account in terms of convergent evolution. Huh? is a universal word not because it is innate but because it is shaped by selective pressures in an interactional environment that all languages share: that of other-initiated repair. Our proposal enhances evolutionary models of language change by suggesting that conversational infrastructure can drive the convergent cultural evolution of linguistic items.

The paper is quite accessible, but there's also a web site ("Is 'Huh?' a universal word?") and a YouTube video:


  1. Q. Pheevr said,

    November 9, 2013 @ 11:44 am


  2. David Eddyshaw said,

    November 9, 2013 @ 12:16 pm

    Doesn't seem very different from words like "ouch," which are similarly language-specific but fill pretty similar roles in the different languages, and seem likely to derive in some way from "grunts" even though they have graduated from that status now.

    "Ouch!" = "I'm hurt"
    "Huh?" = "I'm puzzled"

    The distinction seems to boil down to there being non-human analogues for "ouch" but not "huh." But pain is surely more basic to animals than puzzlement, especially puzzlement over sound miscommunication, so that doesn't seem too surprising on first principles.

  3. Barbara Partee said,

    November 9, 2013 @ 12:22 pm

    Isn't 10 languages a terribly small sample? And I don't recognize that Russian 'word'. Maybe reading the article will help, but I don't have time this weekend; so far I'm a little skeptical about the universality claim. (For English it won't be hard to convince me that it's a real word; for Russian, I'll need to see more evidence.)

  4. Richard Sproat said,

    November 9, 2013 @ 12:29 pm

    I don't recognize the Mandarin word either, until I saw that they actually mean "ã". If you want to play *that* game…

  5. Jonathan Mayhew said,

    November 9, 2013 @ 12:43 pm

    Can a word be the same if it doesn't share any phonemes?

    English huh / Spanish eh.

  6. Richard Sproat said,

    November 9, 2013 @ 12:58 pm


    Sure, why not? Anything for a good story. For that matter, the edit distance between "huh" and "e(h)" is no different than the edit distance between "huh" and standing with your mouth open with a gaping puzzled look.

  7. mark said,

    November 9, 2013 @ 1:16 pm

    Thanks Mark for this post.

    @Barbara Partee, yes, 10 languages is a relatively small sample. But try to find languages for which there are corpora of informal everyday conversation that are large enough to find at least 20 cases of other-initiated repair done using an interjection. You'll find it quite hard, I imagine, to double our sample size. The point is that this is not something you can reliably elicit or get from dictionaries, as the comments here and elsewhere nicely show. You need to locate enough tokens in exactly the same sequential context to be sure you're comparing the same (functional) thing.

    Also note that in the paper, we actually add 21 more languages for which we have evidence at least from transcripts of conversation for the form of the interjection. In all 31 languages we checked, then, there was none that didn't have an interjection for this function; and all of the 31 interjections look suspiciously similar. *That* is the core phenomenon we investigate and explain in the paper.

    @Richard Sproat & Jonathan Mayhew, note that in the paper, we specify the claim of universality in quite some detail. "Does this mean that huh? is a universal word? We propose a qualified yes. Qualified, because huh? is clearly not phonetically the same word across languages — if Cha‘palaa tokens were cross-spliced into Spanish dialog, Spanish speakers would likely be confused. What appears to be universal is the function of this interjection along with a set of constraints determining its form."

  8. Steve Kass said,

    November 9, 2013 @ 1:39 pm

    Thanks to both Marks and the other authors for this.

    @Richard, The Mandarin sound in the video was unclear, but the word it likely was, 啊 (á), is one of the "huh" words that can (@mark) reliably be found in a dictionary. I've got it as "(interjection) (pressing for an answer or asking for a repetition of something just said)" and being used like "什么?" Of course, a dictionary doesn't provide other information needed for this study, especially the range of ways in which the word is pronounced.

    In fact (@mark?) the whole YouTube was somewhat disappointing, given what the paper says was collected. Any chance you can make a longer YouTube that includes the conversational exchanges that contain the "huh" words in context as well as those words in isolation? That would be really great to hear.

    @Barbara, Would you say there is any Russian "huh?"-like utterance that can be used interchangably with "что?"?

  9. Richard Sproat said,

    November 9, 2013 @ 2:04 pm


    Sure, 啊 is common enough. And I suppose it may be of some significance that there is a designated character for it, thus a qualified endorsement of the view that in Mandarin at least, it's considered to be a word. Of course there are other things like 嗯 which has a designated character but which in its normal pronunciation violates Mandarin syllable structure, so perhaps that's not so indicative after all.

    Hmm. But aren't all these things just close to the minimal sound you could make by opening your mouth and vocalizing? In English we also have "eh?", which is pretty close, I think, to "huh?". At least I would be hard pressed to think how I would use them differently (except that maybe "eh" is bit more polite?). If that's the criterion then it suggests a low or perhaps mid vowel, no lip rounding, and if any consonantal gesture at all, then a glottal one.

  10. Victor Mair said,

    November 9, 2013 @ 2:54 pm

    English "huh?" sounds different from the presumed equivalents in all the other languages in that video, in some cases quite different. Furthermore, in all the languages that I've learned (quite a few), I've never come across one that has a word to express sudden, sharp pain that sounds like "ouch!". These supposed universal exclamatory and interrogative utterances are as diverse as the onomatopoetic words for animal sounds, e.g., bow-wow / wāngwāng 汪汪! (don't ask why those vocables are written with characters having the water radical).

  11. Sili said,

    November 9, 2013 @ 3:48 pm


  12. Vilinthril said,

    November 9, 2013 @ 4:07 pm

    »Furthermore, in all the languages that I've learned (quite a few), I've never come across one that has a word to express sudden, sharp pain that sounds like "ouch!".«

    Erm, German „au“ is pretty much the same phonetically (no surprise, cognate, I suppose), so that certainly expresses sudden, sharp pain in the same way. :p

  13. JS said,

    November 9, 2013 @ 4:10 pm

    The character "啊" tells us next to nothing; it could indicate any of a variety of vocalizations. Even the spelling "á" could indicate a few…

    Which relates to my central question: the individual tokens within several of the languages considered are spread evenly across the whole of the mid-low x front-central vowel space… the huh/eh spelling difference pointed to by Richard Sproat suggests a comparable range in English. Word?

  14. mark said,

    November 9, 2013 @ 4:15 pm

    @Richard, did you read the paper? Because what you describe ("then it suggests a low or perhaps mid vowel, no lip rounding, and if any consonantal gesture at all, then a glottal one") is precisely the kind of generalisation that covers all forms found in the languages we studied. In fact we use much the same word to describe this "template for huh?". We also mention minimality as one of the functional requirements on the word in its particular context (other-initiated repair).

    In general, the methods and discussion sections of our paper seem to cover all of the points raised thus far.

  15. mark said,

    November 9, 2013 @ 4:20 pm

    @Victor Mair, you write "English "huh?" sounds different from the presumed equivalents in all the other languages in that video, in some cases quite different."

    English wasn't part of our sample, so we don't really know what it sounds like in a representative corpus of informal face to face everyday interaction. We did note early on in our research that its spelling is somewhat misleading, as (in the tokens we listened to) there seems to be very little word-initial aspiration.

    Also, on your term "presumed equivalents": we make clear in the paper that we only compare interjection tokens that are found in the same structural sequential context: that of other-initiated repair, in which says something, B indicates a generic problem with what A says, and A then repeats and/or redoes the first turn in some fashion. We "presume" that having a clear structural definition of the phenomenon helps to make sure that we are, indeed, comparing like with like.

  16. mark said,

    November 9, 2013 @ 4:25 pm

    @Steve, thanks. I don't think the Youtube video would've had as many views (4500+) if we had included all 196 tokens in there. It's really only a brief demonstration of the basic phenomenon for a lay audience. There's also an issue of ethics: while single tokens of huh? are pretty much anonymous, not all of the sequences in which they occur might be. But contact me privately and I might be able to give your more details.

  17. Victor Mair said,

    November 9, 2013 @ 5:02 pm


    "…German „au“ is pretty much the same phonetically…."

    The -ch of English "ouch!" is an essential part of the expression, though one can also say just "ow!".


    "…(in the tokens we listened to) there seems to be very little word-initial aspiration."

    The initial "h-" of English "huh" is very much present when the expression is enunciated.



    I was thinking the same thing.

  18. Eric P Smith said,

    November 9, 2013 @ 5:11 pm

    I don't hear "Huh?" much here in the UK, and I tend to think of it as American. Here in Scotland it's "Eh?" [ɛ] or "What?" [ʍɔʔ]. Middle-class kids of course are taught, "Don't say What, say Beg pardon".

  19. Martin said,

    November 9, 2013 @ 5:11 pm

    I am a bit puzzled by how the authors infer convergence as an explanation, or about the relationship they claim to exist between their findings and their explanation. They do not seem to be sure about how they got there, either. At one point in the paper it says,"The second [explanation] is that it is similar as a result of convergent evolution. Empirical evidence supports the second." If I am no mistaken, there is no evidence provided, and the claim is absent for the remainder of the paper, (the "evidence" in the Conclusion refers to the universality of "huh?", and there is indeed evidence for that claim.), but a series of general claims about constraints and requirements in communicative situations. Intriguingly, in the section "Convergence", the claim has morphed into "A more plausible mechanism for the cross-linguistic similarity of huh?"

    I think not even in the most informal settings is a "plausible explanation" a subsistute for "empirical evidence". If there is empirical evidence – which is an explicit claim at one point, at least – for the specific claim that the strikingly common features of "huh?" are the consequence of convergence, the authors must have made, at some point, an observation exactly to this effect. However, none is presented, be it in the paper itself, or in the references (and it hard even to imagine how evidence for such an evolutive process should be provided in this respect, but the claim is right there.)

    The difference is of importance for further conjectures found in the paper. If the explanation put forward was independent evidence for convergence, the latter would indeed form a basis for the claim that it "offers a more general mechanism", as is later claimed. But, as I read it, that gets the argumentation in the paper exactly backward: as it seems, the paper nowhere "points to a factor that may constrain divergence or diachronic drift", but exactly the contrary: a theory about convergence is what the authors take as explanation for are merely conjectured 'diachronic thrift', that is itself not observed! Given that no evidence is provided that and how "huh?" has actually converged, isn't it strange to base further conjectures on the claim that it is a product of convergence, rather to confine onself to the assertion that is fits into a broader theory of convergence, to which the paper has contributed nothing, though? Perhaps they authors mean that convergence generally offers a "more general mechanism", and do not mean to say that this is a conclusion of their paper – but then they have merely a model, like economists, without any obvious way to pad it with evidence (or contradict it with counter-evidence, for that matter) or any basis to claim that it actually highlights something going on in the real world.

    Now, I am a chemist, and know absolutely nothing about linguistics, so what do I get wrong here?

  20. Martin said,

    November 9, 2013 @ 6:22 pm

    Thinking about it, I don't even get it on a more basic level. If convergence is the reason for the common features of "huh?" across the languages studied to explain why words like "rororo" and "bi" are absent for the same function – are the latter two supposed to be the kind of words they converged from? What are the starting points? What is the time scale for such processes to play out? If they should have converged from different forms, one is in the odd situation that language at some point in the past must have known an arbitrary relationship as in Saussure's signifiant/signifié dual sign, concerning the phonetic representation of a word. And then, pragmatic and physical constraints made those forms converge? But then, that is (by reference 1) exactly what the authors refute. So, was it once true, and words sprang into existence as aribtrary phonetic representations of lexical items? Why should those forces have been absent during language formation, then? Is there anything in the provided evidence that contradicts my complete ad-hoc and unsupported idea that those exact constraints must have led to a common ancestor of "huh?" in all languages (I imagine "huh?" must have been a very basic item in the advent of a language) that then diverged under the stress of phonetic inventories and environment of different languages, rather than converged (from what again?)?

    Also, Maddieson's chapter about vowel quality inventories is accompanied by a map showing that the chosen quantiles of vowel frequencies are obviously not uniformly distributed. Additionally, the paper makes no observation how vowel quantities are distributed within languages. Can anyone explain me how the suggestion "Given this fact [that languages have on average 6 phonemes that tend to be maximally spread], it is striking that the vowels of OIR interjection tokens are only found in the low front central corner of vowel space" has any validity?

    Also, Maddieson's chapter stresses that he does NOT count nasal vowels as long as their non-nasalized counterparts exist, and much the same for long and short vowels. Thus, it seems that he does not count vowels on the phonemic level, contrary to what is claimed in the paper.

  21. Graeme said,

    November 9, 2013 @ 6:26 pm

    How common or rarely do people use such a formalised exclamation as 'ouch' if no one else is around?

  22. Nelida K. said,

    November 9, 2013 @ 7:23 pm

    In Spanish, as it has been mentioned above, it is "¿eh?" where the "h" is not aspired at all; in fact, in Spanish the "h" is never aspired, it is absolutely mute. Just there to complicate kids at grade school with their spelling. And the "e" sounds as it does in the French article "le". So I gather that "huh?" is only universal in the sense that when an English speaker says "huh?" and a Spanish speaker says "¿eh?", they both express, if in person, a puzzled (open-mouthed) gesture, clearly understood by the interlocutor; and if on the phone, it being a non-word (interjection) and spoken out as a question, lets the interlocutor guess that what he or she said was not understood.

  23. Martin said,

    November 9, 2013 @ 8:00 pm

    @ Nelida

    That does not seem to be a problem, though: it is exactly the glottal sound that is not universal, as a glottal sound may only be present in the respective "huh?" if the language makes use of a it as phoneme. As you say, not only does Spanish lack such a sound representing a phoneme, but it is also absent as a speech sound in general (I think, but correct me; that is contrary to German, for example, where the glottal stop is actually quite frequent, but not phonemic, though it has phonemic /h/).

    Also, the vowel sound is not fixed, but only within a limited range, which the authors find 'striking' for some reason not really explained. Specifically, they say:"Such limited variation and striking similarity across languages is wholly unexpected on the basis of the principle of the arbitrariness of the sign." This is a bit odd, as they did not look at the underlying frequency distribution of vowels (or consontants, to which the quote refers, too) across or within languages. Nor did they define a dimension along which to look at such a distribution: Geographical? Across language phyla? How quantify relatedness within/between phyla? For what type of words: Interjections? Very frequent words? Very short words? So that it is not really clear to me what is supposed to be "wholly unexpected" expect that some gut feeling has been refuted (and supposed that the authors didn't conflate "arbitrary" and "uniform" – ?)

  24. Mark F. said,

    November 10, 2013 @ 12:04 am

    Martin – The authors do explicitly use a broad sense of "convergent" evolution to include "parallel" evolution. So they aren't necessarily claiming that these interjections descend from words that differ more greatly across languages.

    Also, when they said "Empirical evidence supports the second", perhaps it would have been more accurate to say "Empirical evidence militates against the first." But, if you accept that the two explanations they give are the only two, then it amounts to the same thing. (Not that you have to accept that.)

    This isn't the first place that a similarity of word form across languages has been proposed to result from common pragmatics. A lot of languages have words for mother that sound like "ma" or "mama", and it's plausible that's because those are among the first sounds babies make.

  25. Mark F. said,

    November 10, 2013 @ 12:20 am

    As for the "ouch" comparison, I think "ouch" does come from modifying a sound one might make involuntarily, which I think puts it in a different category from "huh". It really differs in another way too – the "ch" in "ouch" really seems like the kind of arbitrary change that doesn't appear in the "huh" variations.

    It's been pointed out that English actually has both "eh" and "huh". I realize that analysis of formants doesn't support this, but it sounded to me like other languages were basically choosing from those two options.

  26. Martin said,

    November 10, 2013 @ 1:19 am

    @ Mark F.

    Well, that does not get much better, then, though I got it wrong. This broad definition again is based on a paper about evolution on a genetic level (their other examples refers to sharks, dolphins, mammals, and marsupials – and other biology papers ). On the other side, they explicitly refute such a direct analogy, based on the (never specified, but probably very short in comparison) time scales involved. So, either I take them at their word and they have actually observed a process one can call "convergence", or they have a mere wishy-washy analogy that has not been observed (also not before, apparently, since the only thing they refer to are papers from biology), and is based on nothing more than some vague, evidence-free idea. Plausibility is good for a research proposal, here it is a central claim of a published paper. You can't just claim that an evolutive process has been at work if none has been observed (or only in a completely different field, the mechanism of which has explicitly been excluded).

    Also, similarity between words between languages has already been invoked by Saussure, specifically in what he calls "authentic onomatopoeic words" and interjections – only that the cross-linguistic differences even among those words (which, following their very limited number, he does not consider to be very important one way or the other) is, for him, actually further evidence for the arbitrariness of the linguistic sign. So, this alone is neither a surprise nor contrary to any tenets, central or not, in linguistics, as far as the references are concerned – and I do not think that this is what the authors are concerned with.

    There is another point where the paper is sloppy, though it's an aside. As one of the 21 additional languages not included in the more detailed survey they mention "‡Âkhoe Hai//om". Now what they probably mean with the first word is ǂĀkhoe, where the diacritic above the a represents a long vowel in standard Khoekhoegowab orthography (as you can check with Wilfrid Haackes Kkoekhoegowab dictionary) – theirs represents a nasal vowel. Moreover, it is correctly written as ǂĀkhoe in their own source! Also, they didn't bother to represent the lateral click ins Haiǁom correctly, but put two slashes there (again, this is not an informal email, but a research paper, and also correct in their source). Then, what is "ǂĀkhoe Haiǁom" supposed to be? I tried to track that down, because usually, ǂĀkhoe and Haiǁom are considered two very close dialects of what is now officially called "Khoekhoegowab" (google Johanna Brugman's doctoral thesis of 2009, as an example) – but even if not, two very close dialects of one language. Is that the standard notation of a dialect continuum? But as it seems, that is simply a project name for an OLAC project thich their source references and where the language is referred to is actually Haiǁom – how did "ǂĀkhoe Haiǁom" end up as a "language" name?

  27. Reinhold {Rey} Aman said,

    November 10, 2013 @ 2:11 am

    @ Victor Mair:

    The -ch of English "ouch!" is an essential part of the expression, though one can also say just "ow!".

    In addition to "Au!", there is also the standard German "Autsch!", which sounds exactly like "Ouch!".
    It's not an Anglicism.

  28. mark said,

    November 10, 2013 @ 2:45 am

    @Martin, starting with your last point: in our submitted, final, proofed version of the paper, we have "ǂĀkhoe Haiǁom", the spelling used by our colleague Gertie Hoymann. You can look up her work to find exactly which language variety is meant, and what its relation is to Khoekhoegowab. Please don't hold us responsible for PLOS ONE's poor copyediting process (also shown by their sloppy handling of the interlinear glosses). We've complained to them about this ourselves and we hope that some of these inelegancies can be corrected.

    Then with regard to convergent evolution. We make clear, as Mark F also notes, that we use this as a broad cover term, and that you'd need more (historical) evidence on language relationships to be able to clearly tell the difference between, for instance, convergent and parallel evolution. () We also make a distinction between convergent evolution in biology from convergent evolution in culture and language (which at one point we call 'convergent cultural evolution'. The point being that languages are culturally evolving systems and so that the items that are converging are cultural items.

    And so we observe that even in unrelated languages (i.e. in languages that, as far as experts can tell, show no evidence of being phylogenetically related), the form of the huh-like word is suspiciously similar. It is *that* phenomenon for which we propose that convergent cultural evolution is a plausible mechanism. More plausible than positing innateness for this particular interjection, which runs into all sorts of problems as we outline in the paper. More plausible, too, than another mechanism that we're aware of that may cause words to sound similar across unrelated languages: sound-symbolism, the kind of stuff you refer to when you mention De Saussure and onomatopoeia. (The point is that onomatopoeia get their similarity by reference to language-external sounds that can keep serving as attractors across different languages.)

    I'm in transit right now so I won't be able to check in regularly. But thanks for all of the useful points made so far!

  29. Marek said,

    November 10, 2013 @ 4:16 am

    This doesn't strike me as particularly mindblowing research, but perhaps I'm biased, because like with the Russian example mentioned, I can't think of any Polish equivalents of "huh?". The closest equivalent would be "(h)mm?" with raising intonation, which as far as I am aware also exists in English as similar but distinct from "huh". Maybe repair "words" can have hyponyms?

  30. Rubrick said,

    November 10, 2013 @ 4:21 am

    I'm a bit disappointed by the civility of the exchanges in the comments so far. Given the topic, if they were to grow more heated, and especially if they were to evolve into a full-on flame war, we would truly have a schwa-fire on our hands.

  31. Martin said,

    November 10, 2013 @ 4:58 am

    @ mark

    Thanks for the answer. Concerning the onomatopeia, as I said, I did not think that you were talking about mere similarities of words across languages, as also Saussure knew that (though he doubted the symbolic character of interjections), and in one of the references you refer to Saussure for a claim that is to be refuted.

    Thanks also for the "ǂĀkhoe Haiǁom" clarification, I'll look that up (not because I don't believe you, but I find that interesting more generally.)

    As I also said, I have no problem with the "plausibility" of your claim. But first, you claim you have evidence for the convergence of "huh?", when really you have none. And then, there is not even more general evidence anywhere else for such a process suggested or referenced. That languages "evolve" in some sense or other is a no-brainer (and also something Saussure spent a whole chapter on) – you make a claim about a specific process – that neither you, nor, as it seems, anybody else has ever observed in linguistics – and for a specific word, at that. The analogy to biology is little more than borrowing the word "convergence", and given that you deny the very mechanism there to be at work in your supposed process, there really seems to be no there there. Has the process actually been observed for "huh?", and has such a process ever been observed, at all – or is this a plausible speculation, but nothing more? That it is somehow more plausible than innateness is not a strong argument in the absence of evidence, that's also true for phlogiston theory vs. a theory of four basic elements.

    Also, I still get the "suspicious similarity" thing. In you paper you refer to this in terms of a sorta-kinda violated prior probability of something you'd expect to occur. You express surprise at several points that the scale of phonetic (or phonemic) realisations of "huh?" is so limited given the much larger and wider sound inventories in the world's languages. But apart from the fact that I think you got the Maddieson source wrong (he counts according to a definition of speech sounds, not phonemes, as you claim), that means that you must have expected to have a broader range of phones realised, or more often, or whatever. But what, exactly, did you expect? What reason is there, for example, to expect any number of different vowels in a specific lexical item – with specific distribution, a specific number of syllables, (a) specific semantic field(s), a specific signifiant/signifié relationship – based on the observation that there are, on average, 6? At the very least, the frequency distribution of vowels withing languages and the distribution of those languages exhibiting an average number of those sounds should be analysed. A limited variation is much less of a surprise in a language that either generally exhibits few vowel sounds (or belongs to a phylum with this feature), or that has most vowels occurring at places of articulation of limited range (even if the total number of vowels are widely spread in this regard). Geographical clusterings (in either direction, i.e. having either a lot, the average number of, or very few vowels) of languages is strongly suggested by the map accompanying your Maddieson source. If you want to make any inference about your sample of languages based on Maddieson/Ladefoged, you also have to make sure that your sample actually reflects the underlying distribution of the Maddieson/Ladefoged survey.

    And also their contingent probabilites, though it's difficult to say contingent on what, as you never make any argument as to what aspects of "huh?" are important in this respect. E.g. what would one expect as probability of a open-mid back vowel in interjections? Or monosyllabic interjections? Or should conjunctions also be included, and generally words whose signifié is hard to pin down, too (as they all have their special place in semiotics)? Determiners, then? What about bound morphemes? If not, why not? And if so, why? So, you find the observation of very limited variation "striking" and "wholly unexpected", but you never actually tell us why – and indeed, it never seems that you asked what you should have expected. There is only this loose reference to average sound inventories, but no reason why average sound inventories (rather than the modal, or something) should be important – even to some extent – , even less so in a word that exhibits so many specific properties (apart from its phonetic make-up, of course) cross-linguistically.

    Then, you suggest the the phoneme inventory of a langage has some relation to the presence of a glottal stop/fricative in "huh?" Why? I ask that because in the Maddieson analogy, a claim of lacking relationship refers to phones, if I have that right. So, what exactly is the working hypothesis? Again, there is no relation established, only some loose ex-post rationalisations that happen to fit the story, and for which no explanation is provided.

  32. Martin said,

    November 10, 2013 @ 5:10 am

    @ Rubrick

    I am sorry for that. Just keep in mind that critical comments lacking civility are mostly written by attention-seeking trolls, often without realising it themselves. That does not make those comments go away, but easier to tolerate. I'll try to do better.

  33. Mark said,

    November 10, 2013 @ 6:21 am

    Off topic but this reminds me of a discussion or comment here once pointing out how words that mean 'big' have similar characteristics across languages i.e. lifted palate and open lips for a 'bigger' sound. Does anyone remember that? Has it actually been researched?

  34. Observation said,

    November 10, 2013 @ 8:07 am

    吓? (Cantonese)

  35. Victor Mair said,

    November 10, 2013 @ 8:18 am


    When you cite something in Chinese characters, please provide Romanization for the vast majority of readers on this list who do not read them. In this case, Jyutping or other recognized Romanization would be in order.

  36. Victor Mair said,

    November 10, 2013 @ 8:21 am

    @Reinhold {Rey} Aman:

    Thanks for letting us know about German "autsch". Since German and English are so closely related, I'm not surprised that "autsch" is virtually identical to English "ouch". What would be very surprising to me is for "ouch / autsch" independently to arise in many unrelated languages, which, if I'm not mistaken, is being asserted for "huh" in the paper under discussion.

    The process whereby "huh" is alleged to have become a universal word is reminiscent of the phonosymbolism that I discussed here:

    and here:

    See also:

    "Phonosymbolism, etymology, and the nebulous Chinese word family"

  37. Nhan said,

    November 10, 2013 @ 10:11 am

    My mother tongue is Vietnamese. We ask 'hả' when we don't hear clearly.
    Mostly Southern Vietnamese use this interjection to ask, or to elicit agreement.

  38. Mark F. said,

    November 10, 2013 @ 10:57 am

    Martin – I thought it was pretty clear that they were talking about mere similarities of words across languages. But I'm not sure if there is a difference between "words in different languages with the same function and similar sounds" and "the same word with slightly varying pronunciation across languages". And when you say "as also Saussure knew that," I'm not sure what it is that you're saying he knew. Just that there were examples of words sounding the same across multiple languages? But that's not the same claim.

    Also, as I understand it (but I'm not a linguist) there is evidence for evolution towards more reduced forms, as a general pattern that happens across language. (This is my interpretation of my recollection of something from John McWhorter's The Power of Babel.) So it seems like there is a recognized evolutionary process within language that would have the desired effect.

    As for what the expected variation in sound form would be, I think they answer that by bringing up other repair-request words (like "what") that exist in parallel and do differ a lot across languages. I suppose they could have tried to quantify the variation across those words too, as a reference standard for the expected amount of variation.

    As for the phlogiston theory, I thought that was perfectly good science that just turned out to be wrong.

    Basically, all they are asserting is that that particular interjection sounds so similar across lots of languages because it's so easy to say, and it occurs at a conversational point where an easy-to-say word is needed. It looks to me like they've given a solid observation and a good best-candidate explanation. I don't think the paper is revolutionary, but I do think it's interesting.

  39. Peter S. said,

    November 10, 2013 @ 10:59 am

    Google Translate will translate "huh" to a number of other foreign languages, and the only one I found which doesn't support the paper's thesis is Swedish, where "huh" is translated as "va" ("what"). Does Swedish have a form of "huh".

  40. Dan Lufkin said,

    November 10, 2013 @ 11:31 am

    Well, the Swedish Academy's wordlist has hu = expression of displeasure, etc. Swedish also has ha (semi-voiced on the inhale) as a token of agreement. It is the correct response to "Talar du svenska?" Said on the exhale, it means "have."

    And FWIW autsch is in my 1984 edition of Sprach-Brockhoau = outcry with pain (Ausruf des Schmerzes).

  41. Josh Treleaven said,

    November 10, 2013 @ 12:31 pm

    I would be interested to know whether "huh" is a useful word when you're talking to somebody with whom you share only a beginner's understanding of each other's language.

  42. mark said,

    November 10, 2013 @ 4:12 pm

    @Martin F., for a chemist you ask very well-informed linguistic questions! Regarding your inquiry about our reasoning behind the point about the vowel space, here's the argument.

    0. We start with the functionally-structurally defined notions of "OIR interjection" and "OIR question word" (OIRI and OIRQ for short), not caring about their sound.

    1. The vowels of any given language tend not to cluster closely together but make good use of the available articulatory/perceptual space.

    2. This means that if you pick any random word within a language and you take its vowel, it could be any of the vowels from its vowel system. There seems to be little reason to expect a skewing of particular words to particular vowels.

    3. Now pick the OIRQ in 10 languages. If it's just like any random word in all of those languages, the OIRQ could have any vowel from that language —no a priori reason to expect that its vowel will fall in a specific part of the vowel space— and therefore, across the ten languages, the OIRQ is expected to feature vowels that differ in just the same way as the vowels of random other words across languages differ. This is borne out; see Table 1 in the article for a list of OIRQ-equivalents in 10 languages.

    4. Now let's pick the OIRI in those same 10 languages. Again, if it's just like any random word in all of those languages, the OIRI could have any vowel and therefore across the ten languages you expect that the vowels of the OIRI's will differ just like the vowels of other words differ across languages. This is NOT borne out: you find that the vowels of the OIRI all happen to be in the same corner of vowel space.

    In sum, if* the OIRI behaved like a regular word (or like the OIRQ), you wouldn't expect to find this. This is the phenomenon we investigate in the paper. We then go on to see in what ways it does and doesn't behave like a regular word; and to discuss various reasons for why it is the way it is, settling on convergent cultural evolution as the most plausible explanation. Regarding the role of convergent evolution in our account, I find Mark F.'s clarifications insightful and to the point.

    Finally, as to claims that our finding is 'not revolutionary' or 'not surprising', we are okay with that. We observed an unexpected phenomenon in language, formulated a research question, investigated it in ten languages from around the world, proposed an explanation for it, and published a peer-reviewed paper about it.

  43. mark said,

    November 10, 2013 @ 4:16 pm

    To all who report that they know languages in which a word equivalent to "what?" is used: hold your breath, we write about this in the paper (and in a previous publication on huh? and what? in 21 languages). So far, all of the reports we have seen mention a question-word based expression, but also an interjection like the one we describe.

    To all who report that they know languages in which it's not [huh] but something like "m?": hold your breath, we write about this in the paper. Closed-mouth variants of the interjection are found in all of the 10 languages we study, though they are not the most common variant in any of them. We analyse them as underarticulated versions, and we do not exclude the possibility that they are more common in some languages.

  44. Wannes L. said,

    November 10, 2013 @ 4:24 pm

    If someone is going repeat this research for more languages, it would be interesting to look at the affirmation grunt uhuh too, because it seems to share a lot of the characteristics that were found by the authors for huh? as well.

    In a master's thesis about polar answers, which I would perhaps not like to read again, I once devoted a small chapter on the phonetics of the word for 'yes' in the world. There were indications that there may be a non arbitrary connection between form and meaning of words meaning 'yes'.

    More than a third of the 201 'yesses' I found in some 100 languages worldwide, the word for 'yes' could be produced with a minimal articulatory movement, similar to uhuh (and also similar to huh, actually), i.e.:
    – short as a whole (few syllables)
    – mid vowels
    – glottal consonants
    – nasal consonants and vowels
    – no difficult phoneme clusters

    I realized my analysis was at least in part subjective and I stayed away from whether those statistics were really significant – to this day I have no idea how one would be able to prove that, and from the comments I see that the authors have a hard time convincing everyone for huh? in this respect.

    I also didn't assess whether these words for 'yes' were always 'true' lexical items in their own right or rather paralinguistic utterances, but for quite some languages it was the only mechanism to give a positive answer to a polar question.

    Steve Parker compiled a list of words for 'yes' which match his template of a universal phonological form for 'yes' /he?e/ which he posited. Unfortunately, he did not keep count of how 'yesses' did NOT match the template.

    I think a lexical item in Universal Grammar is a very strong hypothesis and hypothesized that backchannels in individual languages may be an etymological source to grammaticalize into a positive answer particle.

    There are some reasons I could come up with why it would be a good thing for a backchannel not to be phonetically very salient (I'm not sure if I'm convinced by the reasons which the authors of Is "huh?" a universal word? give for why it should be the case for huh?.):
    1. You don't want to suggest a speaker turn (that is, backchannels are used to let the speaker continue, not to interrupt him or her)
    2. Since a backchannel doesn’t need to have a specific lexical meaning (but just a pragmatic one: 'still listening, please continue'), it doesn’t need a clear phonology that can be distinguished from other lexical items (which is probably a reason why within many languages there are different possible ways of pronouncing the affirmation grunt)
    3. Backchannels are used a lot. (Nigel Ward had a sample of 79 American English conversations in which um appeared to be the 6th most frequent item – after I, and, the, you and a.)

    How do we get from a backchannel mechanism to a full 'yes'? First of all, humans are humans: when they don't agree, they will want to interrupt – a backchannel does the opposite. Also, people often disagree when they don't reason the same way, when you do follow the same reasoning, you backchannel. The thin line can also be demonstrated the other way by 'yes' particles that did NOT evolve from backchannels: English yes (< gea 'yes' (< demonstrative) + si (3PL subjunctive of beon 'to be')) and French oui (< oïl < Lat. hoc ille (fecit)) are both sometimes used as backchannels — in this function their weaker variants yeah and ouais are perhaps even more common, which is an argument in favour of the "minimal articulation for backchannels" theory.

    Anyway, all I dare say is that the gut feeling one gets from looking at the words for 'yes' in a larger sample of languages and the hypotheses I made for the phenomenon, share a lot of characteristics (some of which would perhaps be competing/contrasting) with the conclusions which Mark Dingemanse et al. drew from looking in detal at huh? and I believe it would deserve a closer look.

  45. R.L. said,

    November 10, 2013 @ 5:16 pm

    (Finnish. E.g. Youtube "code" XWez2ZY1Aio – Sleepy Sleepers: "Kuka mitä häh")

    Ther are also synonyms "täh?" (probably from "mitä?" ("what?") – e.g. in Lord Est: "Häh tä!" intro, Youtube Ap0XfaBFhoU at 0:17) and "mitäh?"

    Finnish expression for "ouch" is "auts". Searching for it in Youtube seems to be a good way to find videos of stumbling Finns.

    Also, as far as I know, Swedish doesn't have "huh?", but "hmmm?" and "va?".

  46. Steve Kass said,

    November 10, 2013 @ 6:14 pm

    @mark: You say

    There seems to be little reason to expect a skewing of particular words to particular vowels. … no a priori reason to expect that its vowel will fall in a specific part of the vowel space … you expect that the vowels of the OIRI's will differ just like the vowels of other words differ across languages…

    I think Martin F. wanted to know whether these statements, hence your claim that you observed an "unexpected phenomenon" are more than speculative, and your restatement of them here doesn't answer that question.

    So: How exactly do "the vowels of other words [typically] differ across languages"? You express surprise that "the vowels of the OIRI's differ" in a different way – is there a quantifiable difference between what you found for the OIRI's and what previous research on the vowel differences for words in general found? The observation that the "Huh?" vowels are in one quadrant of vowel space and the "What?" vowels aren't is interesting, but it's just one comparison.

    Wikipedia's article on Saussure (excuse: I'm not a linguist) suggests that interjections are known to have less variation across languages than other kinds of words. Did you find something unique to "Huh?" or something characteristic of interjections in general? Does the typical degree of spread for a word vary according to any property of the word (its function, its commonness)? By what numerical measures is "Huh?" an outlier, hence unexpected? Is "Huh?" also an outlier among just interjections, or among interjections that, like "Huh?", serve to interrupt, get attention, or the like, such as "Hey!", "Wow!", or "Boo!"?

    I have a feeling you might find similar results for "Boo!" Do you suppose it's a universal word, too?

    Thanks for engaging with non-linguists like Martin F. and me!

  47. Steve Kass said,

    November 10, 2013 @ 6:32 pm

    @Josh Treleavan:

    I would be interested to know whether "huh" is a useful word when you're talking to somebody with whom you share only a beginner's understanding of each other's language.

    My guess is that it would get the message across, but I don't think I've ever used it in that situation. However, the reason might be more a question of politeness than comprehension. "Huh?" strikes me as too informal, and most situations like you mention are with strangers. I don't think I even use "Huh?" in English except with friends. (Maybe not even "What?" – more likely "I'm sorry, could you repeat that?")

    Speaking languages where I know only a little, I'll either used a non-verbal cue (hand up to ear, tilted head and scrunched brow, and maybe a soft "Hm?" or however you spell that "Huh?" alternative), a "What?" word, or an "I didn't understand that" phrase, like "不懂", "mande", or "pardon."

  48. Barbara Partee said,

    November 10, 2013 @ 7:34 pm

    @Steve Kass: I really don't know Russian thoroughly enough to know what's a good translation of Huh?. It seems that the authors are using "a?". I know one one-word response "a?" that is definitely not 'huh?' — it's the conjunction 'a' which is a sort of contrastive 'and', and its use as a response is asking the speaker to give some explanatory follow-up, sort of like "So?" or even "And?". But there may well be a different 'a' which the authors are citing. We need people who are deeply enough bilingual to understand all the nuances of these little interjections.

    Oh, and about the glottal stop construction: huh? doesn't have one, but I think "uh-huh" has an initial one, and I know that "uh-uh" definitely has one in the middle (and I think also initial). Is that an argument that they are non-words in English? If that's so, it would at least make one want to re-examine the arguments for 'huh' being a word, since the three seem sort of like a family.

  49. Martin said,

    November 10, 2013 @ 7:47 pm

    @ mark

    Wait, you morphed me (Martin) and Mark F. into one commenter "Martin F.". I made no statements about the revolutionary (or not) character of your paper. I just do not udnerstand you paper.

    0. I got that.

    1. I got that, too. That was not my point, however. I talked about vowel frequency. If you have a system of three cardinal vowels of type /iua/ you are maximally spread (high-front, high-back, low-central). (That's easy, chemistry's VSEPR model has almost the exact same concept, just referring to electron pairs!) However, it does not follow that /i/, /u/, and /a/ occur with a frequency of 33.3 percent each. Also, if e.g. German – listed as a language with the maximal vowel inventory of 14 by Maddieson – has (freely invented) 70 percent of its vowels ocurring at some articulatory region mid-central to -front, where only about 50 percent of the total number of vowels are to be found, the surprise that a vowel in really ANY syllable of ANY chosen word is mid-central to -front is much less jawsome than the simple referral to the inventory makes is appear – meaning that you would rather expect it than not. Or perhaps it does exist a uniform frequency distribution, I don't know, but neither of your references makes that point. For what it's worth, an analysis of a corpus of German does not appear to show that (also not if you sum up those vowel phonemes Maddieson would have counted as one sound):

    2. No, already, I don't think that this is true, or ill-specified, at the very least. Yes, it could be any vowel, but not any vowel with equal probability, see 1. This is a not self-evident assumption without references, and not verified in the paper. You did not look at the frequency distribution of vowels across languages, or control for the underlying distribution between languages of Maddieson's survey. Your surprise could as well be a "Meh, not so surprising, if you think about it."-surprise. Or not.
    But more importantly, that's exactly NOT what you did! You did NOT randomly chose a word. You limited your choice up-front via your definition in 0, and then chose that exact word in all the languages surveyed. It is not, at all, clear that this doesn't also limit the vowels to expect.
    An example springing to mind would be the uninflected indefinite article in Germanic languages. In your extended languages survey, that would be "ein" (German), "een" (Dutch), "a" (English), and "en" (Norwegian). Surprise! It's all shades of schwa with a half-assed "n" added (possible in English to, if only a vowel follows, so there)! Amazingly, you can extend that to various Romance languages, too, if you allow for a little more, but really not much more, variation – that includes the added "n", and French even parallels the English "n"-optionality. Point is, would there ever have been a reason to think that somewhat redundant elements like indefinite articles that lots of languages simply do away with would make use of the less frequent vowels in a language, or even of several syllables? I, for one, thinking about what an indefinite article is there for, would never expect /rororo/ or a none-garden-variety fringe-vowel like the labialized palatal approximant (as it occurs in French). Now, this is all extremely sloppy and I might be wrong – but I eat my shorts if the indefinite article is just as random as any word. It's not, or not more so as choosing only yellow balls out of a sample of yellow, green and black balls that also differ in size of unkown distribution, is a random choice of balls.*
    Khoekhoegwob enclitica serve as another example. While the majority of words in Khoekhoegowab start with one of the four (phonemic, there are much more counting accompaniments) clicks, they do not occur word-finally, and are rare in final syllables of form CV. Now, those enclitica are used to form nouns from word stems, that are themselves not syntactically specified, and are added quasi as suffixes to those stems. But as such, of course, they will rather not contain clicks (and really you have b, kha, ku, s, ra, ti, -i, and n in the subject forms). If you extend the survey to other Khoe-languages, you'll expect a) to find similar noun-derivation, and b) no, or almost no clicks, in their enclitica.**
    Once you have chosen what to look at, there is absolutely no reason to think that the full range of the phoneme inventory is realised – or even the most common phonemes: the kind of sound katexochen that lead to the false identification of a Khoisan-family is absent in Khoekhoegowab enclitica, and really nobody should be surprised once the phontactics of Khoekhoegowab is accounted for!
    Now, not every language has indefinite articles, or enclitica to derive nouns. But that's the point: you specified a word functionally-structurally and found it to be universal. So, why, exactly, would you expect much variation across languages? You assert that you would, but you don't say why. This would be interesting, as far as I can see, to have an idea if the above-mentioned examples have something to do with similarities among more or less close language families (in which case also your extension to 31 languages is not as large as it seems, as you have several subdivisions like Germanic, Romance, or Bantu languages). Or, if certain specifications, if well-defined, lead to words that simply do not show much variability across languages, as long as they exist.

    For the convergence thing: I think I got that – but it's data-less, and plausible at best. I think your paper should have been clearer about that, because it's a very central claim (it's half of the title). Specifically, even though you explicitely distance yourself from an all-to-close analogy to the biological example, you still reference at least four biology papers. It's not clear why, exactly because there is no real anology, but rather some idea, or rather inspiration. You could as well have linked to papers dealing with mathematical convergent series.

    * To say "indefinite article" is easy; To specify what it is, less so. E.g. it seems that quite a few languages use their singular indefinite articles as numeral meaning 1, too. But not all, e.g. English. What does this difference entail? What about more distances languages that also make use of a word serving as an indefinite article?
    ** See Hagman "Nama Hottentot grammar*, or Böhm "Khoe-kowap" for reference.

  50. Josh Treleaven said,

    November 10, 2013 @ 8:03 pm

    Bonus fact: in Minecraft, "huh" is the only word spoken by human villagers. It has a couple of different meanings, depending on its tone.

  51. Martin said,

    November 10, 2013 @ 8:49 pm

    @ Steve Kass

    Indeed, Saussure identifies "authentic onomatopoetic words" and "interjections" (which he sees "closely related to onomatopoeia"), as words that might – at least at first sight – contradict the arbitrary nature of the linguistic sign. But he doesn't think the objection holds. First, because they are simply not very frequent. And second, because even though often imitations of natural sounds (in the case of onomatopoeia), there are still differences and language change might destroy similarities completely. For interjections, specifically, he says:"One is tempted to see in them spontaneous expressions of reality dictated, so to speak, by natural forces. But for most interjections we can show that there is no fixed bond between their signified and their signifier."

    Now, this last bit seems important to understand what Saussure actually meant with "arbitrary". I'd have been happier if the paper – who refers to Saussure's "Course in general linguistics" for the arbitrary nature of the sign, in order to refute it – would have be more explicit about that. I am not sure, but at least part of the definition of "arbitrary" in the paper seems to refer to the expectation that a word chooses aribtrarily among the phone or phoneme inventory of a language. Then they find that this is not so, and therefore – among others – Saussure is wrong, simplistically speaking. Now, apart from the fact that this might be true for an arbitrarily chosen word, but – to my understanding – not for a word that should be expected to be found in a certain quantile of the distribution, I do not think that this is what Saussure meant. See quote above first. And then this:

    "The word arbitrary also calls for comment. The term should not imply that the choice of the signifier is left entirely to the speaker (…); I mean that it is unmotivated, i.e. arbitrary in that it actually has no natural connection with the signified."

    If I read that correctly, Saussure would not have had a problem with a language where 99 percent of words contain just one of, say, 6 vowels; or if this is the case for a word in any number of languages. What he refers says, directly quotable, is the arbitrary relationship between the signifiant and the signifié, not the aribtrary nature of the realisation of the signifiant (that thereby itself gets a signifié, so it goes sith semiotics). E.g. there is no reason to identify "huh?" with what it means, the relationship is arbitrary, if the phonetic realisation of "huh?" is itself abritrary is another question. There is a difference, I think, but it's a bit elusive and hard to pin down (so it goes with semiotics, again); I'd have loved to see a discussion, but perhaps that's rather clear to trained linguists.

    The English translation of the "Course" from which I took the quotes from can be found here (quotes from pages 68/69):

  52. mark said,

    November 11, 2013 @ 2:17 am

    @Martin, sorry for the 'merge' operation — I got confused indeed.

    The most important point in response to your response is this: It's no surprise that the articles in Germanic and Romanic languages are similar, or that you find similar vowels in similar enclitics within the Khoekhoe languages. That's a result of shared inheritance (language relatedness), and nobody is surprised when that happens. Crucially, our reasoning holds for languages that are not (demonstrably) phylogenetically related. That is precisely why we didn't just look at Romance or Germanic, but at languages from different phyla around the world.

    To be totally clear, following this reasoning, it's perhaps not so surprising that English and Dutch have an OIRI that is similar (because the languages are otherwise related). But it *is* surprising that languages like Siwu, Cha'palaa, Murrinh-Patha, Mandarin Chinese, and Spanish have this incredibly similar form (because those languages are unrelated as far as we know).

    @Steve Kass, we discuss two possible explanations in our study. The first is 'innateness'. That one taps into the kind of phenomenon De Saussure was getting at for the interjections that are similar across languages: some interjections go back to instinctive cries that are hardwired into us, as seen by the fact that they're also found in other primates or that humans use them from a very young age (e.g. cries of pain, laughter). We think empirical evidence militates against this idea that 'huh?' is just an instinctive grunt. We also think that it's never a good idea to assume too much innate stuff for simple reasons of scientific parsimony. Do we have the evolutionary time to get all those instinctive cries somehow encoded in our genome? Perhaps for the very basic, corporeal ones that would've served our common ancestors well even before we spoke any language. Less likely so for this one.

    As we note on our explanatory website:

    ‘Huh?’ may seem almost primitive in its simplicity, but in fact nothing like it is found in our closest evolutionary cousins. It’s not an involuntary response like a sneeze or a cry of pain. Indeed, to have such a word, specialized for clarifying matters of understanding, only makes sense when a fully functioning cooperative system of communication (i.e., human language) is already in place — babies don’t use it, infants don’t use it perfectly, but children from about 5 have mastered it perfectly, along with the main structures of their grammar. If there is a plausible explanation that doesn’t assume it’s innate, we prefer that, on the standard scientific principle that it is best to keep to the simplest possible assumptions and explanations. In our paper we provide such an explanation: convergent cultural evolution.

  53. Martin said,

    November 11, 2013 @ 3:29 am

    No, actually that was not what I was getting at. But at this stage, if I didn't get across what I meant to say, I can only repeat myself without hoping to be clearer, so I'll let it at that.

  54. Observation said,

    November 11, 2013 @ 8:10 am

    Sorry – the corresponding jyutping for 吓 is haa2 (and is not related to scaring in any way).

  55. Rodger C said,

    November 11, 2013 @ 8:50 am

    It's no surprise that the articles in Germanic and Romanic languages are similar. … That's a result of shared inheritance (language relatedness), and nobody is surprised when that happens.

    No it's not; the common ancestor of Germanic and Romance had no articles. It's an areal phenomenon.

  56. Peter S. said,

    November 11, 2013 @ 9:34 am

    The common ancestor of Germanic and Romance languages had a word for "one". Most of the indefinite articles in Germanic and Romance languages are derived from the word for "one". This may explain why they are similar.

  57. Rodger C said,

    November 11, 2013 @ 9:52 am

    I did speak too hastily. If what's at issue is the form of the indefinite article, sure they're related.

  58. Martin said,

    November 11, 2013 @ 1:10 pm

    @ Rodger C, Peter S.

    As Mark, you didn't really bother to read my comment – which is understandable – it's long, boring, and probably not very intelligent. But this similarity because of relatedness of language is something I actually mentioned, so it's a bit odd to have it pointed out not once, not twice, but thrice.

    If relatedness is a relevant factor, then it is one further that is not controlled for in the paper (that would arguably also need a measure for the degree of relatedness, then). If being an Indo-European language has you assume similarities, then the explanatory power of the ten-language sample is about halfed. If it's Romance and Germanic languages that are not to close not to invoke relatedness – as you seem to suggest, and Mark, too – then they have still four languages in their sample that actually should only represent one sample point (I didn't check for other such clusterings). Which would make their sample really small, and also the extended sample not so much bigger.

    I put this example which refers to similar words within language families exactly because according to the very reasoning in the paper, all is supposed to be completely independent in different languages – there is no caveat in the paper. I argued several times that it seems counter-intuitive (for several reasons, not ony relatedness) to think so, to no avail. And suddenly, in my example (though I followed a different line of reasoning) I should have considered a factor (namely language relatedness) that a priori limits what we should expect in terms of variation. I mean, seriously? Either you reason based on an in-depth knowledge of the languages involved, or you do it without any a priori-constraint. One can't be flabbergasted by limited variation because one expected otherwise based on the reasoning that languages chose randomly out of an assumed uniformly distributed phoneme inventory, but then protest against my example invoking the fact that this is self-evidently not the case – because one knows one or two things about how languages work and that there is something like relatedness to account for! I'd argue that there are other factors to at least consider (I expressed some vague idea above that words called "syncategorematic" by some semioticians might maybe, perhaps, qua their nature, not be expected to include the more fringe-y sounds of a language's inventory. I don't know how to find out if that is nonsense or not, but I am just a dog on a computer. Would be interesting to find out what one has to control for. You seem to agree, but I am not sure that you agree that you agree.)

  59. mark said,

    November 11, 2013 @ 2:44 pm

    @Martin, sorry for the brevity and yes, some of your comments were TL;DR. In my defense, I think I did note I wouldn't be able to comment extensively here due to travels and other commitments.

    I didn't see a coherent counterproposal that might explain the similarity of the OIR interjection in Siwu, Cha'palaa, Lao, Mandarin, Murriny-Patha, and Spanish, say. Do you lean towards a monogenesis account of language and think that 'huh?' has been with us all this time? Do you think the selective pressures we've sketched don't play a role at all? What exactly does the syncategorematic point have going for it, and can you explain how it could cause the similarity of an item like this across unrelated languages? (I would be interested in a serious study of this phenomenon, but I'm with you in not being aware of published work that would speak to this question.)

    Yes, our sample is what typologists call a 'convencience sample'. The reason we study those 10 languages and not a randomly selected set of 10 (or 50, or 1000) ones, controlling for relatedness and areality, is because the kind of data one needs to draw from in a comparative study of conversational structures is simply not yet available for many languages beyond the 10 we study (apart perhaps from the 21 additional ones we cite), and therefore one has to work with what one can get.

    Why, you might ask, didn't we throw out all but one of the Indo-European languages (which I agree for some parts of the argument in the paper might be counted as one datapoint in response to Galton's problem)? The reason is that even from the diversity within IE (for instance the intonational difference between Icelandic and Dutch, and the consonant and vowel differences between those two and Spanish) we learn something about one of our research questions: namely whether the OIRI is a word (i.e. shows integration and conventionalisation). Recall that the paper is not just about the question of the universality of the OIRI (and the reason for this); it's also about its word-hood (and the reason for this).

    We fully agree that it is desirable to be able to control for language relatedness, areality, and other interesting possible effects of phonotactic skewing due to word class (which you seem to allude to), but the plain fact is that this is, at present, the best we can do. You seem to feel that there might be a serious case of Galton's problem (of controlling for phylogenetic relatedness) here. We don't think this is the case because of the high internal diversity of the sample (apart from the IE languages), which, it is true, we don't stress in the paper. It is possible that you think that relatedness may explain all; that innateness is a better explanation; that you disagree with our convergent evolution account; or that you find that it could all have been phrased better. Science is always a work in progress, and we can't wait to get our hands on more data to do a more rigorous test of the proposals. I take your comments to be in that spirit and I thank you for thinking along critically.

    I'll have to leave it at this. It is quite true that I haven't been able to read through your (and others') quite verbose comments so I might've left some points hanging, for which I apologise.

  60. David Eddyshaw said,

    November 11, 2013 @ 7:24 pm


    Compounding your offtopicness:

    I know of at least one study relating, if a bit indirectly, to sound symbolism and size words: there's a paper by Edward Sapir (no less) in DG Mandelbaum's collection of his "Selected Writings in Language, Culture and Personality" called "A Study in Phonetic Symbolism" which used made-up words of the pattern mVl, stated to mean "table", and asked respondents to say whether the words sounded as if the table was smaller or larger. Almost all the respondents were English speakers, but there were some Chinese. A "mal" is consistently bigger than "mil". There's a lot more to the article, which is worth tracking down just because it's fascinating (if not wholly persuasive …)

    I think Sapir did quite a bit in this general area of sound-symbolism, but am unfortunately too ignorant of the field to know if anyone has taken these things up much subsequently. Somebody here will know …

    I share your vague memory that there was a thread about this on LL once. It certainly isn't hard to reel off crosslinguistic pairs of "big" words with low vowels and "little" words with high vowels.

    I shall suppress an urge to mention "tinny" words and "woody" words at this point.

  61. Asya Pereltsvaig said,

    November 11, 2013 @ 10:59 pm

    @ Barbara Partee: let me throw a word of support to your doubts about the Russian "a?" To me the version in the YouTube video doesn't sound right for this context at all (having an actual exchange with "a?" embedded in it might be helpful, but I understand the privacy issues). To me it sounds very English-like to use "a?" this way. You "And?/So?" explanation sounds more on the right track to me. I would just say "chto?" or "cho?" (with a ch rather than sh, as a native Saint-Petersburger).

    @mark: "this is not something you can reliably elicit" — why is that? just say something unclear to a speaker and see their reaction!

  62. Martin said,

    November 12, 2013 @ 12:47 am

    @ David Eddyshaw

    Thanks for the reference, I'll look that up (though, is that still up-to-date?). Apart from that, I am all, but not off-topic. The motivation of the paper hinges on the assumption that there is actually something to explain (doh!). Pertaining to an example with indefinite articles in Germanic and Romance languages, everybody seems to agree that there is absolutely nothing to explain, because those languages are too related, so nobody expects much variation (though, I do not fully understand why not: definite articles seem to vary much more, at least between Germanic and Ronmance languages. Which brings me again to the basic problem that nowhere is even hinted to what, exactly, to expect, and why: Are Germanic and Romance languages far enough apart or not? This can't be answered ex-post depending on the question if the chosen example fits your story or not. There must be some criterion. Is Niger-Congo all just one data point, or has Bantu diverged enough from Niger-Congo languages of West-Africa? I do not think they have an answer to that.). I.e. there are factors to account for that might leave you with no residual phenomenon to explain (or the opposite). The paper did not look for such, but just assumed that the phenomenon ist striking. I have some very vague ideas about other things to look at , but then I also have not the slightest idea what I am talking about. There seems to be some faint agreement now, at least, that they did not just pull a word randomly out of different languages, but I am not sure. But, as has already been mentioned: the sample gets really small if you throw out sample points due to language relatedness (especially if you go up far enough the ladder of the family tree) – that it's also very diverse is of little importance here: if you have only a couple of data points, you have a statistical problem. What about language contact, by the way (e.g. the fact that their Khoe languages was in close contact to Bantu languages ever since we know it, and certainly before that)?

    In short: Never has been made clear what is to be expected, what is to be controlled for, and thus why there is actually anything to explain (or if it's significant). And though the explanation delivers the result and aparently avoids problems related to ideas about innateness, it has itself not a single data point.

  63. mark said,

    November 12, 2013 @ 6:12 am

    @Asya, Barbara, the Russian corpus was collected by Julija Baranova in the Chelyabinsk region. Perhaps you're tapping into some dialectal variation here? If you really want to know I can send you some more sound samples — you can find my email on my homepage.

  64. mark said,

    November 12, 2013 @ 6:12 am

    Oh, and @Asya, of course we also find examples of shto, as we also mention in our article.

  65. mark said,

    November 12, 2013 @ 6:53 am

    And another thing @asya: you write "just say something unclear to a speaker and see their reaction!"
    Yes, that'd be an interesting elicitation task, but experience shows that what you get depends a lot on social asymmetries. I.e. you can't trust that what *you* elicit as a fieldworker/researcher is representative of what people use most commonly in their own everyday face to face informal conversations. It would certainly be informative though, and it's a possibility we have considered trying out ourselves to do additional elicitation. For us it is methodologically important, however, to make sure that what we research represents sort of a baseline of informal conversational behaviour.

  66. Boris said,

    November 12, 2013 @ 1:39 pm

    I can vouch for the fact that the Russian "a?" is used exactly in the same way as the English "Huh?". It cannot mean anything like "so?" in the interrogative form.

    The interesting thing about "uh-huh" (or "mm-hmm"" or however you want to spell it) in Russian is that it is near identical in pronunciation to English despite the fact that Russian doesn't have an "h" sound. Does that make it more of an instinctive word than "huh"?

  67. speedwell said,

    November 13, 2013 @ 2:44 pm

    My Northern Irish husband says "uh?" with rising intonation for "eh?" ("I didn't hear that" or "I didn't understand what you meant"), "eh" (general noise of mild disagreement), and "aye" ("yes"). I don't mean he replaces those three words with his word, I mean that's actually how he pronounces all three… at least when he's speaking in a low voice. If I ask him to differentiate them, he can enunciate them more clearly.

    Since I was not, myself, brought up anywhere within a day's sail of his hometown, it's hard enough for me to understand what he actually does enunciate clearly (before you ask, Skype). :)

  68. mark said,

    November 20, 2013 @ 3:04 am

    In response to the widespread media coverage of our study (some of which was, predictably, overblown) we've added a Frequently Asked Questions page to our mini-website, with pointers to the relevant sections of the paper:

    Some LL commenters will see their questions answered there as well.

  69. mark said,

    November 25, 2013 @ 12:12 pm

    Although we seem to be long past the due date of discussion here, just for the record I want to come back to the fact that both Barbara Partee and Asya Pereltsvaig voiced concerns that they didn't recognise the Russian "a?" as a possible initiator of repair. Of course, against these opinions, Boris above already noted that according to him, "Russian "a?" is used exactly in the same way as the English "Huh?"."

    But more conclusively, I have since shared a relevant portion of the data underlying our analysis with Asya Pereltsvaig (the audio recordings were helpfully made available by my colleague Julija Baranova, who investigates OIR in Russian). After listening to a number of full OIR sequences, Asya has gracefully conceded that "a? is indeed used in the OIR function in Russian" (on her GeoCurrents blog).

    This kind of exchange is quite typical for many of the conversations we've had concerning this study. It seems there is no guarantee that native speakers or even expert linguists will be aware of such crucial and common aspects of everyday language use as the one we studied. This is one of the reasons we have gone to great lengths to ground our claims in representative corpora of informal conversation.

RSS feed for comments on this post