Common language

« previous post | next post »

Provocative research results reported in Sci-News (9/13/16), "Unrelated Languages Often Use Same Sounds for Common Objects and Ideas, Research Finds":

A careful statistical examination of words from 6,000+ languages shows that humans tend to use the same sounds for common objects and ideas, no matter what language they're speaking.

The article reports on findings of a research group led by Morten Christiansen of Cornell University, with colleagues from Argentina, Germany, the Netherlands, and Switzerland.  The team "analyzed 40-100 basic vocabulary words in 62% of the world's more than 6,000 current languages and 85 percent of its linguistic lineages."

Damián E. Blasi, Søren Wichmann, Harald Hammarström, Peter F. Stadler, and Morten H. Christiansen, "Sound–meaning association biases evidenced across thousands of languages", PNAS, published online September 12, 2016; doi: 10.1073/pnas.1605782113.

Significance

The independence between sound and meaning is believed to be a crucial property of language: across languages, sequences of different sounds are used to express similar concepts (e.g., Russian "ptitsa," Swahili "ndege," and Japanese "tori" all mean "bird"). However, a careful statistical examination of words from nearly two-thirds of the world's languages reveals that unrelated languages very often use (or avoid) the same sounds for specific referents. For instance, words for tongue tend to have l or u, "round" often appears with r, and "small" with i. These striking similarities call for a reexamination of the fundamental assumption of the arbitrariness of the sign.

Abstract

It is widely assumed that one of the fundamental properties of spoken language is the arbitrary relation between sound and meaning. Some exceptions in the form of nonarbitrary associations have been documented in linguistics, cognitive science, and anthropology, but these studies only involved small subsets of the 6,000+ languages spoken in the world today. By analyzing word lists covering nearly two-thirds of the world's languages, we demonstrate that a considerable proportion of 100 basic vocabulary items carry strong associations with specific kinds of human speech sounds, occurring persistently across continents and linguistic lineages (linguistic families or isolates). Prominently among these relations, we find property words ("small" and i, "full" and p or b) and body part terms ("tongue" and l, "nose" and n). The areal and historical distribution of these associations suggests that they often emerge independently rather than being inherited or borrowed. Our results therefore have important implications for the language sciences, given that nonarbitrary associations have been proposed to play a critical role in the emergence of cross-modal mappings, the acquisition of language, and the evolution of our species' unique communication system.

It seems to me as though they are making a brief for the role of sound symbolism in the evolution of languages.  See:

"Phonosymbolism and Phonosemantics in Chinese" (1/13/12)
"What does "Schmetterling" sound like to a German?" (1/30/16)

Could the results of this study not also be interpreted as an argument for the existence of a Proto-Human?

Cf. "Ur-etyma: how many are there?" (7/6/14)

[h.t. John Hill]



41 Comments

  1. Mark Meckes said,

    September 14, 2016 @ 12:46 pm

    The statement "The areal and historical distribution of these associations suggests that they often emerge independently rather than being inherited or borrowed" in the abstract seems to pretty directly foreclose interpreting this as support for the existence of a Proto-Human. At most, it would seem to suggest some extremely vague guesses for what some words in Proto-Human might have sounded like, if it existed at all.

  2. JS said,

    September 14, 2016 @ 1:28 pm

    I am curious about the significance of "Significance," in which the authors (I assume) essentially assert the laughable falsehood that they are the first to observe that the relationship between sound and meaning is not 100% arbitrary. Then the "Abstract" proceeds to make fairer statements, generally speaking. Is this, like, doing tabloid science journalism's job for it, or…?

  3. Eidolon said,

    September 14, 2016 @ 1:59 pm

    This discovery could make sense of many of the difficult to explain cognates between distant languages, though I think it also complicates the process of establishing genetic relationships between languages even more, if we now have to account for independent convergence in basic vocabulary. As to whether it might indicate the existence of a primitive human language shared by all homo sapiens, it's possible, and the way to test whether it's convergence or early divergence would be to see how many of these terms were present in their languages at the very beginning, as opposed to developing later on.

  4. Martin said,

    September 14, 2016 @ 4:10 pm

    I think the existence of Proto-Human would speak against the thesis professed in the paper: It would be a sign that these now 'merely' similar relationships between meaning and phonetic realisation are the result of a divergence of the phonetic forms from one another. I.e. this would an argument FOR the independence of meaning and sound, or signifiant and signifié on another level. Note in this regard that Saussure gave the following example for this independence: French 'pigeon' goes back to vulgar latin 'pīpiō', which had at least strong elements of an onomatopeya. These having been completely obliterated in (still) modern 'pigeon' shows clearly (acc. to Saussure) that the linguistic sign is unmotivated ('immotivé').

    Note that this 'immotivé' is actually Saussures explicit explanation of how to understand the notion of arbitrariness. In discussions today, it is often taken to have sort of a probabilistic meaning, implying e.g. a complete lack of association between signifiés for a given signifiant across languages, which is then shown to be untrue (remember 'Huh?'). Why this should be an implication and what this has to do with at least the Saussurean strand of the argument I'd like somebody who knows something about this to explain to me.

  5. Chris C. said,

    September 14, 2016 @ 4:29 pm

    "Could the results of this study not also be interpreted as an argument for the existence of a Proto-Human?"

    This may be a stupid question on my part. Do we need any further arguments for the existence of a Proto-Human? It seems to me that arguments analogous to those pointing to a single origin for all life on Earth should suffice. For instance, all life uses the substantially same genetic code; the same codons specify the same amino acids in nearly all cases with only slight variations. If life had arisen more than once, we'd expect to see more variation in the code. Doesn't the fact that all human languages can be analyzed in substantially similar ways in terms of grammar and syntax point to a single invention regardless of any other data?

    I suppose it's difficult to imagine how anything we might see as language might possibly be constructed using different principles, at least if you're not Ted Chiang. But if there had been no Proto-Human, wouldn't we expect to find some highly anomalous languages that don't seem to fit into the general paradigm in important ways? (Or perhaps we do and I'm just ignorant of them.)

    Maybe, if we knew what concepts humans we physiologically or psychologically prone to vocalizing in similar ways across languages, it would help us characterize Proto-Human at least in terms of phonetics, even if it wouldn't tell us much more about it.

  6. David L said,

    September 14, 2016 @ 4:38 pm

    It's easy to think up 'just so' stories for some the examples mentioned. If you poke your tongue out and point at it you are likely to make 'l' sounds. If you scrunch up your nose and gesticulate you may well make nasal 'n' sounds of some sort. If you indicate roundness by opening your mouth in a round shape, you will produce 'r' sounds. You might signal smallness by hunching your shoulders and pursing your lips to make high-pitched 'eee, eee' noises.

    Full — hmm, that's tougher. Pat your tummy and open your mouth to make satisfied 'bah, bah, bah' sounds?

  7. Simon Fodden said,

    September 14, 2016 @ 5:16 pm

    To veer off topic… I notice that VM uses "the Netherlands." I sometimes do as well, but worry that I ought to be saying simply "Netherlands." The "the" used to designate some region less defined than a nation, I think: e.g. the Levant. Anyhow, how do other people call that delightful country?

  8. Adrian said,

    September 14, 2016 @ 5:28 pm

    @Simon Although one hears it a lot, I think that the argument that we shouldn't/can't call a country "The x" because [insert spurious reason here] is bogus.

  9. Christian Weisgerber said,

    September 14, 2016 @ 5:31 pm

    @Chris C.
    Here's a well-reasoned argument by Piotr Gąsiorowski that humanity has always been multilingual:
    https://langevo.blogspot.de/2013/04/too-many-to-communicate.html

  10. Jerry Friedman said,

    September 14, 2016 @ 5:47 pm

    Chris C.: Nearby languages influence each other, so it's possible that the descendants of Protolanguage 1 and Protolanguage 2 (if there were two) have influenced each other in ways that have wiped out any distinction.

    Also Protolanguage 1 and Protolanguage 2 and their descendants could be similar because of linguistic universals, common features that result from common features of humanity. I believe that topic has been discussed a lot in linguistics. I'm not a linguist and I don't know much about it.

    Martin: I think it would be interesting to study how these claimed similarities interact with the usual sound correspondences of related languages. If a language has a word for 'full' starting with /p/, and in some of its descendants that sound has changed to /f/ or /h/ or whatever, is there a tendency for those descendants to have words for 'full' that have /p/ or /b/ anyway? Maybe by an atypical sound development or borrowing or the appearance of a new word? Obviously this didn't happen in English with full, though.

    Simon Fodden: I say "the Netherlands" and "the Gambia", and I had trouble stopping saying "the Ukraine". (I'm a native speaker of American English.)

  11. Guy said,

    September 14, 2016 @ 5:58 pm

    It's too bad that it would be impractical and unethical to create a community of humans with no exposure to any existing language to see if and how they develop their own language, and then compare any resulting language to existing languages to get an idea of what "universals", if any, might be attributable to a genetic relationship. I understand that sign languages often "pop up" naturally but they usually inherit features from the local spoken language.

  12. Mark S said,

    September 14, 2016 @ 6:35 pm

    @Guy: Something similar has happened several times: Children exposed to pidgins tend to develop a Creole language, which borrows vocabulary from the pidgin, but which has its own grammar. Interestingly, that grammar is very similar across unrelated Creoles.

  13. Jenny Chu said,

    September 14, 2016 @ 7:08 pm

    @Guy: I would be interested in seeing a similar study on signed languages, which are popularly understood to be more "iconic" than spoken languages.

  14. Charlie said,

    September 14, 2016 @ 7:32 pm

    Guy: Even if it was ethical, that would be a very difficult experiment to do right. Where do you get people with no exposure to language? New-born babies. How do you raise new-born babies without using people who know language, and thus simply cannot avoid passing on that language in some form or another, and still be able to raise the baby without introducing factors that would negatively affect the experiment's results (e.g. never speaking to a baby or using language in any way might lead to a developmentally stilted person)?

    In fact, the more I think about it, it's practically impossible *not* to pass on your language to your children.

  15. Chris C. said,

    September 14, 2016 @ 7:53 pm

    @Charlie — Derek Bickerton had a proposal or two about that some time ago, and even suggested a study design that might conceivably be ethical.

  16. Y said,

    September 14, 2016 @ 10:54 pm

    Guy, Chris C., Jenny Chu: There are a number of sign languages which have developed spontaneously and independently, with no external influence. Such languages start out more iconic, but in later generations become more abstract and more expressive, i.e. like regular languages.

  17. AntC said,

    September 14, 2016 @ 11:24 pm

    Thank you Victor. I knew when I read of this research syndicated in my local paper it would turn up on LL.

    we find property words ("small" and i, "full" and p or b) and body part terms ("tongue" and l, "nose" and n)

    How can they write that with a straight face? 75% of their example claims don't apply to English, apparently. They could at least have quoted "little" with /i/, "labial" with /l/. Or are we to take a very broad interpretation of the indicator sounds? The /f/ in "full" is kinda like a /p/: front-articulated, aspirated.

    Or do they mean: there's a 'significant' incidence of some-word-or-other meaning roughly-this including some-sound-or-other roughly-that? Borrowing could explain a huge proportion of that. Is this any more of a valid exercise than poor Edo Nyland?

    As for the papa, mama, baba words: Otto Jesperson's explanation (discussed here long ago) seems completely convincing.

  18. Bathrobe said,

    September 15, 2016 @ 12:15 am

    In fact, the more I think about it, it's practically impossible *not* to pass on your language to your children.

    So once it takes hold, language is a virus you can't eradicate?

  19. Reinhold {Rey} Aman said,

    September 15, 2016 @ 12:18 am

    @ Guy:
    It's too bad that it would be impractical and unethical to create a community of humans with no exposure to any existing language to see if and how they develop their own language…

    See Wikipedia's "Language deprivation experiments".

  20. speedwell said,

    September 15, 2016 @ 12:25 am

    Am I missing something? I can't access the paper itself, but nothing I've seen in the linked articles and those linked to them says that the authors considered language groups more statistically important than the individual languages that comprise them. If I were trying to plan a study like theirs, I wouldn't count individual languages, I'd take the most common sounds in each language "family" and then compare those typical sound profiles with each other. If you take 500 words from a language family with 600 members in which the selected word typically contains an S, for example, and 100 words from a language family with 125 in which the selected word typically contains a B, doesn't that just mean that the study will find there are five times more S words than B words, and conclude that the S words must reflect some universal S-ness?

  21. Johan P said,

    September 15, 2016 @ 4:31 am

    @Simon Fodden – The Netherlands is the correct form in English, unless it appears in a list or in (say) a football score headline, in which case just Netherlands is customary. "*From Netherlands" is ungrammatical in English, as it would be for instance with the Gambia, the Czech Republic, the Philippines or the Bahamas, and indeed the United Kingdom and the United States themselves.

    However, this fact can change quite rapidly as well. "The Congo" or "The Sudan", common fifty years ago, now have a distinct colonial whiff to them and are considered obsolete. Some British newspapers persist with using "The Ukraine", but that, too, is quickly becoming old-fashioned.

  22. Bart said,

    September 15, 2016 @ 6:01 am

    @Johan P – I agree. Living here, I have no doubt about saying or writing in English THE Netherlands. The problem I have is deciding whether to write 'The' or 'the'. In some contexts upper case looks right; in other contexts lowr case.

  23. Christian Weisgerber said,

    September 15, 2016 @ 7:12 am

    @Jerry Friedman

    If a language has a word for 'full' starting with /p/, and in some of its descendants that sound has changed to /f/ or /h/ or whatever, is there a tendency for those descendants to have words for 'full' that have /p/ or /b/ anyway? Maybe by an atypical sound development or borrowing or the appearance of a new word?

    Not quite what you asked for, but Latin lingua from Old Latin *dingua (cognate with English tongue) comes to mind. That's an irregular change that gave the "tongue" word an L.

  24. languagehat said,

    September 15, 2016 @ 8:18 am

    Doesn't the fact that all human languages can be analyzed in substantially similar ways in terms of grammar and syntax point to a single invention regardless of any other data?

    That is not a fact, it is a misguided idea that was briefly popular but has been convincingly refuted. I expect these "provocative research results" to suffer the same fate.

  25. GH said,

    September 15, 2016 @ 1:55 pm

    @Christian Weisberger:

    The argument about the population bottlenecks is all well and good, but it doesn't really seem to address whether all the languages or dialects spoken at that time could still have derived from an earlier common ancestor.

    The first language or languages must (per definition) have arisen with the development of language – however you decide to distinguish language from proto-language. While some associate modern "symbolic language" with an "Upper Paleolithic Revolution" leading to "behaviorally modern humans" supposedly taking place around the time of these population bottlenecks, it seems like most theories place the development much further back in time (more like 100-350 kY, with some form of proto-language perhaps going back several million years), so it's the size of the population participating in that development, quite possibly linked to the evolution of Homo sapiens from Homo erectus/heidelbergensis, that is relevant.

    If proto-language evolved into modern "symbolic" language within a relatively small group of human ancestors, it was presumably originally a single language that then spread (perhaps along with the new species) and differentiated into different languages and language families: Proto-Human.

    If, on the other hand, the evolution (cultural and genetic) was widely dispersed across a big population, different groups must already (per the article's argument) have spoken different proto-language "dialects", with each then at some point separately accumulating enough complexity to become proper languages. So there would not have been a single original language, but neither would the first languages have developed in total isolation from each other, and their proto-language ancestors would have to be ultimately related at some point.

    On the other hand, the idea that totally independent languages sprang up without any contact whatsoever and from no closely related predecessors would seem to imply that the whole evolution of language happened multiple times in isolation, which seems immensely implausible.

  26. Chris C. said,

    September 15, 2016 @ 3:26 pm

    @Y — I'm aware of at least one such circumstance in Nicaragua, but I'm not sure even in that case the signers were sufficiently isolated from the concept of language to make a conclusive case, since there were at least efforts to teach them some kind of signing.

  27. Chris C. said,

    September 15, 2016 @ 3:50 pm

    @languagehat — What has been refuted? That all human languages can be analyzed in substantially similar ways? Are you saying that not all human languages have grammar, syntax, and some equivalent of phonetics?

    I'd appreciate information rather than a sneer, unless that's too much trouble.

  28. languagehat said,

    September 15, 2016 @ 4:12 pm

    @languagehat — What has been refuted? That all human languages can be analyzed in substantially similar ways? Are you saying that not all human languages have grammar, syntax, and some equivalent of phonetics?

    I'm saying there's no common ground of grammar/syntax by which all languages can be measured, a Chomskyan belief which has been refuted by showing counterexamples to all proposed commonalities; I thought that was what you were referring to. If there was any sneering, it was at him and his cockamamie ideas, not at you.

  29. John Roth said,

    September 15, 2016 @ 4:19 pm

    There's a rather well researched proposal for a basic set of meanings that occur in all languages. See https://www.griffith.edu.au/humanities-languages/school-humanities-languages-social-science/research/natural-semantic-metalanguage-homepage . There are a fair number of publications on the resources page.

    What this is not, very emphatically, is a claim that there are any similarities in sounds or in the words that instantiate the meanings.

    Since someone brought up the observation that creoles (the second generation after a pidgin) seem to have very similar grammar, I'll note that the 65 or so basic meanings in NSM have specific combinatorial properties, that is, the basics of a grammar. More research is needed on this point (actually, some, as in any, research is needed on this point).

  30. JS said,

    September 15, 2016 @ 7:47 pm

    Also @languagehat, these are largely not "provocative research results" at all, but widely-recognized phenomena supported by experimental and statistical results that go back at least to Otto Jespersen in 1922 (/i/ and smallness).

  31. Y said,

    September 16, 2016 @ 12:35 am

    Chris C.: I highly recommend Margalit Fox's Talking Hands, about Al-Sayyid Bedouin Sign Language, spoken in a small community in Israel.
    There are more. The Wikipedia entry for "village sign languages" lists about two dozen.

  32. Guy Tabachnick said,

    September 16, 2016 @ 12:51 am

    @Christian Weisgerber:

    *dingua > lingua may be irregular, but it's hardly unprecedented, and to attribute the Sabine l to sound symbolism seems like it would dismiss a lot of other interesting facts about the change.

    https://books.google.cz/books?hl=cs&lr=&id=RtsLKLhCZ0EC&oi=fnd&pg=PA311&dq=tim+pulju+indo-european+d+l+dl&ots=KrU_EOqVVk&sig=ODMnY2NFZ1-Qiu9Xwr4GrDx1BLw&redir_esc=y#v=onepage&q=tim%20pulju%20indo-european%20d%20l%20dl&f=false

  33. Victor Mair said,

    September 16, 2016 @ 9:00 am

    People are saying that "the theory is as old as the hills" (a supposed relationship between certain sounds and certain meanings) and indeed it is. See George van Driem, Languages of the Himalays (2001), p. 153. The difference with the new research reported here is that they are attempting to prove the theory with massive statistical analysis of empirical data.

  34. J.W. Brewer said,

    September 16, 2016 @ 10:47 am

    To expand on a comment above, the basic Saussurean proposition that the relationship between sound and meaning is arbitrary has never been claimed to be an absolute and exceptionless rule, if only because Saussure was not unaware that onomatopoeia is a thing. And sometimes (as the example given in that comment suggests) it may work both ways – a word in a given language starts off as transparently onomatopoeic but then through the operation of regular sound changes over the centuries gets transformed to a point where that aspect of the "etymology" becomes opaque to anyone who doesn't know the history and the signifer has thus evolved into arbitrariness. So the question is how frequent exceptions need to be from the general pattern of arbitrariness for that to be dramatic or exciting news. Since the abstract contains lots of adjectives but not very many statistics and the full article requires $ to view, I reserve judgment on whether the claims made in the full article are actually worth getting excited about if true.

  35. J.W. Brewer said,

    September 16, 2016 @ 11:02 am

    Note also in the title of the article another instance of the phenomenon that came up in another thread last month, namely the use of the word "bias[es]" in a technical sense not intended to be pejorative, which raises the potential misunderstanding by a more general audience for whom "bias" is almost uniformly a pejorative word. Would just saying "sound-meaning associations" rather than "sound-meaning association biases" have weakened (or altered) the meaning of the title?

  36. January First-of-May said,

    September 16, 2016 @ 2:57 pm

    @Christian Weisgerber – it is almost certain that Europe had always been multilingual (for the last 20 millenia or so, anyway). That does not refute Proto-(Indo-)European. (Similarly for the Middle East and Semitic.)

    Even if the Toba bottleneck did not actually happen, it is entirely not impossible that most of the world's languages come from the same ancestor, perhaps as recently as 30-40 millenia ago (well after Toba).
    (I'm only not saying "all" because it's clear that Australia had been essentially isolated for the last 30,000 years or so [prior to European contact] – and whatever's going on in New Guinea hints at something similar – but even that would likely just add another 20-30 millenia; in any case probably well after the actual evolutionary development of language.)
    Of course, it is likewise not impossible that the Khoisan languages (or Basque, or some Tasmanian language, or *insert your favorite language isolate here*, or indeed Piraha for all we know) actually happen to be descended from a completely unrelated language acquisition event way back when. (And, of course, we probably couldn't find out either way.)
    It's just that there's been so much language spread in the last few millenia (Afro-Asiatic, Austronesian, Na-Dene, Pama-Nyungan… Altaic too, if that family is real) that there's no real reason there wasn't more in the past, and it doesn't take very much of it, over 60,000 years, for a worldwide common language ancestor.

    In any case, it is hardly disputable that there had been many historical languages and language groups that had gone the way of the trilobites, pterosaurs and sauropods.
    Presumably there had also been plenty of Paleolithic languages whose descendants didn't make it to the Iron Age (and it is entirely possible that, in fact, this includes all but one of them).

    (All of the above is entirely my opinion, incidentally. And part of that opinion might well change the next day.)

  37. Rodger C said,

    September 17, 2016 @ 10:53 am

    it's clear that Australia had been essentially isolated for the last 30,000 years or so [prior to European contact]

    There's been a good deal of talk about genetic and linguistic evidence for a Dravidian influx into Australia ca. 4000 BP, but I can't quickly find a non-popular exposition of the idea.

  38. BZ said,

    September 19, 2016 @ 11:13 am

    Re: "The Netherlands", I get the impression that any country name (or word in general) that is syntactically plural gets a definite article (e.g. "The United States of America"). In fact, even country names that are only conceptually plural (i.e. implying a collection of multiple areas, like "The United Kingdom") get an article.

  39. GH said,

    September 19, 2016 @ 2:16 pm

    It seems true that almost any proper place name that is plural in form gets a definite article: the Hamptons, the Alps, the Philippines, the Americas… Presumably this is because English lacks an indefinite plural article, so leaving the article out misleadingly suggests indefiniteness. This applies to plural-form country names even though they tend to use singular verb agreement (the USA, the Philippines, the Netherlands; the UN also seems to fall under this rule, and helps demonstrate that it applies even in abbreviated form).

    However, some other proper place names formed in the plural are nevertheless singular in construction, and these do not use the article. This particularly seems to apply to smaller features such as gardens/parks (Kew Gardens, Kensington Gardens), farms (Ford Farms, Green Acres), or neighborhoods (Beverly Hills, Washington Heights, Ashburn Estates, Summit Woods). Here use of the article would seem to signal a reference to the constituent elements, and require the plural ("Kew Gardens is…" but "The Kew Gardens are…"; "The Beverly Hills range in height from 200 to 500 meters.")

    But aside from that, country names that are phrases modifying generic nouns (e.g. "kingdom", "republic", "union", "state", "empire") usually take articles as normal. So I don't think we say "the United Kingdom" because of its conceptual plurality, but simply because "kingdom" is a regular noun that takes an article, just like "the British Empire" or "the Czech Republic" or "the Soviet Union".

    However, this second pattern doesn't apply to all nouns. For example, place names formed on the basis of "island" or "forest" usually dispense with the article ("Easter Island", not "the Easter Island"). I'm not sure what's going on there, but I bet the experts have looked into it.

  40. Adrian Morgan said,

    September 20, 2016 @ 9:35 pm

    I find this interesting in part because I discussed the possibility of studies somewhat like this in the 2003 undergraduate data mining paper discussed in the comments here (the main obstacle I pointed out at the time was that of defining a semantic distance metric). But I agree with languagehat and others that scepticism is the only appropriate response.

  41. David Marjanović said,

    September 23, 2016 @ 4:30 am

    the UN also seems to fall under this rule, and helps demonstrate that it applies even in abbreviated form

    Either that, or the singular actually applies to its full name – United Nations Organization.

    (Routinely called UNO in German and ONU in French.)

RSS feed for comments on this post