Linguistic relativity: snow and horses

« previous post |

For the record:

"Do Inuit languages really have many words for snow? The most interesting finds from our study of 616 languages", The Conversation (4/10/25); rpt. in phys.org/news (4/13/25)

Authors:

Charles Kemp
Professor, School of Psychological Sciences, The University of Melbourne (PhD MIT
Ekaterina Vylomova
Lecturer, Computing and Information Systems, The University of Melbourne (The University of Melbourne, PhD/Computational Linguistics)
Temuulen Khishigsuren
PhD Candidate, The University of Melbourne (National University of Mongolia, M.A. in linguistics)
Terry Regier
Professor, Language and Cognition Lab, University of California, Berkeley (Ph.D., Computer science, UC Berkeley, 1992; frequent co-author with Paul Kay; among his most-cited work is:

"Whorf hypothesis is supported in the right visual field but not the left",
Aubrey L. Gilbert; Terry Regier; Paul Kay; Richard B. Ivry.
Proceedings of the National Academy of Sciences of the United States of America (2006)\

These two articles (The Conversation and phys.org/news) are journalistic accounts of the scientific study by Kemp, Vylomova, Khishigsuren, and Regier.

The full scientific paper is here:

Temuulen Khishigsuren et al, "A computational analysis of lexical elaboration across languages", Proceedings of the National Academy of Sciences (2025). DOI: 10.1073/pnas.2417304122

Journal information: Proceedings of the National Academy of Sciences

Significance

The vocabulary of any language emphasizes some areas more than others, and the number of terms for fish, cattle, smells, and other concepts varies across languages. Most work on lexical elaboration relies on manually compiled data, but we show how lexical elaboration can be measured using data from bilingual dictionaries, and use this measure to develop analyses of lexical elaboration that span hundreds of languages and thousands of concepts. Our work suggests several hypotheses about well-studied concepts (e.g. that smell terms are well developed in Oceanic languages), and opens up the investigation of concepts that are new to the literature on lexical elaboration (e.g. dance).
 

Abstract

Claims about lexical elaboration (e.g. Mongolian has many horse-related terms) are widespread in the scholarly and popular literature. Here, we show that computational analyses of bilingual dictionaries can be used to test claims about lexical elaboration at scale. We validate our approach by introducing BILA, a dataset including 1,574 bilingual dictionaries, and showing that it confirms 147 out of 163 previous claims from the literature. We then identify previously unreported examples of lexical elaboration, and analyze how lexical elaboration is influenced by ecological and cultural variables. Claims about lexical elaboration are sometimes dismissed as either obvious or fanciful, but our work suggests that large-scale computational approaches to the topic can produce nonobvious and well-grounded insights into language and culture.

Some highlights from the two journalistic articles cited above:

Languages are windows into the worlds of the people who speak them – reflecting what they value and experience daily.

So perhaps it’s no surprise different languages highlight different areas of vocabulary. Scholars have noted that Mongolian has many horse-related words, that Maori has many words for ferns, and Japanese has many words related to taste.

Some links are unsurprising, such as German having many words related to beer, or Fijian having many words for fish. The linguist Paul Zinsli wrote an entire book on Swiss-German words related to mountains.

One example of a concept we looked at was “horse”, for which the top-scoring languages included French, German, Kazakh and Mongolian. This means dictionaries in these languages had a relatively high number of

    1. words for horses. For instance, Mongolian аргамаг means “a good racing or riding horse”
    2. words related to horses. For instance, Mongolian чөдөрлөх means “to hobble a horse”.

Our findings support most links previously highlighted by researchers, including that Hindi has many words related to love and Japanese has many words related to obligation and duty.

We were especially interested in testing the idea that Inuit languages have many words for snow. This notorious claim has long been distorted and exaggerated. It has even been dismissed as the “great Eskimo vocabulary hoax”, with some experts saying it simply isn’t true.

But our results suggest the Inuit snow vocabulary is indeed exceptional. Out of 616 languages, the language with the top score for “snow” was Eastern Canadian Inuktitut. The other two Inuit languages in our data set (Western Canadian Inuktitut and North Alaskan Inupiatun) also achieved high scores for “snow”.

The Eastern Canadian Inuktitut dictionary in our dataset includes terms such as kikalukpok, which means “noisy walking on hard snow”, and apingaut, which means “first snow fall”.

The top 20 languages for “snow” included several other languages of Alaska, such as Ahtena, Dena'ina and Central Alaskan Yupik, as well as Japanese and Scots.

Scots includes terms such as doon-lay, meaning “a heavy fall of snow”, feughter meaning “a sudden, slight fall of snow”, and fuddum, meaning “snow drifting at intervals”.

You can explore our findings using the tool we developed, which allows you to identify the top languages for any given concept, and the top concepts for a particular language.

The top-scoring languages for “smell” include a cluster of Oceanic languages such as Marshallese, which has terms such as jatbo meaning “smell of damp clothing”, meļļā meaning “smell of blood”, and aelel meaning “smell of fish, lingering on hands, body, or utensils”.

Prior to our research, the smell terms of the Pacific Islands had received little attention.

Much to their credit, the authors are careful to issue a set of thoughtful caveats:

Although our analysis reveals many interesting links between languages and concepts, the results aren’t always reliable – and should be checked against original dictionaries where possible.

For example, the top concepts for Plautdietsch (Mennonite Low German) include von (“of”), den (“the”) and und (“and”) – all of which are unrevealing. We excluded similar words from other languages using Wiktionary, but our method did not filter out these common words for Plautdietsch.

Also, the word counts reflect both dictionary definitions and other elements, such as example sentences. While our analysis excluded words that are especially likely to appear in example sentences (such as “woman” and “father”), such words could have still influenced our results to some extent.

Most importantly, our results run the risk of perpetuating potentially harmful stereotypes if taken at face value. So we urge caution and respect while using the tool. The concepts it lists for any given language provide, at best, a crude reflection of the cultures associated with that language.

To conclude, one of my favorite Mongolian words is Morin Khuur (Mongolian: Морин хуур), which may be translated as "horse fiddle".

Soulful Mongolian Horsehead Fiddle | Coplans in China
 
The full Classical Mongolian name is Morin Tologhay'ta Quğur (Морин толгойтой хуур), meaning "fiddle with a horse's head".

See "Some Mongolian words for 'horse'" (11/7/19), a variorum post with observations and comments by more than two dozen specialists.

 

Selected readings

The claim that Eskimo words for snow are unusually numerous, particularly in contrast to English, is a cliché commonly used to support the controversial linguistic relativity hypothesis. In linguistic terminology, the relevant languages are the Eskimo–Aleut languages, specifically the Yupik and Inuit varieties.

The strongest interpretation of the linguistic relativity hypothesis, also known as the SapirWhorf hypothesis or "Whorfianism", posits that a language's vocabulary (among other features) shapes or limits its speakers' view of the world. This interpretation is widely criticized by linguists, though a 2010 study supports the core notion that the Yupik and Inuit languages have many more root words for frozen variants of water than the English language. The original claim is loosely based in the work of anthropologist Franz Boas and was particularly promoted by his contemporary, Benjamin Lee Whorf, whose name is connected with the hypothesis.[4][5] The idea is commonly tied to larger discussions on the connections between language and thought.

Franz Boas did not make quantitative claims but rather pointed out that the Eskaleut languages have about the same number of distinct word roots referring to snow as English does, with the structure of these languages tending to allow more variety as to how those roots can be modified in forming a single word. A good deal of the ongoing debate thus depends on how one defines "word", and perhaps even "word root".

The first re-evaluation of the claim was by linguist Laura Martin in 1986, who traced the history of the claim and argued that its prevalence had diverted attention from serious research into linguistic relativity. A subsequent influential and humorous, and polemical, essay by Geoffrey K. Pullum repeated Martin's critique, calling the process by which the so-called "myth" was created the "Great Eskimo Vocabulary Hoax". Pullum argued that the fact that the number of word roots for snow is about equally large in Eskimoan languages and English indicates that there exists no difference in the size of their respective vocabularies to define snow. Other specialists in the matter of Eskimoan languages and Eskimoan knowledge of snow and especially sea ice argue against this notion and defend Boas's original fieldwork amongst the Inuit, at the time known as Eskimo, of Baffin Island.

[Thanks to Hiroshi Kumamoto and Ross Presser]



5 Comments »

  1. jhh said,

    April 15, 2025 @ 12:05 pm

    The tool for searching by concept or by language doesn't seem to be working :(

  2. Wally said,

    April 15, 2025 @ 2:10 pm

    My takeaway from reading the Great Eskimo Vocabulary Hoax many years ago was that he didn’t show that there weren’t many words for snow but rather he showed that the evidence that had been presented in favor of the hypothesis was badly flawed.

    If Geoff Pullum keeps up with Language Log and disagrees my comments come with a full money back guarantee.

  3. Pamela said,

    April 15, 2025 @ 2:57 pm

    Surely this dynamic is related to practical uses as well as "concepts"? Anglo-Saxon (and I would guess most medieval languages) had many more words for "horse" than we have. Because they had to distinguish horses by sex, age, function, color, and so on. Many cultures had more complex taxonomies for plants because they needed to know which ones were medicinal and so on.

  4. Toby Blyth said,

    April 15, 2025 @ 4:01 pm

    The whole "more words for X in language Y [than in English]" is bunkum.

    Firstly agglutinative languages don't really map well with non-agglutinative languages.

    Secondly, saying English doesn't have a word for the "smell of blood" is disproved by the very statement. We just say it in a different way.

  5. Julian said,

    April 15, 2025 @ 5:00 pm

    @Toby Blyth
    I would rather say, to put it a bit less dogmatically, "Is there any particular significance in the fact that language A expresses concept X using one word, while language B does the same using a collocation of several words?"

RSS feed for comments on this post · TrackBack URI

Leave a Comment