Sapir-Whorf redux
« previous post | next post »
In "Linguistic relativity: snow and horses" (4/15/25), I summarized and assessed the following paper:
Temuulen Khishigsuren et al, "A computational analysis of lexical elaboration across languages", Proceedings of the National Academy of Sciences (2025). DOI: 10.1073/pnas.2417304122
My post was picked up by Cody Cottier, who was doing a critique of the Khishigsuren et al. article for Scientific American. Cottier interviewed me and incorporated some of what I said to him in this review:
Linguists Find Proof of Sweeping Language Pattern Once Deemed a ‘Hoax’
Inuit languages really do have many words for snow, linguists found—and other languages have conceptual specialties, too, potentially revealing what a culture values
Scientific American (5/9/25)
Cottier begins his article thus:
In 1884 the anthropologist Franz Boas returned from Baffin Island with a discovery that would kick off decades of linguistic wrangling: by his count, the local Inuit language had four words for snow, suggesting a link between language and physical environment. A great game of telephone inflated the number until, in 1984, the New York Times published an editorial claiming the Inuit have “100 synonyms” for the frozen white stuff we lump under a single term.
Then, as we at Language Log know all too well, our colleague Geoff Pullum published "The great Eskimo vocabulary hoax" in 1991, which muted the billowing claims for a generation, but, as Cottier quotes me, “it’s coming back in a legitimate way.” Fair enough, and here is why:
In a sweeping new computational analysis of world languages, researchers not only confirmed the emphasis on snow in the Inuit language Inuktitut but also uncovered many similar patterns: what snow is to the Inuit, lava is to Samoans and oatmeal to Scots. The results were published in the Proceedings of the National Academy of Sciences USA in April. Charles Kemp, a computational psychologist at the University of Melbourne in Australia and senior author of the study, says the results offer a window onto language speakers’ culture. “It’s a way to get a sense of the ‘chief interests of a people’—what’s important to a society, what they prioritize and value,” he says, quoting Boas.
I will not repeat the methods and findings of the original PNAS paper and my Language Log post summarizing and assessing it, but will only point to three striking maps in the Scientific American article that illustrate the researchers' claim that their analysis of different themes across dictionaries of more than 600 languages that show which ones have the highest estimated proportion of references to certain concepts, in this case for "snow", "smell", and "dance", together with associated words that show the same general distribution (e.g., for "smell", "suck, rotten, ripe, pull, rub, food, climb, wet, dry, tree, nose".
Many Oceanic languages … have highly specific words for smell. In Marshallese, meļļā means “smell of blood” and jatbo means “smell of damp clothing.” This may be explained by the humidity of the rainforest, which amplifies scents.
Turning more directly to Sapir-Whorf,
Mair says this research, which he highlighted on the popular linguistics blog Language Log, helps resurrect the much-maligned idea of linguistic relativity, sometimes known as the Sapir-Whorf hypothesis. At its boldest, linguistic relativity asserts that language determines how we perceive things, causing speakers of different languages to experience the world in radically different ways (think of the movie Arrival, in which a character becomes clairvoyant after learning an alien language). But in Mair’s opinion, this study supports a softer claim: our brains all share the same basic machinery for perceiving the world, which language can subtly affect but not restrict. “It doesn’t determine,” he says. “It influences.”
Similarly, Lynne Murphy, a linguist at the University of Sussex in England, who was not involved in this study, notes that “any language should be able to talk about anything.” We may not have the Marshallese word jatbo, but four words of English do the trick—“smell of damp clothing.” It’s not that having many precise words for smell reveals mind-blowing cognitive abilities for processing smell; it’s simply that single words are more efficient than phrases, so they tend to represent common subjects of discussion, highlighting areas of cultural significance. If we routinely needed to talk about the smell of damp clothing, we’d whittle that unwieldy phrase down to something like jatbo.
Cottier ends his article with reservations about the limitations of lexical elaboration expressed by Murphy and Khishigsuren. Thus, while the door to Sapir-Whorfianisam has been reopened by a slight crack, it remains as it should be, one of linguistic relativity.
Selected readings
- Charles Kemp, Lexical Elaboration Explorer (html)
- John A. Lucy, "Linguistic relativity" (pdf)
[h.t. Hiroshi Kumamoto and Ben Zimmer]
Raphael said,
May 15, 2025 @ 4:32 pm
All this talk of "Language X has Y words for Z" made me wonder, a while ago, whether I could find any example for that in the *English* language. And after thinking about that for a while, I decided that English seems to have really a lot of words for "a group of people empowered to make decisions".
Let's see, there's house, senate, chamber, assembly, parliament, congress, convention, council, conference, committee, commission, jury, panel, board…
Michael Vnuk said,
May 15, 2025 @ 5:22 pm
Following Raphael's comment, doesn't English have a lot of words for 'fool', 'drunkenness' and 'things that you can't think of the name of' and so on?
Victor Mair said,
May 15, 2025 @ 8:05 pm
thingamabob, thingamajig, whatchamacallit, doodad, doohickey, gizmo, widget
DDeden said,
May 15, 2025 @ 6:08 pm
https://www.asianscientist.com/2014/01/in-the-lab/odors-expressible-malay-hunter-gatherer-language-2014/
Jahai, a Malayan Austro-Asiatic language rich in terms for scents, while English is rich in terms of colours.
https://www.asianscientist.com/2018/02/in-the-lab/new-language-jedek-malaysia-video/
Jedek is closely related to Jahai yet distinctive.
JPL said,
May 15, 2025 @ 8:08 pm
This line of research, using computational analyses of a large number of bilingual dictionaries, strikes me as linguistically naive, so that what is offered as significant results seems to present a muddled picture of the world, as well as making a claim that seems less than earth-shaking. The root of the problem seems to be the failure to make distinctions between on the one hand 'word' and 'lexeme', and on the other hand between 'bound form' and 'free form', distinctions that have been standard in linguistics since at least Bloomfield (e.g., his "Postulates" article), although they are not always maintained even by some linguists. It is also necessary to recognize that a sentence and a lexicon are phenomena of quite different types. There are important differences in the way these terms apply to the description of polysynthetic languages, as opposed to languages like English. If one gets clear about these distinctions one can then re-ask the question, "what is the question of interest here?" I'm sure no one here wants to read a longish comment trying to clarify these distinctions, but it can be done.
There are also problems with the uncritical use of bilingual dictionaries as data. If, e.g., one is looking at a polysynthetic language where two lexical roots are morphologically unrelated, (putting derivational relations between lexemes to one side) it won't do to simply use the English lexeme 'snow' to unify the two cases, since what you want to know is, how does that language view the relation between the senses of the two root lexemes? After all, even in the case of English, although there is definitely loose usage, one could legitimately make a correction like, "It's not snow, it's sleet." I.e., 'sleet' is not "a word for 'snow'". On the other hand, the elaboration of a semantic category by internal differentiation (e.g., expressed through compounding) is well known.
tudza said,
May 15, 2025 @ 8:20 pm
I thought the business with Eskimos and snow was a case of people not recognizing the word 1 and word 2 were word-for-snow + additional descriptive term. So one word in Eskimo and two words like "wet snow" "sticky snow" in English.
Do you have a couple of examples of these many words for snow?
Chas Belov said,
May 15, 2025 @ 9:06 pm
I believe Cantonese is rich with words for "to carry."
Jarek Weckwerth said,
May 16, 2025 @ 4:21 am
@JPL I'm sure no one here wants to read a longish comment trying to clarify these distinctions — On the contrary, this is a linguistics blog, so that is exactly what we read it for.
Dr. Decay said,
May 16, 2025 @ 4:52 am
On having many or few words for something:
I, a child the city, will use the word "tree" in a context such as: "You can park next to that tree over there." I have a friend who grew up in a rural environment and is much more likely to say: "You can park next to that oak (maple, sycamore …) over there." So is an alien linquist going to conclude that people from Chicago have only one word for those tall plants whose trunks can be used as building material, whereas people from northern Wisconsin have 3, 10, 100 different words for such plants"? Yes. Duh. Will this linguist rejoice in these "windows onto two cultures"? Maybe, but I would caution them against getting too excited.
John said,
May 16, 2025 @ 12:57 pm
Isn't there a potential problem with the nature and scope of the dictionaries used? In the earlier Language Log post there is a quote and link from the paper ("The Eastern Canadian Inuktitut dictionary in our dataset includes terms such as kikalukpok, which means “noisy walking on hard snow”…), which lead to a dictionary by Arthur Thibert (1898-1963). I found an entry for a manuscript at Trent University which appears to show he published an early (first?) version (between "Eskimo" and French) in 1932. Thibert, incidentally, was a Catholic missionary. So, a pretty old record, even if it was subsequently updated and republished.
There is also a paper below about how many of these early dictionaries were compiled and the reasons why (generally proselytising). Although Thibert would no doubt have counted as an expert in his time, it’s not clear to me how he came up with the entries and translations, or indeed how familiar he was with the varieties of Inuktitut he recorded.
https://languagelog.ldc.upenn.edu/nll/?p=68852
https://archives.trentu.ca/index.php/04-1001
https://muse.jhu.edu/pub/153/article/915063
P.S. I will give Thibert the last word by quoting his foreword to the 1970 edition of his English-Eskimo Dictionary:
'"This dictionary is the result of twenty-seven years of missionary work among the Eskimos. Chesterfield, Eskimo Point, Southampton Island, Baker Lake and Churchill were the chief headquarters from which I traveled across the Arctic, meeting the Eskimos and studying their language, ways and manners. Besides my personal knowledge, the Eskimo dictionary embodies pioneer works and manuscript essays of such emminent linguists as Bourquin and Erdmann, of Bishop Turquetil, O.M.I. and Fathers Ducharme and Fafard, O.M.I… In compiling this dictionary, I have had no other aim than to provide the Missionaries and all those who work in the Arctic or care for the social welfare of the Eskimos with a suitable tool for their task. I do hope that this work will prove beneficial to the mutual understanding of two widely different cultures of our Country." — Arthur Thibert, o.m.i.'
https://www.google.co.uk/books/edition/English_Eskimo_Dictionary/KEjYKI7fvMAC
JPL said,
May 16, 2025 @ 8:49 pm
@ Jarek Weckwerth:
Thanks for the sentiment. That sentence is in there because I started a comment trying to do that, but I realized it was going to be too long, and so I scrapped it and just went with the general comment. Maybe this weekend I'll try to comment focusing on the differences between polysynthetic languages and relatively analytic languages like English wrt the way the referents of the terms in question are related, and wrt the relation between syntax and lexicon, insofar as I understand it. It looks like if you can describe the situation more precisely, interesting questions arise, but maybe not the questions these researchers seem to be pursuing.
Michael Vnuk said,
May 16, 2025 @ 10:15 pm
Another aspect of how many words there are for a thing or a concept is the deliberate preference by some groups to actively use more words, eg certain poets, journalists, teachers, and I could add teens or others who are playing with words, and those people who want to obscure their words' meanings from others. This approach has been discussed by many observers of English (eg see 'elegant variation', 'elongated yellow fruit', 'teenspeak', slang). And it presumably applies in other languages. (I recall a native Spanish speaker telling me that she needed to vary the words in a text for an exam because she didn't want to repeat words.) Thus, to me, the diversity of words is not necessarily due to the importance of a thing or a concept in a culture.
As an example, I was a geologist for a number of years (studying, teaching, working). During that time, I don't recall seeing or hearing the word 'temblor', a synonym for 'earthquake' (derived from Spanish), in geological writing about earthquakes, only in popular science or in journalism, where 'temblor' sometimes alternated with 'earthquake'. Were the popularisers of science or the journalists more interested in earthquakes than the geologists, or were they just writing to a different style that avoided repeating words?
Rodger C said,
May 17, 2025 @ 11:46 am
I think one of the main virtues of "temblor" for journalists, especially headline writers, is simply that it's shorter than "earthquake." (And I once saw "trembler.")
Tom Dawkes said,
May 17, 2025 @ 3:08 pm
@Chas Belov.
Laurence Thompson in “A Vietnamese grammar” [p.334, paragraph 14.9: Problems of semantic range] points to the range of words corresponding to English ’carry’, including “two rather general terms”: he gives 18 words, for example “ xách” ‘carry suspended from hand [usually by a handle]
Possibly Cantonese has similar ranges.
Yves Rehbein said,
May 17, 2025 @ 4:25 pm
I can give you 18 words, to drag, carry, tow, bear, wear, port(able), bring, take, have (in hand), pack, shoulder – that's 11, easily, a few a might seem strained and the rest will be worse: to lug, come with, handle, haul – six more to go, help a fella out! To be loaded, stacked, equiped, sporting, parading, hung? You haven't set the rules straight.
JPL said,
May 17, 2025 @ 5:10 pm
'Schlep' is a good one.
Jonathan Smith said,
May 17, 2025 @ 7:34 pm
Re: 'carry' in (roughly) SEA, it's not words that kinda mean carry (which also are), but for distinct modes of carrying. E.g. Chinese languages tend to have separate verbs — by no means necessarily cognate — for 'carry at side by handle or sth' (cf. Vietnamese above), 'carry (person) on back', 'carry in arms at front (e.g. a child)', 'carry balanced load across shoulder' (generally plus other words for other 'carry on shoulder' modes), 'carry on head', 'carry between two people', etc., etc. Mand. dan1 'carry balanced load across shoulder(s)' is about as famous as such things get due to claimed relation(s) to arguably similar items in other Chinese, HM, AA. What this all means "culturally" IDK.
Richard Hershberger said,
May 18, 2025 @ 8:42 am
I am confused. Is the claim that Pullum was wrong about Inuit words for snow? Is there new lexical data unavailable to him, or did he simply miss them?
Jonathan Smith said,
May 18, 2025 @ 9:26 am
Re: "linguistic relativity" broadly, nothing to see here ever — everyone who knows 2+ languages decently appreciates that Natural Phenomena are differently Reified by different languages, sometimes in dramatic ways: as to whether this means that "language affects thought" well No Of Course Not or Yes Duh Of Course depending on how one cares to define "thought".
Incidentally, the notion that these differences don't matter for "thought" because (say) what is expressed by a single word in Language A may be got at paraphrastically in Language B misses the point — of course the differences matter, are interesting, and indeed more than "lexicon" "syntax" etc. are the very Stuff that makes languages different at all; whether "thought" is involved, again well…
Kimball Kramer said,
May 18, 2025 @ 1:08 pm
To expand on Jonathan Smith’s comment “…well No Of Course Not or Yes Duh Of Course”: It is not a matter if whether or not language affects thought. Language obviously affects thought. It is a matter of how much language effects thought. And this is we have no measure of the degree of effect and no way of quantifying it. A trivial example: I think of brown and orange as different colors. But brown is merely dark orange. I do not think of light and dark blue as different colors. But a native Russian speaker probably does, since they are different words. Even though we see the same colors. Expand this example over all the languages that have only 2, 3, etc. words for colors. Speakers of these languages must think slightly differently. Another example: We think “here, there, over there”; while the French, saying “ici, la, là-bas” divide that space differently. That is, there are many cases where we would say “here” while a French person would say “la”. We divide the distance from us differently and must think differently about it. Even though we see the same distances.
Large differences occur if we consider prayers spoken or heard by very religious persons of the same religion, very religious persons of another religion, or atheists. How can anyone claim that language does not affect thought?
Think of the affect of language (from the government) in Germany in the 1930s, and how it affected behavior (the effects include WWII and the Holocaust) or the affect of language in the U.S. in the 2020s and how it affected behavior. Can language affect behavior without affecting thought?
Kimball Kramer said,
May 18, 2025 @ 7:00 pm
In the 3rd sentence: the words "And this is" should be replaced by the word "But".