Devilishly difficult "dialect"

« previous post | next post »

Are some languages innately more difficult than others?  In "Difficult languages" (1/2/10), Bill Poser addressed this question from various angles.  I've heard it said that Georgian is incredibly difficult because it possesses an "impossible" verbal system, has ergativity and other features that make for "interesting" learning, and so forth.  Yet, in comparison with some of the North Caucasian languages (whose relationship to K'art'velian [or South Caucasian], the language family to which Georgian belongs — along with Svan, Chan/Megrelian/Mingrelian/Laz, is perhaps more an areal phenomenon than a genetic relationship), it is relatively simple. The North Caucasian languages have an abundance of phonemes and an even more complex grammatical system.  John Colarusso has written an excellent grammar of Kabardinian, which gives a good idea of the complexity of this Northwest Caucasian language.

But if you ask Chinese what the most difficult language is, chances are they'll tell you it is Wenzhounese.  What?  You've probably never heard of it.  Starting September 21, though, with the premiere of the TV series "Blindspot", you won't be able to avoid Wenzhounese.  In fact, the mention of Wenzhounese in this American TV series has already gone viral in China, since — as often happens — the Chinese have already pirated and subtitled (in Chinese and English) at least one episode.

It's about a woman, Jane Doe (Jaimie Alexander), who has had her memory erased and doesn't recall who she is, but — mirabile dictu — has had her mind reprogrammed in such a way that she knows Mandarin, Cantonese, and –yes! — Wenzhounese!

There's even a saying about it:

Tiān bùpà, dì bùpà, jiù pà Wēnzhōurén shuō Wēnzhōuhuà
"Fear not the Heavens, fear not the Earth, but fear the Wenzhou man speaking Wenzhounese."
(never mind that the same saying is used against other varieties of Sinitic that people perceive to be hard to understand or hard on the ears)

The translation of this deathless aphorism is from Wikipedia, which also informs us:

Due to its high degree of eccentricity, the language is reputed to have been used during the Second Sino-Japanese War during wartime communication and in Sino-Vietnamese War for programming military cipher(code)[5][6][7] Due to its unique grammar, vocabulary, and pronunciation, the language is basically impossible for any non-local to understand.

The same sentiments are reflected in the Chinese Wikipedia article on Wenzhounese and in the Wu language article as well.

Indeed, these opinions of Wenzhounese are widely shared among Chinese, and even the Wenzhou people themselves seem to be inordinately proud of how difficult their language supposedly is.

"Blindspot affirms Wenzhou dialect as China's most unintelligible" (8/19/15)

See also:

"Do You Dare Try the Devil-Language? China’s 10 Hardest Dialects" (5/21/14)

I cannot fault the Wikipedia articles for repeating these claims, since they do responsibly say that they are "reputed" and they do make a serious effort to document the "eccentricity" of Wenzhounese.  But, despite its formidable reputation in this regard, I have yet so see solid documentation of the use of Wenzhounese for intelligence purposes in war.

One major obstacle to the use of Wenzhounese for espionage is that it is not a written language.  As they did for many topolects during the late imperial and early Republican period, Christian missionaries did create a Romanization for Wenzhounese, but it was not in general circulation among the population.

Furthermore, the supposed Wenzhounese writing that we see in the "Blindspot" trailer (at 1:36) is just standard written Chinese:

èrlíngyīwǔ nián, shí yuè yī rì, bái jiē sānbǎi jiǔshíjiǔ hào, wǔlíng shì
二零一五年,十月一日,白街三百九十九號,五〇室 (I have changed the special complicated forms of the numerals used in the tattoos on the woman's body to common forms)
October 1, 2015 / 399 White Street, No. 50

When the actress (probably a dubber speaking for her) reads off the characters, it sounds to me as though it is simply Mandarin with a not too good American accent, although a few of my Chinese friends say it sounds like somebody with a southern accent pronouncing Mandarin.

By now, screenshots from the pirated version of the episode are widely available (see here, here, and here).

The screenshots show FBI linguists at their computers struggling with what is evidently some sort of written Chinese language:

Tā de yóujiàn quán shì yòng Zhōngwén xiě de
"All of his emails are [written] in Chinese."

The agents then scramble around to mobilize as many people who know Chinese as possible.

In one of the screenshots, the subtitles transcribe and translate the words of an agent thus:

Wǒmen de fānyìyuán yǒudiǎn lìbùcóngxīn
"Our translators are struggling with it."    ("have come up short"; "are powerless"; "inadequate")

The following panels identify the unknown language as a fēicháng shēngpì de fāngyán 非常生僻的方言 ("a very rare dialect") called Wenzhounese, and the final panel of this sequence adds:

Zhōngguórén chēng tā wéi "èmó zhī yǔ"
"The Chinese call it the Devil's language."

This reference to Wenzhounese in the film has occasioned much merriment among Chinese netizens, with one of them reminding us that Wenzhounese was allegedly "used during World War II to prevent the occupying Japanese from understanding communications, much as the US used Native American code talkers in the Pacific Theater."

Further coverage of the hype over the film in China is presented in this Time article by Alissa Greenberg:

"Why China Is Going Crazy Over NBC’s New Crime Drama, Blindspot" (8/19/15)

I have a little bit of a problem with how the Chinese subtitles render "the Devil's language" as èmó zhī yǔ 惡魔之語.  A more common way to refer to Wenzhounese in this sense would be guǐhuà 鬼話, such as in this minor variant of the proverb quoted above

tiān bùpà dì bùpà, jiù pà Wēnzhōurén jiǎng guǐhuà
天不怕地不怕, 就怕溫州人講鬼話
"Fear neither heaven nor earth, fear only a Wenzhounese speaking guǐ-talk."

Guǐ 鬼 is notoriously difficult to translate into English:  "ghost; demon; devil; apparition; spirit" — none of them quite work in the collocation guǐhuà 鬼話, which can also connote "lie; nonsense".

We have studied the semantics and cultural implications of guǐ 鬼 before, e.g.:

"Laowai: the old furriner" (4/9/14)

"Is Cantonese a language, or a personification of the devil?" (2/9/14) (without direct citation of the term guǐ 鬼)

Nor is this the first time we have encountered Wenzhounese on Language Log.  In fact, we have already scrutinized it rather thoroughly in these posts:

"Devil-language " (5/25/14)

"The enigmatic language of the new Windows 8 ads " (5/14/13) (especially toward the end of the post)

"Mutual unintelligibility among Sinitic lects " (10/5/14)

"Mutual intelligibility " (5/28/14)

One last thing to clear up is how to account for Wenzhounese being known to the screenwriters for "Blindspot".  Simple: there are a lot of Wenzhounese speakers in New York City!  And plenty of direct flights from New York City to Wenzhou.

Seriously, though, if Wenzhounese is "devil talk", then there are plenty of devils living among us.  How weird is Wenzhounese, after all?  As quoted above, "Due to its unique grammar, vocabulary, and pronunciation, the language is basically impossible for any non-local to understand."  Well, the same could be said of hundreds, if not thousands, of other mutually unintelligible varieties of Sinitic.  If we have been bedeviled by anything, it's not by Wenzhounese or any of the other countless Chinese topolects, but rather by the very notion of "dialect" when applied to different languages in China, a lamentable problem in Chinese linguistics to which I have called attention time and again.

[h.t. Charlie Clingen and John Rambow; thanks to Peter Golden, Fangyi Cheng, Xiuyuan Mi, Rebecca Fu, Toni Tan, and Maiheng Dietrich]


  1. Guy said,

    August 20, 2015 @ 11:56 am

    "Due to its unique grammar, vocabulary, and pronunciation, the language is basically impossible for any non-local to understand."

    This reminds me of the time I went to Germany and everyone was speaking in these weird nonsense noises I couldn't even understand. How they expect foreigners to understand them without becoming familiar with their local way of speaking is beyond me. I think they do it just to be difficult.

  2. Victor Mair said,

    August 20, 2015 @ 11:58 am

    From John Colarusso:

    I am quite surprised to find that some one has read one of my grammars.

    I see complexity as being made up of the number of morphemes that must be included in any grammatical form, either noun, verb, or even adjective, etc., and also the number of choices that can be included, that is to say, the number of elaborations or alternates that the speaker can use in a given structure. So, I do not think a judgment call of this type is all that subjective. The issue is much less clear when one goes to the level of discourse and narrative richness. At that level other factors emerge that tend to level the playting field among Ls.

    I can send you a copy of one of my Kabardian grammars if you want some reading for a rainy afternoon. Kabardian is phonologically the simplest member of the family. It has 48 Cs and 3 Vs. 48 Cs is the maximum for North American Indian languages. These otherwise can rival the NWC Ls for grammatical richness in the terms that I have posited, but they lose out phonologically. Mohawk, for example, is quite rich, complex, but has only a handful (8 or 9) Cs and 4 o5 5 Vs. The Salishan Ls of the NWS and BC come close to the NWC Ls in phonology. In 2010 we were on vacation in Whistler, BC, and stopped by the Native Museum. There I traded phrases with a Lushootseed speaker. He was quite delighted to hear Circassian and thought it a kindred L.

    I have a reader in all of these (including Ubykh) that I hope to send off to LINCOM this morning.

  3. Guy said,

    August 20, 2015 @ 12:23 pm

    "I see complexity as being made up of the number of morphemes that must be included in any grammatical form, either noun, verb, or even adjective, etc., and also the number of choices that can be included, that is to say, the number of elaborations or alternates that the speaker can use in a given structure. So, I do not think a judgment call of this type is all that subjective. "

    I think that's true if we adopt this definition of complexity (modulo questions about what we consider a "form"), but we need to be careful before we equate this measure of complexity with "difficulty". A complicated inflectional paradigm doesn't tell us anything about how prevalent irregular forms are, nor does it tell us about complexity of the syntax of the language at the phrase level. A person learning English might encounter any number of difficulties that make learning English approximately as difficult as learning another language, but this likely has little to do with serious difficulties with English's fairly impoverished inflectional system.

  4. J. W. Brewer said,

    August 20, 2015 @ 3:46 pm

    The immediate problem I have with that attempt at a definition of complexity is that I don't know how it handles a language like Mandarin which in over-literal word-for-word translation often sounds very cryptic/elliptical/ambiguous, because it seems to omit words-or-morphemes that would be obligatory to make the equivalent English expression grammatical. The usual explanation is that "oh, it's not inherently vague because a native speaker can reliably figure out all the not-explicitly-expressed information from context" but it's at least as hard for a non-native speaker to master that sort of mentally-filling-in-the-missing-parts skill as it is to master a complex-but-regular morphological system that requires explicit marking of lots of things that might be optional in other languages.

  5. un malpaso said,

    August 20, 2015 @ 4:16 pm

    My suspicion is that most languages are easier or harder to learn simply in direct relation to their similarity to the speaker's native language. Or is this just naive wisdom? lol.

    I traveled to Georgia several times and lived there once for 6 months, and in my limited experience I didn't find Georgian any worse than Russian verbally. It was much, MUCH easier regarding its case system. The phonemes are, however, a little gnarly for an Indo-European to get a hold of. I think it helped that I had an interest in languages and have had some training in linguistics, and it definitely helps to have an propensity to listen, mime sounds, and repeat.

  6. Tsu-Lin Mei said,

    August 20, 2015 @ 4:17 pm

    My friend Pan Wuyun, a distinguished scholar in historical phonology and comparative Wu, is a native speaker of Wenzhou. ZhengZhang Shangfang, an equally distinguished scholar in historical phonology and historical dialectology, is also a native speaker of Wenzhou. ZhengZhang's descriptive Wenzhou phonology in the 80's gave us the first glimpse into Wenzhou. Now we have "Wenzhou fangyan cidian" (A dictionary of the Wenzhou dialect) , which is very good and quite complete.
    I am a native speaker of Peking Mandarin and Shanghai. When I speak with Pan Wuyun, I have no difficulty; we speak in a mixture of Mandarin and Shanghai. When I speak with ZhengZhang, I have a great deal of difficulty. He speaks every dialect with a thick Wenzhou accent. If you speak a Northern Wu dialect, Shanghai, Ningbo, Suzhou, etc. you can understand all the Northern Wu dialects. Wenzhou is the most prominent member of Southern Wu, and there are far fewer speakers of Southern Wu, and that is probably the reason is why it is so devilish.
    I have a great deal of difficulty in understanding Southern Min. But Southern Min has a large number of speakers. That is why it is not so devilish.

  7. Pat Barrett said,

    August 20, 2015 @ 4:39 pm

    Since John McWhorter is on this blog and has written extensively about the comparative difficulty of various languages and/or complexity, I'm surprised there are no references to his works. On a listserv, the dean of Urdu literary studies in the U.S. told me that Urdu is relatively simple, but when I pointed out she might be referring to morphological complexities rather than those that bedevil me with Urdu, she agreed she was thinking of languages like Greek and Latin.

  8. Bathrobe said,

    August 20, 2015 @ 5:09 pm

    I've never actually seen or read anything to objectively substantiate or illustrate the difficulty of Wenzhounese, merely assertions of its difficulty for speakers of other dialects (topolects). But that suggests differentness more than difficulty. A few examples to show HOW Wenzhounese is so hard wouldn't go astray here (e.g., incredibly complex grammar, phonology, lexicon).

  9. Guy said,

    August 20, 2015 @ 6:06 pm

    @Pat Barret

    To elaborate on my previous comment, spurred on by yours, I think there's a tendency to equate simple inflectional morphology with simple grammar, but would the variation between forms like "would be eating", "ate", and "had been eaten" really be made more difficult if marked by inflections on a single verb, as it would be in many languages, than expressed analytically, as in English? In languages which complicated case marking, is learning what verbs specify what case really much more difficult than remembering what English verbs specify what propositions, and undergo what alternations? Knowing that we say "mad at" but "angry with", "I broke it" v. "it broke" shows "ergative" alternation but not "I killed it" v. "I killed". We have "I told him it"/"I told him"/"I told secrets", but "I sold him it"/"I sold it"/"I sold to him" – requiring addition of "to" for the last, and we also have "the car sold" but not "the secrets told". These are just some examples of arbitrary "rules" a learner of English needs to memorize that can't be easily captured by naïve measures of grammatical complexity.

  10. GeorgeW said,

    August 20, 2015 @ 7:47 pm

    As others have suggested, I don't think the complexity necessarily equals difficulty (universal difficulty). It seems to me that difficulty is relative to the dissimilarity of known languages and maybe a measure of complexity.

  11. Matt said,

    August 21, 2015 @ 1:44 am

    "Due to its unique grammar, vocabulary, and pronunciation, the language is basically impossible for any non-local to understand."

    This reminds me of the time I went to Germany and everyone was speaking in these weird nonsense noises I couldn't even understand. How they expect foreigners to understand them without becoming familiar with their local way of speaking is beyond me. I think they do it just to be difficult.

    I enjoyed reading this comment but I think, to be fair to the original statement, the implied comparison is with cases where there is a smoother dialect continuum. If it's generally possible for people from neighboring regions of China to understand each other's dialects to a certain extent, but people in the regions neighboring Wenzhou can't understand Wenzhounese at all, that makes Wenzhounese a notable outlier. (I don't know how true either of these two premises are, mind you.)

    "Fear not the Heavens, fear not the Earth, but fear the Wenzhou man speaking Wenzhounese."

    I'm not sure I get this — even as a joke, why fear someone just because you don't understand their (non-privileged) language?

  12. KeithB said,

    August 21, 2015 @ 9:40 am

    What makes a language more difficult? The phonemes or the grammar?

    After all, no other folks that I know use the ! like the bushmen of the Kalahari.

    Or NPR had a story yesterday about a whistled version of Turkish that is dying out.

  13. tsts said,

    August 21, 2015 @ 10:29 am

    "And plenty of direct flights from New York City to Wenzhou."

    This is probably wrong. AFAIK there are no direct flights between any US city and Wenzhou.

  14. Victor Mair said,

    August 21, 2015 @ 11:44 am


    Fixed now.

    I'm not very good at reading advertisements.

  15. Rohan Fenwick said,

    August 21, 2015 @ 2:41 pm

    I'd actually argue that, despite the phonological complexity, North-West Caucasian (NWC) is relatively rather more straightforward – though not simpler as such – than Kartvelian. (It's best to call it Kartvelian, by the way, Professor Mair; there are no ejectives in the Georgian word kartveli and the use of apostrophes to mark non-phonemic aspiration in Georgian is a holdover from early 20th-century linguistics, parallel to its use in the Wade-Giles romanisation for Mandarin.) I work intensely with Ubykh, the most phonologically complex of the NWC languages (my grammar was published by Lincom in 2011, for those who are interested), and in my experience although NWC is very complex, it's not overly complicated. The NWC languages are, in the main, essentially straight ergative with the sole exception of personal pronouns (and in Abkhaz-Abaza there's no morphological case-distinction between ergative and absolutive at all). The verb lends itself very well to a templatic analysis, they have preverbs that are almost entirely semantically driven (much of what's done in other languages with adpositions is done in NWC by local preverbs), they have little in the way of the kind of productive intermorphemic morphophonology you see in languages like Navajo, and despite the 80-odd simple consonants seen in Ubykh, consonant clusters of more than two segments are rare (up to four pop up very occasionally in Abkhaz). The immense polypersonalism is unique, admittedly. Agreement for subject, object, and indirect object is normal and four-way agreement's possible, though Ubykh and literary Abkhaz seem to prefer to avoid it (the Abaza lects are more tolerant). But apart from a simple ability to aggregate an unusually large number of non-lexical affixes on a single lexical root – I count about 26 potentially distinct affix positions in Ubykh finite verbs – and an unusual richness of non-finite verb forms, there's not much more to it in terms of complication across the family.

    Kartvelian, on the other hand, exhibits not just ergativity, but split ergativity; it also has Suffixaufnahme, the presence of often lexically specified "version" vowels, preverbs and thematic suffixes that often carry little semantic content and whose presence or absence may be dictated by verb tense and aspect (and therefore also related to ergativity), and a great deal of both predictable and unpredictable morphophonological alternation, particularly within Svan; there's productive ablaut across the family compounded with umlaut in Svan and some Georgian dialects. Kartvelian languages also possess a strongly-developed system of grammatical active and passive voice, mostly absent within NWC, and tense, aspect and mood affixation have a much more complicated interaction in Kartvelian than they do in NWC. Finally, many special peculiarities or irregularities are exhibited by small sets of roots (for instance, only some verbs possess passive/active distinctions in the masdar or verbal noun, and only a handful use the thematic suffix -ol- in the masdar and future tense). That's even without the insanity of the consonant clusters in gvbrdɣvnis "he's plucking us" and ancxls "to the bad-tempered [one]". There are strict rules governing large-cluster structure, and Marika Butskhrikidze's work on that is really fascinating, but still.

    Winfried Boeder's introduction to the South Caucasian languages in Lingua (vol. 115, pp. 5-89) is an excellent overview of what's going on in Kartvelian as a family.

  16. michaelyus said,

    August 26, 2015 @ 10:24 am

    The 天不怕, 地不怕,只怕… is quite a well-known trope. Cantonese vs Mandarin enmity is the most common in my opinion; a laowai spin was famously uttered by an Australian politician. [The Cantonese version uses 驚 and 唔正 to preserve the rhyme.] I think the fear element is somewhat bleached here, and is a general indicator of dislike (compare the semantics of 恐怕 and the word "dread" in English). But I can conceive of situations where language incomprehension gives rise to fear.

    Phonologically, I can think of several features that make the southern Wu area greatly deviant from many other southern Sinitic topolects. One is the tone sandhi process, which has a reputation for being extravagantly complicated and quite different from the sandhi systems of the surrounding Northern Wu and Eastern Min forms (which in themselves are pretty fiendish).

    The fact that words that etymologically having 入声 in Middle Chinese not only do not end in any stop consonant, but can seem to sound longer than other tone, throws off speakers of surrounding topolects used to the clipped sound of lexemes with the 入声. Additionally, where southern Wu furnishes 上声 lexemes with some glottalisation or at least vowel shortening, whereas in all the other Sinitic varieties I've encountered no such glottal stop exists in 上声. This swapping of features from one "tonal" class to another is quite disconcerting.

    As an inheritor (rather than a true speaker) of Min and Yue varieties, my personal "that's really weird" thing is that fact that final -n of Middle Chinese is simply lost, rather than being merged into some other nasal ending. I'm aware many northern Wu varieties do something similar, but it just catches me out a lot.

RSS feed for comments on this post