Tangut workshop at Yale

On the weekend of January 19-20, 2018, there was a Tangut Workshop at Yale University.  Organized by Valerie Hansen and sponsored by the Yale Council of East Asian Studies, this was an intense, exciting learning experience for the 35 or so people who were in the room most of the time.

Many readers may be scratching their heads and asking, "Tangut?  What's that?  And why should we at Language Log be concerned with it?"

The Tangut were a Tibeto-Burman-speaking people whose name first appears in the Old Turkic Orkhon inscriptions of 735.  Sometime before the 10th century, the Tangut moved to Northwest China where they founded the Western Xia / Xixia or Tangut Empire (1038–1227).

I have long been interested in the Tangut because of their complicated Siniform script.  It looks sort of like Chinese characters (square shaped logographs, similar brush strokes, etc.), but even more complicated.  Many people who encounter Tangut script for the first time joke that the Tangut, while seeming to borrow the basic structural principles of Chinese characters, tried to outdo the Chinese by making their characters more dense and complex.

As the renowned Turkologist, Gerard Clauson, put it:

The [Tangut] language is remarkable for being written in one of the most inconvenient of all scripts, a collection of nearly 5,800 characters of the same kind as Chinese characters but rather more complicated; very few are made up of as few as four strokes and most are made up of a good many more, in some cases nearly twenty. It is extremely difficult to remember them, since there are few recognizable indications of sound and meaning in the constituent parts of a character, and in some cases characters which differ from one another only in minor details of shape or by one or two strokes have completely different sounds and meanings.


Here's what the name of the Tangut state looks like in Tangut characters:

phôn¹ mbın² lhi̯ə tha²

See Ruth Dunnell, The Great State of White and High: Buddhism and State Formation in Eleventh-Century Xia (Honolulu: University of Hawaii Press, 1996)

For those who want to fill in the gaps in their basic knowledge about Tangut, here are some relevant Wikipedia articles:

  • Tangut people, an ancient ethnic group in Northwest China, not Tibetan people.
  • Tangut language, the extinct language spoken by the Tangut people, not Tibetan language.
  • Tangut script, the writing system used to write the Tangut language
  • Tangutology, the study of the culture, history, art and language of the Tangut people
  • Western Xia (1038–1227), also known as the Tangut Empire, a state founded by the Tangut people

Beside the script, another aspect of Tangut language that has intrigued me is the fact that it exists in two registers.  These are lhwe and mi.  Nikita Kuzmin, a budding Tangut specialist who was present at the Yale workshop, states:

The majority of Tangut texts (dictionaries, sutras, translations) were written in mi register (which has more or less been researched). Only Tangut odes were written in lhwe register, therefore it is sometimes called "odic language". Despite the fact that these two registers were expressed in the same Tangutgraphs, the syntax, grammar, and lexicon are different, which creates problems in translation. A leading Chinese scholar in Tangut studies, Nie Hongyin 聂鸿音, points out that lhwe is a different type of language, hence a Russian scholar Ksenia Kepping (Ксения Кеппинг) supposes that it is Tangut ritual language (probably the dichotomy lhwe – mi can be compared to wenyan [literary] – baihua [vernacular] in Sinitic).

More tentatively, Nikita feels that lhwe might be a separate language altogether, perhaps Ural-Altaic or even Turkic, since the ruling house of Xixia claimed to be descendants of Tuoba / Tabgach.  At this point, these are only speculations, so we should not hold Nikita responsible for asserting them as fact, although this is an area in which he intends to undertake future research.

The proceedings at the Yale workshop were mostly philological / Tangutological in nature, with masterful presentations by Kirill Solonin (who teaches at Renmin [Peoples] University in Beijing).  He walked us through the structure of the characters, the archeological discovery of the manuscripts and their contents, Tangut religion, and the history of Tangut Studies.  Most exciting of all, Kirill also guided us through the reading of several texts, commenting on the grammar and phonology of the language as he went.

There were also talks on Tangut art by Yong Cho and on Tangut household registers by Xin Wen that nicely complemented lectures and demonstrations by Kirill.  Since it was almost nonstop from start to finish, there was never a chance for one's attention to flag.  This was a typical Valerie Hansen workshop (she has presided over a whole series of such events dedicated to Central / Inner Asian and Silk Road Studies at Yale), jam-packed with the most authoritative speakers and audiences deeply involved with the subject matter at hand.

A somewhat more leisurely exemplar was the "Kitan-Language Crash Course" held at Yale from May 11-19, 2016.  Also spelled Khitan, this script and the people who wrote it bear many similarities to Tangut, though Kitan is a Mongolic language (or perhaps more accurately, as Juha Janhunen would classify it, a Para-Mongolic language), whereas Tangut is, as we have seen, Tibeto-Burman:

Between the 10th and 12th centuries, the nomadic Kitan dominated a large swath of Mongolia and Manchuria and created the Liao Dynasty, a rival to China’s Song Dynasty. The Kitans, who were the first people to make Beijing one of their capitals, originally did not have their own writing system. After the founding of their empire, the Kitans saw the need to invent their own script to define the Kitan identity. They created two scripts by borrowing from the Chinese and Uighur languages.


The Khitan language maven at the May, 2016 workshop was Daniel Kane, Professor of Chinese at Macquarie University in Sydney, Australia.

Here are relevant Wikipedia articles on Khitan:

Incidentally, it is from the Khitan that we get our word "Cathay" and Russian gets the word "Китай" for "China".

Free pdfs of two articles on Siniform scripts that discuss Tangut and Khitan:

A schedule of the Yale Tangut workshop may be found here.

YouTube videos of the Tangut workshop may be accessed here.

YouTube videos of the Yale Kitan workshop may be accessed here.


  1. David Marjanović said,

    February 3, 2018 @ 8:33 am

    One intriguing thing about the Tangut script is that the characters can't be divided into radicals and phonophores the way most Chinese characters can. Rather, characters with related meanings share one of their parts; rearrangements of these parts, each of which is in a way a "radical", were used to create characters.

    perhaps Ural-Altaic or even Turkic, since the ruling house of Xixia claimed to be descendants of Tuoba / Tabgach.

    A Ural-Altaic family hasn't been proposed in close to a hundred years; there are few if any indications that the uncontroversial Uralic family is particularly closely related to the highly controversial Altaic family or any of its proposed constituents (Turkic, Mongolic, Tungusic, Korean, Japonic). Last time I read anything about it, Tabghach was thought to be another Para-Mongolic language (i.e. closer to Mongolic than to any other extant family, but not descended from the last common ancestor of all of today's Mongolic languages).

  2. Andreas Johansson said,

    February 3, 2018 @ 10:20 am

    A funny thing about Ural-Altaic is its staying power in popular references. In the early 1990s, I had a school atlas whose linguistic map showed it as if as uncontroversial as Indo-European or Semitic. It must have been printed in the late '80s or very early '90s (before the dissolution of the USSR).

  3. Andreas Johansson said,

    February 3, 2018 @ 10:32 am

    Speaking of funny and school atlases, my father has a German school atlas from the '30s which, apparently on the idea that Ural-Altaic is a racial as well as linguistic unit, classifies the Finns as of Mongolic race.

  4. cliff arroyo said,

    February 3, 2018 @ 12:40 pm

    Do some Tangut characters look (to literate Chinese) like potential or possible Chinese characters the way 'lorbic' looks like a potential English word (AFAIK it isn't) or do they all look like txwvki or qlmnurei would look to English speakers (nonsense collections of random letters)?

  5. S Frankel said,

    February 3, 2018 @ 12:53 pm

    I bet the Finns stopped being Mongolic and turned "Aryan" in 1941.

  6. Dan said,

    February 3, 2018 @ 2:49 pm

    Cliff, it's more like txwvki or qlmnurei. Actually a better comparison would be how English speakers would view written Russian in the Cyrillic alphabet. The Tangut characters use some similar strokes, but they are put together in ways that never occur in Chinese characters.

  7. B.Ma said,

    February 3, 2018 @ 3:08 pm

    The Khitan script gives me the shivers – it just looks *wrong* like an M.C.Escher drawing or a Penrose triangle.

    It perhaps could be likened to an English speaker trying to read Scots transcribed into the Cyrillic alphabet reflected in a mirror. Or an English speaker who has grown up without ever coming across the concepts of accents, dialects or other languages, hearing West Frisian spoken for the first time.

  8. cliff arroyo said,

    February 3, 2018 @ 4:44 pm

    "Actually a better comparison would be how English speakers would view written Russian in the Cyrillic alphabet"

    Thanks! It might sound suspicious but that was the third possibility, which I didn't think of until shortly after hitting the submit button…

  9. Victor Mair said,

    February 3, 2018 @ 7:37 pm

    From Jichang Lulu, remarkable resources for the study of Central and Inner Asian scripts:

    Today's date in Tangut:

    (See the Tangut fonts!)

  10. Christopher Atwood said,

    February 3, 2018 @ 8:30 pm

    Thanks for the nice write-up of the workshop, Victor. If Lhwe were to be related to the Tangut royal family's claimed Tabghach ("Tuoba") than it would expect it to be in the Mongolic family, since Tabghach was certainly Mongolic (or Serbi-Mongolic, as Andrew Shimunek calls it in his new big study). But I raised that possibility with Kirill directly, and he didn't think it very likely. (And Tabghach seems close enough to what we know of other Mongolic languages, that I doubt that if it was related to Tabghach that that wouldn't be obvious to scholars by now.) So I guess Lhwe is probably not related to Tabghach and not Mongolic. Just another big mystery to solve!

  11. Andreas Johansson said,

    February 4, 2018 @ 4:55 am

    Zhou Youguang, " The Family of Chinese Character-Type Scripts: Twenty Members and Four Stages of Development", Sino-Platonic Papers, 28 (September, 1991), 1-11

    Is this article truncated? The last section begins with saying there are six syllabic and two phonemic alphabets, lists the former set, and never gets around to the phonemic ones.

  12. Guillaume Jacques said,

    February 4, 2018 @ 6:23 am

    Here are a few recent articles on the Tangut language, including the etymology of the Tangut imperial name:


    A more complete bibliography is available here:

    On Tabghach, see:

  13. SO said,

    February 4, 2018 @ 8:05 am

    @A.J.: You're right, the original paper this English version is apparently based upon* goes on for a little more and adds a few paragraphs on the phonemic ones, namely: Korean han'gŭl and bopomofo.

    * I.e. Zhōu Yǒuguāng 周有光, "Hànzì wénhuàquān de wénzì yǎnbiàn" 汉字文化圈的文字演变 [Script developments in the sinographic cultural sphere], Mínzú yǔwén 民族语文, 1989.1, 37–55.

  14. Wolfgang Behr said,

    February 4, 2018 @ 8:09 am

    Thanks for this nice report, Victor. Just a quick note on the alleged "complexity" of Tangut graphs, which keeps reccurring in discussions of this writing system here and elsewhere. —
    As Andrew West has nicely shown in 2009, the complexity of Tangut graphs in terms of the number and distribution of strokes per character is in fact considerably _smaller_ than that of unabbreviated (fanti) characters in Chinese, somewhere close to abbreviated (jianti) characters.


    This doesn't mean that Tangut writing as a whole is less complex than Chinese, of course, but that its complexity has more to with a less transparent component structure and, possibly, phono-semantic division of labour, and less with the sheer density of strokes. If complexity is measured by a "length of description" (or a more sophisticated "Shannon entropy") kind of approach, I bet Tangut would come out still more complex than Chinese. (Or, for that matter, that we still haven't figured out what the compositional principles of its graphs _really_ are).
    To the sources Jiachang Lulu has mentioned, let me add Guillaume Jacques' fantastic _Esquisse de phonologie et de morphologie historique du tangoute_ (Leiden: Brill, 2014), which, along with Marc Miyake's blog on amritas.com, is by far the best source on the language I am aware of and one of the finest books on an ancient Tibeto-Burman language ever.

  15. Victor Mair said,

    February 4, 2018 @ 8:51 am

    @Andreas Johansson

    Thanks for reading Zhou Youguang's article in SPP so closely that you sensed it must have been truncated.


    Thanks for telling us exactly what ZYG's SPP article was based on and what was omitted from the original Chinese article.

    I went down to my basement storehouse and examined the paper version of ZYG's SPP paper (I hadn't looked at it for more than ten years) to verify that it is identical with the electronic version; they are the same.

    I have a vague recollection that, when ZYG was finishing the English adaptation of his paper, I asked him why he left off Hangul and Bopomofo, and he said something to the effect that they were outliers to the Family of Chinese Character-Type Scripts, so he decided to leave them out.

  16. Victor Mair said,

    February 4, 2018 @ 10:47 am

    Guillaume and Wolfgang,

    Many thanks for the valuable references and remarks.

    When all is said and done, a typical Tangut text looks denser than a typical Chinese text. Here are a few quotations from the article on the complexity of the Tangut script by Andrew West cited by Wolfgang:


    Appears crowded compared with Chinese, with few non-complex characters
    Most characters composed of two or three distinct components, and only a few characters are themselves elemental components

    A large proportion of high frequency Chinese characters have very few strokes (e.g. 一二三人女山火水大小中), and conversely Chinese characters with very many strokes tend to occur less frequently, with the result that normal Chinese text always has a large proportion of characters with few strokes. In contrast to the situation with Chinese, there does not appear to be any relationship between frequency and stroke count for Tangut characters, so that normal Tangut text is uniformly composed of characters with 12±6 strokes, with the result that it appears denser and more crowded than Chinese.

    many character components are very similar to each other, and when two or three such similar components… are combined together in different combinations to make different characters…, the results are confusingly confusable. [VHM: emphasis added]


    (For a palpable, visceral sense of the density of strokes of typical Tangut texts, see the samples illustrated in Andrew West's article.)

    The frequency weighted average number of strokes of characters in typical Chinese texts can get down as low as around 7.2.

    On the other hand, the Chinese writing system as a whole has numerous monstrosities with as many as 64 strokes. Take a gander at this one:

    XX (this character doesn't show up in WordPress, so I'm substituting "XX" for it)

    At normal font size, this will just be a black blob, so if you want to see how it is composed, you'd better magnify it three or four hundred per cent. It consists of four dragons — lóng 龍 — each with 16 strokes.

    This XX is pronounced zhé and means "loquacious; verbose" (I guess somebody thought dragons are talkative when they made up this freakish character).




    XX is a variant of 讋 (simpl. 詟)




    In consequence of the above outlined characteristics of the Chinese writing system as a whole, the average number of strokes in a Chinese character is around 12, but in typical running text, there are many characters with fewer strokes: 一的他目木母尼牛舌上下內如入女. Moreover, many of these characters having few strokes are of relatively high frequency, so a typical Chinese text looks far less dense than Tangut texts, where you will find few characters with less than nine strokes but many with twelve or more strokes.

  17. Andreas Johansson said,

    February 4, 2018 @ 11:49 am

    @SO, Victor Mair:

    Thanks :)

  18. Jichang Lulu said,

    February 5, 2018 @ 9:47 am

    Life used to be easier. Latin, Classical Chinese (q.v.) &c. and you were OK. Millennially, you can hardly be seen socially without your Tangut shirt. (The characters (which might not display properly) are (mji ·jwɨr dji (Li Fanwen via West) or 2mi4 1wyr4 2di4 (Miyake) or ミ ウィル ンディ mi wir[u] ndi (Messrs Mojimoji, purveyors of Tangut shirts)) and mean 'Tangut script'.)

  19. David Marjanović said,

    February 6, 2018 @ 5:06 am

    A funny thing about Ural-Altaic is its staying power in popular references. In the early 1990s, I had a school atlas whose linguistic map showed it as if as uncontroversial as Indo-European or Semitic. It must have been printed in the late '80s or very early '90s (before the dissolution of the USSR).

    Same in mine, which dates from 1993 (USSR dissolved on almost all maps, Czecho-Slovakia not yet dissolved).

    Has anybody yet tried (and stayed sane in the process) to find out if Lhwe is Yeniseian…?

  20. Andrew West said,

    February 6, 2018 @ 8:02 am

    David, here are the "Lhwe" numbers for two through seven for you to do some research with (eight is phonetically identical to the ordinary Tangut word for "eight", but written with a different character, so is probably a loan; and nine is not attested):

    2 = lọ (Miyake 2loq1)
    3 = lhejr (Miyake 2rer1)
    4 = ŋwər (Miyake 1ngwyr1)
    5 = tśjɨ̱r (Miyake 2chyr'3)
    6 = we (Miyake 1vi1)
    7 = ŋwər (Miyake 1ngwyr1)

    You will notice that the words for four and seven are phonetically identical according to current understanding of Tangut phonology.

    These numbers are all written with special Tangut characters (which I omitted because Language Log does not cope with astral Unicode characters), distinct from the ordinary Tangut number characters. Nikita Kuzmin's statement that "these two registers were expressed in the same Tangutgraphs" is not correct, as the two "registers" were written with largely different sets of characters (although some characters were used for both). Indeed, a large proportion of Tangut characters are only attested in the five surviving Odes (as well as in dictionaries and other linguistic or phonetic texts).

  21. A-gu said,

    February 6, 2018 @ 4:48 pm

    Really appreciate the video links!

  22. David Marjanović said,

    February 6, 2018 @ 5:13 pm

    No similarity to Yeniseian. I found the numbers 1–10 in 5 Yeniseian languages here: click on "List" and then search the page (Ctrl+F) for "Yeniseian".

  23. S Frankel said,

    February 6, 2018 @ 5:21 pm

    Ditto appreciation for the videos. Too bad the chalkboard was mostly invisible (off camera or behind somebody's head). You could count the stroke by sound, at least.

  24. Victor Mair said,

    February 9, 2018 @ 2:31 am

    From Petya Andreeva:

    With regard to the growing interest in Tangut, it is perhaps worth sharing one really wonderful resource which I was made aware of just now. The website contains numerous useful reference materials, dictionaries, shortcuts to databases, 同音 lookup tools, the 1999 catalogue by Kychanov etc. Here it is:


    One can even find the whole index to Clauson's "Skeleton Tangut Dictionary" (the whole book is available on Amazon).


    I think some of these materials could be very useful not just for the study of Tangut but also for those interested in visual materials, as some interesting Buddhist images are available in the catalogues.

