How to generate fake Chinese characters automatically

« previous post | next post »

On the otoro blog, there is another amazing article about sinograms:

"Recurrent Net Dreams Up Fake Chinese Characters in Vector Format with TensorFlow" (12/28/15)

I say "another amazing article" because, just a week ago, in "Character building is costly and time consuming" (12/22/15), we looked at a fascinating report on the vast amount of labor necessary to build fonts made up of real Chinese characters.  Basically, the latter report examined the history of Chinese characters and then explained how typographers create new fonts comprising all the characters necessary for printing books, newspapers, magazines, advertising copy, and so forth.

The article under discussion goes in the opposite direction.  Instead of telling us how to produce a font of all currently existing characters, it explains how one could go about creating an unlimited amount of hitherto unknown characters.  One might object that this is a whimsical and trivial pursuit, but if you read other posts on the same blog, you will see how it fits into the author's overall investigations that have both theoretical and practical implications for design research.  Psychologically and personally, however, there is another set of motivations that prompted the author to undertake this particular project:

As a child growing up in a mostly English speaking country, my parents would force me to attend these dreadful Saturday morning classes where I was to be taught Chinese. There would be these dictation tests where the students have to write out full passages of memorised Chinese text from a textbook, usually indirectly exposing us to Confucian moral values. We would have to spend a lot of time during the weeknights memorising passages to prepare for the test on the following Saturday. A score less than perfection is frowned upon. This would go on for years. I still have nightmares about those dictation tests. I think that’s how most children learn Chinese as well via this rote learning method around the world. Maybe in some sense, Chinese language education resembles how LSTM’s are trained to reproduce sequences from training examples.

[VHM:  N.B.:  I have added the link on LSTM.]

I have written about these dreaded tīngxiě 听 写 / 聽寫 ("dictation") tests before on Language Log.  See, for example:

"Spelling bees and character amnesia" (8/7/13)

"The future of Chinese language learning is now" (4/5/14)

At Penn, when it comes to language courses, for the first two decades of my career, I taught 3rd and 4th year Mandarin, and for the last two decades of my career, I have been teaching Classical Chinese, so I have not had to administer tīngxiě 听 写 / 聽寫 ("dictation") tests, which normally are only given during the first two years of study.  But whenever I might mention tīngxiě 听 写 / 聽寫 ("dictation") tests in passing, my students would groan and gasp, as though the tests were a nightmare from the past.  As a matter of fact, if I ever had to teach first- and second-year Chinese classes, I would not subject my students to this kind of mindless, rote memorization.  (David Moser and I have described many times on Language Log much more enlightened and benign ways to learn how to read and write Chinese [references available upon request].)

Be that as it may, I can sympathize with the author with regard to the bane of tīngxiě 听 写 / 聽寫 ("dictation") tests.  There is a certain trauma associated with them that can scar a person for life.  In the case of otoro, the trauma has been turned to fruitful use in research on the fundamental nature of the sinograms at a very deep level.

It is interesting that otoro's investigations complement the work of innovative artists such as Xu Bing.  See, for example:

"The unpredictability of Chinese character formation and pronunciation" (2/6/12)

"Chinese characters formed from letters of the alphabet" (8/20/14)

Petya Andreeva, "From Xu Bing to Shu Yong: Linguistic Phenomena in Chinese Installation Art," in Victor H. Mair, ed., Language and Ideology in Nationalist and Communist China, being Sino-Platonic Papers, 256 (April, 2015).

The Chinese character system is essentially open-ended.  With the available elements (radicals, components, phonophores, strokes), it is possible to create an infinite number of different characters.  It is curious that some of the resultant characters in otoro's experiment look very much like possible legitimate characters or alternative characters.  For instance, see here and here.

The computer code seems not to position the strokes quite correctly (yet), but other than that, many of the characters generated by the program might well be mistaken for actual characters or variants of characters.

[Thanks to Rachel Kronick]


  1. Adrian Morgan said,

    December 30, 2015 @ 7:48 pm

    I get a "Bad Gateway" error on the link, but will try to remember to try again later. Meanwhile I have to say there is nothing objectionable about whimsical and trivial pursuits, for in such is the meaning of life…

  2. Michael Watts said,

    December 30, 2015 @ 8:26 pm

    I don't understand the terminology here. A dictation test is a test of whether, hearing an audio cue, you can write it down. (And the chinese term, 听写 "listen-write", strongly suggests exactly the same idea.) In concept, there is no memorization involved at all.

    What's actually going on in these tests, and why are they called "dictation tests"?

  3. Victor Mair said,

    December 30, 2015 @ 8:39 pm

    You have to memorize how to write the characters or the phrases or sentences that the teacher reads out.

  4. Bathrobe said,

    December 30, 2015 @ 9:02 pm

    It is curious that some of the resultant characters in otoro's experiment look very much like possible legitimate characters or alternative characters.

    If you look through tables of Chinese GB standard characters you're likely to come across character after character that you can't recall ever seeing and can't even begin to place the meaning of. Otoro's creations don't seem much different…

  5. Victor Mair said,

    December 30, 2015 @ 11:13 pm

    And here are Japanese artists playing with kanji:

    Mesmerizing, captivating.

  6. The suffocated said,

    December 30, 2015 @ 11:53 pm

    Interestingly, in the first displayed image in the linked article, the sixth character on the first row, 揞, is NOT a fake character, but a real one that means "to cover something by hand". 《方言據》(c. 1600):「以手按物曰揞(烏感切),藏也,手覆也。」I'm not sure what topolect the book 《方言據》 is based on, but the word 揞 is definitely still in daily use in Cantonese.

  7. John said,

    December 31, 2015 @ 12:24 am

    The author seems to be describing 默寫 tests, not 聽寫 tests. In 默寫, you are indeed expected to reproduce a memorized passage of text with no prompts at all.

  8. Joseph Lemien said,

    December 31, 2015 @ 12:45 am

    I'd love it if you could share a few links to articles or blog posts that cover the "more enlightened and benign ways to learn how to read and write Chinese". Could you please share some of those here?

  9. Victor Mair said,

    December 31, 2015 @ 1:58 am

    @Joseph Lemien:

    Here are a few posts on the subject:

    "How to learn to read Chinese" (5/25/08)"

    "How to learn Chinese and Japanese" (2/17/14)

    "The future of Chinese language learning is now" (4/5/14)

    "Pinyin in practice" (10/13/11) (see the comments)

    "Spelling mistakes in English and miswritten characters in Chinese" (12/18/12)

  10. Victor Mair said,

    December 31, 2015 @ 10:47 am

    They are called tīngxiě 听 写 / 聽寫 ("dictation") tests / contests because the teacher or judge dictates a character, word, phrase, or sentence, and the student / contestant writes down (xiě 写/寫) what he or she hears (tīng (听/聽).

    See the descriptions and explanations here:

    "Spelling bees and character amnesia" (8/7/13)

    "The future of Chinese language learning is now" (4/5/14)

  11. Randy Alexander said,

    December 31, 2015 @ 11:39 am

    In the quoted passage, the author doesn't mention a teacher reading anything out for the student to write by ear, but rather describes reproducing whole passages of text from memory. In my experience of well over a decade of daily contact with Chinese elementary school students and teachers, this reproduction of passages is called 默写 (mo4xie3) as John points out. Confusingly, the author calls this "dictation" and also includes a picture of copying characters over and over, which is another thing entirely.

  12. AM Thomson said,

    December 31, 2015 @ 12:43 pm

    I enjoyed that "Japanese artists playing with kanji" page, and it reminded me of a 2014 music video by the band nhhmbase for their song "ichirin no hane," in which the lyrics (kanji and hiragana) fall, float and spin into view, colliding with each other and with small geometric shapes, shattering or piling up as their vectors dictate.
    Nice tune, too:

  13. JQ said,

    January 1, 2016 @ 2:11 pm

    In tingxie, you don't need to memorize the passage, but you need to memorize how to write it.

    Or rather, the point that some people seem to be missing is that the passage (or list of words/phrases) to be written is known in advance, but the teacher will jog your memory by saying it, rather than you having to memorize the whole thing as well.

    Yes, strictly speaking 'dictation' might indicate that you don't know the passage in advance and you need to write down what the teacher says. There is another name for this in Chinese but I can't remember what it is. Also my elder relatives have told me that it was mainly used for Classical Chinese, which means that if you don't already know the passage, you won't be able to write it down just from hearing it anyway.

  14. Alan Shaw said,

    January 1, 2016 @ 11:23 pm

    @ the suffocated: I recall Xu Bing relating that after publication of 天書 A Book From the Sky, the occasional phony character therein was revealed to be a real one.

  15. hardmaru said,

    January 5, 2016 @ 9:50 am

    Thanks for writing about my work, never knew I had readers here :)

    Stay tuned I'm planning to publish some new demos soon.

RSS feed for comments on this post