Pinyin for phonetic annotation

« previous post | next post »

One more reason for me to love Wikipedia.

I just noticed in this article on Chinese honorifics that some example sentences are phonetically annotated with Pinyin.  Not only that, it observes properly spaced word division, which must be technically difficult to achieve.  Furthermore, the Pinyin annotations are appropriately small, yet clear.

I don't know how widespread this usage has become in Wikipedia or elsewhere, but I can tell you that learning about it this morning brought me great joy.

It turns out that this joyful discovery is a WP built-in annotation markup.

Many WP articles use ruby-zh-p.

Now somebody should figure out how to make this wonderful tool available to users outside of Wikipedia.

For more than three decades, I've been imploring the relevant educational, publishing, and cultural authorities to make this type of phonetic annotation system available in China.  As I have noted on numerous occasions, I learned much of my Chinese through bopomofo (Mandarin Phonetic Symbols) annotation.


"How to learn to read Chinese" (5/25/08)

"The future of Chinese language learning is now" (5/5/14)

"Bopomofo vs. Pinyin" (4/28/15)

"The end of the line for Mandarin Phonetic Symbols?" (3/12/18)

"Another use for Mandarin Phonetic Symbols" (3/29/18)

"Writing Sinitic languages with phonetic scripts" (5/20/16)

"Pinyin in practice" (10/13/11), esp. these two comments (here and here)

"Phonetic annotation of Chinese characters" (10/15/12)

"Pinyin for the Prez" (10/25/18)

"The uses of Hanyu pinyin" (5/22/16)

[Thanks to Michael Carr]


  1. Antonio L. Banderas said,

    October 27, 2018 @ 9:33 pm

    I suppose by "outside wikipedia" you're referring to its use in blogs posts; otherwise, I'd long heard about fonts for document editors displaying pinyin atop, even with proper spacing for words:

  2. Alex said,

    October 27, 2018 @ 10:25 pm

    It's only a matter of time. I think the momentum is picking up. It's not only technology causing character amnesia but also slowly I'm seeing parents are getting it. If the 35 to 45 year old parents are beginning to understand the fundamental flaw with characters and their added burden, I'm sure when the kids who are 15 now have kids in a decade or so a change will happen. Perhaps my kids had to have to endure some suffering but I have hope their kids won't.

  3. John Roth said,

    October 27, 2018 @ 10:52 pm

    Since Ruby is part of the HTML 5 standard, if your editor allows raw HTML markup you can mark up ruby using it. There are discussions in the HTML 5 standards document under "ruby" using mostly Japanese, but there are a few Chinese examples as well.

    The markup looks fairly straightforward to me, but please remember that I'm a (retired) software developer.

    I suspect better input methods are needed; if I had to mark up anything beyond examples I'd go stark, staring mad in no short order.

  4. Philip Taylor said,

    October 28, 2018 @ 3:51 am

    VHM — The article on Chinese honorifics is absolutely fascinating; many thanks for making its existence more widely known.

    John Roth — A macro pre-processor should allow one to express Ruby in a more human-friendly manner. Looking at the example at MDN, it would indeed be mind-numbingly boring to have to mark up any extensive document using such low-level tagging, but then the same is true of (e.g.,) MathML.

  5. John said,

    October 28, 2018 @ 5:21 am

    It appears to be part of HTML (which I'd never heard of). It's the "ruby" tag:

    I've been playing with transliteration for years and this is going to be very useful.

  6. Mark S. said,

    October 28, 2018 @ 8:16 am

    About 13 years ago, before Ruby was well supported, I put up something on how Pinyin and Hanzi might be displayed together interlinearly on Web pages using CSS and no tables. But that's about as far as I ever went with that.

    Popup Chinese has long had a nice online feature that will take a text in Chinese characters and change it so words (cí, not just zì) will be annotated with Pinyin (and English) when moused over. This could probably be adjusted so all Pinyin and Hanzi would appear at the same time.

    But most notably, the amazing Key Chinese software, which deserves to be much better known, will convert a text in Chinese characters to word-annotated Pinyin above Hanzi (or Hanzi above Pinyin). This can then be printed as a PDF. Key also offers conversion of Hanzi texts to HTML with Pinyin Ruby.

    Paste Hanzi text into a blank Key document. Then choose File –> Interactive Web Publishing. Alternately, select the text then choose Format –> Hanzi with Pinyin; and choose the options for "Hanzi with Pinyin" and "Show all non-Hanzi symbols in Pinyin line". Key ends up outputting the Hanzi twice in the HTML; but this can be adjusted in a regex text editor. A bit of CSS could tweak the presentation; but, basically, the results are ready for Web use.

  7. Thitherflit said,

    October 28, 2018 @ 9:04 am

    Kind of like Japanese furigana– little kana "sprinkled" around hard-to-read kanji. Delightful :)

    And, Prof. Mair, it was delightful to hear you at the Tea conference in Ithaca this weekend as well :)

  8. Philip Taylor said,

    October 28, 2018 @ 11:08 am

    Mark S — What can you tell us about Key5 ? I visited the web site, but it was rather less than forthcoming, simply saying that the software is available "by special request". Elsewhere it speaks in terms of a 30-day free trial, but I can find no actual cost stated anywhere.

  9. cameron said,

    October 28, 2018 @ 11:38 am

    The Ruby language was developed in Japan, and was popular there for some time before becoming widespread elsewhere. It stands to reason that the Ruby community would have developed standard libraries and utilities for handling Japanese text, which could then be extended to support Chinese. Ruby, the language, is, I think, less popular here in the US than it was about ten or fifteen years ago (when it was a hot new thing) but the libraries and utilities written in Ruby can still be used by programmers writing in other languages.

  10. John said,

    October 28, 2018 @ 12:54 pm

    I've created a small tool that annotates any text you input using the ruby tag. It uses unidecode for generalness.

  11. Dan said,

    October 28, 2018 @ 3:23 pm

    John, you might want to fix your example in that link.

    什么 is shenme, NOT shiyao.

    The first character would be shi if it lacked the radical, and would then mean "ten".

    The second character looks quite similar to yao, typically written 幺, which means "insignificant" or "one".

  12. John said,

    October 28, 2018 @ 4:27 pm

    Dan, I'm not surprised as the data came from The project's motto is "It's better than nothing!" which tells you a lot. I'd encourage anyone to take a look at the Unidecode project as it represents an interesting, and purely technical, take on the whole diacritic / romanisation debate.

    The author has this to say about the scope "Text::Unidecode is meant to be a transliterator of last resort, to be used once you've decided that you can't just display the Unicode data as is, and once you've decided you don't have a more clever, language-specific transliterator available– or once you've already applied a smarter algorithm and now just want Unidecode to do cleanup."

    Suffice to say, I did not apply a smarter algorithm before resorting to Unidecode to do cleanup :o)

  13. Minhv said,

    October 28, 2018 @ 11:11 pm

    @Dan FYI “shiyao” is a perfectly valid pronunciation gloss for the characters 什么 if a machine wasn’t smart enough, and stuck to the ROC standard of characters. Firstly, 什 is the financial numeral (anti-forging) version of 十, and secondly 么 is the ROC standard of writing “yao1” (Hong Kong and PRC writes ㄠ). 么 is only “me” in the PRC; other regions write 麼, and apart from ROC, will treat 么 as a nonstandard variant of “yao1”.

  14. William McKee said,

    October 29, 2018 @ 10:09 am

    Philip Taylor – KEY5 is available for US$50 pay via PayPal. We offer it by special request because we send out a download link by email so the robots do not continually download our software from the website.

  15. Philip Taylor said,

    October 29, 2018 @ 1:05 pm

    Thank you William; I may treat myself to a copy as a Christmas present to myself :)

  16. Kirk Kittell said,

    October 29, 2018 @ 4:01 pm

    That "ruby" tag is interesting–never noticed that before. I've always just used a span tag with the pinyin in the title attribute around the hanzi. But you'd just have to *know* that there was something there else you'd never think to mouseover the word. This seems much more useful–definitely going to play with it.

  17. JonathanZ said,

    October 29, 2018 @ 5:16 pm

    @cameron, with all due respect to the Ruby programming language, I'm pretty sure the name of the HTML tag comes from this meaning of the word.

  18. Kirk Kittell said,

    October 29, 2018 @ 8:30 pm

    That's great. I'd never heard of the "ruby" tag. To keep the pinyin and hanzi nearby, I typically just wrap the hanzi in a span, and use a title attribute for the pinyin. But it only works if you *know* to mouseover the word to get the pinyin. The Wikipedia layout is so much better–thanks for sharing that.

  19. B.Ma said,

    October 29, 2018 @ 10:16 pm

    Ruby annotation has been around for many years, so I'm not sure why it's a particular reason to "love Wikipedia". I use it frequently but only the first time I type an uncommon character. Of course a similar thing has been used in printed books for schoolchildren for decades, while standardization in HTML is a bit more recent.

    I once had a lecturer who spent an entire lecture talking about how the Wikipedia articles on his area of expertise were completely flawed, and later found he had been giving the same talk for 5 years before me and has continued to do so. Some of the citations in the article are from his own papers but it never seems to have occurred to him that he can edit the article himself.

    What I'm saying is that anyone can add pinyin to Wikipedia if they are so inclined, I sometimes do it on Wiktionary.

RSS feed for comments on this post