Language Log

The unpredictability of Chinese character formation and pronunciation, pt. 2

February 11, 2019 @ 8:08 am · Filed by Victor Mair under Errors, Writing, Writing systems

Emma Knightley asks:

My background is that I grew up in Taiwan learning Traditional Chinese and now most of what I use in my professional life is in Simplified Chinese. How exactly should the character of hē, "to drink," be written?

I grew up learning that the character inside the bottom-right enclosure is 人. Now I see that it is mostly written as 匕. I don't know when this changed, and I don't think it's a matter of Traditional vs Simplified, either, as I see both versions in Traditional writing as well. This Wiktionary entry illustrates the confusion nicely. No one I know has noticed this change, which leads me to think that I'm either losing my mind or experiencing the Mandela Effect.

Here's the character that Emma is asking about: 喝 (traditional and simplified), but it is also written thus: [Whoops! No matter how hard I try, I can't get my browser to produce this character, even though I use the correct Unicode number, U+FA78. Whoops again! By tricking my browser, I think I can get it to appear here, , though I can't guarantee it'll still be there when I make this post (I won't attempt to type it again later in this post, but will simply refer to it as "the phantom character")]. It's the form with 人 in the bottom-right enclosure, not 匕. Actually, in the latter version, the stroke that starts at the top left, goes straight down vertically, then abruptly curves to the right before ending with an upward hook serves as the left side and bottom of the bottom right of this form of hē ("to drink"), so the only thing inside the "enclosure" at the bottom right is a short stroke that slants downward to the left, not 匕.

The Wiktionary entry for hē ("to drink") does indeed illustrate the problem nicely, since the character in the heading is the one my browser won't produce, but the examples for Chinese, Japanese, and Korean all have 喝, while the one for Vietnamese has the one that I can't call up with U+FA78.

As for the pronunciation of 喝 and the phantom character I can't produce, we have:

hē ("to drink")

hè ("shout loudly")

yè ("with a hoarse sound")

kài (listed in Wiktionary and other sources, but I haven't been able to track it down otherwise and assign a meaning to it, so I'm a bit dubious about its existence as a morphosyllable in MSM, though it may well exist in one or another topolect [e.g., Taiwanese], though I'm not sure where it comes from and I have no idea what it means) — FLASH!! Just found it in Hànyǔ dà zìdiǎn 漢語大字典 (Unabridged character dictionary of Sinitic; HDZ), 1.653b, where it says that, in this reading, 喝 = alas, can't copy this one either; it is the fourth rare variant from the left under yìtǐzì 异体字 — top center here (HDZ says, believe it or not, that it means "sound" [shēng 聲])

Aside from the phantom character that I can't type, variants of 喝 include 欱 and 哈, and there are about half a dozen others that I cannot type. The main point of this post, however, is that Emma and I, and doubtless millions of others, were not too long ago taught to write hē ("to drink") with a form of the character (U+FA78) that now barely exists. If we wrote 喝, it would have been considered an error.

Readings

"The unpredictability of Chinese character formation and pronunciation" (2/6/12)
"How many more Chinese characters are needed?" (10/25/16)
"Chinese character inputting" (10/17/15)
"Is there a practical limit to how much can fit in Unicode?" (10/27/17)
"Character crises" (6/15/18)
"Ask Language Log: Looking up hanzi for ignoramuses" (11/29/17)
"Sinological suffering" (3/31/17)
"Writing characters and writing letters" (11/17/18)
"An immodest proposal: 'Boycott the Chinese Language'" (11/18/18)
"The wrong way to write Chinese characters" (11/28/18)
"Sinographs by the numbers" (1/22/19)

February 11, 2019 @ 8:08 am · Filed by Victor Mair under Errors, Writing, Writing systems

Permalink

21 Comments

David Moser said,

February 11, 2019 @ 9:29 am

I'm confused. I think the version with the 匕 component is the Japanese version, no? That's certainly what the Wiktionary indicates. And my browser will produce both versions, as long as I cut-and-paste the 匕 component graph. The ASCII just seems to be for the Japanese version of the character.
unekdoud said,

February 11, 2019 @ 9:41 am

Character amnesia strikes! I read the first paragraph and mentally conjured up the character with 勾 in the corner.
Tom Gewecke said,

February 11, 2019 @ 9:53 am

There's only one character U+559D, but two common forms, one for Chinese and one for Japanese. Which one appears on your computer depends on whether a Chinese or Japanese font is being used at the time. On a Mac or ipad/iphone, which font gets used often depends on the order of these two languages in the Preferred Languages list in your preferences. If you are seeing the wrong one, changing that should fix it.

https://discussions.apple.com/thread/250126320
glasserc said,

February 11, 2019 @ 9:53 am

I learned the form with 人 from the 2003 printing of Integrated Chinese.

I'm not an expert but I think it's U+559D, from this Wiktionary article:

https://en.wiktionary.org/wiki/%E5%96%9D#Translingual
Victor Mair said,

February 11, 2019 @ 10:34 am

"I'm confused."

And well you should be.

The problem with hē 喝 ("to drink") is similar to that with mén 门 (not the form I want to type, but it's too much trouble to get the one I want to come up) ("door; gate"), which we have discussed at great length in "Foul Meat-gate" (7/29/14). I don't think anyone is happy to have to cut-and-paste to get the "correct" version of such common characters as those for "to drink" and "door; gate".
Tom Gewecke said,

February 11, 2019 @ 10:52 am

@Victor I think the mén problem perceived on computers is really the same thing, a case where people's machines are for one reason or another (most often their language preference settings) using a Japanese font (namely Hiragino on MacOS and iOS) when displaying Chinese text instead of a Chinese font.

https://m10lmac.blogspot.com/2011/09/odd-chinese-display-issue.html
Ollyver said,

February 11, 2019 @ 11:40 am

Further evidence for Tom's point: my browser displays the characters as described, but the RSS reader I first opened it in on the same device ("Read" on Android) displayed 喝 with 人 in the bottom right. This made the post quite confusing to read!
Philip Taylor said,

February 11, 2019 @ 3:40 pm

fileformat.info is happy to offer a graphic version of "the phantom character" for those for whom the Unicode glyph will not display.
Philip Spaelti said,

February 11, 2019 @ 7:12 pm

The reason why your character is a phantom is that FA78 is a "CJK compatibility character" and most people aren't going to have a font that can display it.

Actually I think such characters should really be called "CJK INcompatibility characters"
Alyssa said,

February 11, 2019 @ 7:40 pm

Thank you Tom Gewecke for the explanation, on my computer every instance of the character in this post shows up with the 人, so I was very confused.
B.Ma said,

February 12, 2019 @ 2:31 am

I'm still confused. I can find 3 versions: U+559D, U+FA36 and U+FA78.

The following links have images.

http://www.fileformat.info/info/unicode/char/559d/index.htm
http://www.fileformat.info/info/unicode/char/fa36/index.htm
http://www.fileformat.info/info/unicode/char/fa78/index.htm

The above website image shows 559D with 匕. I have never seen the character written like this before. As far as I'm concerned the FA78 and FA36 versions with 人 are "correct".

On my Windows PC, all instances of the character in the LL post appear with 人. I can't see the image in the post so am not sure which version it is supposed to display.

It looks like my fonts render the 559D code point using 人, and are unable to display FA78.
Frédéric Grosshans said,

February 12, 2019 @ 5:33 am

In the Unicode chart of “CJK unified ideographs” (35MB pdf), all versions of U+559D 喝 have the 人-form, but the japanese one, which has the 匕-form. The chart for the CJK compatibility ideographs (pdf) contains two codepoints canonically equivalent to U+559D: One inherited from a DPRK (North Korea) standard, U+FA78 喝, in the 匕-form, and one from a Japanese standard (JIS X 0213), U+FA36 喝 in the 人-form.

The modern version to distinguish the two in Unicode is through the use of ideographic fariation sequences, 559D FE00 喝︀ for the 人-form and 559D FE01 喝︁ for the 匕-form.
Tom Gewecke said,

February 12, 2019 @ 8:33 am

@Frederic Grosshans Are there some common fonts that actually use IVS to do this? As a practical matter, what every has are machines using separate fonts for Chinese and Japanese which have different forms for 559d.
Tom Gewecke said,

February 12, 2019 @ 10:03 am

@B.Ma Nobody uses FA78/FA36, standard fonts do not include them (which is why you can't see them) Instead everyone uses 559d. 559d has 2 versions, both of which are correct, one for Chinese and one for Japanese. Which one you see depends on what kind of font (Chinese or Japanese) your particular machine or app is using at the moment. If it always uses a Chinese font you will never see the Japanese version. The Fileformat image seems to be the Japanese version for some reason. If you click on the Browser Test Page link you should see the Chinese version too.
Tom Gewecke said,

February 12, 2019 @ 2:22 pm

If anyone is interested in other cases where this display problem could arise, the url below has a chart (the one named Same Code Point, Different Language Tags) which seems to work on my browsers, showing the different forms that would normally be produced by different language-specific fonts.

https://en.wikipedia.org/wiki/Variant_Chinese_character#Usage_in_computing
Moa said,

February 14, 2019 @ 2:57 pm

That's neat, I have never noticed the difference before. For all I know, any squiqqle could take the place of 人 in 喝 and I would not notice. Now when I look closer, my browser sometimes display one, sometimes another. Nice!
Emma said,

February 15, 2019 @ 12:42 am

Letter-writer here. Thank you to Prof. Mair for posting this and to all the kind folks who commented. This is a huge relief to me, honestly. I was confused by this for the longest time. Now I just have to figure out how to get my fonts to display correctly!! (I assume people using computers + browsers configured in sinophone countries do not have this problem. But I do wonder how many people learning Chinese outside of those areas just think that the Japanese version is the correct character because of things like this.)
Frédéric Grosshans said,

February 15, 2019 @ 8:21 am

@TomGewecke :

Actually, both IVS in my post above appear correctly on my computer (Ubuntu 18.10), both under Firefox and Chromium, so I guess the answer is yes, such fonts exist. However I don’t which font I actually use ….
Magnus Henoch said,

February 15, 2019 @ 4:48 pm

I couldn't resist coming up with a little tool to try to get the character to display in different ways: http://קבצים.חנוך.se/characters.html

On my Ubuntu machine (not specially configured for any East Asian language), Firefox displays the variant with 人 when specifying any variant of Chinese, and the variant with 匕 for Japanese and when not specifying a language. Chromium on the other hand uses the 人 variant in all cases, though it does display the character 门 differently for Chinese and Japanese. I'm not sure if this reduces or increases the amount of confusion…
Tom Gewecke said,

February 15, 2019 @ 7:29 pm

@Magnus Very cool! Works fine on my Mac and iPad, also with Chrome.
Frédéric Grosshans said,

February 17, 2019 @ 11:13 am

Thanks for this nice tool! It also handles variation selectors correctly (by copy-pasting). If I could make a suggestion to improve it, it would be the possibility to add several codepoints, to play with variation selectors.

RSS feed for comments on this post

The unpredictability of Chinese character formation and pronunciation, pt. 2

21 Comments

David Moser said,

unekdoud said,

Tom Gewecke said,

glasserc said,

Victor Mair said,

Tom Gewecke said,

Ollyver said,

Philip Taylor said,

Philip Spaelti said,

Alyssa said,

B.Ma said,

Frédéric Grosshans said,

Tom Gewecke said,

Tom Gewecke said,

Tom Gewecke said,

Moa said,

Emma said,

Frédéric Grosshans said,

Magnus Henoch said,

Tom Gewecke said,

Frédéric Grosshans said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta