Long kanji readings

« previous post | next post »

SoraNews24 (4/20/17) has an article by Scott Wilson titled "W.T.F. Japan: Top 5 kanji with the longest readings【Weird Top Five】 ".  Before attempting to read and critique this article, we need to familiarize ourselves with some basic terms and concepts about the modern Japanese writing system.  It basically consists of thousands of kanji (Chinese characters) and kana (a syllabary of 48 symbols, of which there are two different types, cursive hiragana and angular katakana).  As the name "syllabary" indicates, each of the kana symbols is pronounced as a syllable, except for one, which indicates the sound "n".

The pronunciation of the kanji is much more complicated.  One kanji may have multiple readings, of which there are two main types:  on'yomi / ondoku (lit., "sound reading", i.e., Sino-Japanese reading) and kun'yomi / kundoku (lit., "instructional / exegetical reading", i.e., Japanese or native reading).  The former are one syllable in length, while the latter may have two, three, or more syllables.

In his article, Wilson makes frequent reference to the monumental Dai Kan-Wa Jiten 大漢和辞典 (The Great Chinese–Japanese Dictionary) by Tetsuji Morohashi, which students of Sinology and Japanology fondly refer to simply as "Morohashi" (e.g., "Check Morohashi" or "Morohashi will certainly have it").

Here's the nub, one that Wilson himself raises:  are some of the longer readings of certain kanji actually definitions rather than kunyomi?  Here is Wilson's position on this point:

The way that the Morohashi dictionary “confirms” a reading for a kanji is via the index volume (yes, the index is an entire volume… sometimes more, depending on the edition). If the kanji’s reading is in the “Japanese reading” index, then I say it’s fair game to label it a “reading.”

Here's what Linda Chance has to say about this matter:

I looked in my Morohashi, and indeed he calls the pages a kun index, but I don't think these are readings. When you look at the entries for the characters, they do not appear as readings. What I think he has done in the index is to give a phrase by which you might try to find a character you cannot recall, or search to see if there is a character for a particular expression.

Just now I consulted the first volume 凡例, the hanrei, which explains how to use the dictionary (and as all my students know, is the one part of any reference work you must not skip). In fact, Morohashi does not have a category of 訓読 in his instructions; 音読 is the only kind of reading you get. He calls the hiragana entries 訓議 [VHM: kungi, where -gi means "deliberation;  consultation;  debate;  consideration"]. Now very often these are the same as the kunyomi for a character, so for example the meaning of  石 appears as いし. But it also appears as なげいし [VHM:  a family name], and you would normally not read the character 石 that way without some kind of aid telling you to do so. The fact is, I suppose, that in Chinese quotations, 石 may mean なげいし. But in Japanese we would only read 投げ石 as なげいし. (And of course we would read 石 as なげいし if the writer glossed it as such.)

The writer of this entry does not understand how kanji readings work in the first place (if I may be so bold) as he or she writes that 食 is a one-syllable kanji, pronounced ta. That character cannot be read as "ta." It is read "ta+beru." As we know, the purpose of the beru ending (okurigana) in the example 食べる is to inform us of the intended inflection for the verb. Even without the okurigana, if the grammar requires the reading "taberu," the reader should supply it. So the kun reading of the character is a properly inflected form of the verb "taberu." Japanese language textbooks do not, of course, make this distinction clear.

The writer has pointed out that the examples are all extremely rare kanji, and this is the crux of the matter–I would venture that these graphs do not have kunyomi, only onyomi. In fact not a single one of these graphs appears in a desktop-size dictionary. As far as I can tell (and I have gone as far as I plan to on this subject) they were not used to write any Japanese words, for which the writer can be thankful.

Comment by an anonymous colleague:

I assume this site is related to the trend that brought us something called 'Why Japanese people?!,' which I still have never watched, but it's one of those 'what the heck is wrong with this bizarre culture' tv shows that invariably make my blood boil. because they usually don't know what they're talking about, and no Japanese will set them straight because it is nice to hear that you're special, even if he's saying 'Japanese are especially dumb.'

Given the title of the article, which we should not necessarily attribute to Wilson, the comment by the anonymous colleague may not be far from the truth.

[h.t. Ben Zimmer]


  1. Chris Brockett said,

    April 22, 2017 @ 10:14 pm

    Some of the entries cited may be "fingerprinting"–fake entries inserted by the publisher for the purposes of copyright protection.

  2. Jim Breen said,

    April 23, 2017 @ 12:17 am

    Much as I like Morohashi (I own a copy) it is probably the last reference I would use when exploring kunyomi of kanji. It's regarded as having a rather Chinese orientation. (And I too think the kun reading of 食 is たべる.)

  3. Max said,

    April 23, 2017 @ 1:10 am

    I wonder if there really is a difference between a "reading" and a "definition."

    After all, when China inevitably conquers the US and imposes characters on us, would "red" be a definition of 赤 or a reading?

    I was surprised to learn that Korean takes a somewhat different approach to Chinese characters than Japanese does. A character has a "formal definition," a native Korean word, but in text it can only have a Chinese-derived one-syllable reading. The formal definition can be said together with the reading "as a verbal means of identifying a character," according to A Guide to Korean Characters, but a Chinese character can't be a full replacement of a native word, as in Japanese.

    So maybe that's the difference.

  4. And O said,

    April 23, 2017 @ 1:32 pm

    The former [on'yomi] are one syllable in length

    Look over a list of the kanji taught in schools and you'll find lots of examples for two-syllable on-readings. Just going down the list a bit, both 悪 and 握 are glossed with アク aku, while later on 一 gets two readings: イチ ichi and イツ itsu.

    I had a whole tentative theory on the composition of your average on-reading worked out here, but the Wiki page got the scoop on me:

    [M]any Chinese syllables, especially those with an entering tone, did not fit the largely consonant-vowel (CV) phonotactics of classical Japanese. Thus most on'yomi are composed of two morae (beats), the second of which is either a lengthening of the vowel in the first mora (to ei, ō, or ū), the vowel i, or one of the syllables ku, ki, tsu, chi, fu (historically, later merged into ō), or moraic n, chosen for their approximation to the final consonants of Middle Chinese. It may be that palatalized consonants before vowels other than i developed in Japanese as a result of Chinese borrowings, as they are virtually unknown in words of native Japanese origin, but are common in Chinese.

    Is it just a coincidence that all of the mora that would add a second syllable to a given reading (i.e. "ku, ki, tsu, chi, fu") contain a close vowel, both of which are frequently devoiced in modern standard Japanese? Related: when did the whole devoicing business start in the first place?

  5. phspaelti said,

    April 23, 2017 @ 10:53 pm

    > Is it just a coincidence…
    No. The second syllable in such readings is the result of epenthesis. On-yomi are borrowed words from Chinese (more precisely 'a Chinese language'), and those characters had a final consonant which the Japanese could not pronounce without adding a vowel.
    One interesting thing about the vowel is that you can see the development of the Japanese epenthetic vowel as it switches from older 'i' to the more recent 'u'. (So for 日, older "nichi" had 'i' and more recent "jitsu" has 'u'). The exact choice is partly conditioned by the main vowel and partly by the consonant.

    I don't think the de-voicing of the vowel is crucial in any way, but of course devoiced vowels make 'better' epenthetic vowels (since they are less sonorous). I don't know when devoicing of vowels started in Japanese, or if there is even any way to determine when it might have.

  6. Victor Mair said,

    April 24, 2017 @ 3:44 am

    From Mark Liberman:

    As the name "syllabary" indicates, each of the kana symbols is pronounced as a syllable, except for one, which indicates the sound "n".

    And two others that are never syllabic:

    the chōonpu "ー" used to mark indicate vowel length, and the sokuon "ッ" used to indicate geminate consonants.

    Thus 「ポッキー」 "pocky" = pokkii is two syllables, not four.

    Also there are vowel-kana used as part of dipththongs, so that e.g. "tai" (= "red snapper") is just one syllable, though it would be written with two kana:

    たい or タイ

    Also palatal on-glides get their own kana. So Tokyo is とうきょう, with five kana for two syllables.

    For those reasons, kiragana and katakana are moraic writing systems rather than syllabaries.

  7. Rodger C said,

    April 24, 2017 @ 6:47 am

    Wouldn't ポッキー be counted as four syllables in a haiku? (I know, it conjures up an interesting concept.)

  8. Ellen Kozisek said,

    April 24, 2017 @ 6:55 am

    @Rodger C

    Japanese Haiku count morae, not syllables. (My interest in poetry, and thus having read about haiku in Japanese, is how I know the concept of morae.)

  9. Rodger C said,

    April 24, 2017 @ 10:57 am

    @Ellen Kozisek: Thank you. I sort of knew that. Also thank you for knowing that "mora" is a Latin word. I've encountered writers who treat it as a Japanese word.

  10. Chris Button said,

    April 24, 2017 @ 3:36 pm

    @ Jim Breen

    And I too think the kun reading of 食 is たべる.

    What about when the verb is inflected for context (e.g. たべます, たべない etc….)? Isn't it far more economical just to treat this "kun" reading as most usually just a single syllable "ẗa-" with a hyphen denoting that it needs some "okurigana" afterwards in order to have this reading? Reading the character 食 on its own as たべる with no following okurigana is hardly common. Furthermore, how do we know not to say くう (食う) in such a case?

  11. Alyssa said,

    April 24, 2017 @ 11:26 pm

    This is the first I've heard that a verb like 食べる can be written as simply kanji without the kana inflection. In what situations would this be done? I would say I've never seen it, but of course I wouldn't be able to tell if I had.

  12. leoboiko said,

    April 25, 2017 @ 5:23 am

    I'd say both points above are right: 食 is just the ta- in taberu, in an important sense; under standard orthography, you need the "-べる" to write this word. On the other hand, 食 is the ta- in taberu, not elsewhere. Thus the traditional representation of kun'yomi as た(べる) or た-べる etc. (These definitions of kun'yomi are like linguists' notation for phonological rules: ta / _beru).

    Further, okurigana standardization is both new and "squishy". It's not hard to find e.g. things like 食る; I can find plenty of examples in Google Books (won't post counts because they're misleading) or in Twitter. And I'd say 食べる is a particularly stable example; other verbs' okurigana may vary even more. So we have to define kun logic as associating the character 食 primarily with the verb taberu, and secondarily with the mora ta. I see the secondary association as a consequence of a couple orthographic rules or transformations:

    0. Start with the verbal word-form (inflected verb) you want to represent.
    1. Morphographic principle: Use kanji for the radical and kana for the inflection (to the best approximation available under moraic writing).
    2. Intra-kanji disambiguation: If there are more kun'yomi readings associated with this kanji, and you've just created homographs, extract one kana to the okurigana, in order to disambiguate them.

    For example, 0. agar-u 1. あが-る aga-ru = 上る. Then 0. age-ru 1. あげ-る = 上る, 2. ops now they're ambiguous so let's have the tip of their radical dangle visibly: 上げる、上がる. (Notice however that it's still easy to find 上る=あがる in the wild—despite the fact that 上る is also an orthography for nobo-ru.)

    Regarding the use of okurigana-less orthographies for verbs: while I don't think I've seen verbal -u forms as single kanji (other than the trivial case of kanbun kundoku), it's easy to find nominalized forms (be them -i inflections or vowel stems) as single kanji. So 切 as well as 切り for kir-i, 待 and 待ち for mat-i, etc. I think this stems indirectly from the morphographic principle. The nominal form may be an inflection, and normally the morphographic principle wants to mark inflections with kana (thus the 切り orthography). But kir-i works like a noun, syntactically as well as semantically (it takes case particles, it references a conceptual entity etc.). Since non-verbal nouns (like hito) are uninflected, they don't have okurigana; which creates a sense of "no okurigana means nouny things". This derivative expectation that "nouns have no okurigana" generates the alternative form 切 for kir-i. Finally, because the linking form is identical with the nominal form, the linking form ends up with the okurigana-less orthography, too. So 切取る as well as 切り取る, even though in this case kiri- ain't a noun even conceptually.

  13. Chris Button said,

    April 25, 2017 @ 2:27 pm

    @ Leiboko

    From a modern usage perspective, the non-standard spelling 食る actually makes a lot of sense since たべ- rather than just た- is the unchanging component no matter how you inflect it. From an etymological perspective, the standard spelling 食べる better reflects its origin in 食ぶ.

  14. Chris Button said,

    April 27, 2017 @ 12:13 am

    "So the kun reading of the character is a properly inflected form of the verb "taberu."

    If I'm understanding Linda Chance's comment correctly, the suggestion is that the reading "ta" can only be the "ta" in "taberu" (or any other inflected form) and not the "ta" in any other unrelated Japanese word. As a result, we cannot really separate "ta" from its okurigana even if we write them separately from each other. Following that line of thought, would we then need to regard "-s", "-ted" and -ting" as inseparable hypothetical okurigana for a kanji representing the verb "set" to avoid any association with an unrelated meaning of "a collection of things"? (granted "set" is a rather more meaningful element in English than "ta" is in the Japanese example)

  15. Chris Button said,

    April 27, 2017 @ 8:34 am

    Having mulled this over a little more (and noting that the past tense of "set" is "set" and that "sets" could be used as a 3rd person verb and a plural noun), I think the difference in approach could be best exemplified in English accordingly:

    1. We have a hypothetical kanji representing the morpheme "sit" (I'm using "sit" rather than "eat" because it works better). Following Linda Chance's approach, this could also be read as "set", "sat", "seat", "soot" or any other inflected form by replacing the "i" with other vowels (ignoring related forms like "sedate" etc for simplicity). This corresponds to treating a kanji like 食 as having the kun reading "taberu" (or more strictly an uninflected older reading "tabu") at its base.

    2. We have a hypothetical kanji representing "s_t". This could be read as "sit", "set", "sat", "seat", "soot" or any other inflected form by adding vowels to "s_t" as appropriate. This corresponds to treating a kanji like 食 as having the kun reading "ta_" at its base.

    While English "s_t" and Japanese "ta_" might not explicitly represent the meanings "sit" and "eat" in and of themselves, the fact that you cannot add for example "_ui_" to make unrelated "suit" or "_tsu" to make unrelated "tatsu" respectively means that the "semantic" concepts of "sit" and "taberu" (eat) are encoded regardless. As a result, in my humble opinion, claiming that 食 cannot be treated as "ta_" but only as "taberu" is overly prescriptive and frankly rather confusing.

  16. Quinn C said,

    April 27, 2017 @ 2:44 pm

    As for morae and syllables, one can distinguish them as Mark Liberman does, but it's common practice not to use the concept of a syllable at all in Japanese, and therefore, the word "syllable" becomes free to refer to a mora by a more familiar name.

    Rather than calling out the prevailing practice as wrong, first one should suggest that the syllable is a useful concept in Japanese, and insist on using "mora" only when that has been established.

  17. leoboiko said,

    April 27, 2017 @ 3:38 pm

    @Quinn C: What "prevailing practice"? I strongly dispute your assertion. I'm a linguist working with Japanese and in my circles no one uses the word "syllable" to mean "mora" when discussing the language. I don't think I've ever read a single phonology article or textbook using it in this way. Even literary texts (like, say, Robin D. Gill's Rise, Ye Sea Slugs!) are careful to distinguish the concepts, while in Japanese no one would confuse a haku (or, informally, a ji) with an onsetsu.

    There are many phenomena where the notion of "syllable" is necessary in Japanese. For example, the length of the moraic nasal depends on the vowel nucleus of its syllable (a distinct mora)—they affect the length of each other—even as the vowel nasalizes if its syllable (not mora) has a nasal coda. Some dialects will apply tones (pitch accent) exclusively on syllable boundaries, while in others the accentual domain is the mora, and in still others the word. So it's useful to distinguish "mora accent", "syllable accent" and "word accent" dialects. Syllables (as well as morae) have been shown to be implicated in native Japanese speech production, perception, language change and so forth. See Kawahara (2015), or just search for "Japanese syllables" on Google Scholar or wherever.

RSS feed for comments on this post