The mystery of sóng (U+2AA0A) ("semen")

« previous post | next post »

Matt Jenkins writes:

I am hoping you'll indulge a question that's been bugging me. I have been trying to improve my fluency by watching as many Chinese online dramas as possible, and sóng (U+2AA0A) comes up in show after show. But the character is always quite obviously "cut-and-pasted" into the subtitles. I'm (generally) familiar with the character as a simplified form of 㞞, and that people usually write 怂 instead. But why is the character practically completely absent from character sets and dictionaries? It's no more offensive than its progenitor 㞞, but 㞞 is far easier to find in character sets.

Jichang Lulu wrote about 㞞 on the Language Log back in March [see "Selected readings" below], but that post didn't include any reference to    (U+2AA0A).

If you want to see what this mysterious character looks like, you can find it here — with some really esoteric variants (N.B.).

Matt asks a good question about "why is the character practically completely absent from character sets and dictionaries?"  That's hard to say, especially since it's a heck of a lot simpler to write than 㞞, which it is equal to.  One surmise I have is that the mystery character is visually somehow reminiscent of that very nasty character "cào 肏" ("f*ck") (graphically "enter" + "flesh / meat") as in "cào nǐ mā 肏你媽" ("f*ck your mom"), which is beyond the pale, even for all but the most hardened garbage mouth.  Of course, one can always, and usually does, avoid "cào 肏" by punningly writing it as cāo 操 ("exercise").

Another possibility is that   (U+2AA0A) echoes the shape of the notorious 屄 (vulgar "vulva or / and vagina"), which is similarly nearly always replaced by the homophones 逼 or B.

For whatever reason people are avoiding writing the "real" glyph for   (U+2AA0A), the enigma of its being missing from character sets, dictionaries, and fonts points to an inalterable verity about the Sinographic writing system:  it is essentially open-ended and the number of its potential discrete constituents is infinite.  The artist Xú Bīng 徐冰 (b. 1955) proved that with his monumental, epochal installation (1988), "Tiān shū 天書 / 天书" ("Book from the Sky") and other art works (see also "Selected readings").  Anyone, anywhere, anytime can create his or her very own hanzi / kanji / hanja 漢字 /  汉字 ("Sinoglyph") — and many do.  One can even advocate combining them with emoji.


Selected readings




  1. Victor Mair said,

    September 9, 2022 @ 2:04 pm

    From an anonymous colleague:

    Ooh, some dirty words! My mother had never allowed me to read them aloud.

    I just checked my dictionaries.

    入+肉 is not in 辞海 nor 康熙字典。

    I wonder why it was thought as cao (操). I always thought it was pronounced as ri (日), since in northwestern Chinese daily language, the dirt word is always ri ni ma,like 日你妈 (one can find many of such dirty words in 贾平凹's novels) 。In fact, "日" becomes a normal word now for the phrase.

    From both 入 ru and 肉 rou, I think it makes more sense that 入+肉 is ri (日)instead of cao (操).

    I believe song 怂 is a northwestern Chinese word 西北方言. I hear it all the time used by people from 陕西、甘肃、新疆。

  2. Jonathan Smith said,

    September 9, 2022 @ 2:21 pm

    Presumably because "㞞" was not simplified in any official sense in the P.R.C. — that is, 尸 + 从 is not a Simplified form sensu stricto but just a "vulgar" 俗 analogical simplified form 类推简化字?

  3. Jonathan Smith said,

    September 9, 2022 @ 2:41 pm

    * Also in case it's not clear, sóng​ 'comes up in show after show' meaning weak~effete~pathetic, not 'semen' — whether the latter item is even part of modern Mandarin seems dubious.

    also @anonymous colleague makes sense also because it seems likely that the northern word 'fuck' was in origin simply 'enter', with the vowel of 'enter' itself later shifting due to taboo avoidance (cf. niao3 'bird' with novel n- instead of original d-; I think in this case 'bird' & 'dick' remain homophones in e.g. [some?] Hakka.)

  4. Asuitablecase said,

    September 9, 2022 @ 5:35 pm

    @Jonathan Smith

    Or even “cock”?

  5. Chris Button said,

    September 9, 2022 @ 5:58 pm

    The lack of a simplified character could really just be to do with how simplification only really extends in compound graphs (i.e. ones using the same phonetic) through “common use” (whatever that means) characters. It’s probably more evident in Japanese though, where a Joyo kanji might have a simplified form, but a non-Joyo compound character using that form as a phonetic might use the traditional form,

  6. Matt Jenkins said,

    September 9, 2022 @ 6:45 pm

    As Jonathan Smith noted, in online dramas this comes up in the context of the "weak/pathetic" meaning (“别U+2AA0A啊"), rather than "semen." The puzzling thing for me is this: In the subtitles, rather than just rendering U+2AA0A as 怂, the subtitlers will literally "paste" in U+2AA0A, presumably from some more comprehensive typeface or possibly as an image file. So clearly someone thinks it's worth the effort to get this particular ("non-") character correct.

  7. Matt Jenkins said,

    September 9, 2022 @ 7:03 pm

    Sorry … thinking about this more, and following on @anonymous colleague's thoughts:

    In colloquial usage 怂 seems to be used more frequently than 㞞 for "weak/pathetic," even though all the dictionaries I have on hand show 怂 as meaning either "alarm" or "instigate" (but not "weak/pathetic"). So maybe the subtitlers use U+2AA0A to make clear that it's "weak/pathetic" rather than "alarm" or "instigate."

    But still the question remains: Why go to all the trouble to paste an oddball character into the subtitles rather than just rending it 㞞? There has to be something more going on …

  8. Jonathan Smith said,

    September 9, 2022 @ 7:44 pm

    ^ same thing with Chinese more or less; tons of these analogical simplifications have become available in the Unicode extended blocks but many (like 尸 + 从 I assume) aren't officially sanctioned… see e.g. the 2013 通用规范汉字表 where the authors note

    "This table has performed a thoroughgoing vetting of analogically simplified characters in use by the public but not appearing on [previous officially promulgated simplified character charts] and has included only 226 characters, such as "闫", "" and "", that are in general use […blah blah…]"

  9. Jonathan Smith said,

    September 9, 2022 @ 7:50 pm

    ^ sorry, display issues for the characters at issue, of course…

    @Asuitablecase yeah, better parallel for 'bird' semantics (and more usually employed in discussions of such words on LL)…

  10. Jonathan Smith said,

    September 14, 2022 @ 8:50 am

    Oh I missed Matt Jenkins' comments here… as for why bother with 尸 + 从, my intuitive sense is that (1) "怂" = sǒng​yǒng 怂恿~慫恿 'incite', so feels like a phonetic borrowing if used to write 'weak'; (2) "㞞" is to the eye clearly a traditional character; the “從” "should be" simply "从" in a simplified version. Thus cut-and-paste, etc.

  11. Matt Jenkins said,

    September 15, 2022 @ 11:30 pm

    @Jonathan Smith Thanks for your latest comment, and apologies for continuing to drag this out, BUT (!) …

    Thanks especially for pointing out that for a native speaker (er, reader) 㞞 is clearly a traditional character. That bit had eluded me. Just for fun, I went and entered 㞞 in PLECO. When I toggled PLECO to Simplified, the "headword" character totally vanished — there was just a forlorn-looking "sóng" beneath where the character should have been. Which, I guess, was to be expected. But still kind of spooky to see (or rather, not see) a member of this strange class of wandering-ghost characters that have never been given official Simplified incarnations …

  12. Jonathan Smith said,

    September 16, 2022 @ 3:28 pm

    Yeah the funny thing is that analogical simplification is of course eminently reasonable, i.e., every character which used "鳥" really *should* use "鸟" and so forth, but this is incompatible with the way the script has been digitized.
    Since many such forms including this one are now in Unicode I suppose it is just a matter of time before applications like Pleco and the rest display them more readily… whether they'll get the official PRC stamp of approval is another (maybe largely irrelevant) matter.

RSS feed for comments on this post