Seal Script in Unicode
« previous post | next post »
Draft Minutes of UTC (Unicode Technical Committee) Meeting 185
Cupertino, California, United States — October 27-29, 2025
Hosted by Apple in Cupertino and virtually
UTC #185 Agenda
Revision date: November 26, 2025
https://www.unicode.org/L2/L2025/25226.htm#185-C3
As has been true since the beginning of Unicode (see Mair and Liu, Characters and Computers [1991], of the total number of new code points to be added to Unicode, the proportion devoted to Sinoform characters is greater by an order of magnitude than for all other scripts and symbols (cuneiform, Arabic, Armenian, Bengali, Devanagari, Hebrew, Kana, Latin, Mongolian, emojis, alchemy, mathematics, etc.) put together.
Hangul syllables, which derive their basic shape from sinographs, are also Unicode code point hungry.
Feast your eyes on these tables of Unicode blocks, and don't stop reading till you get to the bottom:
https://en.wikipedia.org/wiki/Unicode_block
Some people have suggested that all of Xu Bing's made-up characters in "Book from the Sky" should also be entered in Unicode, even though we don't know what any of them mean or how they are pronounced. That's 4,000 more meaningless code points right there. Other artists have made similar critiques and elaborations of the Chinese writing system. Since Xu Bing's work and most of the others like it, not to mention the already existing sinographic writing system itself, are open-ended, that means including them would subject Unicode to an infinity of additional sinoform code points.
D.1 Section 1.2 Seal Script
Discussion of the name of the block. Participants agreed that “Seal” is preferable to “Small Seal”. The ISO 15924 registrar noted that the English script name is “(Small) Seal”.
Discussion of the status of properties.
[185-C3] Consensus: UTC accepts 11328 code points U+3D000..U+3FC3F for encoding in a new Seal block based on WG 2 N5344R, for Unicode Version 18.0. Of the proposed properties, kSEAL_THXSrc, kSEAL_CCZSrc, kSEAL_DYCSrc, and kSEAL_QJZSrc are Normative. The others are Provisional. [Ref. 1.2 in L2/25-232R]
[185-A5] Action Item for Ken Whistler, RMG: Update the Pipeline to include 11328 Seal characters U+3D000..U+3FC3F based on WG 2 N5344R, accepted for Unicode Version 18.0. [Ref. 1.2 in L2/25-232R]
[185-A6] Action Item for V.S. Umamaheswaran, SAH: Update the roadmap to reflect accepted code points for the Seal script: U+3D000..U+3FC3F. [Ref. 1.2 in L2/25-232R]
[185-A7] Action Item for Michel Suignard, EDC: Update Table 4-8 in the Core Specification to include name derivation prefix for the Seal script, for Unicode Version 18.0. [Ref. 1.2 in L2/25-232R]
[185-A8] Action Item for Michel Suignard, EDC: Provide block description for the Seal block in the Core Specification, for Unicode Version 18.0. [Ref. 1.2 in L2/25-232R]
[185-A9] Action Item for Michel Suignard, SAH: In Unicode Standard Annex #60, “Data for non Han ideographic scripts”, add properties for Seal as described in WG 2 N5344R.
Some advocates of the sinographs (hanziphiles), even famous university professors, think that the sinographic writing system is superior to all others because it has far more discrete elements than do alphabets, syllabaries, abjads, and so forth. Oy vey!
Selected readings
- "Unicode CJK Unified Ideographs Extension J and the nature of the sinographic writing system" (6/16/25)
- "Language is not script and script is not language, part 2" (7/10/22)
- "Language is not script and script is not language" (1/23/22)
- "How many more Chinese characters are needed?" (10/25/16)
- "Language vs. script" (11/21/16)
- "Triple review of books on characters and computers" (8/23/24)
- "Chinese character inputting" (10/17/15)
- "Sinographic inputting: 'it's nothing' — not" (2/22/21) — with lengthy bibliography
- Victor H. Mair and Yongquan Liu, eds., Characters and Computers (Amsterdam, Oxford, Washington, Tokyo: IOS, 1991)
- "Is there a practical limit to how much can fit in Unicode?" (10/27/17)
- "Sinographs by the numbers" (1/22/19)
- "Idiosyncratic stroke order" (11/23/18) — and the long list of earlier posts at the bottom
- "The wrong way to write Chinese characters" (11/28/18) — You may get all the strokes of a character, but if you don't write them in the correct order, you have miswritten the character, for example, the 29 strokes of this one:
yù 鬱 ("depression; blues; dense; despondent; dismal; dispirited; low-spirited; melancholy; sweet smelling")
Jonathan Smith said,
January 15, 2026 @ 1:44 pm
* in "Section C. Technical – Justification" of the proposal summary form, the answer to whether "information on the user community for the proposed characters (for example: size, demographics, information technology use, or publishing use) is included" is "NO", and the answer to "context of use for the proposed characters" is "RARE". So it is fun (?) that 11000+ "Seal Script" characters will be in Unicode, but it accomplishes practically nothing for practically no one.
* the new block involves a complicated consolidation of four (naturally there are more) versions of the c. 100 CE dictionary Shuowen jiezi. So it really should be called "Shuowen script", not "[Small] Seal script". Because if one wants to, say, publish an article about an actual Seal script inscription, one could not use this Unicode block because the characters… um look different. One would needs to present some kind of facsimile. So in addition to this block being not very useful for ordinary folks, it is not very useful for specialist work. I don't know exactly what it is useful for, but I guess dilettantism, like you want some decorative text on your say restaurant menu to be in Seal script? Which you could already do anyway easily once a decade or so when the issue came up?
Bybo said,
January 15, 2026 @ 5:20 pm
"If a seal could talk, we could not understand him."
~flow said,
January 15, 2026 @ 5:38 pm
I would like to present an opposing view to the ones presented in the post and the comment.
First of all Unicode is—or more precisely, has become, against early opposition by what turned out to be a minority view—a catalogue of the inventories of the world's writing systems modern and ancient. Its outlook is not confined to what Mr and Mrs Everybody will actively use on a daily basis, although if they read a book printed during the last few decades, use their smartphones, exchange documents at work or do anything with their computers they will inevitably be consumers of Unicode which has been capable for this job for a sizable portion of the world's population since at least version 1.0.1 from 1992 when 20,000 CJK characters were added to the repertoire. If the minority view had convinced more people in the relevant bodies then Unicode would have probably stopped around that time, for some people thought that only what is relevant in 20th c and future business and mass-market applications should ever be considered for inclusion in the Unicode repertoire. Fortunately, that did not happen.
Instead, a plethora of writing systems has been registered by the Unicode consortium during the past decades; among them are some forms of cuneiform writing, Ancient Egyptian hieroglyphs, and, of late, the Small Seal, taking specifically the forms recorded by the Shuowenjiezi as sources for the encoding and their, if you will, graphical prototypes, or, for the taxonomically innclined, the holotype of that glyph.
Now one can argue that the Seal script is nothing but an early form of the Sinographs (or CJK Ideographs, whatever) but I won't go into that and just assume that they are, bona fide, a precursor to modern writing that is distinct enough to earn the title of Writing System, rather than a stylistic variant. FWIW this is very much the same discussion that we face with the language / dialect dichotomy, and the same dilemma led the creators of Unicode to establish a unified Greek and Coptic block which later was split into Greek and Coptic as it was felt that the two are not mere variants.
In the same vein, one could unify Latin A with Cyrillic A and Greek A with the stipulation that their lowercase forms are a and α, depending on language, but I think we're lucky this didn't happen; also, scripts like Arabic have many variant forms for the different shapes that letters assume depending on their written context, and separate codepoints were added for these so-called presentation forms because they do appear out-of-regular-context (i.e. irregular) in specific contexts, such as alphabet tables in teaching materials. We can see that overall, Unicode has become more inclusive, more diverse, and less abstract, often erring on the side of adding codepoints where, from a maximalist reductive viewpoint, existing codepoints could have been pressed into service (not unlike the Greek β which in 1970s German language user instructions printed in Japan was routinely used as a stand-in for German ß, itself BTW the result of discussion in Germany's late-1800s printing houses: is it a separate letter or just a ſ+s ligature?).
It is an inevitable fact that once all the most widely used codepoints had been defined, all remaining codepoints had to share their attention among smaller and smaller audiences; this has certainly been true for Egyptian hieroglyphs. Because of this and because of the increasingly niche expertise required to formulate the necessary standards documents, the Unicode consortium is undeniably facing diminishing returns for increased efforts. However, I would argue the efforts are very much worth it. For one thing, it would be very hard to argue that we don't need Unicode at all, or that all of Unicode after version 2 is fluff, or that the pre-Unicode world of computing worked very well without over 100,000 CJK encoded characters and without any standard to encode hieroglyphs, or that like in the 80s and 90s every region and every specialist field should continue to use their own in-house, proprietary and regionally limited, mutually rather incompatible encodings. None of this is true.
What is true is that once you assign a codepoint in a widely accepted standard like Unicode, you give the world a hook on which to hang their scientific papers, their specialist literature, their communications, and their popular science hard cover publications. You enable data people to fill databases with registers of archaeological finds, complete with epigraphic data. In 2009, 1,071 Egyptian hieroglyphs were added to Unicode v5.2, representing an increase of pretty much exactly 1% of Unicode's glyph repertoire at the time. For years later the so-called Diary of Merer was unearthed in a remote part of Egypt. Suddenly a papyrus written 4,600 years before today that chronicles a small part of the works necessary to build the Great Pyramid of Giza could be—as far as it could be deciphered—entered into 21th century web pages, databases, and scholarly papers; not only as embedded pictures, but as text that is, technically speaking, no different from a Latin letter A or a Greek letter Alpha. This is truly revolutionary.
For those who balk at the burden of having to put up with Egyptian hieroglyphs or Seal Script being encoded in Unicode, I can confidently tell you: It is very much a zero-cost abstraction, which for software people means: You only pay (in terms of processing time and storage) for what you use.
Personally I finished in December proof reading the transcription of an early Ming text for which there is a version on ctext.org and some modern normalized versions; almost all of the variants of the original print edition could be faithfully represented in Unicode, with the remaining few odd characters either missing or not yet available in the metadata I had at my fingertip at the time. The problem with normalizing texts like this is of course that not only every translation, but also every transcription is an interpretation, meaning that replacing all the "weird variants" that "nobody uses" and that "unnecessarily burden the standard" with their modern-day equivalents is an interpretation that strays further from the original than necessary, and maybe neglects variants that indeed were used for distinct purposes by some authors in the past. A good number of early German manuscript have that problem when they survived only in some well-meaning normalizing 19th-c edition but not the original: All the distinctions that specific letter forms or specific but rare letter combinations could have implied are forever lost for these documents. It is for reasons like this there is an ongoing Medieval Unicode Font Initiative (MUFI, https://folk.uib.no/hnooh/mufi/) that aims to add codepoints to Unicode that are relevant for the faithful storage of historical documents.
And let's not fall into the trap to think an encoding is useless because in this inscription and that inscription (using Seal script, using hieroglyphs, using medieval Textura) the characters look different anyway—it's not true, or rather, that's nothing new, like not at all new and happens all over the place with scripts modern and ancient. It's inevitable. First of all, the Unicode standard in the accompanying materials state quite clearly that the depictions of individual glyphs are only to be understood as suggestions needed for proper identification, not as literal molds to cast your fonts with (Chinese standards seem to get received that way which is why there are so many teaching materials from the PRC that insist on using the one-storied small letter a instead of the more legible and more common two-storied form: it happened to be used in the standard's publication). Yes, using the Shuowenjiezi as source for a script and/or a font faces the problem that the forms we find in that book and have reason to believe are close to the author's intent do look distinct from like all the other sources for Seal characters (see http://tonan.seesaa.net/article/519713878.html and related pages by 大熊肇). But it is also a culturally exceedingly eminent source, one that has been continuously quoted from for 2,000 years. But to decry the effort as useless because hey my characters look different misses the point entirely as it is always the specific font that determines how a given codepoint is rendered: assigning a codepoint just means we agree on the codepoints' semantics and the appearance of its holotype, not what a given font will make out of it.
AntC said,
January 15, 2026 @ 8:30 pm
Hangul syllables, which derive their basic shape from sinographs, …
The derivation (if any) of Hangul seems to be contentious — especially in Korea. The Indo-Tibetan hypothesis (which notes King Sejong was at least sympathetic to Buddhism) was promulgated in a movie 'The King's Letters'. This might be the most hated film in Korea, says Julesy in her latest YT. [And a few days before, Julesy reviewed it being kept alive as"Women's Script" — I've been waiting for Prof Mair to post them here.]
Chris Button said,
January 15, 2026 @ 9:21 pm
I'm still waiting for the right side of 暵.
~flow said,
January 16, 2026 @ 12:34 am
@Chris Button how about this one https://zi.tools/zi/%F0%A6%B0%A9?secondary=search
U+26c29 CJK Extension B
In the variants shown on the linked page it does look like the right one or a close match at least.
In addition to what I wrote above, let me say that I believe that any and all clear-cut components of already encoded characters should get their own codepoints, too, if only so dictionary authors can write their equivalents of Xu Shen's 从A从B derivations. Clearly the frequent and important character 漢 is composed of two parts; the Shuowen says the right hand side is from 難 shortened, and 難 in turn has 堇 as its phonophore. From what I could gather on zi.tools, 𦰩 U+26c29 fits that bill.
David Marjanović said,
January 16, 2026 @ 9:37 am
Or a ſ+ʒ ligature for that matter…
Yes, but fitting it into squares is undisputedly from Hanja.
Jonathan Smiths said,
January 16, 2026 @ 12:21 pm
Re: ~flow’s comments: the problem isn’t literal storage space or the fact that CJK comprises 2/3 and growing or whatever of Unicode (though maybe these are flags.) Neither are man-hours an issue — if you lika the jus, by all means squeeze. The problem (not an objection or call to action really) is conceptual.
The "digital" nature of writing systems precedes Unicode-type encoding: *conceptually*, writing systems are closed and manageably delimited "sets" of discrete, abstract "characters" which have varying "glyphic" incarnations and which exist in relation to real languages and textual corpora. But the CJK component of Unicode expands as if none of this matters — we are literally collecting squiggles.
Take ~flow’s examples: in the case of the early Ming text, previous standards are suggested to be insufficient for faithful representation of what might well be "variants […] used for distinct purposes by some authors in the past" — that is, what on some interpretations are "glyphic variants" are here better encoded as separate "holotypes." Differently, in the case of Seal Script inscriptions, the new standard is suggested to be sufficient for representation of variation of exactly the same (in reality of a more extreme) kind — that is, what might be separate "holotypes" are now to be seen as mere "variants."
So who can adjudicate? Or more to the point, why is adjudication impossible?
Why, because no one does or can have any idea what a "holotype" vs. a "variant" actually is because the other end of the would-be symbol — the language end — is here unknown, there mutable (despite the nonsense plugged into "kdefinition") and because corpora are not meaningfully bounded. Taken to its logical extreme (which it is), hungry digitization results — with respect to "characters" — in no meaningful reduction of analog input.
So I no lika the jus: I can’t think of a single situation where a Unicode representation of an early inscriptional character or text would be preferable to an image (now also generally digital in its own — vastly superior — way of course.)
Philip Taylor said,
January 16, 2026 @ 12:33 pm
All makes perfect sense, Jonathan, but when you reach your conclusion ("I can’t think of a single situation where a Unicode representation of an early inscriptional character or text would be preferable to an image") is not one possible answer "because Unicode provides an unambiguous way to refer to that character (or to include it in a text, if one has the appropriate font); what do you see as the equivalent mechanism if all "early inscriptional character[s]" were stored as images ?
Jonathan Smith said,
January 16, 2026 @ 1:03 pm
@Philip Taylor
Haha I'm afraid my "conclusion" is too extreme in more than one way — a more careful phrasing might be that Unicode-type representations are useful *only to the extent* one knows something/a lot about the character-set / language at issue. So e.g. I can search for or refer to trivial (?) stuff ("王") in a modern CJK representation of say an Oracle Bones text, but can't meaningfully search for or refer to less secure stuff (and often even "secure" stuff is less secure than I imagine.) I had better resort to images. Or to take ~flow's early Ming text, to the extent they modify the encoding to respect "variants" which might be important, they also undermine my ability to search it unless I am deeply "in the know" — Ctext is now full of such issues.
Chris Button said,
January 17, 2026 @ 7:20 am
@ ~flow
Thanks for trying. But the top component in (U+26c29), as I think you note above when you say "close match", is still different from the one in 暵
Chris Button said,
January 17, 2026 @ 7:32 am
I recall providing some materials on the two Pau Cin Hau scripts to a person wanting to include them in Unicode.
He managed to get the alphabetic script in.
But, despite a proposal, I don't think the "logographic" script has made it in yet.
Philip Taylor said,
January 17, 2026 @ 7:59 am
I totally mis-parsed Wikipedia's introduction to the Pau Cin Hai alaphabetic script, Chris. It reads :
which led me to believe that Pau Cin Hau was the creator of the Unicode block, whereas he was (of course) actually the founder of the Laipian religion. Now the prose does indeed say this, but I think it would be a lot clearer to the naïve reader (such as myself) if the "which was" were omitted, and thus if the prose were to read ::
What do you think ? And yes, I know I could just edit the page, but I would very much welcome an expert opinion before even considering doing so …
~flow said,
January 17, 2026 @ 8:12 am
I disagree with the assessment that Unicode is being reduced to mere squiggles when we strive to add the (near) totality of what has come down to us in the form of unearthed artifacts and documents from centuries long past. I also do not agree with the sentiment that using true-to-the-source variant characters undermines the ability to search (and find!, importantly) in such documents. It is in both cases quite the opposite.
Especially when you look at the oracle bone inscriptions I think the conclusion is that many, but by far not all characters (and orthographic means) found on these finds have direct descendants in modern Chinese writing, so, apart from there being many instances where the experts have not agreed on an interpretation, while you can partially transcribe sections of many OBIs using modern CJK characters (and more meaningfully so than into any other modern writing system), there's always a remainder of signs where assigning a modern equivalent would be misleading. I think we can agree that like Etruscan and Eastern Classical Greek are ancestral but separate writing system to the Roman script, so OBI script is distinct from modern CJK and should deserve its own encoding. I say that without touching either the question whether such an encoding should become part of Unicode (I do, but that's not my point), or whether that tells us anything about how to handle Seal script (I guess it doesn't).
The same consideration that apply to OBI also apply to a book printed a few centuries ago. In that text I mentioned above the replacements I made include:
于▶於, 制▶製, 為▶爲, 弦▶絃, 葉▶叶, 眾▶衆, 款▶窽, 橐答▶槖籥, 為▶謂, 準▶凖, 真▶眞, 參▶㕘, 逆▶𨒫, 淺▶欲, 松▶鬆
Some of these are down to simple errors on the part of the person who did the original transcription that I worked from. In other cases, we just another way of writing what is undoubtedly the same word (or morpheme as the case may be). Yet one can not discount the consideration that in replacements like 制▶製, 弦▶絃, 葉▶叶, 于▶於, 松▶鬆 there is in fact some intentional choice on part of the author—or maybe they were only following the orthographic habits of their time and place, which would be interesting in its own right.
As for the difficulty of searching a text with encoded variants: You are not wrong but as someone who's advocating the use of images rather than encoding this is a, shall we say, remarkable hill to die on. First, no encoding of any text for the purposes of storing it on a hard drive in a retrievable way has ever prevented anyone from adding pictures to the data collection. Never. It's a complete strawman to say encoding is futile, it's all squiggles, we need pictures, fire everybody who's doing the encoding work, we do photos and line art. We can and should do all of that when documenting, discussing, preserving artifacts, ancient and recent.
Yes, searching a non-normalized text is more difficult to do than searching a normalized text, and everybody knows that. That's why in some countries some parts of names (eg Mc, Mac, …) get normalized so as to better find them in the phone book. Data normalization is a well-known everyday technique that is applied like, everywhere. Mayn search engines can do it with Simplified and Traditional Chinese, and some can find text written in CJK from Pinyin-only queries as well. It's not like this is a problem with no known. practical solutions.
As a concluding remark let me add that while you can always normalize a given text in the sense that you can remove distinctions deemed irrelevant for search, for example you reduce all McKennys and MacKennies to just 'mackenny' (normalizing vowel use, letter case, and so on) or whatever is closest to the users' expectations. What you cannot do, on the other hand, is to add lost detail to a normalized text in the absence of whatever was your data source. You can always normalize a version of a text by replacing all occurances of 製 by 制 to make it more accessible (more searchable). But if you regard all occurrences of 製 as squiggles, maybe worth a picture but not an encoding, you both do not do the work of identifying the characters (which as you rightly say we conceptualize as distinct entities with their own idiosyncratic realizations) correctly and also keep people from searching your data collection for that variant orthography other than browsing through photo albums. I think we can do better.
Tom Bishop said,
January 17, 2026 @ 10:56 am
People have made some valid points here.
I don’t think anybody could reasonably be expected to say exactly where to draw the line about what characters should be encoded. There’s no simple formula and there’s no single authority. A cost-benefit analysis would be very difficult and subjective. One factor is, who’s willing to do the many kinds of hard work needed to get a set of characters encoded.
I heard that some people proposed encoding the characters in “Book of the Sky” by Xu Bing, but I don't know if they were seriously wanting to use them, or were just proposing it to make a point. If they want it badly enough, and are numerous/vocal/persistent enough, maybe they could succeed, who knows! It sounds fun, but life is short. People have worked pretty hard to get Klingon script encoded, but so far they are still using private-use code points.
There are working groups who study proposals they receive, and then decide what, if anything, to do with them, based on various criteria. For example, they probably won’t agree with encoding a sinograph that varies from an already-encoded one only in the precise thickness or curvature of a stroke, if it’s only a stylistic variant without any clear semantic distinction.
Chas Belov said,
January 18, 2026 @ 12:36 am
Interesting to read this discussion as someone who virtually never has use for rarely used scripts or obscure variations, but who has spent hours poring through Unicode code charts and is deeply disappointed when my Mac's symbol viewer displays tofu in place of a Unicode glyph.
@Chris Button: ¿What is the concern regarding 暵 (U+66B5)?
~flow said,
January 18, 2026 @ 6:03 am
@Chas Belov You should definitely have a look at BabelMap (https://www.babelstone.co.uk/Software/BabelMap.html current screenshot shows emoji but it comprises all of Unicode, currently at v16 with v17 being supported by a beta version). It's a Windows executable that I run with Wine on Linux. One great feature is its configurable font display: you can select a separate font for each Unicode block from a list that also tells you how many codepoints each font provides. The site also advertises Babelstone Han (https://www.babelstone.co.uk/Fonts/Han.html), a font with over 60,000 CJK characters which should provide a good baseline for configuring your system default fonts. Speaking of which, it's a total mess under Linux with a completely botched, misguided system software called fontconfig which is hardly documented at all and is apparently only there so people can pretend "you can configure your own default fonts", which you, meaningfully and without spending a year solely on this issue, can't. I once in 50 years managed one single computer (a Windows machine at the time) to have a satisfactory system-level font configuration with all of CJK covered; I currently try to make do with the terminal, my text editor, and web pages that I have control over display correct fonts. Speaking of which, not only can you produce HTML pages that display text using your choice of fonts using CSS Unicode ranges, you can also inject those CSS rules into any web page in your browser using an extension (I currently use "Magic CSS editor"). Other than that I have no idea how things are run on MacOS which I ditched for good some years ago (mainly because it's "approximately Linux but not quite Linux", with a parade of annoyingly outdated basic system commands), but ideally dropping a font like Babelstone Han into your local font folder should do the trick, like almost, mostly.
~flow said,
January 18, 2026 @ 6:23 am
As for the Xu Bing Book of the Sky characters, they stand little chance of being included in Unicode, ever, because they are not used as part of a writing system, more like fantasy words invented for a poem. I'd add that emoji are IMHO already on the fringe of what I'd allow in Unicode but maybe its a good thing for popularizing Unicode. On the upside, many emoji are composed by multiple codepoints beyond U+ffff (so outside the Basic Multilingual Plane) which makes them great for testing correctness of text processing tools, if nothing else.
Coming to think of it, the way emoji are handled by modern operating systems is very much I think CJK characters should be handled, by which I mean that we should have ways to display Ideographic Description Sequences as single characters, so you could encode ⿰土⿱田介 to get 堺 displayed dynamically. IDS formulas are not unique (meaning there are often multiple ways to encode the same, e.g. 堺 could also be encoded as ⿰土界), but they can be normalized (circling back to the subject of an earlier post), enabling better search. Dissemination of such a device would IMO also lessen the pressure on the Unicode Committee to encode more rare characters and more variants. FWIW Hangeul and—guess what—Egyptian hieroglyphs already work like that (even if I have yet to see a working implementation for hieroglyphs outside of one single specialized text editor).
Chris Button said,
January 18, 2026 @ 7:45 am
@ Chas Belov
Not just 暵. I was using that as an example of the right phonetic component in all such characters that is not available standalone in unicode.
Jonathan Smith said,
January 19, 2026 @ 12:43 pm
Well…far from "fire everybody", paraphrase of my thoughts above would be rather "go on with your bad selves." And I don't know about what "everybody knows"… but my sense is that no, "everybody" does not have a good sense of the Chinese-specific dimension of this problem. So final thoughts FWIW —
In Chinese-type writing, unlike e.g. alphabet writing with its stark dual patterning, it is the the many thousands of ≈word-mapping elements ("characters") which are primary and within which glyphic variation is irrelevant: so if confronting, e.g., (wildly variable!) handwriting in contemporary Chinese, we need only know/code abstractly that "this is a '參', a '為', etc."
But as we get further and further away from languages/mappings we know (note my personal interest is largely early inscriptions etc.), we increasingly lack grounds for establishing what kind of variation is or isn't (literally) meaningful.
E.g., ~flow's "㕘" is one of 30+ so-called "Yiti" versions of "參" collected in resources like MOE's 異體字字典. But why only 30 something? There are not discrete types of "參", even on attempted descriptions in terms of component parts and their relative sizes/positions/etc.: characters are conceptual gestalts; variation is unconstrained. To large extent, it's handwriting.
Maybe compare the ampersand/"et" ligature: I see the Uniciode block "Ornamental Dingbats" has the following (which might not display): U+1F670 SCRIPT LIGATURE ET ORNAMENT; U+1F671 HEAVY SCRIPT LIGATURE ET ORNAMENT; U+1F672 LIGATURE OPEN ET ORNAMENT; U+1F673 HEAVY LIGATURE OPEN ET ORNAMENT; U+1F674 HEAVY AMPERSAND ORNAMENT; U+1F675 SWASH AMPERSAND ORNAMENT
So a lot of CJK expansion amounts to going through historical handwritten manuscripts in early Englishy (?) languages and expanding sets like this (but larger) to account for "variation." But there are not discrete types of "&"; you can find as many as you like.
Then multiply this by every Chinese character (an open set of course.) And also — we note grimly that people engaged in these projects aren't in general really concerned with whether some variant was or wasn't "used for distinct purposes by some authors in the past" (if that were possible to figure out), but rather think it is cool to be able to display the correct-looking decorative "et".
Chas Belov said,
January 20, 2026 @ 12:58 am
@~flow: Thank you for the ref. I'm actually on a Mac and gave up on Wine years ago. Guess I'll just have to keep perusing the code charts.
@Chris Button: Ah, I see. Thank you for the clarification.
~flow said,
January 20, 2026 @ 10:44 am
@Chas Belov Yeah I guess Wine is more straightforward on Linuxes than on Macs; other than that I used to use virtual desktops in the past (Virtual Box, Parallels Desktop); those also have a mode where you can run an app without a desktop (it would just occupy whatever rectangular area the application window needs).