The ultimate Chinese character input method
« previous post | next post »
Never mind that it doesn't work, this is the supreme pipe dream for inputting Chinese characters on electronic communication and information processing devices. Of the many thousands of Chinese character inputting systems (see also here and here) that have been devised, some work fairly well and some barely function at all, but this one has to take the cake for being the most ridiculous of all. It is all the more preposterous that initially it was intended for smartwatches with their tiny glass surfaces.
The name of the system gives it away, that is, yībǐyīzì 一筆一字 ("one stroke one character").
Since the average Chinese character has twelve strokes, and many characters have twenty or more strokes, it would be utterly impossible to input the thousands of different characters with just one stroke.
You can find an index to the characters by total stroke count here.
Here's a bilingual introduction to "Ibeezi". Note that it doesn't really tell you how the system works, and it doesn't give any examples.
The official site of "Ibeezi" contains a video purporting to show you how to use the application.
Here's a comment by a gullible graduate student from China who almost got snookered:
I tried the application myself, but haven't been able to master it. Some words can be really hard to find.
It seems it's a combination of pinyin input and radical input, and I haven't found a way to input the characters by one stroke.
'Nuff said.


Keith said,
December 28, 2015 @ 3:10 am
According to the article, for existing input methods either
Is this true?
Vilinthril said,
December 28, 2015 @ 3:25 am
I've been taught a stroke-based lookup mechanism, which classifies the characters by basic shape and then orders them by stroke count. Seems to work quite well.
leoboiko said,
December 28, 2015 @ 5:29 am
> yībǐyīzì 一筆一字 ("one stroke one character").
Ohh, I think I've seen that method! It's called 草書, right? #rimshot
flow said,
December 28, 2015 @ 8:53 am
Not quite sure where the number 12 as the average of strokes in Chinese characters comes from, but I'd say this number is a bit too high. Chih-Hao Tsai (http://technology.chtsai.org/charfreq/) gives the following estimates:
Frequency-weighted average number of strokes:
For the most frequently used 2,965 characters: 9.10;
For the most frequently used 1,253 characters: 8.91;
For the most frequently used 733 characters: 8.65.
This is for 'traditional' character forms as encoded in the Big5 scheme; PRC simplified characters should have even less.
To appreciate these numbers, one must know what counts as 'one stroke'; in this case, the most common way of counting was used, which has been one of the pillars of the "radical plus strokecount" system that became widespread due to the Kangxi dictionary. Common as the system is, one must say that it does count complex strokes (bending strokes) as single units, so e.g. 乙 is, surprisingly, a single stroke as one does not lift the pen intermittently. Talking about ease of writing and visual complexity, this should probably be counted as three units; hence, the above figures would have to be somewhat bigger.
Randy Alexander said,
December 28, 2015 @ 10:25 am
This shows how the Chinese input is done:
http://ibeezi.com/quick-start-guide/
leoboiko said,
December 28, 2015 @ 10:40 am
@Randy Alexander: Judging from the manual, the input methods seems to be a fairly straightforward application of the "ring menu" concept (aka "pie menu", "radial menu" etc.), as in the 1993 videogame Secret of Mana, using both pīnyīn and component information to narrow down the hànzì. If, instead of all that fluff, they had just said "a radial-menu approach to Chinese input", I'd had grokked it instantly.
Victor Mair said,
December 29, 2015 @ 12:44 am
Many years ago, I looked up statistics on the average number of strokes per Chinese character, and the figure I derived from many sources hovered around 12.
I'm travelling right now, so I can't check in my books at home or in the office to tell you exactly which sources I used that gave those figures. I recall, however, that they were computed on the basis of a total of around 8,000 to 12,000 characters.
I think that computing the average number of strokes per character on the basis of a very small subset of the characters, e.g., between 700 and 3,000 characters (figures which are often cited in these discussions) is quite misleading for a variety of reasons. One reason why such low figures are misleading is that full literacy requires mastery of, or at least familiarity with, more than those low numbers of characters. Another reason is that the most frequent 700 to 3,000 characters in general have fewer strokes than less frequent characters (those above 3,000 in frequency lists), while very low frequency characters tend to have many strokes. See:
"Complexity of Chinese Characters", by Tatsuo TOGAWA, Kimio OTSUKA, Shizuo HIKI, and Hiroko KITAOKA.
http://www.scipress.org/journals/forma/pdf/1504/15040409.pdf
To gain a fair and reasonable appreciation of the average number of strokes in Chinese characters, one should work with a body of at least 8,500 characters, which is roughly the amount in Xīnhuá zìdiǎn 新华字典 (trad. 新華字典) (New China character dictionary), which is the standard character dictionary for students in China (it has sold over 400,000,000 copies). See:
"A movie about a dictionary" (7/9/15)
http://languagelog.ldc.upenn.edu/nll/?p=19906
If you go above 20,000 to 40,000 characters, the average number of strokes per character becomes progressively larger than 12. I also recall that, even if you add in all the simplified characters, the average number of strokes per character will not be much below 11.5 if you have a base of 10,000 or so characters.
Because I'm unable at the present time to consult my library at home, Stephan Stiller kindly compiled the following data, which are based on a total of 9,933 characters and are highly instructive in the light of our present discussion:
======
With frequencies from Jun Da's "modern Chinese" frequency data
http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO
(amended by merging compatibility character U+F9E7 into U+88CF (裏))
and strokes from Unihan's kTotalStrokes field (data downloaded in 2009), I get the following frequency-weighted average stroke counts, counting only characters from 1 to n:
n=1000: 6.98
n=2000: 7.20
n=3000: 7.28
n=4000: 7.30
n=5000: 7.31
n=6000: 7.31
n=7000: 7.32
n=8000: 7.32
n=9000: 7.32
n=9932: 7.32
Jun Da's list includes simplified characters in the beginning (and oddly some unsimplified ones towards the end), but those low-frequency characters don't seem to make a difference in the calculation. It doesn't make sense to go much below 3000 in the calculation. So, for simplified characters, 7.3 is a good number to quote…. For traditional characters, I would expect the count of 9.1 to not change much if we go higher, because the analogous number 7.3 doesn't change much for simplified characters beyond n=3000 or n=4000.
The same calculation that I just did (for simplified characters) yields the following unweighted average stroke counts:
n=1000: 8.05
n=2000: 8.94
n=3000: 9.60
n=4000: 10.11
n=5000: 10.47
n=6000: 10.77
n=7000: 11.15
n=8000: 11.62
n=9000: 12.08
n=9932: 12.44
Counts for traditional characters will be a bit higher.
So this is probably where the number 11.98 comes from. The numbers are ever-increasing because they're not frequency-weighted. I can't tell whether your source was based on simplified or traditional characters, since (given that the numbers don't converge on a limit), things really depend on the cutoff n that was used.
======
The latter figures are clearly what I was thinking of. If we want to get an idea of the average number of strokes per character, I do not think that we should begin by weighting our calculations toward a relatively small number of high frequency characters.