Language Log

The ultimate Chinese character input method

December 27, 2015 @ 11:38 pm · Filed by Victor Mair under Information technology, Writing systems

Never mind that it doesn't work, this is the supreme pipe dream for inputting Chinese characters on electronic communication and information processing devices. Of the many thousands of Chinese character inputting systems (see also here and here) that have been devised, some work fairly well and some barely function at all, but this one has to take the cake for being the most ridiculous of all. It is all the more preposterous that initially it was intended for smartwatches with their tiny glass surfaces.

The name of the system gives it away, that is, yībǐyīzì 一筆一字 ("one stroke one character").

Since the average Chinese character has twelve strokes, and many characters have twenty or more strokes, it would be utterly impossible to input the thousands of different characters with just one stroke.

You can find an index to the characters by total stroke count here.

Here's a bilingual introduction to "Ibeezi". Note that it doesn't really tell you how the system works, and it doesn't give any examples.

The official site of "Ibeezi" contains a video purporting to show you how to use the application.

Here's a comment by a gullible graduate student from China who almost got snookered:

I tried the application myself, but haven't been able to master it. Some words can be really hard to find.

It seems it's a combination of pinyin input and radical input, and I haven't found a way to input the characters by one stroke.

'Nuff said.

December 27, 2015 @ 11:38 pm · Filed by Victor Mair under Information technology, Writing systems

Permalink

12 Comments

Keith said,

December 28, 2015 @ 3:10 am

According to the article, for existing input methods either

you have to memorise either the traditional stroke orders, or Pinyin, the phonetic forms, to be able to type a Chinese character.

Is this true?
Vilinthril said,

December 28, 2015 @ 3:25 am

I've been taught a stroke-based lookup mechanism, which classifies the characters by basic shape and then orders them by stroke count. Seems to work quite well.
leoboiko said,

December 28, 2015 @ 5:29 am

> yībǐyīzì 一筆一字 ("one stroke one character").

Ohh, I think I've seen that method! It's called 草書, right? #rimshot
flow said,

December 28, 2015 @ 8:53 am

Not quite sure where the number 12 as the average of strokes in Chinese characters comes from, but I'd say this number is a bit too high. Chih-Hao Tsai (http://technology.chtsai.org/charfreq/) gives the following estimates:

Frequency-weighted average number of strokes:

For the most frequently used 2,965 characters: 9.10;
For the most frequently used 1,253 characters: 8.91;
For the most frequently used 733 characters: 8.65.

This is for 'traditional' character forms as encoded in the Big5 scheme; PRC simplified characters should have even less.

To appreciate these numbers, one must know what counts as 'one stroke'; in this case, the most common way of counting was used, which has been one of the pillars of the "radical plus strokecount" system that became widespread due to the Kangxi dictionary. Common as the system is, one must say that it does count complex strokes (bending strokes) as single units, so e.g. 乙 is, surprisingly, a single stroke as one does not lift the pen intermittently. Talking about ease of writing and visual complexity, this should probably be counted as three units; hence, the above figures would have to be somewhat bigger.
Randy Alexander said,

December 28, 2015 @ 10:25 am

This shows how the Chinese input is done:

http://ibeezi.com/quick-start-guide/
leoboiko said,

December 28, 2015 @ 10:40 am

@Randy Alexander: Judging from the manual, the input methods seems to be a fairly straightforward application of the "ring menu" concept (aka "pie menu", "radial menu" etc.), as in the 1993 videogame Secret of Mana, using both pīnyīn and component information to narrow down the hànzì. If, instead of all that fluff, they had just said "a radial-menu approach to Chinese input", I'd had grokked it instantly.
Victor Mair said,

December 29, 2015 @ 12:44 am

Many years ago, I looked up statistics on the average number of strokes per Chinese character, and the figure I derived from many sources hovered around 12.

I'm travelling right now, so I can't check in my books at home or in the office to tell you exactly which sources I used that gave those figures. I recall, however, that they were computed on the basis of a total of around 8,000 to 12,000 characters.

I think that computing the average number of strokes per character on the basis of a very small subset of the characters, e.g., between 700 and 3,000 characters (figures which are often cited in these discussions) is quite misleading for a variety of reasons. One reason why such low figures are misleading is that full literacy requires mastery of, or at least familiarity with, more than those low numbers of characters. Another reason is that the most frequent 700 to 3,000 characters in general have fewer strokes than less frequent characters (those above 3,000 in frequency lists), while very low frequency characters tend to have many strokes. See:

"Complexity of Chinese Characters", by Tatsuo TOGAWA, Kimio OTSUKA, Shizuo HIKI, and Hiroko KITAOKA.

http://www.scipress.org/journals/forma/pdf/1504/15040409.pdf

To gain a fair and reasonable appreciation of the average number of strokes in Chinese characters, one should work with a body of at least 8,500 characters, which is roughly the amount in Xīnhuá zìdiǎn 新华字典 (trad. 新華字典) (New China character dictionary), which is the standard character dictionary for students in China (it has sold over 400,000,000 copies). See:

"A movie about a dictionary" (7/9/15)

http://languagelog.ldc.upenn.edu/nll/?p=19906

If you go above 20,000 to 40,000 characters, the average number of strokes per character becomes progressively larger than 12. I also recall that, even if you add in all the simplified characters, the average number of strokes per character will not be much below 11.5 if you have a base of 10,000 or so characters.

Because I'm unable at the present time to consult my library at home, Stephan Stiller kindly compiled the following data, which are based on a total of 9,933 characters and are highly instructive in the light of our present discussion:

======

With frequencies from Jun Da's "modern Chinese" frequency data

http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO

(amended by merging compatibility character U+F9E7 into U+88CF (裏))
and strokes from Unihan's kTotalStrokes field (data downloaded in 2009), I get the following frequency-weighted average stroke counts, counting only characters from 1 to n:

n=1000: 6.98
n=2000: 7.20
n=3000: 7.28
n=4000: 7.30
n=5000: 7.31
n=6000: 7.31
n=7000: 7.32
n=8000: 7.32
n=9000: 7.32
n=9932: 7.32

Jun Da's list includes simplified characters in the beginning (and oddly some unsimplified ones towards the end), but those low-frequency characters don't seem to make a difference in the calculation. It doesn't make sense to go much below 3000 in the calculation. So, for simplified characters, 7.3 is a good number to quote…. For traditional characters, I would expect the count of 9.1 to not change much if we go higher, because the analogous number 7.3 doesn't change much for simplified characters beyond n=3000 or n=4000.
The same calculation that I just did (for simplified characters) yields the following unweighted average stroke counts:

n=1000: 8.05

n=2000: 8.94
n=3000: 9.60
n=4000: 10.11
n=5000: 10.47
n=6000: 10.77
n=7000: 11.15
n=8000: 11.62
n=9000: 12.08
n=9932: 12.44

Counts for traditional characters will be a bit higher.
So this is probably where the number 11.98 comes from. The numbers are ever-increasing because they're not frequency-weighted. I can't tell whether your source was based on simplified or traditional characters, since (given that the numbers don't converge on a limit), things really depend on the cutoff n that was used.

======

The latter figures are clearly what I was thinking of. If we want to get an idea of the average number of strokes per character, I do not think that we should begin by weighting our calculations toward a relatively small number of high frequency characters.
APOLLO WU said,

December 29, 2015 @ 10:16 pm

I wonder if yibiyizi utilizes the Pinyin initial input method. Sogo has such an approach, for example, input dht you get 大会堂 and other options; sqsj can be readily converted to 山穷水尽；zg provides 中国、这个、最高、战国。。。options，each option can be selected by keying in the numeral besides it.
flow said,

December 30, 2015 @ 5:09 pm

I got interested whether I could pull out some useful strokecount and frequency data from my database and have published the preliminary results as https://github.com/loveencounterflow/jizura-cjk-strokecounts. TL;DR: more frequent characters tend to have fewer strokes, and, indeed, the average strokecount of around 15,000 characters used in the PRC, Japan, and/or Taiwan is 12.7 according to this.
James Bradbury said,

December 31, 2015 @ 2:32 am

I’ve long had an idea similar to this one that would actually allow for the vaunted 一筆一字 (unlike this, which requires four taps per character). The keyboard display would have one button per pinyin initial, with more common initials located more centrally and with bigger buttons. To input a syllable, the user would swipe from the initial in a direction corresponding to the medial glide (i, u, ü, or none), and finish the "stroke" with a shape corresponding to the final. I even have sketched out somewhere a whole arrangement of stroke patterns that I think might work. This would all be in addition to a strong predictive language model, so that the user can input the most likely next character with a given initial simply by tapping the initial and select less-likely characters with a menu similar to existing pinyin input schemes.
ibeezi webmaster said,

January 2, 2016 @ 6:42 am

Thanks to all for your comments on iBeezi method. Actually, here is how it works.
iBeezi patented algorithm consists of the maximum four following steps in a combination of phono-semantic steps.
Two first pinyin steps (phonetic steps) where one will select the pinyin phonem by inputting successively the initial and final phonetic components (phonetic step) then step three where the user will select the head of Chinese radical partitions to which the targeted CC belongs and accordingly, step four is presenting a reduced list of possible final CC to choose from, steps 3 & 4 being semantic steps. The fact that any CC can be reached in a max of 4 steps has been made possible thanks to the iBeezi unique algorithm.
All sub-steps are arranged in such a way that only 6 selections are possible: these selections being a button or its equivalent direction. Hence, for each CC the iBeezi method provides a unique and guided path to the targeted CC that will consequently always be found at the same geometrical position (hence allowing building up muscle memory). By traversing this path through a continuous movement with only direction changes and then lifting up the finger when the targeted CC is found allows to claim the "one stroke, one character".
This unique keyboard concept is available free for testing on iPhone, Android phone and Android Wear and currently available in button mode only. Stroke mode will be released soon. Do not hesitate to test it and comment back to us.
The iBeezi team.
Stephan Stiller said,

January 3, 2016 @ 10:03 pm

Now with unweighted average stroke counts for traditional characters:
1 to 1000:     8.05  10.34
1 to 2000:     8.94  11.13
1 to 3000:     9.60  11.67
1 to 4000:   10.11  12.06
1 to 5000:   10.47  12.39
1 to 6000:   10.77  12.65
1 to 7000:   11.15  12.92
1 to 8000:   11.62  13.24
1 to 9000:   12.08  13.54
1 to 9932:   12.44  13.77
The middle column shows the previously calculated numbers for simplified characters. The right-hand column shows stroke counts for traditional characters. The traditional characters used in the calculation were converted from the simplified ones from the same source. That is, the table reflects real Mainland usage for simplified characters and imaginary Mainland usage for traditional characters. (I am assuming that Jun Da's frequency data used Mainland-Chinese/simplified sources; see here/here, Table 2.)

Because the original list of "simplified" characters already contained some traditional characters (at higher rank numbers), the stats for higher ranks are not as representative. Anyways, it's fair to say that there are on average about 2 more strokes for traditional-character writing.

RSS feed for comments on this post

The ultimate Chinese character input method

12 Comments

Keith said,

Vilinthril said,

leoboiko said,

flow said,

Randy Alexander said,

leoboiko said,

Victor Mair said,

APOLLO WU said,

flow said,

James Bradbury said,

ibeezi webmaster said,

Stephan Stiller said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta