Handwriting recognition
« previous post | next post »
The phys.org website has a new article that piqued my interest:
"96.7% recognition rate for handwritten Chinese characters using AI that mimics the human brain" (9/17/15)
It begins:
Fujitsu today announced the development of the world's first handwriting recognition technology by utilizing AI technology modeled on human brain processes to surpass a human equivalent recognition rate of 96.7%, that was established at a conference. Fujitsu had previously achieved top-level accuracy in this field, as demonstrated by taking first place, with a recognition rate of 94.8%, at a handwritten Chinese character recognition contest held at the International Conference on Document Analysis and Recognition (ICDAR), a top-level conference in the document image processing field.
However, in order to further increase recognition accuracy, a new mechanism for studying the diversity of character deformations was required. Now, with a focus on a hierarchical model of expanded connections between neurons, a model based on the human brain which grasps the features of the characters, Fujitsu has developed a technology to automatically create numerous patterns of character deformation from the character's base pattern, thereby "training" this hierarchical neural model….
Ordinarily, while humans can easily recognize media such as characters, images and sounds, for computers this recognition is much more difficult, due to both the many variations in shape, brightness and so on of the object to be recognized, as well as the existence of similar objects. This has become a central problem in artificial intelligence research. Fujitsu has decades of experience in character recognition, with commercialized technologies used in such areas as Japan's finance and insurance fields for Japanese language, as well as a Chinese character recognition technology used by the Chinese government for 800 million handwritten census forms. Fujitsu started research using artificial intelligence based on deep learning for character recognition in 2010….
I have many questions:
1. How does this 96.7% recognition rate for handwritten Chinese characters compare with voice recognition?
2. Would it work equally well for the recognition of other types of handwriting than Chinese?
3. Is the current level of recognition of any practical utility?
4. This is a type of OCR, is it not? Does it differ from already established OCR paradigms and techniques in significant ways?
5. Is the use of the terms "neurons" and "neural" metaphorical or literal?
This research is undoubtedly interesting and may lead to useful applications, but I wonder how far they can go with it, and what the major obstacles confronting them are.
[h.t. Carolyn Lye]
Dave Orr said,
September 19, 2015 @ 10:28 am
When he refers to neurons, this is a reference to neural networks, a standard machine learning technique that has advanced quite a bit in recent years. Neural networks have revolutionized speech recognition in recent years, and are making headway in other tasks as well.
Handwriting recognition is good enough for a variety of applications, like automatic check processing at ATM's or address recognition for the postal service. I have no idea of any of that is good enough in Chinese.
What seems to be interesting here is that he took his training data and found a technique to greatly expand it by automatically creating variations of characters. This sort of automatic generation of training data has been used to great effect in other nn applications, which works because neural nets have a huge model capacity and can learn from a lot of data.
AI that mimics the human brain is really overselling this though.
Josh said,
September 19, 2015 @ 11:18 am
The "modeled on human brain processes" part sounds like a journalist who's never heard of a neural network before trying to summarize half an hour of Wikipedia reading. I don't know where they got "world's first handwriting recognition technology" from.
For comparison, this site claims an accuracy of "over 97 percent" by character for (I assume) English handwriting OCR.
In theory, the system from the article would only have to be fed new training data to work with other writing systems, but as Dave says the novelty here is the automated variation in the samples, which I'd guess would be far more difficult to adapt to another system because the bits of the character you're allowed to vary without it becoming another character may be different.
Douglas Bagnall said,
September 20, 2015 @ 12:13 am
When I want to read about handwriting recognition using recurrent neural networks, I look to see what Alex Graves has been up to at http://www.cs.toronto.edu/~graves/ and http://arxiv.org/find/cs/1/au:+Graves_A/0/1/0/all/0/1. It seems he has moved on to slightly different things, but the old papers are still there.
Alexis Van Gestel said,
September 23, 2015 @ 5:49 am
However interesting, this sounds like a battle for the improvement of a technology on its way to obsolescence , especially when it comes to writing Chinese characters.
I will explain this with a simple example. When using handwriting recognition technologies for encoding Chinese characters, one needs to draw all the strokes of the character, which are often numerous, no matter how fast one "draws" them.
A recent (touch or) stroke-based technology from a startup called iBeezi (see their website: http://www.ibeezi.com), allows to reach almost any Chinese character with a very limited number of strokes, following an intuitive logic; meaning that 1) this technique is much faster than current handwriting recognition technologies, 2) it allows for quick muscle memory building and thus quick mastering of the technique, and 3) it has the unparalleled advantage of being 100% accurate.
Furthermore, handwriting recognition technologies are also limited by the quality and the size of the touch sensitive screens they are used on, whereas the technology developed by iBeezi is compatible with virtually any screen size and quality, thanks to the simplicity of its interface and encoding logic.
As to the comparison of written and spoken communication technologies, my personal belief is that spoken communication is indeed very time and effort efficient, but – and this is an important one – not appropriate for use in a large number of use cases. Therefore, my belief is that written and spoken communication methods are complements to each other rather than competitors/substitutes.
Keith Trnka said,
September 28, 2015 @ 6:51 pm
Speech recognition tends to be around 5-10% word error rate for low-noise datasets and that's probably without much variation in pronunciation. I don't know anything about this competition so I can't say how much variation is allowing in sloppiness of characters and additional noise.
Current levels of recognition are sometimes used for mobile keyboard input instead of pinyin. I've heard it's more popular with older and/or rural folks cause it's easier to remember. On Android, Swype has it and Google has a downloadable keyboard with it. Probably others too. Also handwriting recognition is seeing adoption in the automotive space (don't know why though).
Generally handwriting recognition gets the timing information of the strokes so they're separated. It's much easier because you have a reliable way to tell 5 vs 6 strokes for instance. A system might use timing to tell the completion of a character so they're separated too. In OCR the system only has an image so it can be harder.
Yao Liu said,
October 1, 2015 @ 12:01 pm
Both iOS and Android have built-in Chinese handwriting input that are pretty good (don't have any figure to compare with the 96.7%). I don't think the number of strokes matters, as I can do cursive, but the order of strokes is important (I know someone who writes 成 with an unconventional order, and he could never get it on his iPad until I showed him the "correct" order).
Though not as fast as pinyin (especially with long common phrases which also beat English), I'd prefer to handwrite whenever I can. Not only is the new generation rapidly losing the ability to write characters, we already have a hard time reading manuscripts from 100 years ago that are mildly cursive, which is a problem on top of the unfamiliar lexicon. Well, one may expect that the technology will one day be able to read all preserved manuscripts by means of "big data" (and human help, as in google translate).