Exotic letter in Taipei
Paul M. sent in this photograph of the front of a fashion shop on Yongkang Street, Da’an District, Taipei City, Taiwan:
Read the rest of this entry »
Paul M. sent in this photograph of the front of a fashion shop on Yongkang Street, Da’an District, Taipei City, Taiwan:
Read the rest of this entry »
Photograph of a campaign billboard in Taiwan showing President Tsai Ing-wen, who is up for reelection on January 11, with one of her two beloved cats:
Read the rest of this entry »
Following up "Shelties On Alki Story Forest" (11/26/2019) and "The right boot of the warner of the baron" (12/6/2019), here's some recent testimony from engineers at Google about the brittleness of contemporary speech-to-text systems: Arun Narayanan et al., "Recognizing Long-Form Speech Using Streaming End-To-End Models", arXiv 10/24/2019.
The goal of that paper is to document some methods for making things better. But I want to underline the fact that considerable headroom remains, even with the massive amounts of training material and computational resources available to a company like Google.
Modern AI (almost) works because of machine learning techniques that find patterns in training data, rather than relying on human programming of explicit rules. A weakness of this approach has always been that generalization to material different in any way from the training set can be unpredictably poor. (Though of course rule- or constraint-based approaches to AI generally never even got off the ground at all.) "End-to-end" techniques, which eliminate human-defined layers like words, so that speech-to-text systems learn to map directly between sound waveforms and letter strings, are especially brittle.
Read the rest of this entry »
The vast majority of people, both inside and outside of China, input characters on cell phones, computers, and other electronic devices via Hanyu Pinyin or other phonetic script. Naturally, this has had a huge impact on the relationship between users of the Chinese script and their command of the characters, since they are no longer directly writing the characters through neuro-muscular coordination and effort. Instead, their electronic devices do the writing of the characters for them by converting the Pinyin or other phonetic inputting to the desired characters, resulting in the widely lamented phenomenon of "character amnesia", which we have touched upon in dozens of LL posts.
There has in recent years been a lot of stuff and nonsense bandied about concerning how Chinese character inputting led to the development of predictive typing, whereas the actuality is that the extreme cumbersomeness of the Chinese writing system necessitated the development of one kind of predictive typing (other predictive algorithms were already in use long before) to rescue the characters from hasty extinction.
Read the rest of this entry »
In the discussion of Boris Johnson's misperceived phrase ("Was it 'people of colour' or 'people of talent'?", 12/6/2019), several people expressed the opinion that "people of talent" is an unexpected way to refer to the group that he wants to welcome. Thus Rose Eneri:
My question is why does Mr. Johnson use such as odd phrase. Why does he not say, "talented people" or "people with skills we need?" I don't know of any other use of the phrase, "people of…" This fracas demonstrates the perils of using one.
Actually there are quite a few other possible values for X in "people of X", where the phrase means something like "people who have X": faith, goodwill, conscience, influence, integrity, character, means, authority, importance, intelligence, vision, quality, . . .
Read the rest of this entry »
When I was learning Mandarin over half a century ago, the more grammatically minded Chinese language teachers argued that historically and functionally there were multiple "le" particles that just happened to end up being written with the simple two-stroke character 了. Then a contrary movement set in, and linguists tried to prune down all the "le" into two or even one, claiming that all of the different 了 developed out of an ur-了.
The irony of it all is that, before the 20th century, there was no established, systematic, explicit grammar for Sinitic languages in indigenous sources.
See, inter alia, Victor H. Mair (1997), "Ma Jianzhong and the Invention of Chinese Grammar," in Chaofen Sun, ed., Studies on the History of Chinese Syntax. Monograph Series Number 10 of Journal of Chinese Linguistics, 5-26. (available on JSTOR here)
Mǎshì wéntōng 馬氏文通 (conventionally rendered as "Ma's Grammar", though it would probably be closer to the original meaning in Chinese to translate it as "Written Language Unobstructedness"; 1898)
Just as we have seen in a recent post, before the 20th century there was no Chinese concept of "word":
"HouseHold GarBage" (12/6/19)
Which leads to the question: can you have grammar without words?
There have been countless papers, articles, dissertations, and monographs on le 了. Here I'm going to introduce two dissertations on le 了 written within the last few decades and the latest monograph on le 了 as representative of what has been happening with regard to the conceptualization of this protean particle in recent times.
Read the rest of this entry »
Jim Waterson, "Channel 4 apologises over subtitle error on viral Boris Johnson clip (Tory anger after tweet claims PM said ‘people of colour’ instead of ‘people of talent’)", The Guardian 12/6/2019:
Channel 4 News has apologised after a subtitling error wrongly claimed Boris Johnson had discussed whether “people of colour” should be allowed into the UK, prompting the Conservatives to accuse staff at the channel of being campaigners rather than journalists.
In a clip of the prime minister uploaded to Channel 4’s social media accounts, Johnson was captioned as saying: “I’m in favour of having people of colour come to this country but I think we should have it democratically controlled and have it done that way.”
In reality, Johnson said he was in favour of having “people of talent” come to the UK, and did not discuss race.
The falsely subtitled clip went viral on Friday, prompting Channel 4 to issue a correction: “Boris Johnson says ‘people of talent’ not ‘people of colour’. Our earlier tweet was a mistake. We misheard and we apologise.”
Some people who had shared the clip continued to wrongly insist the prime minister had said the word “colour”. This suggested it may be an example of people’s hearing being influenced by visual cues – similar to the known phenomenon of the McGurk effect. It also echoes the confusion at the end of last year over whether a voice in a short audio clip was saying the word “laurel” or “yanny”.
Read the rest of this entry »
Dick Margulis saw this in a hospital waiting room in the University of Hong Kong Shenzhen Hospital:
Read the rest of this entry »
Annie Ropeik, "N.H. Defends Laconia Law Barring Female Nudity In U.S. Supreme Court Appeal", New Hampshire Public Radio 12/6/2019:
New Hampshire has filed a response with the U.S. Supreme Court in the so-called “Free the Nipple” case of three women arrested for going topless at Weirs Beach in 2016.
The high court had asked to hear from the state, which an attorney for the women appealing says shows at least one justice may be interested in the issue.
The women say the Laconia ordinance under which they were convicted is unconstitutional and discriminates based on gender.
Read the rest of this entry »
Here at the UNESCO LT4All conference, I've noticed that many participants assert or imply that the problems of human language technology have been solved for a few major languages, especially English, so that the problem on the table is how to extend that success to thousands of other languages and varieties.
This is not totally wrong — HLT is a practical reality in many applications, and is being rapidly spread to others. And the problem of digitally underserved speech communities is real and acute.
But it's important to understand that the problems are not all solved, even for English, and that the remaining issues also represent barriers for extensions of the technology to other communities, in that the existing approximate solutions are far too hungry for data and far too short on practical understanding and common sense.
Read the rest of this entry »
I'm in Paris for the UNESCO International Conference Language Technologies for All (LT4All), which happens to coincide with another event in France, a national strike that (among other things) created a very long trip from the airport. In my hotel's elevator there was a sign whose first sentence taught me a couple of new words:
Ce Jeudi 5 Décembre aura lieu une grève interprofessionelle reconductible indéterminée.
There was also an English translation:
This Thursday, December 5th, will be an indefinite, renewable interprofessional strike.
Responding to this recent post about machine analysis of grammar, "Literary Sinitic / Classical Chinese dependency parsing" (11/27/19), Nicholas Morrow Williams writes:
That reminds me tangentially of something I just heard about, an effort to transcribe Japanese "kuzushiji" (cursive-like) script using AI. This article, which contains some striking illustrations, is about a huge international competition to devise a better method, won apparently by a Chinese team.
Read the rest of this entry »