Archive for Language and computers
January 29, 2017 @ 8:38 am· Filed by Victor Mair under Borrowing, Diglossia and digraphia, Language and computers, Style and register, Topolects, Translation
The inability of Google Translate, Microsoft Translator, Baidu Fanyi, and other translation services to correctly render jī nián dàjí 鸡年大吉 ("may the / your year of the chicken be greatly auspicious!") in various languages points up a vital distinction that I have long wanted to make, and now is as good a time as ever. Namely, just as you could not expect these translation services to handle Cantonese, Shanghainese, Taiwanese, etc. (unless specifically and separately programmed to do so), we should not expect them to deal with Literary Sinitic / Classical Chinese (LS / CC).
Read the rest of this entry »
Permalink
December 10, 2016 @ 3:07 pm· Filed by Victor Mair under Language and computers, Typography
Somebody asked Mark Swofford to help her devise a speedy, easy way to locate all the Chinese characters in a book-length manuscript that she was working on. Mark set to work on the problem, and this is what he came up with:
"How to find Chinese characters in an MS Word document" (12/10/16)
Read the rest of this entry »
Permalink
December 9, 2016 @ 10:51 pm· Filed by Victor Mair under Language and computers, Language and culture, Language and food
My son sent me this wonderful, learned post called "The best bits" from the "Old European culture" blog (12/7/2015). It begins:
Offal, also called variety meats or organ meats, refers to the internal organs and entrails of a butchered animal. The word does not refer to a particular list of edible organs, which varies by culture and region, but includes most internal organs excluding muscle and bone.
The word shares its etymology with several Germanic words: Frisian ôffal, German Abfall (offall in some Western German dialects), afval in Dutch and Afrikaans, avfall in Norwegian and Swedish, and affald in Danish. These Germanic words all mean "garbage", or —literally— "off-fall", referring to that which has fallen off during butchering. However, these words are not often used to refer to food with the exception of Afrikaans in the agglutination afvalvleis (lit. "off-fall-meat") which does indeed mean offal. For instance, the German word for offal is Innereien meaning innards. According to the Oxford English Dictionary, the word entered Middle English from Middle Dutch in the form afval, derived from af (off) and vallen (fall).
Read the rest of this entry »
Permalink
December 5, 2016 @ 10:57 am· Filed by Geoffrey K. Pullum under Announcements, Errors, Language and computers, Language and technology, WTF
Almost every day, when looking through the headlines on Google News, I see one or two stories where what's meant to be a snippet from the first paragraph of the story contains not a single word from the story but instead says this:
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button. Close Modal Dialog. This is a modal window.
Read the rest of this entry »
Permalink
November 5, 2016 @ 2:05 pm· Filed by Victor Mair under Language and computers, Writing systems
We have looked at the Chinese typewriter again and again:
"Chinese Typewriter" (6/30/09)
"Chinese typewriter, part 2" (4/17/11)
"Chinese character inputting" (10/17/15)
By now we are thoroughly familiar with this unwieldy contraption. Given that it has long since been consigned to the museum, where it properly belongs, it is strange that some folks continue to tout it as the wave of the future in information processing.
Read the rest of this entry »
Permalink
October 25, 2016 @ 6:23 pm· Filed by Victor Mair under Language and computers, Writing systems
I was stunned when I read this op-ed piece in the NYT yesterday (10/24/16): "China's Digital Soft Power Play". In it, the author, Jing Tsu (a professor of Chinese literature and culture at Yale), writes:
This month, the Chinese government plans to introduce codes for some 3,000 Chinese characters as part of a grand project, known as the China Font Bank, to digitize 500,000 characters previously unavailable in electronic form. Until now, only 80,388 characters have been encoded in the international computing standard, Unicode.
The project highlights 100,000 characters from the country’s 56 ethnic minorities, and another 100,000 rare and ancient characters from China’s written corpus. Deploying almost 30 companies, institutions and universities, it’s the largest state-funded digitization project ever undertaken.
Read the rest of this entry »
Permalink
October 15, 2016 @ 8:21 pm· Filed by Victor Mair under Diglossia and digraphia, Language and computers, Tones, Transcription, Writing systems
A father speaks
[This is a guest post by Alex Wang, following up his remarks in "Learning to read and write Chinese" (7/11/16).]
The more I learn Chinese to teach my younger son Chinese reading and writing the more I realize for lack of better word how “ridiculous” it is for a “significant / modern” country to use such a reading and writing system. Perhaps I may be wrong because I’m not informed.
To provide some background, I grew up speaking only Chinese in the house. I went to Saturday school for a few years to learn a little bit of reading and writing but mostly forgot all of it by the time I came to Shenzhen 9 years ago. I did not learn pinyin; I was taught Bopomofo which I have forgotten entirely. I say this so that you understand my relative fluency in the spoken language. On reading characters, I can now recognize perhaps several hundred.
Read the rest of this entry »
Permalink
October 14, 2016 @ 8:39 pm· Filed by Victor Mair under Diglossia and digraphia, Language and computers, Language and education, Writing, Writing systems
This is a photograph of a page from an essay written by a third grade student at an elementary school in Suining, Sichuan Province, China:
Read the rest of this entry »
Permalink
October 1, 2016 @ 6:40 am· Filed by Geoffrey K. Pullum under Humor, Information technology, Insults, Language and advertising, Language and computers, Swear words, Taboo vocabulary, Words words words
To access an article in the Financial Times yesterday I found myself confronted with a short market-research survey about laptops, tablets, and smartphones. Answer three our four layers of click-the-box questions, and I could get free access to the article I wanted to look at. A reasonable bargain: clearly some company was prepared to pay the FT for access to its online readers' opinions. And at the fourth layer down I faced a question which asked me to choose a single word that comes into my mind when I think of a certain Microsoft product.
My choice, from all the tens of thousands of words at my disposal, and the word I picked would go straight into the market research department of the one corporation, above all others, for whose products I have the greatest degree of contempt. Just choose that one evocative word and type it in, and I would be through to my article. A free choice. Which word to pick?
Read the rest of this entry »
Permalink
September 29, 2016 @ 4:07 pm· Filed by Geoffrey K. Pullum under Computational linguistics, Humor, Language and computers, Prescriptivist poppycock, Usage advice
Let me explain, very informally, what a predictive text imitator is. It is a computer program that takes as input a passage of training text and produces as output a new text that is composed quasi-randomly except that it matches the training text with regard to the frequencies of word or character sequences up to some fixed finite length k.
(There has to be such a length limit, of course: the only text in which the word sequence of Melville's Moby-Dick is matched perfectly is Melville's Moby-Dick, but what a predictive text imitator trained on Moby-Dick would do is to produce quasi-random fake-Moby-Dickish gibberish in which each sequence of not more than k units matches Moby-Dick with respect to the transition probabilities between adjacent units.)
I tell you this because a couple of months ago Jamie Brew made a predictive text imitator and trained it on my least favorite book in the world, William Strunk's The Elements of Style (1918). He then set it to work writing the first ten sections of a new quasi-randomly generated book. You can see the results here. The first point at which I broke down and laughed till there were tears in my eyes was at the section heading 'The Possessive Jesus of Composition and Publication'. But there were other such points too. Take a look at it. And trust me: following the advice in Jamie Brew's version of the book won't do your writing much more harm than following the original.
Read the rest of this entry »
Permalink
September 22, 2016 @ 7:57 pm· Filed by Victor Mair under Changing times, Language and computers
[This is a guest post by Nathan Hopson]
NHK reported yesterday on the recently released results of the Agency for Cultural Affairs' annual survey of the changing uses of Japanese. This year, the survey of 3500 men and women 16 and up received responses from 54%. The most interesting results reflected the impact of online and SMS language use by young people.
Read the rest of this entry »
Permalink
September 5, 2016 @ 4:59 am· Filed by Geoffrey K. Pullum under Language and computers, Nerdview
Adam Rosenthal told me in an email recently:
While trying to enter my address into American Airlines' horribly designed phone app, I was asked to wait, because "States/Provinces are still populating for the first time".
What the hell was going on? I'm sure you regular readers will be able to guess.
Read the rest of this entry »
Permalink
August 31, 2016 @ 1:53 am· Filed by Geoffrey K. Pullum under Errors, Grammar, Humor, Information technology, Language and business, Language and computers, Language and technology
A phishing spam I received today from "Europe Trade" (it claims to be in Wisconsin but its address domain is in Belarus) said this:
Good Day sir/madam,
I am forwarding the attached document to you as instructed for confirmation,
Please kindly do the needful and revert
Best regards
Sarah Griffith
There were two attachments, allegedly called "BL-document.pdf" and "Invoice.pdf"; they were identical. Their icons said they were PDF files of size 21KB (everyone trusts PDF), but viewing them in Outlook caused Word Online to open them, whereupon they claimed to be password-protected PDF files of a different size, 635KB. However, the link I was supposed to click to open them actually led to a misleadingly named HTML file, which doubtless would have sucked me down to hell or sent all my savings to Belarus or whatever. I don't know what you would have done (some folks are more gullible than others), but I decided I would not kindly do the needful, or even revert. Sorry, Sarah.
Read the rest of this entry »
Permalink