Archive for Information technology

Easy versus exact

Ever since people started inputting Chinese characters in computers, I've had an intense interest in how they do it, which systems are more efficient, and why they choose the particular ones they adopt.  For the first few decades, because all inputting systems presented significant obstacles and challenges, I remained pretty much of an onlooker because I didn't want to waste my time struggling with cumbersome methods.  It's only after I discovered how simple and fast it is to use Google Translate as my chief inputting method that I became very active in entering Chinese character texts.

Read the rest of this entry »

Comments (25)

Awesome / sugoi すごい!

Comments (6)

Information content of text in English and Chinese

Terms and concepts related to "letters" and "characters" were used at spectacularly crossed purposes in many of the comments on Victor Mair's recent post "Twitter length restrictions in English, Chinese, Japanese, and Korean". I'm not going to intervene in the tangled substance of that discussion, except to reference some long-ago LLOG posts on the relative information content of different languages/writing systems. The point of those posts was to abstract away from the varied, complex, and (here) irrelevant details of character sets, orthographic conventions, and digital encoding systems, and to look instead at the size ratios of parallel (translated) texts in compressed form. The idea is that compression schemes try precisely to get rid of those irrelevant details, leaving a better estimate of the actual information content.

My conclusions from those exercises are two:

  1. The differences among languages in information-theoretic efficiency appear to be quite small.
  2. The direction of the differences is unclear — it depends on the texts chosen, the direction of translation, and the method of compression used.

See "One world, how many bytes?", 8/5/2005; "Comparing communication efficiency across languages", 4/4/2008; "Mailbag: comparative communication efficiency", 4/5/2008; "Is English more efficient than Chinese after all?", 4/28/2008.

 

Comments (7)

Veggies for cats and dogs

This video was passed on by Tim Leonard, who remarks, "real-time video translation at its best":

Read the rest of this entry »

Comments (8)

More Sinological suffering

[This is a guest post by Brendan O'Kane. See "Sinological suffering", 3/31/17, for background.]


I snapped this picture at the library today:

Read the rest of this entry »

Comments (28)

Siri and flatulence

An acquaintance of mine has a new iPhone, which he carries in a pocket that is (relevantly) below waist level. He has discovered something that dramatically illustrates the difference between (i) responding to speech and (ii) responding to speech as humans do, on the basis of knowing that it is speech.

Read the rest of this entry »

Comments off

The miracle of reading and writing Chinese characters

We have the testimony of a colleague whose ability to write Chinese characters has been adversely affected by her not being able to visualize them in her mind's eye.  See:

"Aphantasia — absence of the mind's eye" (3/24/17)

This prompts me to ponder:  just how do people who are literate in Chinese characters recall them?

Read the rest of this entry »

Comments (26)

Pick a word, any word

To access an article in the Financial Times yesterday I found myself confronted with a short market-research survey about laptops, tablets, and smartphones. Answer three our four layers of click-the-box questions, and I could get free access to the article I wanted to look at. A reasonable bargain: clearly some company was prepared to pay the FT for access to its online readers' opinions. And at the fourth layer down I faced a question which asked me to choose a single word that comes into my mind when I think of a certain Microsoft product.

My choice, from all the tens of thousands of words at my disposal, and the word I picked would go straight into the market research department of the one corporation, above all others, for whose products I have the greatest degree of contempt. Just choose that one evocative word and type it in, and I would be through to my article. A free choice. Which word to pick?

Read the rest of this entry »

Comments off

Kazakhstan HQ for the Buffett Foundation

I received an exciting email this afternoon from Perry Alexis, the chief accountant for the Warren Buffett Foundation. It seems I have been picked to receive a $1,500,000 donation — not a grant for research or anything, but a donation. And I notice it came from an email address in Kazakhstan.

Read the rest of this entry »

Comments off

Kindly do the needful

A phishing spam I received today from "Europe Trade" (it claims to be in Wisconsin but its address domain is in Belarus) said this:

Good Day sir/madam,

I am forwarding the attached document to you as instructed for confirmation,

Please kindly do the needful and revert

Best regards
Sarah Griffith

There were two attachments, allegedly called "BL-document.pdf" and "Invoice.pdf"; they were identical. Their icons said they were PDF files of size 21KB (everyone trusts PDF), but viewing them in Outlook caused Word Online to open them, whereupon they claimed to be password-protected PDF files of a different size, 635KB. However, the link I was supposed to click to open them actually led to a misleadingly named HTML file, which doubtless would have sucked me down to hell or sent all my savings to Belarus or whatever. I don't know what you would have done (some folks are more gullible than others), but I decided I would not kindly do the needful, or even revert. Sorry, Sarah.

Read the rest of this entry »

Comments off

Clueless Microsoft language processing

A rather poetic and imaginative abstract I received in my email this morning (it's about a talk on computational aids for composers), contains the following sentence:

We will metaphorically drop in on Wolfgang composing at home in the morning, at an orchestra rehearsal in the afternoon, and find him unwinding in the evening playing a spot of the new game Piano Hero which is (in my fictional narrative) all the rage in the Viennese coffee shops.

There's nothing wrong with the sentence. What makes me bring it to your notice is the extraordinary modification that my Microsoft mail system performed on it. I wonder if you can see the part of the message that it felt it should mess with, in a vain and unwanted effort at helping me do my job more efficiently?

Read the rest of this entry »

Comments off

Spamferences thrive; junk journals prosper

I was recently moved (screaming and struggling, as four strong men held me down by my arms and legs) to a new web-based university email system designed and run by Microsoft: Office 365. Naturally, it's ill-designed slow-loading crap, burdened by misfeatures and pointless pop-ups that I do not want popping up, and it fails to allow various elementary operations that I often need (every upgrade is a downgrade). But that is not my topic today. I want to note one special sad consequence of moving to an entirely new system: all my previous email system's Bayesian machine learning about spam classification has been lost. The Office 365 system has had hardly any data to learn from as yet, so I am seeing some of the stuff that would have been coming to me all along if it had not been caught by machine learning and dumped in the spam bin. And what has truly amazed me is the daily flow of advertising for spamferences and junk journals.

Read the rest of this entry »

Comments off

AI for youth: success and failure

Success: Xiaoice is a Microsoft chatbot program that has become popular in China.  Her name is written in various ways:

"Xiaoice" 42,400 ghits (that's pronounced "xiǎo ice")
"小冰" 362,000 ghits (that's pronounced "xiǎo bīng")
"小ice" 11,200 ghits (that's pronounced "xiǎo ice")
"Little Bing" 16,000 ghits (she's obviously named after Microsoft's search engine*)
"Little Ice" for the chatbot doesn't work, because that's the name of Ice-T's son.

Not all of these ghits are to the Chinese chatbot program; some are for Facebook and Twitter monikers, etc., but most do refer to the Microsoft chatbot.

Read the rest of this entry »

Comments (10)