Lawrence Evalyn wrote to me saying that he received the official communication below about a new student card that is being issued by his university. He was perplexed by all the Chinese characters that got inserted in the text. They seem to appear consistently in certain places and for certain letters. [N.B.: The communication has been anonymized for posting on Language Log.]
What is the [new student ID card]?
Offering the services of multiple cards in one, the [card] will be your student identification card. [card] is:
慍our [local bus pass] (if eligible)
慍our access to [the gym]
嵩ash and [this card] are the ONLY accepted forms of payment at the newly renovated [cafe] opening in September 2014
慍our on-campus debit card for tap-and-go payment at all University Food Services locations (including the newly renovated [cafe] in the University Centre)
慍our University Food Services Meal plan
In the near future, you will be able to use your [card] for:
慈ther on-campus vendors, including the Bookstore and the [university-affiliated] businesses
惹nacks and drinks from on-campus vending machines
[The card] is more streamlined, flexible and secure than the current [card] system, and your student number will not change.
Beat the September rush!
Once you are registered for courses in the 2014 Fall Term, visit the [card] office (prev. Photo ID Centre) in the University Centre lobby to get your new [card].
You will need:
慈ne piece of current government-issued photo identification that clearly shows your full, legal name and your date of birth
慍our [university] student number
嫂void the long lines and come in soon!
Lawrence wondered what sort of glitch would cause this kind of garbling.
Tom Bishop and Richard Cook, both of whom are Chinese specialists associated with the Unicode consortium, commented thus:
It looks to me like some kind of "bullet" character codes got reinterpreted as Hanzi, due to mistaken encoding identification (luanma). For example, something like "•Your" became "慍our". Probably at some point along the line of communication, a non-Unicode encoding (e.g., Latin1 or Big5) was used. Unfortunately, a lot of software still doesn't support Unicode properly.
The text was written as one encoding and then read as another encoding. Without studying this example carefully, it seems that bullets at the line start are being misinterpreted. Anyway, if the text isn't too damaged, if you open it in Wenlin you can probably figure out what the original encoding is.
I don't have much to add to what Tom and Richard have said, except to note that 5-10 years ago this sort of thing used to happen a lot more than it does now. I used to be very annoyed when I would receive English language documents with random Chinese characters scattered throughout the text, not just at the beginnings of lines as here, and often I had to spend a considerable amount of time removing them individually. That almost never happens to me anymore.
Here's one documented case and its solution:
For those who wish to delve more deeply into the technical details of how this happened in the above quoted letter, here is an expert explanation from Silas S. Brown:
慍 encoded in Big5 is a byte B7 + a letter Y. B7 is mid-dot (·) in Windows-1252 / ISO-8859-1. (A 'real' bullet • is not available in that codepage.) Somebody typed a mid-dot immediately followed by a Y, encoded it into Windows-1252 or 8859-1 (Western European), and then some other program interpreted these 2 bytes as a Big5 code and converted it to Unicode 慍. Similarly for 嵩 (Big5 code = byte B7 + letter C). Hence 慍our is a luanma'd ·Your, and 嵩ash is a luanma'd ·Cash. Not sure why the original writer didn't put a space after the mid-dot, but still.
I've also found myself amused by the many, many ways the university has been attempting to convince all of us that this card is somehow more convenient for the student body– at least for graduate students, all of these features were already offered by our student ID cards, and the cafes used to accept credit and debit– but there's no real linguistic innovation in corporate speak insisting that something is "more streamlined!" when they really mean "more streamlined for us."
Anyway, it does seem that each character is consistently used for a specific letter, but I have no idea how each letter came to be replaced by the characters in question! If there's some kind of logic behind it, I'd love to hear it.
If this ever happens to you, it's all right to get upset, but don't make the slightest effort to decipher these annoying graphs. Detached from a Chinese context and arbitrarily inserted in an English text, they convey no intelligible meaning or logic whatsoever.
[Thanks to John Rohsenow]