Archive for Orthography

Information content of text in English and Chinese

Terms and concepts related to "letters" and "characters" were used at spectacularly crossed purposes in many of the comments on Victor Mair's recent post "Twitter length restrictions in English, Chinese, Japanese, and Korean". I'm not going to intervene in the tangled substance of that discussion, except to reference some long-ago LLOG posts on the relative information content of different languages/writing systems. The point of those posts was to abstract away from the varied, complex, and (here) irrelevant details of character sets, orthographic conventions, and digital encoding systems, and to look instead at the size ratios of parallel (translated) texts in compressed form. The idea is that compression schemes try precisely to get rid of those irrelevant details, leaving a better estimate of the actual information content.

My conclusions from those exercises are two:

  1. The differences among languages in information-theoretic efficiency appear to be quite small.
  2. The direction of the differences is unclear — it depends on the texts chosen, the direction of translation, and the method of compression used.

See "One world, how many bytes?", 8/5/2005; "Comparing communication efficiency across languages", 4/4/2008; "Mailbag: comparative communication efficiency", 4/5/2008; "Is English more efficient than Chinese after all?", 4/28/2008.

 

Comments (7)

You April fools!

Many Language Log readers have been complaining about the absence of any recognition of April Fool's Day at this site. I can only lament your lack of perceptiveness. There have been pranks all over the place and you simply didn't see them because you are too gullible.

The primary linguistic one was Victor Mair's amusing spoof post "Sinological suffering", cunningly posted on March 31st to be there when you read Language Log on Saturday morning, April 1st, about an imaginary Chinese character that couldn't be found in dictionaries no matter what lookup method you tried.

Do you really think a writing system could survive if it were so brain-wrenchingly complex, arcane, and impossible to document that there would be written characters that Victor Mair, one of the greatest experts on Asian languages on this planet, could not track down or translate?

Read the rest of this entry »

Comments off

Sinological suffering

Since I became a Sinologist in 1972, hardly a day has passed when I didn't spend an hour or two vainly searching for a character or expression in my vast arsenal of Chinese reference works.  The frustration of not being able to find what I'm looking for is so agonizing that I sometimes simply have to scream at the writing system for being so complicated and refractory.

Read the rest of this entry »

Comments (56)

Hyphenation with words containing capital letters

A truly startling (and surely unintended) hyphenation in the print edition of The Economist (March 11th) suggests that some updating of word-breaking algorithms is in order in the light of the fairly recent practice of inventing product and brand names that have word-internal upper-case letters. An article about juvenile delinquency, reporting that kids are less involved in crime in part because they're indoors playing video games, ends with this paragraph (I reproduce the line breaks and hyphens of the UK print edition exactly, though not the microspacing that justifies the right-hand margin; the only thing I'm interested in is the end of the penultimate line):

    The decline in crime among the young
bodes well for the future. A Home Office
study in 2013 found that those who com-
mitted their first crime aged between ten
and 17 were nearly four times more likely to
become chronic offenders than those who
were aged 18-24, and 11 times more likely
than those who were over 25. More PlayS-
tation, less police station.

Read the rest of this entry »

Comments (37)

Topolectal traffic sign

This has apparently been around for awhile, but I'm seeing it now for the first time:

Read the rest of this entry »

Comments (14)

Donlad's mispellings

Dana Milbank, "Shoker! Rediculous chocker Trump attaks and dishoners English with ever-dummer spellings", Washington Post 2/7/2017:

The English language was unprepared for the attak. It was destined to loose. And, inevitably, it chocked.

The Trump White House on Monday night, attempting to demonstrate that the media had ignored terrorism, released a list of 78 “underreported” attacks. The list didn’t expose anything new about terrorist attacks, but it did reveal a previously underreported assault by the Trump administration on the conventions of written English.

Twenty-seven times, the White House memo misspelled “attacker” or “attackers” as “attaker” or “attakers.” San Bernardino lost its second “r.” “Denmark” became “Denmakr.”

Sounds like one of my LLOG posts before readers step in to help me out.

Read the rest of this entry »

Comments (25)

He comfortable! He quickly dry!

A neighbor of mine, a respectable woman retired from medical practice, set a number of friends of hers a one-question quiz this week. The puzzle was to identify an item she recently purchased, based solely on what was stated on the tag attached to it. The tag said this (I reproduce it carefully, preserving the strange punctuation, line breaks, capitalization, and grammar, but replacing two searchable proper nouns by xxxxxxxx because they might provide clues):

ABOUT xxxxxxxx
He comfortable
He elastic
He quickly dry
He let you unfettered experience and indulgence. Please! Hurry up
No matter where you are. No matter what you do.
Let xxxxxxxx Change your life,
Become your friends, Partner,
Part of life

Read the rest of this entry »

Comments off

…"such matters as Opinion, not real worth, gives a value to"

Recently, a series of serendipitous connections led me to read Mary Astell's work, A serious proposal to the ladies, for the advancement of their true and greatest interest, first published in 1694.  And this experience led me to two questions, the first of which is, Why in the world are Mary Astell's works not available in a readable plain text form, from sources like Project Gutenberg and Wikisource?

Astell's Wikipedia entry explains that she "was one of the first English women to advocate the idea that women were just as rational as men, and just as deserving of education." And she is important enough to merit an entry in the Stanford Encyclopedia of Philosophy, which describes at length her contributions to metaphysics and epistemology.

I know that the first-order reason for this lacuna is that OCR is still pathetically incapable of dealing with 17th-century printing, and that no volunteers have stepped forward to transcribe her writings from the available paper or image sources. But this doesn't really answer the question, it just moves it back a step.

Anyhow, my second question is one that I've wondered about before, without ever trying to find an answer: Why did authors from Astell's time distribute initial capital letters in the apparently erratic way that they did?

Read the rest of this entry »

Comments (30)

"Spelling" errors in Chinese

A smart and generally careful graduate student from China recently handed in an English –> Chinese translation.  In checking over his work, I noticed several mistakes, from which I select here a couple of examples.  Except in two cases, I won't point out the problems with inappropriate word choice and grammar, but will focus on a particular category of error associated with contemporary Chinese writing.

Read the rest of this entry »

Comments (17)

Degemination

If you think about it, "home made" is pronounced the same way as "homade" would be if it was a word:

And maybe "homade" *is* a word?

Comments (42)

Colloquial contractions in Mandarin

I've mentioned my old friend Liu Yongquan in various posts and comments — see, inter alia, here, here, and here, where I wrote:

A colleague, Liu Yongquan 刘永泉, who spent most of his life working in Beijing as an applied linguist (especially concerned with machine translation and computer applications), spoke quite good MSM, referred to people who speak "like that" (as I have described colloquial Pekingese in the above paragraph) as méi xiūyǎng 没修养 ("lacking cultivation"). I'm not sure where Liu originally came from, though I think it was from somewhere in the northeast. He had a curious speech mannerism: whenever he said zhè'er / zhèr 这儿 ("here") and nà'er / nàr ("there"), they always came out as zhèher and nàher. For the first few months when I heard him talk like that, I thought that it was an affectation, but later I heard the same pronunciation from a few other people, so I suppose it has some basis in a regional variety of Mandarin.

Read the rest of this entry »

Comments (18)

Ask Language Log: -er vs. -or

From Matthew Yglesias:

A few of us at work were talking about why it's adviser and protester but professor and and auditor and after bullshitting around for 10 minutes I thought "maybe I should ask a linguist." Have you ever blogged on this?

I don't think that we have, though you can find well-informed discussions elsewhere, e.g. here or here/here. The executive summary is that -er is (originally) Germanic while -or is (basically) Latin, often via French.

But this doesn't help much with the particular examples you cite, since all four words are from Latin via French. Like most things about English morphology and spelling, the full answer is complicated, and also more geological than logical. But the OED seems to have the whole story — lifted from the depths of the discussion, the key point is that

Many derivatives [formed with -er as an agentive suffix] existed already in Old English, and many more have been added in the later periods of the language. In modern English they may be formed on all vbs., excepting some of those which have [Latin- or French-derived] agent nouns ending in -or, and some others for which this function is served by ns. of different formation (e.g. correspond, correspondent). The distinction between -er and -or as the ending of agent nouns is purely historical and orthographical.

For a (much) longer treatment — you have been warned — press onward.

Read the rest of this entry »

Comments (39)

Ask Language Log: Iowa mystery image

David Donnell:

A friend in Ames, Iowa, sent me this photo of a small framed picture she purchased at a garage sale in her town. She is curious what the language is, and what it says…in English.

She added, “I got the impression from the other items at this woman's sale that she had done some traveling and picked up souvenirs from all over the world. (I could be wrong, though!)”

Myself, I am clueless about what language it is, and clueless how to even google it! (I tried a Google image search and got nothing useful, and googling the word “Capamoba” also didn’t help.)

Read the rest of this entry »

Comments (23)