From Down Under, Valerie Syverson sent in the following photograph taken at a storefront in Sydney's Chinatown:
As she notes, the sign is advertising what appear to be leek turnovers as "Bradysia homozygous". Bradysia is the scientific name of a genus of fungus gnats; a homozygous individual has identical alleles of a given gene on both homologous chromosomes. How did we get from leek turnovers to the genetics of insects?
First of all, let's establish what the English translation should be.
(Technically, as we shall see below, the third character is "miswritten", but it is a very common "miswriting", so we won't worry about it for now.)
I suspect that Valerie was standing in front of Yee King Noodles on 408 Sussex Street in the Haymarket section of Sydney when she took that photograph, since John (self-declared "gourmand & ex-chef") described the same delicacy last August on his blog entitled heneedsfood.com.
In fact, when I entered "Bradysia homozygous" in Google, the only hit I received was for John's article about the Yee King Noodles restaurant. I'm not surprised that I only received that single hit, however, since "Bradysia homozygous" is not an actual species name, but a terminological Frankenstein's monster combining an insect genus (Bradysia) with a genetic abstraction (homozygous).
When I entered "bradysia" into Google Translate, the Chinese translation it returned was jiǔcài 韭菜 "leek", and Baidu Fanyi does exactly the same thing. How did this happen? Well, there is one particular species of fungus gnat, Bradysia odoriphaga, whose maggots are particularly fond of leek, onion, garlic and similar plants (the allium family; plants that Buddhist monks stay away from). In leek fields, Bradysia odoriphaga reproduces six generations per year and is capable of causing great damage to the crops.
So without knowing the details, we can guess that these statistical machine translation programs have digested parallel text rich in leeks and Bradysia, and have managed to get confused about which is the vegetable and which is the maggot.
This still doesn't explain the mapping in the other direction, since Google Translate and Baidu Fanyi both now give "leek" as the English translation of 韭菜 jiǔcài. But perhaps the bradysia -> 韭菜 mapping preserves an association that went both ways in an older state of the software.
Now, how to explain "homozygous"? When I ask Google Translate about the sign's version of "turnover", hézǐ 合子, lo and behold out comes "zygote". Baidu Fanyi yields "The zygote". We can't blame the translation engines too much in this case, since hézǐ 合子 (lit., "zygote") is actually a common "miswriting" for hézi 盒子 ("box", i.e., "turnover"). This orthographic malapropism is common enough that a cooperative translation system should take it into account, however.
The "homo" part of the translation is less easy to excuse. Homozygous should be chún hézǐ 純合子 ("pure / unmixed zygote"). Without knowing the details, we can guess that the statistical translation software has been fooled by a common collocational association in parallel text (chún hézǐ 純合子 = "homozygous") into a mistaken alignment (hézǐ 合子 = "homozygous").
In this case, the big ABC Chinese-English Comprehensive Dictionary does a better job than Google Translate and Baidu Fanyi:
1hézi 盒子 1. box; case; casket 2. Mauser pistol
2hézi 合子 1. a kind of meat pie 2. box, see also 2hézǐ
2hézǐ 合子 zygote, see also 2hézi
[I should mention that an enhanced, enlarged version of the ABC Chinese-English Comprehensive Dictionary will go online in the not-too-distant future, although there are still a lot of details to be worked out before that becomes a reality.]
hézi 合子, with the meaning "box", has been used in exactly this form since the Tang period (618-907).
Another meaning for hézǐ 合子 as an alternate name for kēténgzi 榼藤子 (Entada phaseoloides or St. Thomas Bean), which is used in traditional Chinese medicine.
As a matter of fact, more people write jiǔcài hézi 韭菜合子 ("leek zygote –> turnover") than write the "proper" form, jiǔcài hézi 韭菜盒子 ("leek turnover"):
韭菜合子 154,000 ghits
韭菜盒子 139,000 ghits
This fits with the pattern we've seen in numerous Language Log posts where the sounds of characters are more important than their strict, "correct" form, e.g., "Kung-fu (Gongfu) Tea" (7/20/2011), "Google me with a fire spoon" (7/28/2011).
So far as I know, hé 合 ("close; shut; join; unite; combine; contract; whole; total; fitting; corresponding; suitable; appropriate; equal to; to match; to meet; together with; a region; note on a musical scale; a surname"; pronounced gě, the same character can mean "a unit of measure for grain; a container for measuring grain" — both meanings from ancient times) is not an offically recognized simplification of 盒 ("box; covered container"). However, the two characters are homophonous and clearly cognate.
Moreover, 合 originally meant (guess what?) "box; covered container". Indeed, the earliest form of the graph on the oracle bones (circa 1200 BC) was a pictograph showing a conical lid over a round container. In the following centuries, this hé 合 character acquired so many extended and bleached meanings, as well as being borrowed for other, unrelated morphemes, that in the late 3rd century BC, a new character, hé 盒 was created to disambiguate the original meaning ("box; covered container" [--> "close; shut"]). The new character, hé 盒 means precisely "box; covered character", and was formed by placing the old 合 above mǐn 皿 ("utensil; vessel; container"), radical [semantic classifier] no. 108 in the Kangxi system).
The story I've narrated for the relationship between hé 合 and hé 盒 is typical for tens of thousands of Chinese characters. Here's basically how the process goes:
1. there is a word in the spoken language
2. a pictogram or ideogram is devised to stand for that spoken word
3. for more complex concepts, ideogrammatic compounds are created
4. phonetic loans (often somewhat confusingly referred to as "rebuses" in the Sinological literature) are utilized for more subtle, abstract meanings
5. phono-semantic compounds are invented; these account for the vast bulk of Chinese characters, probably around 85% of the total
6. a rare category known as "derivative cognates" accounts for a handful of characters
Many of the phono-semantic characters are like our 盒 in that they were invented to disambiguate the original or basic meaning of a graph after it had become attenuated, extended, and bleached beyond recognition. Hence there are thousands of characters like hú 鬍 ("beard") and xū 鬚 ("mustache") that were devised to restore the original meaning of hú 胡 and xū 須 after the latter two characters were co-opted for numerous other meanings and purposes (respectively "non-Sinitic nomads; reckless; unreasonable; why; for what" and "must; have to; ought; should; wait; await"; part of a disyllabic word meaning "moment; instant"). It is curious that the current, official simplified forms of hú 鬍 ("beard") and xū 鬚 ("mustache") are 胡 and 须, so we're right back where we started from more than two thousand years ago! Proof once more that the basic phonologically determined written forms keep (re-)exerting themselves against the more semantically determined, elaborated, later forms.
So, most people prefer to write jiǔcài hézi 韭菜合子 ("leek zygote" –> "turnover") rather than jiǔcài hézi 韭菜盒子 ("leek turnover") because it still gets the sounds across perfectly well and saves them five brush / pen / pencil / stylus strokes.
By the way, my mother-in-law used to make jiǔcài hézi 韭菜盒/合子 ("leek turnovers") for me, and they were incredibly delicious (my mouth drools just thinking of them), but I always thought that the name sounded funny: "leek box" (it doesn't look much like a box). And I certainly don't want to connect them with homozygous fungus gnat maggots! If I think of 韭菜盒/合子 as "leek turnovers", then I'm happy as a clam when I'm lucky enough to get one.