In response to my post on the relative difficulty of learning to read in English ("Ghoti and choughs again", 8/16/2008), Mark Seidenberg sent a note raising an interesting question about the relationship between writing systems and the morphology of the languages they represent:
It is my informal observation that the shallow orthographies are associated with languages that have relatively complex morphology (inflectional and/or derivational). Classic examples would be Serbo-Croatian, Russian, Finnish and German (though of course these languages aren't all morphologically complex in the same way). I mean complex relative to other languages like English. The deep orthographies are associated with languages such as English and Chinese, which have relatively simple morphological systems. Perhaps this observation is correct (though mixed systems such as Japanese present a potential challenge); perhaps your readers would be able to generate counterexamples. Still, if the general trend holds, the question would then be why properties of the writing system trade off against properties of the language.
I'm not sure whether the generalization is really true — Japanese, for example, certainly does look like a morphologically-complex language whose standard written form is anything but phonologically shallow. But if the generalization were true, I'd be tempted to try a functional explanation as follows. In speech, quite a bit of homophony (different morphemes being pronounced the same way) is tolerable, because prosody and context usually make it clear enough how to put morphemes together syntactically and how to interpret them. In writing, though, a corresponding amount of homography (different morphemes being written the same way) is more problematic, because the prosodic clues are missing; and therefore some phonologically-opaque differentiation of the written form of words is helpful. In a morphologically complex language, this motivation is reduced, because the inflections continue to provide high-redundancy clues about how to fit the morphemes together syntactically, so that there's less reason to create (or preserve) arbitrary differences in how to write them.
Note that this hypothesis doesn't make any claims about the relative (or absolute) amounts of homophony (or redundancy), so it's not easy to disprove. But I'd caution that (as a matter of practical experience) post hoc functional explanations rarely hold up, at least in linguistics. And the social history of orthographic systems is so full of apparently-arbitrary top-down political choices that it's hard to see how functional pressures (even if real) could have a consistent evolutionary impact.
Mark half-seriously suggests an alternative functional explanation, in terms of a sort of conservation of learning difficulties:
A related issue: reading researchers look at reading. But, what is happening with the children's spoken language acquisition? Serbian speakers get the spelling-sound correspondences essentially for free, because they are highly consistent. It would be a good bar mitzvah language. However, the inflectional system is very complex and speakers continue learning it well after the spelling-sound correspondences are second nature. So my question for people who find differences in ease of learning to read is: what is the state of the child's knowledge of the corresponding language? Maybe it's a good thing that Finnish orthography is shallow: the children can then spend a little extra time mastering the spoken language.
By "a good bar mitzvah language", Mark means a language whose written form is easy to learn how to recite, whether or not you understand any of it or even recognize the words. This is a reference to the fact that some Jewish children learn only enough Hebrew to be able to read a Torah or Haftarah passage out loud at their bar mitzvah (or bat mitzvah) ceremony. If the diacritical signs representing vowels are present, written Hebrew is phonologically transparent enough that it's fairly easy to learn to read it in this way, without knowing much (or even any) of the language. (And the cantillation signs provide a stylized form of phrasing and intonation…) Mark's point is that it's also easy to learn to "recite" Serbian, in this sense of pronouncing the written form (though obviously with an accent if you're not a native speaker [– and minus vowel length and word accent, according to Boris Blagojević's comment below]). It's much harder to learn to recite English (or Chinese) .
Returning to the question of the relative difficulty of learning to read English in a more general sense, Mark observes that
The key paper here is by Nick Ellis, now at Michigan, comparing learners of Welsh and English, which differ in orthographic depth. Welsh, shallow, wins. But, the question is, what was the state of their knowledge of Welsh vs. English grammar?
That's Nick C. Ellis and A. Mari Hooper, "Why learning to read is easier in Welsh than in English: Orthgraphic transparency effects evinced with frequency-matched tests", Applied Psycholinguistics 22: 571-599, 2001. The abstract:
This study compared the rate of literacy acquisition in orthographically transparent Welsh and orthographically opaque English using reading tests that were equated for frequency of written exposure. Year 2 English-educated monolingual children were compared with Welsh-educated bilingual children, matched for reading instruction, background, locale, and math ability. Welsh children were able to read aloud accurately significantly more of their language (61% of tokens, 1821 types) than were English children (52% tokens, 716 types), allowing them to read aloud beyond their comprehension levels (168 vs. 116%, respectively). Various observations suggested that Welsh readers were more reliant on an alphabetic decoding strategy: word length determined 70% of reading latency in Welsh but only 22% in English, and Welsh reading errors tended to be nonword mispronunciations, whereas English children made more real word substitutions and null attempts. These findings demonstrate that the orthographic transparency of a language can have a profound effect on the rate of acquisition and style of reading adopted by its speakers.
Ellis and Hooper show some interesting effects beyond percent correct, such as an apparently-paradoxical difference in overall latency (how long it took to read a word) and in the slope of the function relating latency to (word) frequency:
The naming latencies for correct responses are shown in Figure 2, which plots the mean reaction times (RTs) for each frequency-matched pair of English and Welsh test items from item 1 (frequencies > 54,000 per million) down to item 100 (frequencies = 1 per million). Note that this is a graph of the main effects of frequency and language; the individual data points reflect a wide variety of other lexical influences, including word length, imageability, orthographic regularity, and sound – spelling consistency. Overall, the latencies are greater for Welsh than for English: English M = 1.41 s, SD = 1.13; Welsh M = 1.85 s, SD = 1.11; t(82) = 2.70, p < .01, and, as can be seen in Figure 2, the difference between the two groups increases as an inverse function of word frequency.
Why were the children tested in English "faster readers", on average, given that the children tested in Welsh were "better readers" as judged by the percent of test words correctly read? Ellis and Hooper give this explanation:
… it should be remembered that there are fewer correct responses in the English group; indeed there were 17 words for which there were no correct responses for the English group (hence the df of 82). Thus, the Welsh function is penalized by the presence of a greater number of low-frequency data points for Welsh. Over the 30 most frequent items in the languages for which there was, on average, greater than 95% accuracy of responding in both groups, there was no significant difference in the latency of responding… Over the next 30 items (31 – 60), on which responding was 42% correct in the English group and 50% correct in the Welsh group, the latencies were still not significantly different in the two groups… It is only in the last bin of 40 items (61 – 100) that the latencies differed significantly: English M = 1.734 s, SD = 1.13; Welsh M = 2.26 s, SD = 0.93; t(22) = 2.5, p < .05, on the 23 word pairs where there is some correct naming in both groups, although English accuracy was only 6.8%, whereas Welsh accuracy was 17.3%.
Mark's note also argues for applying a consistent psychological model of reading across types of orthographic system:
… there is a lot of research now, like Seymour's, suggesting that English is harder to learn to read than some other writing systems. There is a recent article in Psych Bulletin by David Share suggesting that reading research has been thrown off by its emphasis on English, which he believes is highly atypical. I think he's missed the boat. People's brains are alike. There are only so many ways to read. You want a general theory out of which differences between writing systems fall out. Whether English is "typical" or not depends on how you do the normative comparison. That's what our models attempt: ithere's one architecture (determined mainly by the nature of the reading task and how it relates to spoken language), and one set of principles about how knowledge is represented, learned, processed, etc., There are then differences across writing systems in the "division of labor" between components. So it's easy to place English in a broader context that includes other writing systems.
The article that Mark mentions is David L Share, "On the Anglocentricities of current reading research and practice: The perils of overreliance on an 'outlier' orthography", Psychological Bulletin 134(4) 584:615, 2008.
Mark's own proposals can be found in various papers in his publications list, for example in M.S. Seidenberg and D.C. Plaut, "Progress in understanding word reading: Data fitting versus theory building", In S. Andrews. (Ed.), From inkmarks to ideas: Current issues in lexical processing, 2006.