[This is a guest post by Silas S. Brown]
It seems a few native Cantonese speakers employed in the production of Cantonese language courses are quite happy to read out Mandarin vocabulary with Cantonese pronunciation, rather than the actual native Cantonese versions of the words, and I can't help wondering why.
Recent example of many: cantoneseclass101.com – for example their "vocabulary builder" recordings, such as the recent one on face-related nouns. I have been unable to get a response from them about why, for example, they pronounced the Cantonese word for "ear" as yi5doh2 (from 耳朵) instead of yi5jai2 (from 耳仔) among several other "Mandarin-isms".
My Cantonese wife was keen to correct these mistakes, but was unable to suggest a possible reason why they got past the native Cantonese speaker involved in the recording. There might have been non-native speakers on the production team, but surely the actual recording artist who presents herself as "Nicole" would notice?
Assuming that Nicole and others who do this are (1) real native speakers and (2) not under pressure to rush through their recording jobs without asking questions, the only possible reasons I can think of for their failing to pick up on this are (a) Cantonese people are used to reading written Mandarin, and the act of working from a script can somehow flip them into a special "reading written text" mode instead of "normal spoken Cantonese" mode, and they won't alter the Mandarin-isms while in "reading mode", or (2) the people involved in this kind of work think that foreigners somehow "need" to learn the Mandarin-inspired version of their language, instead of actual Cantonese (this is the old language-teaching fallacy of trying to teach someone to read before they can speak).
Surprisingly, I have been unable to find any negative reviews of the content presented by cantoneseclass101.com.
Either they have a very good reputation management system, or I'm the only person who's tried their material in the presence of a real Cantonese speaker (or perhaps the world is full of Cantonese speakers who think learners need a Mandarinized version of their language—and/or don't care about learners—and therefore don't say anything).
Once again, cantoneseclass101 are not the only ones to do this. This is just the latest one to come to our attention, and we're mentioning them only as an example.
The (online-only, non-downloadable) dictionary at http://www.cantonese.sheik.co.uk/ gives both Mandarin and Cantonese readings for both 耳朵 and 耳仔, but it clearly marks 耳朵 as Mandarin-only and 耳仔 as Cantonese-only. On the other hand, it wasn't quite so helpful distinguishing its words for "cheek".
耳仔 is currently in the English Wiktionary but not the Chinese Wiktionary, and not CC-CEDICT, Pristine, Yahoo Dictionary and other services listed on Wenlin web links. It's a pity that Mandarin-oriented Chinese dictionaries don't include more Cantonese-derived words (marked as Cantonese of course), especially from the point of view of 'word segmentation'—you need data on
what combinations of characters make words regardless of topolect if you want software to segment written text that could have been influenced by any topolect.
It would be interesting to see what the forthcoming ABC Cantonese-English dictionary's take is. If it manages to steer clear of (or clearly mark) the Mandarin-isms that creep into other sources, that would be a valuable selling point.
Meanwhile, I think we have a warning for anybody involved in employing native Cantonese speakers to record audio: they might not see the need to point out any 'Mandarin-isms' in the script unless you specifically ask them to do so (but if we do then we might have to beware the possibility of over-correction).
We also have a cause for concern in text segmentation: it might be necessary to merge dictionaries from different topolects if we don't know where the source text came from (writing in "real Cantonese" seems to be getting a little more popular than it used to, although it's still on the fringe).