Mandarin phonetic annotation for English
« previous post | next post »
The PRC uses hànyǔ pīnyīn 汉语拼音 ("Sinitic spelling") for phonetic annotation, Taiwan uses zhùyīn fúhào 注音符號 ("phonetic symbols") for the same purpose. Since we are well acquainted with pīnyīn, but not very familiar with zhùyīn fúhào, I will focus on the latter in this post:
Mark Swofford, "If you ever find yourself stuck on how to pronounce English", Pinyin News (5/7/23):
Here are some lyrics from a popular song, “Count on Me,” by Bruno Mars, with a Mandarin translation. The interesting part is that a Taiwanese third-grader has penciled in some phonetic guides for him or herself, using a combination of zhuyin fuhao (aka bopo mofo) (sometimes with tone marks!), English (as a gloss for English! and English pronunciation of some letters and numbers), and Chinese characters (albeit not always correctly written Chinese characters — not that I could do any better myself). Again, this is a Taiwanese third-grader and so is someone unlikely to know Hanyu Pinyin.
Here is Mark's transcription of the third-grader's pencilled marks in bopomofo, sinographs, and the roman alphabet:
“If you ever find yourself stuck”
If | ㄧˊㄈㄨˊ | yífú |
you | ||
ever | ㄟㄈㄦ | ei-f’er |
find | 5 | five |
yourself | Uㄦㄒㄧㄦㄈㄨ | U’er xi’erfu |
stuck | ㄙ打可 | s-dake |
“I’ll be the light to guide you.”
I’ll | ㄞㄦ | ài’er |
be | ㄅㄧ | bi |
the | ㄌ | l[e] |
light | 賴特* | laite |
to | 兔 | tu |
guide | 蓋 | gai |
you | you | you |
“Find out what we’re made of”
Find | ㄈㄞˋ | fài |
out | ㄠㄊㄜ | ao-t’e |
what | 花得 | huade |
we’re | ㄨㄧㄚ | wi’a |
made | 妹的 | meide |
of | 歐福 | oufu |
“When we are called to help our friends in need”
花 | hua | |
we | ㄨㄧ | wi |
are | ㄚ | a |
called | 扣 | kou |
to | 兔 | tu |
help | 嘿ㄜㄆ | hei’e-p[e] |
our | ㄠㄦ | ao’er |
friends | ㄈㄨㄌㄣˇ的ㄙ | fulen-de-s |
in | 硬 | ying |
need | [?] | [?] |
I'm impressed, both by the third-grader's resourcefulness in annotating the sounds of Bruno Mars' lyrics with three different systems and by Mark's ability to transcribe his / her faint pencil marks.
Selected readings
- "Bopomofo vs. Pinyin" (4/28/15)
- "The end of the line for Mandarin Phonetic Symbols?" (3/12/18)
- "Another use for Mandarin Phonetic Symbols" (3/29/18)
Peter Cyrus said,
December 1, 2024 @ 3:16 am
The world is crying out for a Universal Phonetic Alphabet, and the IPA is not up to the task. But this is too big a topic for this forum: http://www.upa.bet
David Marjanović said,
December 1, 2024 @ 8:12 am
The homepage of upa.bet is shifted to the left, so the left edge of the page is beyond the screen, and there's no way to scroll left. All I can do is copy the whole text and paste it elsewhere. Fixing that is the first step to creating a Universal Phonetic Alphabet that isn't the Uralic one.
Next, any phonetic alphabet would have to be learned like the IPA. Such things can be made a bit more intuitive, but not much; "a well-designed phonetic notation whose basics anybody could learn in an hour" and which is nonetheless applicable to all languages seems evidently impossible to me. Even a Korean-style featural script would have that problem, and it would run into space issues.
By the way, do I really have to create an account just to read the posts? I can't find any posts, just several pages that tell me there are 26 posts somewhere.
Philip Taylor said,
December 1, 2024 @ 12:16 pm
All looks centred using Firefox 133 under Windows 11 Enterprise. David, even with uBlock Origin active..
ulr said,
December 1, 2024 @ 2:58 pm
Well, the only thing I see on that page is the message that my browser (Pale Moon) is unsupported. Whoever is responsible for that site can't even write standard HTML. I wouldn't trust them to write a sensible phonetic alphabet, let alone one better than IPA.
Jason said,
December 1, 2024 @ 3:32 pm
I taught English in Taiwan for several years and saw some students doing this. My boss at the Bushiban did not like the students doing this as it made them pronounce English with a Taiwanese accent instead of pronouncing it properly.
Keith Ivey said,
December 1, 2024 @ 5:40 pm
I have the problem David Marjanović mentioned, but not on the home page. The problem appears on the page you get when I click "Join the discussion". I'm using Firefox on Android.
Peter Cyrus said,
December 1, 2024 @ 7:02 pm
There's a mirror of the UPA home page on the Musa site at musa.bet/upa, but the Discourse site should work. Discourse is well-regarded, but this WordPress log is much easier to use.
Rather than speculate on how easy a phonetic alphabet could possibly be, why not just take a look, for example at musa.bet/ipacharts.htm? Or do the Quick Start, or take the lessons, or watch the video.
Philip Taylor said,
December 2, 2024 @ 5:08 am
https://upa.bet/ — "Unfortunately, your browser is unsupported. Please switch to a supported browser to view rich content, log in and reply". — Sigh.
Rodger C said,
December 2, 2024 @ 11:11 am
I can see it. It reminds me of Shavian: cute but useless. (Bouncier than Shavian, though.)
Terry K. said,
December 2, 2024 @ 11:55 am
For me, the upa.bet website shows fine when my browser is full screen (Windows computer), on a large monitor. However, when I reduce the window size, it has the problem that David Marjanović mentioned. Interestingly, it shows fine on my Android. (Chrome; first page is all I checked.)
Terry K. said,
December 2, 2024 @ 12:07 pm
Two things strike me as problematic about the idea of a phonetic alphabet where "the same sound would always use the same letter, and the same letter would always stand for the same sound".
First, vowels at least don't neatly separate out like that. More of a continuum. Might be true in some cases for consonants too.
Also, it's quite natural to use a phonetic alphabet as a phonemic alphabet, and when you use phonetic symbols to represent phonemes, they quite often don't always represent the same sound, due to both allophonic variation and accent/dialectal variations.
Daniel Barkalow said,
December 2, 2024 @ 1:31 pm
The problem with any phonetic alphabet for use by non-experts is that it needs to represent the sounds that speakers think they're making (or hearing) and the sounds that speakers need to try to make in order to get a particular result, rather than the sounds that people actually make. The human brain has an optimized processor for converting between what makes sense in the language and what is feasible to do with human vocal tracts, and it takes half a year of intensive practice to learn to route around this processor and identify and produce the actual sounds specified, as opposed to getting them automatically transformed, and, even if you learn to do it, it's slow and reveals that the actual sounds of the language don't make sense directly.
Humans have evolved to have this special speech processing system, and to use it automatically, and to not think about the fact that they are using it, that they need it, or even that they have it.
David Marjanović said,
December 2, 2024 @ 3:42 pm
Ah, that could explain things. I can't install the latest Firefox version on my hardware (long story).
musa.bet works! It gives me a splash page and then an introductory page in German. (I disagree with a few of the claims there, but I digress.)
https://musa.bet/upa/ is indeed the same as upa.bet, except it's all on the screen and the empty right sidebar is gone. :-) But the "Join the discussion" button leads to upa.bet. :-(
https://musa.bet/ipacharts.htm works, but I note the vowel chart simply makes fewer distinctions than the IPA because it simply ignores the central vowels. If I understand it correctly, it doesn't let you distinguish [y] from [ʉ] or even [ɜ] from [ʌ] – I say "even" because from my German starting point [ɜ] would be lumped with ö, but [ʌ] with a! Good luck trying to convince people that's not an important distinction…!
The script is featural, but without the iconicity of Korean; the shapes themselves are as arbitrary as IPA symbols, but they're simpler, so they're easier to confuse. Likewise arbitrary are all the choices of which vowel symbols to reuse for which two themselves unrelated consonant features ("here's a legend" at the bottom). This most certainly cannot be learned in an hour.
It also lacks some of the versatility of IPA; I've mentioned the central vowels, and aspirated fricatives (admittedly phonemic only in about 10 languages worldwide) seem difficult or impossible to represent as well.
The most useful page of the whole site seems to be the (literal) table of contents: https://musa.bet/contents.htm
David Marjanović said,
December 2, 2024 @ 4:32 pm
Oh, never mind the central vowels – it turns out they can be written, as digraphs. "In normal Musa text, vowels are never adjacent, so there's no ambiguity." There are languages that allow vowels to stand directly next to each other, without forming a diphthong, without inserting allophonic consonants, without a stress difference even. (Southern German does that a lot, for example.)
The same page shows that affricates that aren't in the chart are written as digraphs. For Pacific Northwest languages at the very least, that doesn't work.
Also never mind the aspirated fricatives and the preaspirates; they're here – the solution is a digraph with h. Well, let's hope none of that handful of languages distinguishes [ʰ] from [sh] (across syllable boundaries perhaps)… I see there is "a letter for an unexpected syllable break", but it's the same as the length symbol, assuming long consonants haven't simply been forgotten.
David Marjanović said,
December 2, 2024 @ 8:38 pm
[sʰ] from [sh], I mean.
Peter Cyrus said,
December 3, 2024 @ 3:10 pm
Thank you all for taking such a close look at the Musa project, and for taking the time to comment in depth. As the designer of the Musa Alphabet, I’ll try to explain the design choices made. I apologize in advance if my response is long-winded.
Vowels
As Terry K. notes, the vowel space is a continuum, and any notation system is going to divide it up into zones of “close enough”. The IPA division is very dense in the center – lots of zones – compared to the periphery (where ironically the most used vowels lie). When the IPA was first developed, the only scheme available imagined points of articulation in the height × backness plane, and they did the best they could.
But now – 140 years later – we have the technology to study vowel acoustics, as babies do, and to plot our zones in the log F1 × F2 plane, and the results are quite different: the central vowel symbols aren’t needed to provide an even and complete coverage of the vowel space for practical use (“allophonic transcription”).
Having said that, Musa provides linguists with a mechanism for more specificity (“orthophonic transcription”): vowel digraphs. The idea is to specify [ʉ], when needed, as “[y] but adjusted towards [u]”: [yu]. Hochdeutsch doesn’t have a [ɜ] phoneme, but if you needed to zero in on an open-mid central unrounded vowel, you could write it as [ʌɛ], as Musa does, or [ʌe] or [ʌæ] to raise or lower it without diacritics. In most cases, this digraph mechanism offers much more precision than the IPA. Adjacent vowels that are not orthophonic digraphs are separated by a written hiatus.
Iconicity: to use that same row of the vowel chart as an example, the four columns contain a triangle, an arch, a square, and a circle, all open to the left. I’m surprised that David Marjanović finds them easy to confuse, but doesn’t mind IPA [eɘəɵ]. I tried to make the Musa vowels iconic: closed vowels are closed, open vowels are open, rounded vowels are rounded, etc. I also find the shapes pretty iconic when used for consonants: stops are angled, fricatives and sibilants are rounded, rhotics are zigzagged, laterals look like a tongue. But if you don’t see those graphical features as iconic, I guess they wouldn’t help you learn Musa, or remember it.
Consonants
In allophonic transcription – orthography for a language – aspirated fricatives are written with the same letters as ejective fricatives: both are rare, and it seemed to me unnecessary to dedicate separate letters to them. In orthophonic transcription, there’s a suffix for “slacker phonation”, which would indicate aspiration for an unvoiced fricative letter. Musa suffixes combine with the preceding letter in many fonts to form a ligature, which looks like a normal Musa consonant but with three shapes instead of two: http://www.musa.bet/suffixes.htm#ligatures.
Even though there’s no involved in this example, it might occasionally happen that the same letter is used both as a suffix and as the second element of a cluster, for example in the sequence /kw/ in Standard Chinese versus the “labialized” /kʷ/ in Cantonese. I’m told they even contrasted in PIE. For cases like those, Musa has a mechanism for preventing the former from forming a ligature.
Musa has single letters for homorganic affricates, but heterorganic affricates are written as digraphs or ligatures. In comparison, the IPA writes all affricates as digraphs – how can they handle PNW languages?
Phonemes
Obviously, a universal alphabet has to be phonetic: phones are universal, while phonemes belong to a single language. Phonemes are also less “real” than phones: different analyses can posit different systems of phonemes that explain the phonology equally well. Phonemes are artifacts created by orthographers and phoneticians, and we learn them when we learn to read and write – I don’t believe they correspond to mental entities, as Daniel Barkalow suggests. Are [ŋ] or [ʒ] phonemes of English? Are initial [n] and final [n] the same phoneme in Chinese? Musa doesn’t care.
I have to confess that when I first learned Musa, I had trouble remembering to use a flap t in English, and relied on rules. But now I can hear the flap, and writing it comes naturally. In Roman, I have to remember the meanings and spellings of homophones like ladder/latter, medal/metal, or caret/carat/karat/carrot in order to write them correctly – in Musa, I don’t. And I’ve never seen any research that suggests that deeper orthographies are easier to learn – it seems quite the contrary. Yes, Musa sometimes forces you to make a distinction that isn’t phonemic, and people will struggle with the fine points – the final t is unreleased in fight and fort but not font or fast. But phonemic orthography doesn’t work for non-natives, at all. That’s why guidebooks recommend that Anglophones pronounce French Reims like “France” without the F.
So Musa writes phones, not phonemes, as does the IPA. Terry K. thinks that would make it more difficult for phonologists to write phonemes, but the best phonemic transcription systems are bespoke, like AHD/enPR or Arpabet. Using the IPA to write phonemes is, IMHO, a disaster – look at Oxford’s /traɪt/ – and even the IPA Handbook recommends using the Roman alphabet instead of the IPA for phonemes. Please write your phonemes with the current alphabets, not with Musa! :)
The Big Picture
The world’s languages use a huge variety of sounds, and Musa is trying to offer a letter for each of them: as of today, 285 consonants! There’s no way anyone is going to learn 300+ letters, and so the only roads to a solution are either to give letters multiple interpretations (the bad way) or to abstract a level and ask people to learn features. Musa has 22 graphical features that correspond to a vowel when used alone, or to manners and positions of articulations when combined. We also have four keys on the keyboard for punctuation and tone/intonation. So the two key Musa technologies are:
1. Generate the many consonants by combining a manner with a position, graphically, and
2. Use the same set of shapes for vowels, manners, and positions.
The result, hopefully, is to make this huge inventory of letters manageable by humans, and with as little attention as possible.
The IPA has fewer than half as many consonants, and thus relies on diacritics, superscripts, digraphs, and decorations. Did you know that the original sixth principle of the IPA said “Diacritic marks should be avoided, being trying for the eyes and troublesome to write.” The first two principles aspired to universality and consistency, and Musa also embodies them better than the current IPA. Worst, the IPA was originally founded by pedagogues, and the original alphabet was designed to be used by the public. That goal has also been abandoned. That’s why, in my opinion, we need a better Universal Phonetic Alphabet.
I’m sorry that some of you are having trouble accessing the UPA or Musa sites. They seem to work for most people, and the solutions are above my pay grade. I prefer to talk about the linguistics stuff, as you see. Thanks for reading all the way here!
Philip Taylor said,
December 4, 2024 @ 9:15 am
An excellent and informative response, Peter, but one (trivial) question, if I may ? When you write " That’s why guidebooks recommend that Anglophones pronounce French Reims like “France” without the F", that presumably assumes that (a) said Anglophones will pronounce "France" as in English rather than as in French, and (b) that they have northern (British) topolects, otherwise the vowel will tend to emerge as /ɑː/ rather than /æ/.
Philip Taylor said,
December 4, 2024 @ 9:20 am
P.S. On the "above my pay grade" stuff, having content shifted to the left is clearly sub-optimal but saying "your browser is not supported" is not — no web site (or cloud-based proxy) should presume to judge which browsers are capable of rendering content successfully and which are not. Let the web site pass W3C validation for HTML & CSS, let Javascript be used only to add value, and caveat lector if his/her browser then does not render the content in an acceptable manner.
Philip Taylor said,
December 4, 2024 @ 9:39 am
-> […] having content shifted to the left is clearly sub-optimal but possibly acceptable, but saying "your browser is not supported" is not acceptable at all […]
Peter Cyrus said,
December 4, 2024 @ 5:02 pm
My point with the Reims example was just to illustrate what it would be like to try to indicate pronunciation in French using English phonemes. The example is not original with me, and I can't vouch for its efficacy. Americans do pronounce "France" with [æ], though.
If you can see the UPA page on the Musa site, then my HTML is OK. It's the Discourse server that is having trouble displaying HTML, and complaining about non-compliant browsers. And I am just a client: I just upload the HTML.
Choosing Discourse was my fault, though. I wanted a service that would manage the nuts and bolts of chats and forums – I don't want to manage login data, remind you of your password, etc. Language Log uses WordPress for the same reasons, but I thought WordPress might not handle Musa text. Let's try it:
Terry K. said,
December 4, 2024 @ 5:13 pm
Peter Cyrus wrote:
"…Terry K. thinks that would make it more difficult for phonologists to write phonemes…"
I don't see how you got that from what I wrote, which didn't mention phonologists, nor did it talk about anything being easy or difficult.
Jonathan Smith said,
December 4, 2024 @ 6:13 pm
Re: "Phonemes are artifacts created by orthographers and phoneticians": plug dozens of flavors of (say) bilabial plosive into the template ≈[_æt] or whatever and ask (monolingual) speakers of English what they hear — answer is "pat" OR "bat", the two sets representing the phonemes /p/ and /b/ of English… entities every bit as psychically real as the color blue or the letter 'a', regarding either of which we could also say "no such thing, color is a spectrum and every individual instance of written 'a' is slightly different."
So yes alphabetic writing chases this lodestar ("represent the contrasting sound classes of my language"), as it should. As pointed out already above, showing tons of phonetic detail (like four kinds of bilabial plosive for the new English spelling proposed on this system) is impractical (e.g., the "system" breaks every ten years or ten miles) and massively counterintuitive for users — indeed, it is rather the units of such phonetic representations that are unreal, as the aural instantiations depicted exhibit no such segmentation.
Whether a better phonetic alphabet than IPA could/should be designed for specialist use, I really DK, but no, practical alphabets are going to be phonemic to some approximation.
Peter Cyrus said,
December 5, 2024 @ 5:39 am
Terry K.: sorry I misunderstood. You mentioned the transcription of phonemes, and I let my imagination run. :(
Jonathan Smith: I agree that practical alphabets will interpret the e.g. bilabial plosives into a discrete set of sounds. Is that limited to two such sounds, or three in Thai, or four in Hindi? I chose the word "allophonic" for the level of detail that Musa offers/demands, because it's not minimal, as phonemes are.
In a Musa world, French speakers would learn the voiced and tenuis letters, Chinese speakers would learn the tenuis and aspirated letters, Thai and English speakers would learn all three plus the unreleased letter, while Hindi speakers would add the breathy letter. I think people are capable of that :)
When learning to read and write in English, we teach children that the plosives in span Stan scan are fortis p t k, but they actually pattern better as lenis b d g. We often write k as ck in final position but never initially. This level of detail seems feasible, especially if it corresponded better to the actual sound.
So I think Musa's "allophonic" level IS your "phonemic to some approximation".
David Marjanović said,
December 5, 2024 @ 7:37 am
Just to say that I appreciate the long reply. My reply will have to be similarly long, and I'm unusually busy at the moment…
I will say that phonemes are 1) real, 2) quite a bit more complex an issue than most people who've heard of the concept think, 3) the IPA has its own problems with all that, in part because of its 19th-century legacy, 4) actual practice of IPA usage routinely makes that quite a bit worse.
A long string of placeholder glyphs for me, but evidently that's a font issue.
Peter Cyrus said,
December 6, 2024 @ 4:48 am
Before we let this thread disappear into well-deserved oblivion, may I ask you guys a question? Have any of you have come around to viewing Musa in a positive light as a result of this discussion?
I'd also like to invite you to continue debating the topic on Discord, on Reddit, on Facebook, or via email.
Thanks for all the comments!
Peter Cyrus said,
December 6, 2024 @ 4:49 am
Sorry: I see that my email is not displayed: it's pcyrus@musa.bet
Philip Taylor said,
December 6, 2024 @ 9:37 am
Peter — Even if I download and install the font "RomanMusaTrans.otf", the string of hieroglyphics displayed in your earlier comment (and echoed by David Marjanović) resolutely refuse to display correctly. If I set my default browser (Firefox 133) sans serif font to RomanMusaTrans, I can see most of the glyphs correctly but not all (the bad one is "") — do you have any idea wherein the problem may lie ?
Positive light : the Musa script is first-language-agnostic.
Negative light : my mind cannot "see" any of the phones that I encounter on a daily basis when I look at the Musa script.
Peter Cyrus said,
December 7, 2024 @ 4:20 am
Philip: please email me, so we can both find out what the problem is. pcyrus@musa.bet
Peter Cyrus said,
December 7, 2024 @ 6:04 am
One last comment on the topic of phonemes (but we really should switch to email):
Phonemes are now defined as minimal: an analysis with fewer proposed phonemes is to be preferred. But this criterion doesn't arise from any other considerations of the roles of phonemes. There's no requirement for mental entities to be minimal, nor for the set of phonemes that requires the least phonology – the fewest computational steps to phones – to be minimal.
So I wonder if those of you who favor an interpretation founded on phonemes would accept a larger set of phonemes: a non-minimal set. If so, we may not disagree,
Peter Cyrus said,
December 14, 2024 @ 6:20 am
David Marjanović, I'm still looking forward to hearing your thoughts (but there's no hurry). Perhaps via email? Thanks :) pcyrus@musa.bet