Test for dialect relatedness: especially for Northeast topolect groupies
« previous post | next post »
Several of my PRC M.A. students have told me that the following tool for the computation of dialect closeness has become quite popular in China:
fāngyán yīnxì xiāngsì dù cèshì 方言音系相似度測試 ("Dialect phonological similarity test"),V3.2.358
(source)
It became popular and widely spread in Chinese social media recently, as a simple tool/quiz that helps you to locate and clarify your local dialects/accents.
Just select pairs of characters that share the same initials, finals, and tones, respectively.
And the website will give you a result with related probability:
I did not find which person or group created this website or what academic source it was based on, but it seems to work pretty well.
[Thanks to Yizhi Geng, Diana Shuheng Zhang, and Xinyi Ye]
wgj said,
December 20, 2025 @ 7:54 am
I'm surprised that PRC users have no problem using the test, given that it's in traditional Chinese. I'm also suspicious whether the creator has reliable sampling data, if they are from outside Mainland China.
Jerry Packard said,
December 20, 2025 @ 9:22 am
I hope the output graph has better resolution than the one here, which is virtually unreadable. I can’t really tell what the result output is giving us. It seems to be a dialect identifier rather than a dialect relatedness estimate.
Philip Taylor said,
December 20, 2025 @ 9:54 am
Is this any better, Jerry ?
Jonathan Smith said,
December 20, 2025 @ 10:21 am
Typical wrong thinking: "fangyan" simply consist of variant "readings" of a single set of "characters" bestowed upon "the Chinese" by "Huangdi". Why should these characters point unambiguously to a particular word, or to any word at all, of a given Chinese language?
Chris Button said,
December 20, 2025 @ 3:09 pm
@ Jonathan Smith
If I understand your point correctly (and perhaps I don't), then I don't think it matters.
Looking at a random example from further afield, take for example how Portuguese speakers might refer to a pineapple:
– a person in São Paulo would use "abacaxi"
– a person in Lisbon would use "ananás"
But if you asked the person in São Paulo to say "ananás" instead, they would still pronounce it differently from how the person in Lisbon pronounces it.
So, wouldn't the test would still work?
Chris Button said,
December 20, 2025 @ 3:17 pm
And then to bring it back to Chinese, we could randomly assign the characters 鳳梨 to "abacaxi" and 菠蘿 to "ananás".
It doesn't really matter whether 鳳梨 or 菠蘿 is more natural to you. You can still pronounce the other one and betray your background.
Michael Watts said,
December 20, 2025 @ 4:00 pm
This question doesn't make any sense. Why should the string "apothecary" point unambiguously to any particular word of a given English dialect?
The test doesn't even assume that the characters represent words, only that they represent pronunciations. That's how the writing system works; it's not so much an "assumption" as a fact.
Jerry Packard said,
December 20, 2025 @ 5:13 pm
Yes Philip thank you. I see that what the app does is to place the speaker within a dialect defined by geographic place. It does so by figuring out which initials, finals and tones (as perceived by the speaker) are most likely in given geographic areas.
Jonathan Smith said,
December 20, 2025 @ 5:26 pm
@Chris Button @Michael Watts
I am assuming you read the thread title and took this test to concern "dialects" on some commonsense understanding of that word to speakers of English — thus the comparisons to dialects of Portuguese and of English respectively.
Whereas I read the Chinese tool name ("方言音系相似度測試") and looked at the tool. It concerns what are called fangyan in modern Mandarin; that is, to an approximation, Sinitic languages.
Actually, what this tool could or couldn't achieve if it were *really* concerned with *dialects* sensu stricto — say, Mandarin as spoken across ≈rural Hebei+Shandong — is a complicated and interesting question. Maybe further comments can come back to it… and in the process shed some light on your thoughts re: Portuguese ‹abacaxi›/‹ananás› and English ‹apothecary›, etc.
But main point for now is, it's not. So depending on your Sinitic language along with your "Chinese character life experience," character X (so ‹眼› or ‹書› or ‹子›…) could suggest zero, one, or multiple words of your language and zero, one, or multiple (kinda associated?) "pronunciations." (And NB: in the real world, characters will often mostly suggest something Mandainesque that you learned in school or somewhere.)
John Rohsenow said,
December 21, 2025 @ 4:17 am
Philip Taylor said,
December 20, 2025 @ 9:54 am "Is THIS any better, Jerry ?"
Apparently that link worked for Jerry, but not for me. Pls post thr full LINK address again? Thank you.
Philip Taylor said,
December 21, 2025 @ 6:26 am
By all means, John — https://www.dropbox.com/scl/fi/7e1e7gy7t16cdwg46gmqi/Fullscreen-capture-20122025-145204.jpg?rlkey=piynxnpkwfleuoyt6wnoh5bm3&e=1&dl=0
Peter Cyrus said,
December 21, 2025 @ 7:35 am
There are a number of similar resources for dialects of North American English (and probably also for other zones). The input questions include both vocabulary ("what do you call x?") and pronunciation ("does x rhyme with y?"), which makes perfect sense, as both vary by region.
What stood out for me in the sample above is how easy it is to specify meanings using characters. An American quiz might ask if you rhyme COT with CAUGHT, or FATHER with FARTHER, because our spelling is imperfect (and thus indicates the meaning), but it's harder to ask whether, for example, you pronounce :"BOW" tie like "BOW of a ship" or like "BOW and arrow" – we need to use paraphrases. As a committed advocate of phonetic spelling, it's hard to admit that semantic spelling has its advantages.
The American quizzes also include questions like "What do you call the strip of grass between the sidewalk and the street?", a term which varies across dialects. But since the possible answers are completely unrelated expressions, I don't see how the Chinese quiz could ask them in the same format they use for pronunciation.
Chris Button said,
December 21, 2025 @ 8:42 am
@ Jonathan Smith
I thought your issue was this: "Why should these characters point unambiguously to a particular word, or to any word at all, of a given Chinese language?"
I don't think your answer addresses that.
So, to return to Portuguese, Brazilian and European Portuguese differ far more than say US vs British English. Written Portuguese brings them much closer, albeit still with noticeable differences.
But at that point, you may as well bring in Spanish because if you are literate in either Portuguese or Spanish, you are also able to read the majority of things written in either language. And with that, you might compare something written specifically for Cantonese speakers versus something written for Mandarin speakers.
Michael Watts said,
December 21, 2025 @ 4:49 pm
Why are you making that assumption? It's crazy. The tool is meant to be informative as to 方言, and it is well suited to that task. There was no point at which I imagined that the purpose of asking whether 去 and 口 are pronounced with the same onset, or whether 困 and 孔 are pronounced with the same rime, was to distinguish two varieties of Mandarin from each other.
This is not a credible objection. Words, as I already mentioned, aren't at issue. It's possible for characters to have multiple pronunciations, but the way to address that issue is to use well-chosen test items. This isn't a test of obscure character knowledge. It's a test of pronunciation of common characters. All that's necessary is to choose pairs that are informative.
From a Mandarin perspective, you might note that 脚 has two pronunciations, but that neither of them matches with the "rime" of 甲, which is the focus of the 脚 item. The same thing happens with 剥 and 八. In general, I assume that if an ambiguity did arise somewhere, people would answer "yes" if there was any match, and "no" otherwise. (The instructions say "please select the character pairs which share the same 声母 / 韵母 / 声调".)
But I should ask. Do you actually see a problem somewhere on the test? You seem to believe that somebody somewhere has made a mistake. Why?
Jonathan Smith said,
December 22, 2025 @ 12:42 am
@Michael Watts
So if I understand rightly, you are saying one can indeed simply take this list of characters (or a similar one) around to speakers of various Chinese languages and have them "read" the characters off in their home language? This doesn't work or even make sense for Many Reasons which I've tried to gesture at above but am not feeling up to writing an essay about ATM. (This is, however, kind of the way people tried to do "dialectology" throughout much of the 20th century… until they realized it doesn't work or even make sense.) When newer work includes "readings" so elicited, they are (hopefully) notated as e.g. du2zi4 "讀字" — meaning roughly that the survey participant vocalized thus when visually presented with the character but I (the author) DK what that means if anything regarding their home language.
The reason I suggested this survey might kinda sorta work for closely related dialects (NB of Mandarin!) is that such lexicons would be broadly comparable and literate folks may (!) to some degree (!) produce a home language when "reading" from text depending on Factors. (And re: "去"/"口", etc., I'm afraid you have underestimated variation inside "Mandarin," as there are plenty of places where e.g. 'go' [in some or all linguistic contexts] begins with kh-.)
Chris Button said,
December 22, 2025 @ 5:41 pm
To me, it seems akin to something like the following:
– Please pronounce "triste" (meaning sad) in various dialects of Spanish, Portuguese, French, Italian, Romanian …
– Please pronounce "miel / mel / miele / miere" (meaning honey) in various dialects of Spanish, Portuguese, French, Italian, Romanian …
In the second example, the spelling superficially confuses things. A single 蜜-esque "m(i)el/r(e)" would be helpful in such cases–although miel" occurs alongside "miere" in Romanian but with an entirely different meaning.
Chris Button said,
December 22, 2025 @ 11:12 pm
When looking at the basic lexicon of a group of several related "languages/dialects", speakers may produce the expected lexeme based on established sound laws.
Or they may say something totally different.
When prompted for the expected lexeme based on established sound laws (here is where I suppose a commonly known Chinese character would help if the language were a "Chinese" one), they may say one of the following:
– yes we have it, but it has this (clearly related) meaning
– yes we technically have it, but it's not really used anymore
– yes we technically have it, but you sound like you are from this region instead
– no we don't have it, but it would probably sound something like the way you just said it if we did have it.
@ Jonathan Smith – is the fourth answer above what you are getting at? If so, isn't a Chinese character a good way at getting at that for a "Chinese" language?
Chris Button said,
December 22, 2025 @ 11:27 pm
It seems to me like the survey is asking:
"How do you sound when you say this?"
That is not the same question (nor does it have the same objective) as the question:
"Do you even have this word in your language/dialect"?
Chris Button said,
December 22, 2025 @ 11:32 pm
It seems to me like the survey is asking:
"How do you sound when you say this?"
That is not the same question (nor does it have the same objective) as the question:
"Do you even have this word in your language/dialect"?
Jonathan Smith said,
December 23, 2025 @ 2:57 am
@ Chris Button
Re: 'honey' in Romance, etc., yes this seems to be roughly the creators' notion: "a written character maps to a cognate set thus people can/will produce their home language cognate when prompted to 'read' the character."
But it doesn't work. Largest-scale problem is sociological: in (most) people's experience, writing period is (mostly) associated with Mandarin-driven "Standard Written Chinese." So even if someone's language happens to have a straightforward cognate of a particular etymon, a written character generally won't elicit it. And how could it? That would involve a kind of "reading" that (most) people have no experience with.
Or put otherwise, whereas literate users of Mandarin varieties feel no space between the words and word-pieces of their language and the conventionally-associated characters (with the word zi4 for such folks really pointing to this ontological union), words of non-standard languages aren't generally felt by speakers to be associated with written characters at all… or at best maybe in goofy, ad hoc ways.
Then of course the thicket of strictly linguistic problems which you begin to name: even if you get to a Mandarin word~pronunciation and further to the idea that you are supposed to generate an associated (?) / equivalent (?) / "cognate" (?) item of your local language… there might be no such word… there might be two, three, or more such words (say a true cognate and one or more later loans)… there might marginally be such a Mandarin-esque word or words in certain "literary" contexts… there might just be a bunch of words that sound and/or mean about right (and who knows what "cognate" means anyway)… or maybe you're even acquainted with some vernacular writing tradition in which the character in question is used in an entirely different way(s)… etc.
Many cases like all of the above can be found in this list… or better to say most of these characters present such problems. Random example "母"; see e.g. article entitled "閩南方言「母」多重音讀的層次辨析" (NB the title's framing of the issue in terms of "readings" is upside-down.)
—
Random thought why "喫" in Initials Column 2 / Row 4… someone (wrongly) thought this was how to write Mand. chi1 吃 'eat' in some contemporary "traditional" script variant? The weird character seems to have caused the survey-taker to check this pair as sharing an onset which is surprising at least to me… or something funny is really going on in Yutian / Tangshan / Hebei? Speaking of which I could not generate a 100% Putonghua result goofing twice… someone show me the error of my ways?
Jonathan Smith said,
December 23, 2025 @ 3:06 am
re: your last comment, this can't be the point given the design, but yeah for many people the only natural thing do to would be to read the characters~words more-or-less-well in their very best "Mandarin"/"Zhongwen"/"Guoyu"…
Chris Button said,
December 23, 2025 @ 2:13 pm
That's interesting to hear. I hadn't realized there was such a rigid association of individual character readings with Mandarin. Not being an expert in such matters, I had assumed many people would be similar to, for example, Cantonese speakers in associating non-Mandarin readings with the characters as appropriate.
Jonathan Smith said,
December 23, 2025 @ 6:39 pm
^ I will go on a limb (glad indeed to be corrected) and guess that Standard-ish Cantonese is the only other Chinese language where a spoken vernacular is found committed more-or-less directly to writing in non-negligible contexts at non-negligible scale… e.g. handwritten signage on display in the other post… and people text-messaged this way at least when I was living in Guangzhou ~20 years ago. This written register is kind of one end of a long sliding scale upper end of which is HK-type "Standard Written Chinese" (which incidentally people by no means regard as equivalent to e.g. the spoken language of Beijing.)
So it seems words inevitably leak down the registers. E.g. this survey has "找", which a Mandarin brain reads as zhao3 'look for'… whereas 'look for' in everyday Cantonese is (surely?) wan2 with its written representation "揾"… but there is also zaau2 找 in Cantonese "literary" (?or otherwise elevated?) contexts. Many others — see e.g. "喫[吃]" "喝" — from this chart which are associated (of late!! from a Mandarin POV!!) with distinctively northern words are similarly "readable in Cantonese" given its peculiar history. But again if this is all the chart can elicit, it is hardly redeemed; such "literary registers" have special phonological patterns and are in some respects dreamed up.
Chas Belov said,
December 25, 2025 @ 9:27 pm
@Peter Cyrus:
I have no idea. I guess I'd flunk that test.
@Chris Button:
Indeed. I remember an L1 Cantonese co-worker telling me her L1 Mandarin husband could read the news section of the San Francisco Chinese newspaper, which was written in standard written Chinese but was completely baffled by the entertainment section, which was written in colloquial Cantonese. This was over 25 years ago, so I don't know whether it is still the case. (Although I could answer that by obtaining a copy, as, while I can't read either, I can distinguish them.)
@Jonathan Smith:
¿ATM? ¿Automated teller machine? ¿Adobe Type Manager? ¿Asynchronous transfer mode? (I can't think of a language-related phrase that might apply to.) ¿What is this ATM you don't feel like writing an essay about?
Philip Taylor said,
December 26, 2025 @ 5:12 am
I suspect (but cannot prove) "at the moment", Chas.
Chas Belov said,
December 28, 2025 @ 4:03 pm
@Philip Taylor: I suspect you are correct. Preceding "ATM" with "about" certainly garden paths it.
Jonathan Smith said,
December 28, 2025 @ 8:21 pm
Oops, yes "at the moment" — no garden path b/c grammar, but indeed confusing given that (1) lowercase "atm" seems upon reflection to be normative GenZ textese and (2) I did kind of end up writing an essay atm.
Chas Belov said,
January 6, 2026 @ 12:59 am
Garden path because with
I took ATM to be a noun phrase rather than a prepositional one, with "about" being a forward reference rather than a back one.