The classification of [nan] Chinese (Min Nan)

« previous post | next post »

[Serendipitously, right while we are in the midst of energetic discussions over the classification of and terminology for the languages of Taiwan, I received a communication from the international body that is charged with such matters for all the languages of the world, namely, an arm of the ISO.

The following (after the page break) is a guest post by Janell Nordmoe, Registrar of ISO 639-3 Language Coding Agency.  For those who are not familiar with it, "ISO 639 is a standard by the International Organization for Standardization (ISO) concerned with representation of languages and language groups." (source)

There have been significant changes with the publication of 639:2023, including that the decision on CRs rests with the Maintenance Agency, not SIL as Language Coding Agency for 639-3.

This link describes the four sets within ISO 639, the Maintenance Agency.

At the link to the info about the 639 standard, the public reports link is the bottom of the page under Public Reports from the Maintenance Agency.]

——————————————————-

New language code proposals for Taiwanese

While researching Taiwanese, I encountered your work in several places including Language Log, which led me to write to you. The short question I'm requesting your comment on is, how is Taiwanese distinct from Min Nan Chinese/Hokkien [nan] in terms of literature and ethnolinguistic identity?

The long version: In 2021 the Registration Authority for ISO 639-3, SIL International, received two requests to create codes for Taiwanese in the comprehensive set codes for world languages. They can be found at Taigi 2021-044 and Taiwanese 2021-045 (part of an 11-way split of [nan] Chinese, Min Nan) proposal. The consideration of these two requests was delayed due to the expected revision of ISO 639 (which was finally completed at the end of 2023) and is now underway. 

Both change requests lack sufficient evidence from scholarship with regard to the creation of a new language code for Taiwanese as distinct from [nan] Min Nan Chinese, which both Ethnologue and Glottolog currently list as dialects of [nan] (in the case of Glottolog, Taipei Hokkien is a sub-dialect of Quan-zhang dialect).

According to the ISO 639:2023 standard, the distinction between a language and a dialect is based on the criteria below.  In the case of Taiwanese, we have not found scholars making the case that Taiwanese is not intelligible with Hokkien/Min Nan/[nan] as in (a).  The best case seems to rest on the distinct identity and distinct literature basis of criterion c.:

  1. Two related language varieties are normally considered to belong to the same individual language if speakers of each language variety have inherent understanding of the other language variety at a functional level (they can understand each other based on knowledge of their own language variety without needing to learn the other variety)

  2. Where spoken intelligibility is marginal, the existence of a common literature or common ethnolinguistic identity with a central language variety that both speaker communities understand is a strong indicator that they should nevertheless be considered varieties of the same individual language

  3. Where there is enough intelligibility between language varieties to enable communication, they can nevertheless be treated as different individual languages when they have long-standing, distinctly named ethnolinguistic identities coupled with established linguistic normalization and literatures that are distinct

Would you care to comment, for the benefit of the 639 Set 3 Language Coding Agency and for the 639 Maintenance Agency (MA) voting members, on the distinctiveness of Taiwanese from [nan] Min Nan in terms of 

  1. Literature

  2. Ethnolinguistic identity

Articles and blogs describing Taiwanese tend to compare with Mandarin, or lack distinction. between Taiwan and mainland China where [nan] is spoken.

 

Selected readings



12 Comments

  1. Philip Taylor said,

    July 27, 2024 @ 1:22 pm

    Is there any reason why the hyperlinks in this particular thread are all protected by https://urldefense.com/, whereas hyperlinks in all other threads that I have checked are not so protected ?

  2. Jonathan Smith said,

    July 27, 2024 @ 10:08 pm

    Victor I am hoping you checked with the sender and with the authors of these proposals to make sure it was OK to post this email and associated info here? Folks may prefer not to have the particulars of these requests / their personal info exposed in this way even when in principle publicly available.

    Re: the topic at hand, I'll just say that at the very least it should be easy to determine from the academic literature that ISO's/Ethnologue's "nan" (among others) is not remotely an "individual language" and needs to be split in some manner.

    @Philip Taylor

    Because Penn uses email security tools (Proofpoint > URLDefense) that so wrap URLs for evaluation, and then the OP was copied directly from email including above the break (nerdview stuff like "the decision on CRs [?] rests with the Maintenance Agency, not SIL [?] as Language Coding Agency for 639-3" comes from the email sender.)

  3. Victor Mair said,

    July 27, 2024 @ 10:53 pm

    @Jonathan Smith

    This guest post was approved by the author.

  4. Vampyricon said,

    July 28, 2024 @ 12:30 am

    Where there is enough intelligibility between language varieties to enable communication, they can nevertheless be treated as different individual languages when they have long-standing, distinctly named ethnolinguistic identities coupled with established linguistic normalization and literatures that are distinct

    This is clearly a reference to how one Balkan dialect was split up into four languages even though they're all completely mutually intelligible. A quick skim of Wikipedia says "[c]lear ethnic conflict between the Yugoslav peoples only became prominent in the 20th century", claiming a start date of the early 1920s, so if that's their standard of "long-standing, distinctly named ethnolinguistic identities", then the Taiwanese identity had been developing under Japanese rule for 30 years by that point, to the point that Taiwanese elderly are more comfortable with Japanese than any Chinese language, Taiwanese Hokkien aside.

    While we're at it, let's classify Hong Kong Cantonese as a separate language from Guangzhou Cantonese as well.

  5. John Rohsenow said,

    July 28, 2024 @ 2:08 am

    In the orig. post "There have been significant changes with the publication of 639:2023, including that the decision on CRs rests with the Maintenance Agency, not SIL as Language Coding Agency for 639-3", does "SIL" refer to the Summer Institute of Linguistics?

  6. Chas Belov said,

    July 28, 2024 @ 3:00 am

    SIL International is, according to their website, "a global, faith-based nonprofit that works with local communities around the world to develop language solutions that expand possibilities for a better life." Alas, they don't seem to expand the acronym, much like KFC nowadays.

  7. AntC said,

    July 28, 2024 @ 3:50 am

    let's classify Hong Kong Cantonese as a separate language from Guangzhou Cantonese as well.

    Yes please! And we'd better be quick before PRC suppresses any form of Cantonese altogether.

    to the point that Taiwanese elderly are more comfortable with Japanese than any Chinese language, Taiwanese Hokkien aside.

    It's 80 years since Japanese rule. Add another ten years for those elderly (in their youth) to have acquired language competence. I do know Taiwanese families whose elderly spoke Japanese (indeed some in government from Imperial days who stayed on at the end of the War), and a little rubbed off on their kids. Those elderly are all now dead.

    Your "aside" is rather cryptic: you mean they're more comfortable with Taiwanese Hokkien than with Japanese? Is this the same Taiwanese Hokkien for which SIL wants to set up a specific language code? Yes it does include Taiwan-specific idioms, typically from Japanese, or Dutch/Portuguese via Japanese. Then I don't understand your objections.

  8. HTI said,

    July 28, 2024 @ 4:05 am

    For anyone wondering, ISO 639-3 is used for IETF language tags (defined by "BCP 47", another conveniently opaque initialism and number!), amongst possibly other things. These are used for lang attributes in HTML, so that, for example, Taiwanese text written in POJ would be contained within an HTML element that has the attribute lang="nan-Latn-pehoeji"; meaning Southern Min text, in the Latin script, POJ Romanization.

    Most webpages make at least *some* use of the lang attribute so that UAs (user agents; usually a web browser acting as an agent on behalf of a user) aren't just left to guess. A lang tag can specify language, script, geographic region, and variants; so potentially a lot of information. You can imagine that a screen reader that encounters (to use the same example) POJ text is likely to emit unintelligible babble unless (perhaps) given a lang value like "nan-Latn-pehoeji" or similar so that it knows what's going on.

    The value of the lang attribute also affects visual display, by the way. Modern browsers will choose fonts and glyph variants based on lang. For example, some Sinographs have the same Unicode codepoint for their simplified and traditional forms, leaving the lang tag as (potentially) the only indicator of which form to display to the user.

    My understanding is that nan means "Southern Min" (a.k.a. Minnan), so that it includes not just Hokkien lects, but also Teochew, Hainanese, etc. thus making it apparently quite diverse (although I leave this one to Prof. Mair as I am not a Sinologist). A similar situation exists with the subtag cmn, which means "Mandarin", thus including Standard Mandarin, but also Lower Yangtze Mandarin, Southwestern Mandarin, etc. (although Dungan notably gets its own subtag dng)

  9. /df said,

    July 28, 2024 @ 6:21 am

    Is that paragraph (3.) quoted by @Vampyricon not just ISO-speak for "a language is a dialect with an army and navy" (aka "ethnolinguistic identity")?

  10. Philip Anderson said,

    July 28, 2024 @ 8:02 am

    /df
    That would help establish a distinct identity, and works for some Scandinavian languages, although American English and Austrian German are not recognised as languages in their own right, whereas ISO 639 has different codes for Bokmål and Nynorsk, as well as an inclusive Norwegian one, even though Norway only has one army and navy. So no.

  11. Rodger C said,

    July 28, 2024 @ 10:05 am

    Chas Belov: The Summer Institute of Linguistics was originally a program of Wycliffe Bible Translators. It became so famous in its own right that a lot of people started using SIL for the whole organization, and WBT eventually gave in to that, though apparently the old name is still official.

  12. KIRINPUTRA said,

    August 2, 2024 @ 9:50 pm

    Late to the thread, but there's this: "Reclassifying ISO 639-3 [nan]: An Empirical Approach to Mutual Intelligibility and Ethnolinguistic Distinction".

    PDF

    There is a section (pp. 19-24) reviewing the tendencies of the existing ISO 639-3 codes to split or lump related varieties in a range of situations.

    It's worth noting — as Ms. Nordmoe did, very briefly — that 2021-045 (which "Reclassifying ISO 639-3 [nan]" is part of) focuses on much more than just Taioanese!

RSS feed for comments on this post