Language Log

The challenging importance of spacing in Korean

September 17, 2019 @ 7:59 pm · Filed by Victor Mair under Orthography, Parsing, Words words words

Fascinating article from BLARB (Blog // Los Angeles Review of Books:

"Our Language Battle: Korea’s Surprisingly Addictive Game Show of Vocabulary, Expressions, and Proper Spacing", by Colin Marshall (9/1/19)

This is the second paragraph of the article:

Having found myself living in the genuinely foreign country of Korea, I’ve lately also found myself watching Our Language Battle (우리말 겨루기), a game show that has aired every Monday evening on KBS since 2003. Though it occasionally invites celebrities, and this past July even brought on members of the National Assembly, it usually pits four everyday Koreans (or four teams of two, usually family) against each other in a test of their knowledge of the Korean language. It begins simply enough, with the contestants buzzing in to guess the words or phrases that fill in a crossword-style board, but soon the challenges get dramatically harder: separating folk spellings and regional variations from the officially standard, filling in words missing from old television and newspaper clips, and — most difficult of all, even for contestants who otherwise dominate the game — properly re-spacing a text whose words all run together.

Who'da thunk it? – spacing is the most difficult aspect of Korean writing. One might have thought it would be a simple task, that word spacing / separation is innate for all speakers of a given language. Apparently that is not so.

In Hanyu Pinyin, it is called fēncí liánxiě 分詞連寫 ("word division; parsing"). Of course, it has its problems, but we do have rules to guide us, viz., zhèngcífǎ 正詞法 ("orthography").

This morning in my "Language, Script, and Society in China" course, I embarked on a discussion of the difference between zì 字 ("character") and cí 詞 ("word"). Although this seems like a simple, straightforward question, it is always one of the most difficult topics encountered in the course — especially for students of Chinese background. It took me a whole semester to get the idea across to the 72 very smart students in my language studies class at the University of Hong Kong in 2002-2003. Even at the conclusion of the semester, there were still some of the students who just couldn't comprehend the distinction.

It's even harder for them than to understand the difference between yǔyán 語言 ("language") and fāngyán 方言 ("dialect" –> "topolect").

Readings

"Diacriticless Vietnamese on a sign in San Francisco" (9/30/18)
"Words in Vietnamese" (10/2/18)
"Homophonophobia" (2/7/15)
"Sino-Vietnamese poster" (12/4/17) (note the joined syllables on the poster)
"Prolific code-switching in Vietnamese" (4/14/16)
"Words in Mandarin: twin kle twin kle lit tle star" (8/14/12)
"Language vs. script" (11/21/16)
"Word, syllable, morpheme, phoneme" (10/6/18)
"First grade science card: Pinyin degraded, part 2" (4/14/19)
"First grade science card: Pinyin degraded" (4/11/19) — with extensive references to character amnesia, digraphia and diglossia, word division, the uses of Pinyin, etc., also applicable here
"Pinyin for phonetic annotation" (10/27/18)
"The uses of Hanyu pinyin" (5/22/16)

[h.t. Molly C. Des Jardin]

September 17, 2019 @ 7:59 pm · Filed by Victor Mair under Orthography, Parsing, Words words words

Permalink

58 Comments

Bathrobe said,

September 17, 2019 @ 9:51 pm

I don't really find it surprising at all. Latin was originally written without spaces. I think it was Irish scribes who introduced spacing. And Latin is relatively easy to split into words because most of its suffixes and prefixes are solidly attached to the stem. There isn't a great temptation to split them off.

This is different in languages like Chinese, where the writing system is mostly morphemic and those morphemes are, relatively speaking, free combine together to form larger word-like units.

Japanese is difficult because inflexional morphemes tend to be more discrete than those of synthetic languages. That means that there is greater temptation to split them off. This isn't helped by the writing system they inherited from China. Moreover, Japanese has large amounts of Sinitic vocabulary, which, as in Chinese, is difficult to separate into 'words'.

I'm not totally familiar with Korean, but I suspect that it has the same problems as Japanese. Perhaps someone among our commenters can throw more light on this.
Doctor Science said,

September 18, 2019 @ 12:36 am

One might have thought it would be a simple task, that word spacing / separation is innate for all speakers of a given language. Apparently that is not so.

Even as a child, I noticed that, while in books the point-of-view character can transcribe speech in a language they don't know broken up properly into words, in real life unfamiliar language just sounds like a bunch of syllables (at best).

As @Bathrobe points out, word separation was introduced into Latin writing by *non-native speakers*.

Has anyone studied whether native speakers leave larger gaps between words than between syllables within a word? Does it vary by language or type of language?

I'm reminded of the argument that alphabets started in Semitic languages because their triconsonantal roots made it easier for people to perceive consonants separate from vowels. Clearly, the structure of e.g. Korean makes it difficult to see word separations, while the structure of e.g. English makes it relatively easy.

@Victor Mair:
Can you describe what the mental block seems to be? e.g. Why do they say 大夫 dàifu "doctor" and think "clearly that is two words, wtf 教授"?
Michael Watts said,

September 18, 2019 @ 1:50 am

I'm reminded of the argument that alphabets started in Semitic languages because their triconsonantal roots made it easier for people to perceive consonants separate from vowels.

Without knowing all that much about it, it seems like a strong counterargument would be "alphabets started in Semitic languages because the people who spoke Semitic languages were the same people who were writing stuff down".
~flow said,

September 18, 2019 @ 5:48 am

I think that we might approach these observations from an agnostic perspective, as it were; that is, we should not take for granted that there really 'are' phonological segments and words in speech. Sure, that is how speech is normally modeled, but even if I can demonstrate words and segments can be elegantly used to describe a given set of utterances (and one normally can), that doesn't mean those entities are 'there' as foundational, functional parts of the language-producing machinery (although I guess they probably are).

What's more, those parts of language—words and segments—might or might not be as easily accessible to speakers as other aspects of their language, like, say, syllables, which seem to be both fundamental and accessible to naive speakers (first-graders can be instructed to clap their hands and chant a text split into syllables (or moras, as the case may be) in unison, and they seemingly get it intuitively).

It would seem that to many speakers, it is not so much their mental a-priori model of their native tongue that drives writing; rather, the acquired, a-posteriori ways of whatever writing system they have absorbed seems to deliver their mental model of their language (it shows: most people have a hard time talking about speech sounds as cleanly separated from letters). If that is so, we should expect speakers/listeners/writers/readers of language with a non-segmental orthography to be rather more unsure about the segmental sounds of speech, and people grown up with an orthography that does not split words (Chinese, Japanese, but also Thai, Tibetan—others?) to be more unsure about the lexemes in their language and/or the lexemes in a given utterance. This *seems* to be true for speakers Chinese (based on scant anecdotal evidence).

The above invites a few questions:

* Modern Hangeul orthography puts spaces between words; why then is that task apparently felt to be a difficult one (if it can be shown to be true)? Shouldn't we assume that the daily intake and output of written native material should provide enough exercise so the task can be accomplished in a mostly correct, mostly facile way for a majority? (This assumption *should* hold when you look at the difficult orthographies (English, Chinese, Japanese, Thai, Tibetan) of this world, which require users to remember thousands of items with deviations big and small from what would be the simplest way of spelling a given item.)

* Is there a lot of vacillation in usage when it comes to inter-word spaces in modern Hangeul orthography? Do users omit or misplace spaces in personal writing? Are error rates in this repsect significantly different from orthographies like, say, English or German that likewise have many ambiguous cases?

* Is the story about Irish monks inventing word dividers true? I remember Roman monumental inscriptions having lots of center dots instead of an unbroken stream of letters. Also, aren't there word breaks in quite a few orthographies? Arabic appears to have had them from some point onward.

* What's the story in Chinese lexicography? Kangxi, Shuowen, Erya, Fangyan (?) all deal with characters only. What are the earliest Chinese, European glossaries, i.e. list of what are unambiguously 'words'?
Victor Mair said,

September 18, 2019 @ 6:40 am

jiàoshòuqǐngjiàoshòuyǔgāojiàn

教授請教授予高見

Jiàoshòu qǐngjiào, shòuyǔ gāojiàn.

教授請教, 授予高見.

"Professor, permit me to consult you; give me your exalted opinion."
~flow said,

September 18, 2019 @ 7:00 am

That's a nice gardenpath there. Does segmentation as in 教授請教, 授予高見 come naturally to native Chinese speakers? At the least the short pause and inflection symbolized by the comma *should* be obvious to anyone.
Philip Anderson said,

September 18, 2019 @ 7:26 am

@Michael Watts
In the ancient world, lots of different peoples were writing their own languages in a variety of non-alphabetic scripts, not just speakers of Semitic (or related) languages: Sumerians, Hittltes, Hurrians, Mycenaean Greeks. And the prolific Akkadians and Babylonians stuck with cuneiform.
David Morris said,

September 18, 2019 @ 8:02 am

In 3 1/2 years living in Korea, and 10 1/2 years living in Australia with one, two or three Koreans, I hadn't ever seen this and no-one had ever mentioned it. KBS's website (linked in the LARB story) doesn't allow me to view the videos in Australia, but my niece suggested searching Youtube. Searching for 'Our Language Battle' brings up nothing relevant, searching for 우리말 겨루기 brings up various videos, some of which are titled 'Woorimal battle', which is an awkward combination rendering in my view.

Our niece said she knows the program but doesn't watch it as it's too hard for her. My wife said 'That program used to be on television'.

With no help from them, I've been trying to figure out the rules. The writing of hangeul in syllable blocks leads to different kinds of puzzles than in English.
David Morris said,

September 18, 2019 @ 8:04 am

https://www.youtube.com/user/KBSLife/playlists?view=50&sort=dd&shelf_id=33
Bill Benzon said,

September 18, 2019 @ 8:24 am

This speaks to one of my current hobby-horses. I’m exploring the idea that language is the simplest thing humans do that involves computation. I thus reject the view the individual neurons are the basic elements of mental computation as was suggested by McCulloch and Pitts 1943, “A logical calculus of the ideas immanent in nervous activity.” Of course neurons and neuronal circuits can be simulated by a computer, but that’s something else. Computers can simulate atomic explosions too, but we don’t take that as evidence that atomic explosions are computational phenomena.

In my current view, whatever it is that goes on in the brain of a chimpanzee, a chameleon, or a roundworm, for example, it isn’t computation. Just what it is, that’s not my concern at this point. It follows as well that linguistic computation is grounded in something else, likely several things, none of which are my direct concern here.

But what do I mean by computation? I mean something like a Turing machine, albeit one of somewhat limited capacity. As language is first of all speaking, that’s where we start. The vocal system writes to the tape while the auditory system reads from it; taken together they are the head of the device. The brain contains that table of instructions – I’m indifferent at this point as to whether those instructions are symbolic, pre-symbolic, or both – and the state register. The speech stream itself is the tape.

The proper segmentation of the speech stream is essential to linking the speech stream to the proper items in the table of instructions, that is, to parsing. In writing, spacing should indicate the proper segmentation. By the time one is fluent in speech that segmentation is automatic in the vocal-auditory system. But translating it to the (visual) orthographic system seems rather problematic. I wonder why?
Francois Lang said,

September 18, 2019 @ 8:33 am

I know nothing about Korean, let alone Korean spacing, but this page was instructive:

https://keytokorean.com/motivation/30-day-challenge/30-day-challenge-day-15-proper-spacing-%EB%9D%84%EC%96%B4%EC%93%B0%EA%B8%B0-in-korean-can-save-a-persons-life/

This topic also reminds me of my long-ago foray into learning Sanskrit writing, in which (as I dimly recall) not only are there no spaces, but word endings and/or beginnings are munged to be phonologically compatible.
Neil Kubler said,

September 18, 2019 @ 8:41 am

In English, word boundaries are indicated not only by space but also by punctuation and capitalization. Punctuation is also an indication of word boundaries in East Asian languages. A useful principle for writing Romanized Chinese or Japanese is this: what must always be said together should be written together and what can be said alone with the intended meaning should be written apart. This works fairly well for speech that is written down but it begins breaking down when trying to determine word boundaries for formal, Classical Chinese-influenced terms and phrases.
Andy Stow said,

September 18, 2019 @ 9:35 am

Theenemyisnowhere.
Thepenismightierthanthesword.
Christian Weisgerber said,

September 18, 2019 @ 10:20 am

One might have thought it would be a simple task, that word spacing / separation is innate for all speakers of a given language.

While undoubtedly different from the problems of the Asian languages, correct spacing is also a notable orthographic difficulty in German, e.g.: zu Grunde gehen, zugrunde gehen, zugrundegehen? The 1996 spelling reform—a minor modernization, really—tried to simplify this, but the rules are still too complex and too difficult to apply.
Scott P. said,

September 18, 2019 @ 10:20 am

Computers can simulate atomic explosions too, but we don't take that as evidence that atomic explosions are computational phenomena.

That is, in fact, the extended Church-Turing thesis.
Victor Mair said,

September 18, 2019 @ 11:12 am

From an East Asian librarian whose main expertise is Japanese, but who must also manage the Korean collection:

Thanks for sharing this! The comments are a bit far in the weeds for me but Korean spacing has been the bane of my existence since I started managing the collection here. Endless confusion for all. I didn't know whether to laugh or cry at native speakers being similarly frustrated by it.
Jonathan said,

September 18, 2019 @ 2:59 pm

Adding my anecdotal evidence to that of Francois Lang: my long-ago stab at Sanskrit, where my text said, more or less, "You're going to be spending a lot of time and energy extracting words from the stream of letters".
ktschwarz said,

September 18, 2019 @ 5:27 pm

A fictional example: in Ted Chiang's story "The Truth of Fact, the Truth of Feeling" (online here) an African boy is taught to write by a missionary, and it takes him some time to figure out where to put spaces.

It was only many lessons later that Jijingi finally understood where he should leave spaces and what Moseby meant when he said "word."
You could not find the places where words began and ended by listening. The sounds a person made while speaking were as smooth and unbroken as the hide of a goat's leg, but the words were like the bones underneath the meat, and the space between them was the joint where you'd cut if you wanted to separate it into pieces. By leaving spaces when he wrote, Moseby was making visible the bones in what he said.

Jijingi realized that, if he thought hard about it, he was now able to identify the words when people spoke in an ordinary conversation. The sounds that came from a person's mouth hadn't changed, but he understood them differently; he was aware of the pieces from which the whole was made. He himself had been speaking in words all along. He just hadn't known it until now.

Ted Chiang is well-known to Language Log readers as the author of "Story of Your Life"/Arrival.
Bathrobe said,

September 18, 2019 @ 8:02 pm

We are yet to receive an explanation of what the problem is. Is it an artifact of the Korean writing system, just problems in applying rules? Is it partly due to Sinitic-style compound words, which is just an aspect of the larger 'characters as morphemes' issue? Or is it something else? I'm also curious when spacing was introduced into Korean. Was it there from the start, or did it come in with other aspects of Westernisation?

Without knowing what the actual problem is, however, we are just groping in the dark.
Ronan Maye said,

September 18, 2019 @ 10:37 pm

Korean wouldn't even need spacing if it kept Hanja. I learned a bit of Korean out of curiosity, but I realized that Korean is a lot less fun to learn for me than Japanese because many of the cognates with Chinese are "hidden" by the alphabet. Thinking about this also made me realize that Japanese would still work if they kept all of the kanji for the onyomi but just used hiragana for the kunyomi, because I believe that is how Koreans wrote earlier in the 20th century (after all, why bother writing Chinese characters for words that didn't even come from Chinese).
Victor Mair said,

September 19, 2019 @ 5:27 am

From Ross King:

Best source is:

Martin, Samuel E. (1968) Korean Standardization: Problems, Observations, and Suggestions. Ural-Altaische Jahrbücher 40(1-2):85-114.

Pages 94-95, section 1.3: “Problems of word division and punctuation”

“… People are divided on whether to write constructions like Prenoun (=Determiner) + Noun and Noun + Noun (+ Noun + …) as separate words; … There is considerable indecision on how to treat constructions of Numeral + Counter, Adverb + Verb, and those verb compounds in which the prior verb is in a ‘free’ (i.e., ending-attached) form—rather than just the base. … Unmarked Object + Verb are often run together, especially if the Object is short and the expression common (like pap mek- ‘eat food’).”

In general, North Korea loves to write long compound nouns without spaces where South Korea breaks them up with more spaces, and in general North Korean orthography is less generous with spaces.
michaelyus said,

September 19, 2019 @ 7:03 am

띄어쓰기 strikes fear into many Korean native speakers [and this is a case in point: 띄어 쓰기 or 띄어쓰기], but I think it's on the level of the controversy over hyphenation in English. The main issues are with noun "compounds" and serial verb constructions. There are some basic principles that have been prescribed (governmental guidelines first published in 1949), as far as I know. Prior to that, spacing in hangeul script was inconsistent, although on the increase (and of course in the 15th-18th centuries, writing in hangeul and writing in the Korean language was rather restricted in scope). The 1768 Songgang-gasa 송강가사 consistently uses continuous hangeul without spacing. The Tongnip Sinmun 독립신문, a very late 19th century newspaper, uses spacing consistently.
Victor Mair said,

September 19, 2019 @ 7:12 am

From a colleague:

Spacing–word division–assumes shared knowledge among users of what constitutes a language's words. This is not a trivial matter, and Korean linguists, lexicographers and publishers have been working the issue for decades.

The basic problem, as one of the commentators intimates, is that words, like (morpho)phonemic spelling, are an artifact of writing. They are not a given to be plucked from someone's brain. Orthography takes it upon itself to regularize (adjudicate) the intuitions users have about what constitutes the lexical units of their language, which are far from uniform and constantly shifting. Korean lacked that tradition and is catching up, although in a sense all written languages that use word division are continuously "catching up." I don't see it as a major problem, or a problem at all.

What I do find problematic in Asian languages is fluid "standards" for sentence representation, namely, where the period goes. This is not an issue (for me) in Korean, probably because the language does use word division, which enforces a discipline on writers that carries beyond the identification of (agreement on) word boundaries to one's whole approach to sentence structure. Chinese sentences–the text between periods–are often by western standards two sentences, five sentences, or partial sentences. Japanese writers also seem to have more liberty in this regard than a westerner would expect. Vietnamese sentences, in earlier novels at least, end or don't end seemingly at whim. And I question if Tibetans even have the concept of "sentence."

I've been out of this field for too long so my thinking may be dated. But there may be psycholinguistic issues at play here that merit serious study.
Bill Benzon said,

September 19, 2019 @ 8:40 am

I've just been thinking about this. And I'm wondering if the problem isn't similar to the problem that adolescent and post-adolescent second language learners have with pronunciation. I don't know what the current literature says about that, but in the past I've seen it attributed to a lack of neuro-plasticity. I don't find that terribly convincing. My intuition – and it's no more than that – is that the problem is more like conscious access. For some reason conscious access to (something in) the aural-motor channel has been, if not lost, somewhat degraded.

Could the same thing be going on in the transfer of segmentation from the aural-motor channel to the visuo-orthographic?
Bathrobe said,

September 19, 2019 @ 6:51 pm

Thanks to Ross King and michaelyus for the illuminating information. We now have a context. I find colleague's comment the most enlightening yet.

I think it safe to say that all speakers of all languages have some kind of intuition of how morphemes and structural elements clump together. Morphemes don't simply string along to create sentences; there is always a level where morphemes come together into intermediate "chunks" that are below the level of the broader sentence syntax. But the specifics of this can be fuzzy and inconsistent, and depend on the nature of the language. Despite Prof Mair's problems teaching Chinese students what words are, even Chinese speakers are somehow aware that some characters "belong together", like 教授 jiàoshòu 'professor', even if their instinct, heavily influenced by the orthography, is to regard the individual morpheme as the primary unit.

Noun combinations appear to be an issue in many languages. In Korean, 여행 가방을 yeohaeng gabang-eul 'travel bag+obj' is two words, even though the object marker 을 -eul applies to the whole unit rather than just 가방을 gabang 'bag'. English and German adopt totally different approaches, resulting in long German noun chains (which laymen love to make fun of) as against the tendency in English to write each word separately, resulting in the ambiguity of 'French teacher' — an ambiguity that is mostly confined to the written language since prosody in the spoken language usually makes clear what is meant: 'French teacher' vs 'Frenchteacher'.

colleague's comments on sentences are also interesting. Chinese certainly favours very short "sentences" without overt markers to indicate logical links, leading to the need to decide whether or not they should be separate sentences. This is not unknown in English either, where 'To err is human, to forgive divine' is normally regarded as a single unit (with sticklers perhaps using a semi-colon), although the omission of the copula in the second part is the clincher in deciding that it is one sentence. But I personally regard the sentence as one of the more artificial units of language. The school-book definition is that a sentence should express a "complete thought", which is about as useless a characterisation as you could get. In most East Asian languages Western-style punctuation was a very late introduction and is in some ways artificial. It is not surprising that it is not applied in the same way in all languages.
Antonio L. Banderas said,

September 20, 2019 @ 3:17 am

@Bathrobe

What prosody are you intending to show by the following distinction?
'French teacher' vs 'Frenchteacher'.
Philip Taylor said,

September 20, 2019 @ 3:40 am

I would imagine "a teacher of French (language and/or literature)" vs "a teacher of French (nationality)".
Antonio L. Banderas said,

September 20, 2019 @ 4:29 am

According to "Practical Eglish Usage"

Most noun + noun combinations have the main stress on the first noun. However, there are quite a num ber of exceptions: a garden chair, a fruit pie.

The difference between noun modifiers and adjectival modifiers is sometimes
shown by stress:

a 'German teacher (noun modifier: a person who teaches German)
a German 'teacher (adjective modifier: a teacher who is German)

To be sure of the stress on a particular combination, it is necessary to check in a good dictionary. Note that there are occasional British-American differences.
Chris Button said,

September 20, 2019 @ 7:06 am

I suspect this largely comes down to the subtleties of Korean intonation more than anything else. Hopefully a specialist can chime in.

@ Antonio L. Banderas

Intonation and stress aren't readily distinguished in the IPA (notions of primary versus secondary stress further confuse it). Technically "Germ" and "teach" are both stressed regardless of whether we are dealing with nationality or language. The distinction comes from which one of the two stressed syllables receive the falling tone in the statement.
Victor Mair said,

September 20, 2019 @ 10:57 am

From Bob Ramsey:

I'm not sure how much I have to add to what's been said; after all, spacing amounts to a kind of structural parsing that's difficult in any and every language, I suspect. Note that spacing is also a perennial problem in English, too, isn't it? Especially when we consider the differences between British and American orthography.

But since you ask about Korean, I will say that Korean usage seems to have peculiarities all its own. First of all, there's the important point Ross King made about the differences between North and South decisions about word spacing. The North is far less generous with spacing than the South. I think Ross also mentioned the much more liberal use of spaces in Sam Martin's Romanization; for example, Martin advocated that for maximal clarity particles should be separated from the noun to which they're attached. Korean rule-makers don't agree, of course; their instinct is to consider the noun plus particle to be a single word and to write it that way.

But a more important point (and I don't think Ross mentioned this), is that Koreans are notorious for tinkering with the orthographic rules and to changing them all too frequently for many people's taste. The result is often frustration. Few South Koreans, even the well-educated, can keep up with the rules for where spaces are supposed to be used. And there's not always consistency in how those word-spacing–and spelling–decisions are made. In typing Korean you need to be sure you're using an up-to-date word-processing program to make sure your spell-check reflects the rules that are current!

I'll just add one more note about Ross's important observation about North-South differences. What we see going on in Seoul now is a tendency to use fewer spaces in writing, that is, more solid forms than before. It may simply be an accident, but it sure seems like South Korean orthographic usage is moving toward spacing more like North Korean practice. Maybe the arbiters simply see fewer spaces as more streamlined, more efficient. But I'm only guessing. In any case, the fact remains that the orthographic words in South Korean writing are getting longer.
Jerry Packard said,

September 20, 2019 @ 7:17 pm

~flow said,

…. doesn't mean those entities are 'there'

Well spoken!
Philip Taylor said,

September 21, 2019 @ 12:25 pm

In general, there are very few places in (British) English where I might vacillate about whether or not to insert a space, but there is one example that immediately springs to mind : "insofar as". This is, I think, the canonical form, but I write it as "in so far as", completely failing to understand why "insofar" is run together but "as" is not. It is not as if "insofar" means something different to "in so far" — as far as I can see, they mean exactly the same thing — so why is it that "insofar" is normally run together but the "as" is not ?
David Marjanović said,

September 21, 2019 @ 12:53 pm

Has anyone studied whether native speakers leave larger gaps between words than between syllables within a word? Does it vary by language or type of language?

You mean actual pauses in speaking? Those only exist between larger units ("intonational phrases" or whatever – more like clauses), neither between words nor between syllables, except in reading aloud very slowly or in some kinds of ritual chanting or the like.

Why do they say 大夫 dàifu "doctor" and think "clearly that is two words, wtf 教授"?

Well, "great master" is two words; that's the literal meaning of 大夫 and the etymological meaning of dàifu, and the character 夫 doesn't tell you that it has lost its tone.

(The literal meaning of doctor is likewise "teacher". Medical doctors are so called because of their academic degree that theoretically/historically entitles them to teach in certain circumstances. But to figure this out, you need to know a bit of Latin; just looking at the spelling doesn't help.)

I think that we might approach these observations from an agnostic perspective, as it were; that is, we should not take for granted that there really 'are' phonological segments and words in speech. Sure, that is how speech is normally modeled, but even if I can demonstrate words and segments can be elegantly used to describe a given set of utterances (and one normally can), that doesn't mean those entities are 'there' as foundational, functional parts of the language-producing machinery (although I guess they probably are).

I would say they exist in the heads of speakers and the heads of listeners, but not in between – they don't exist in what the speaker actually pronounces and what the hearer actually hears. There's interpretation involved; that means that what the speaker and the hearer have in their minds doesn't always have to be exactly the same, and that, of course, is one mechanism of language change.

What's more, those parts of language—words and segments—might or might not be as easily accessible to speakers as other aspects of their language, like, say, syllables, which seem to be both fundamental and accessible to naive speakers (first-graders can be instructed to clap their hands and chant a text split into syllables (or moras, as the case may be) in unison, and they seemingly get it intuitively).

Counting syllables is generally easy (there are cases of reduction where it's not clear if a consonant is syllabic or not, or this may vary by speed of speaking or between speakers or whatever). But determining where exactly the boundaries between the syllables are is difficult in languages that are generous enough with consonant clusters, like many in Europe. For English it's so hard that it has become its own research topic.

* Is the story about Irish monks inventing word dividers true? I remember Roman monumental inscriptions having lots of center dots instead of an unbroken stream of letters. Also, aren't there word breaks in quite a few orthographies? Arabic appears to have had them from some point onward.

Irish monks invented specifically spaces between words. Other word dividers were occasionally used earlier, but manuscripts from Antiquity (in Latin, Greek, Aramaic, Hebrew, Gothic…) always lack them altogether as far as I'm aware.

Word dividers, including spaces in cuneiform, have been invented several times independently for other scripts.

Arabic writes article + noun as a single word. So does Hebrew (but not Yiddish), and so did Greek in the Cypriot syllabary.

(Greek in Linear B was also written with word dividers, but didn't have articles yet.)

A useful principle for writing Romanized Chinese or Japanese is this: what must always be said together should be written together and what can be said alone with the intended meaning should be written apart.

Ah, but sometimes a sequence of morphemes assumes a new meaning that cannot be predicted from the meanings of its parts. At what point does such a sequence become a thing that cannot be divided?

This question is also, of course, the main driver of uncertainty about spaces and hyphens in English and German. In the example above, has zu Grunde become a verb prefix (thus zugrundegehen), or is it still a metaphor composed of a preposition and a noun in the dative case (thus zu Grunde gehen), or is it a whole new lexeme (thus zugrunde gehen)? Personally, I'm for noun incorporation, but the committee that reformed the orthography disagreed as far as I know.
David Marjanović said,

September 21, 2019 @ 1:01 pm

determining where exactly the boundaries between the syllables are is difficult in languages that are generous enough with consonant clusters

Actually, drawing lines through clusters is not the main issue, though it is one; a more important one are restrictions on which vowels can occur in what may or may not be open vs. closed syllables, another is what allophones consonants have in what may or may not be syllable-final positions.
Jerry Packard said,

September 21, 2019 @ 5:27 pm

Well said.
Dave Cragin said,

September 21, 2019 @ 10:47 pm

When it's one's own language, hearing boundaries between syllables is likely easier. Just this evening, I was trying to help a Chinese friend say "tool". It's a one syllable word that is close to 2 syllables. She said it with 1 syllable, but with too brief of a 1 syllable. I had to teach her how to draw it out so it sounded natural. (I'm not a formal language teacher – we're just language partners/friends).
Victor Mair said,

September 22, 2019 @ 10:17 am

From Young-Key Kim-Renaud:

The spacing issue is as hard as defining what a word is in any language. When the writing system has a pretty close fit to the spoken language and shows word boundaries by spacing, as hankul does, then immediately, different parsing could become an issue. Furthermore, it is individual speaker's decision on the matter of ongoing change or complete restructuring, called grammaticalization. For example what used to be an NP could be relexicalized as a plane N, in which case the space that existed before disappears in the new form. Obviously there is variation and ongoing change, and although some over-enthusiastic grammarians are making a big fuss about spacing these days, all of a sudden, in practice no one is considered "uneducated" or ignorant, because of "wrong spacing" for this reason, I believe.

To give just one example, when you say 여러 분 yele pun (by Yale Rom), Det+N, it literally means 'various honored people' but 여러분 has been used more often in addressing a group of people in a formal situation, the noun phrase has become just one mass noun 'Ladies-and-Gentlemen' and therefore is written with no space within the word. At the same time in a certain context where 'various people' need to be expressed, the NP (with space in between) still remains with the original meaning. This also occurs more often when one of the units is monosyllabic, and since the two parts are pronounced so much closer together, especially in fast speech, they are easier to merge. So parsing, speed, frequent usage in a particular context all contribute to grammaticalization and result in conflicting spacing analyses.
Andrew Usher said,

September 22, 2019 @ 12:31 pm

Dave Cragin:
Presumably the Chinese girl said something that sounded like a rhyme for 'full'. This muddling of lax and tense vowels is a classic foreigners' error; few non-Germanic languages have the same kind of distinction. Teaching them to say 2 syllables may be the safer course in this case (tool = too + syllabic l), but the vowels still must be mastered to sound even close to native.

Re David Marjanovic:
The repeated invention of word dividers shows that word boundaries are in fact salient in one's own language, and that at least for most languages, and most cases, there is not much doubt about where they should go (to native speakers).

You didn't reveal which of the 'zugrundegehen' options you'd pick, but there's one case where it seems English has the edge: fusion of that sort hasn't been productive for a long time, and the equivalent of 'zu Grunde gehen' would be the only possibility (as with our actual phrases 'to go to …'). It's really German that is anomalous with its 'run-together' writing influencing what speakers actually feel to be the word boundaries.

k_over_hbarc at yahoo.com
John Swindle said,

September 22, 2019 @ 5:57 pm

German "zugrundegehen" (zu Grunde gehen, zugrunde gehen) and English "going to ground" being a nice example of false friends.
Victor Mair said,

September 22, 2019 @ 6:09 pm

From Young-Key Kim-Renaud:

One thing I want to add is that the occasional changes on spacing and other issues in orthography demonstrated by the ROK national language institute (국립국어연구원) are not as whimsical or arbitrary as some might think. They constantly conduct sociolinguistic research, especially on variation and change, and decide whether a change has been more or less complete. When this is determined so (of course this is a difficult decision to make, as statistics does not always give you a definite answer.), a new orthographic guideline is submitted. This is an important way to keep the pretty good linguistic fit the Korean writing has been able to maintain so far, but as you know, this is a lofty goal as language is so dynamic! Just see how English orthography has become so difficult because it has remained static for so long.
Bathrobe said,

September 22, 2019 @ 10:50 pm

The repeated invention of word dividers shows that word boundaries are in fact salient in one's own language, and that at least for most languages, and most cases, there is not much doubt about where they should go (to native speakers).

What is your basis for saying this? Established orthography is often the foundation for native speaker judgements, not untutored native-speaker intuition. If you look at the way that many Japanese 'naturally' use spaces to divide up words in romanisation, you will find a lot of diversity. Your 'not much doubt' simply doesn't hold water. Dividing speech into chunks is natural, but how people actually divide it may be as culturally determined as it is 'natural'.
Andrew Usher said,

September 23, 2019 @ 6:21 am

It may be; but how would you show it? In a normal type of language speakers do have an intuitive notion of what a word is; though, yes, they will not agree perfectly on where to divide them. But I'm pretty sure they'd agree a lot more than chance, established orthography or no.

And I know that I have a sense of word division that's not just based on orthography; this is hard to show for English as there is so little ambiguity (except the contractions we all recognise as such), but I will consistently write 'can not', 'none the less', 'not withstanding', instead of the prescribed fused forms that feel inappropriate. Honestly I'd also like to split 'forever', maybe a better parallel to 'zugrunde' as it's clearly become a single lexeme while still pronounced as two words – e.g. one can still say [ˌfoɹ ˈʔɛvɹ̩] (non-rhotic [ˌfɔː ˈʔɛvə]) as would be impossible for a single word.

John Swindle:
I did think of 'go to ground' (which has a different, specialised meaning) but my ellipses were intended to cover things like 'go to ruin' and 'go to pieces' closer in meaning to the German. They and variants with the same structure would never be thought of as inappropriately split.
Bathrobe said,

September 23, 2019 @ 7:35 am

I will consistently write 'can not', 'none the less', 'not withstanding'

How do you know that your practice has not been conditioned by your familiarity with the written language? The opposite also (or perhaps you'd prefer 'all so') applies, where people write 'underway' instead of 'under way'.
Philip Taylor said,

September 23, 2019 @ 8:16 am

Some of us write "under weigh" (or "under-weigh"), believing (probably incorrectly) that that spelling better reflects the true etymology of the expression.
B.Ma said,

September 24, 2019 @ 2:41 am

At my grandfather's funeral Mass I had to give a bible reading in Cantonese. Apparently one is meant to read these slowly and with lots of pauses between phrases or words (詞), even though there are no spaces in the written text. Knowing where to insert a pause is a bit of an art. I had not done this before but afterwards I was congratulated for inserting pauses in all the correct places…
Victor Mair said,

September 24, 2019 @ 5:30 am

From Irene Do:

Spacing words is not really relevant to the way we speak.

Sometimes if you mis-space the words in writing, it may be a bit hard to read. Or, sometimes, the seemingly awkward spacing would actually be the right way to write it.

It is indeed quite difficult to fully know the spacing rules in Korean! They often have quiz-contest tv programs for it because it's so difficult!
Andrew Usher said,

September 24, 2019 @ 7:01 am

It seems that the Asian languages have particular trouble with this. Unsurprisingly, that is connected to the fact that their traditional script does not (and can not without added spacing) divide words in the natural way, and not due to the languages themselves. At least Japanese I think I know to have word boundaries as salient as any other.

Bathrobe, Philip Taylor:
'Under way' is the best spelling, although the expression's not being completely transparent is why some people are confused. It is under + way, though, despite not using the most common sense of either.

No, I'd not write 'all so' for also. I do know when something is a single word even if it was once otherwise. Criteria that I can articulate for it: that the separate words make no sense (or clearly mean something different), and that the questionable word has just one stressable syllable left. In the above post I used both 'may be' and 'maybe'; both differences apply there, and they are separate lexical items. The same for 'all right' and 'alright', as any objective person would have to recognise, even though the first spelling is prescribed from willfully ignorant conservatism. Again, 'under way' is of the first type: intelligible as separate, and double-stressed.
John Swindle said,

September 24, 2019 @ 3:11 pm

Has there already been discussion of English "that's as may be" versus "that's as maybe"? Americans write the former, Britons the latter, I'm told, with the same meaning. For me as an American the derivation of the first is clear (think "be that as it may") and the second is odd.
John Swindle said,

September 24, 2019 @ 4:06 pm

@Andrew Usher: Indeed.
Andrew Usher said,

September 25, 2019 @ 7:04 am

What are you agreeing with? I made several points (as usual), so a bare 'indeed' is not really informative.

I looked and, yes, "that's as maybe" does seem to occur in British with a frequency too great to be a typo. I am not sure it's actually standard there, though; I can't imagine why it should be.
Philip Anderson said,

September 25, 2019 @ 7:23 am

Did the Irish monks introduce spaces to write their own language (Irish) or the Latin they had learned? Similarly with word dividers in cuneiform, often used to write dead Sumerian. It seems to me that native speakers have less problem with missing spaces, whereas it’s more of a problem for learners, just as learners often struggle to extract recognisable “words” from a continuous stream of speech.
Alyssa said,

September 25, 2019 @ 10:41 am

I don't think English word boundaries are really all that much easier for native speakers to get "correct" all the time (see the discussion above about "underway"), I think English speakers are just more likely to accept that there's multiple valid ways of doing it – probably a side effect of lacking an accepted authority on the language. Personally, I had a lot of confusion growing up about "all right" vs "allright" vs "alright", and "a lot" vs "alot".

Also, I think the reason English is so resistant to merging compound words into one, is likely because our orthography is so dependent on position in the word. When you turn "none the less" into "nonetheless", you've now got an awkward silent 'e' in the middle of your word, making it look like it should have an extra syllable. You have to mentally put the spaces back in, in order to pronounce it properly!
John Swindle said,

September 25, 2019 @ 3:33 pm

@Andrew Usher: Sorry. I was agreeing that "going to ground" in English has a specialized meaning rendering it less useful for discussion. "That's as maybe" was something I found in a mystery novel by British writer Elly Griffiths and then looked up also. I suppose it's a reanalysis to include the common word "maybe," regardless of meaning.
Andrew Usher said,

September 25, 2019 @ 5:56 pm

Well, those words on which native speakers might get it wrong are edge cases. Most of us get it right, without need of prescriptive orthography, almost all the time.

Perhaps "that's as maybe" arose because the stress actually shifted for some Brits? While I could only stress 'be' in that phrase, putting it on 'may' – which doesn't seem out of the question – would point toward the misinterpretation.
John Swindle said,

September 26, 2019 @ 5:16 am

There are more examples in English: health care/healthcare, every day/everyday, meeting house/meetinghouse, and all the words I can't remember whether to hyphenate. Be that as it may, the Korean challenge, the greater Chinese challenge, and the lesser English challenge are not just challenges to writers but also challenges to the sacred status of the word, as others have mentioned.
Ellen Kozisek said,

September 26, 2019 @ 10:41 am

Frankly, Andrew Usher, I rarely have opportunity to get spacing right without the help of prescriptive orthography, since how I write is so totally influenced by prescriptive orthography. (In English, mostly, sometimes Spanish.)
Philip Anderson said,

September 26, 2019 @ 3:35 pm

@Andrew Usher
“That’s as maybe” is standard British English, with the stress on “may” as for any other use of the adverb.
//www.lexico.com/en/definition/maybe
Andrew Usher said,

September 26, 2019 @ 6:18 pm

Well, thanks, seems you're right. I'm shocked there's a trans-atlantic difference like that that I've never noticed, but there's probably more.

The American version clearly must be the older, so do we have a new kind of eggcorn (reanalysis) here?

RSS feed for comments on this post

The challenging importance of spacing in Korean

58 Comments

Bathrobe said,

Doctor Science said,

Michael Watts said,

~flow said,

Victor Mair said,

~flow said,

Philip Anderson said,

David Morris said,

David Morris said,

Bill Benzon said,

Francois Lang said,

Neil Kubler said,

Andy Stow said,

Christian Weisgerber said,

Scott P. said,

Victor Mair said,

Jonathan said,

ktschwarz said,

Bathrobe said,

Ronan Maye said,

Victor Mair said,

michaelyus said,

Victor Mair said,

Bill Benzon said,

Bathrobe said,

Antonio L. Banderas said,

Philip Taylor said,

Antonio L. Banderas said,

Chris Button said,

Victor Mair said,

Jerry Packard said,

Philip Taylor said,

David Marjanović said,

David Marjanović said,

Jerry Packard said,

Dave Cragin said,

Victor Mair said,

Andrew Usher said,

John Swindle said,

Victor Mair said,

Bathrobe said,

Andrew Usher said,

Bathrobe said,

Philip Taylor said,

B.Ma said,

Victor Mair said,

Andrew Usher said,

John Swindle said,

John Swindle said,

Andrew Usher said,

Philip Anderson said,

Alyssa said,

John Swindle said,

Andrew Usher said,

John Swindle said,

Ellen Kozisek said,

Philip Anderson said,

Andrew Usher said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta