Alphabetical storage, ordering, and retrieval

« previous post | next post »

We just had a good discussion about a Sinitic language written with an alphabet:

"The look, feel, and sound of Dungan language" (10/15/20)

Under "Selected readings" below, there are listed additional earlier posts about writing Sinitic languages with Romanization.

One of the major advantages of the alphabet over a morphosyllabic / logographic ideopicto-phonetic writing system like the Sinographic script is that it is very easy to order and find / retrieve the entire lexicon with the former, whereas carrying out these tasks with the latter is toilsome at best and torturesome at worst.  See:

Victor H. Mair, "The Need for an Alphabetically Arranged General Usage Dictionary of Mandarin Chinese: A Review Article of Some Recent Dictionaries and Current Lexicographical Projects", Sino-Platonic Papers, 1 (February, 1986), 1-31 pp.

Now there is a new book that puts the ordering and retrieval capabilities of the alphabet in a historical, lexicographical, and epistemological context:

"'A Place for Everything’ Review: Ordering the Universe", by Katherine A. Powers, WSJ (10/16/20)

From the alphabetical index to the filing cabinet, the informational tools we take for granted once caused a stir.

The true subject of this fascinating though relentlessly detailed book is the history of information retrieval, chiefly in Europe and America, up to the age of the computer.

According to Powers, Ms. Flanders points out:

…we only have evidence of a single invention of phonetic writing, this by traders and mercenaries in the Middle East sometime in the second or third millenium B.C. These travelers, sharing no common tongue, communicated in a form of creole whose words were then disassembled into phonetic characters which could be written as sounds required. Like money, the alphabet was a radical abstraction, for letters have no value or meaning in themselves, but stand for something that does.

 According to Ms. Flanders, the alphabet showed up in Europe as early as the 11th century B.C. The Latin word “alphabeto” came into use around the year A.D. 200, with “alphabet” common by the fourth century. Still, the origin of the set ordering of its letters, the aspect which makes it so useful for the organization of information, is lost to history, and it wasn’t until the 13th century that it was widely used for this purpose. As such, it often met with hostility or perplexity: “Alphabetical order,” Ms. Flanders observes, “looked like resistance, even rebellion, against the order of divine creation. Or possibly ignorance: an author who placed angeli, angels, before deus, God, simply because A comes before D, was an author who had failed to comprehend the order of the universe.”

As it happens, the arbitrariness, as it might be called, of arranging subjects alphabetically still bothered some people well into the 19th century. Coleridge, unsurprisingly, was one naysayer, railing against the recently-instituted, alphabetically arranged Encyclopaedia Britannica. The work, he wrote, was a “huge unconnected miscellany . . . in an arrangement determined by the accident of initial letters.” Meanwhile, across the Atlantic, Yale College did not abandon the practice of listing its students by their families’ social status until 1886, when it finally adopted an alphabetical arrangement.

Ms. Flanders proceeds to trace the gradual acceptance, and eventually dominion, of the alphabet over the storage, organization, and retrieval of information and knowledge.

In her preface, Ms. Flanders writes, “In all the millennia of reading and writing, only one major sorting system has evolved that requires no previous knowledge from the searcher: alphabetical order.” Today, however, the dominion of alphabetical order is waning: Computers and search engines with capabilities once granted only to God have usurped it. There is something enfeebling about this: Information is stored in computer code as indecipherable to most of us as ancient runes, while in the world of books, many libraries’ off-site storage facilities identify and retrieve books by barcode, a system as inscrutable without an electronic instrument as the genome. The upshot is that centuries of effort to render knowledge universally accessible have culminated in a system that stores information in a form which no human eye, unaided by machine, can decipher.

Westerners introduced Romanization of Sinitic languages to China starting around 1600.  They soon adopted it for the ordering of entries in Sinitic-foreign language dictionaries for themselves, but this idea did not catch on among Chinese until the first half of the twentieth century, and I know from personal experience among Chinese students and scholar friends that there was strong resistance to the alphabetical ordering of dictionaries until the second half of the twentieth century.  Among diehards, the resistance to the alphabetical ordering of dictionaries, indices, and so forth continues to the present day at the beginning of the third decade of the twenty-first century. 

Before the advent of the alphabet in China, it was common to organize words in Chinese dictionaries and other reference works according to concepts ("heaven", "earth", etc.), rhyme categories, and so forth.  Does the seemingly unavoidable triumph of alphabetical principles for inputting, dictionary ordering, indices, etc. indicate that the rise of digraphia will continue unabated?

 

Selected readings

etc. (this is just a tiny sampling of the scores of LL posts relevant to Chinese character inputting and ordering of words in Chinese dictionaries)

[h.t. June Teufel Dreyer]



29 Comments

  1. Philip Taylor said,

    October 18, 2020 @ 7:58 am

    I assume that if I were able to access the full text of the article, all would become clear, but from the prose above I am completely unable to determine what rôle Ms Flanders plays in the narrative, and why she is consistently accorded the courtesy title of "Ms" while Ms (or possibly Dr) Powers is referred to simply by surname.

    There is also what I perceive as a considerable ambiguity at "According to Powers, Ms. Flanders points out" — did Ms Flanders point out that Powers wrote what follows, or did Powers state that Ms Flanders wrote what follows ?

  2. Victor Mair said,

    October 18, 2020 @ 8:09 am

    Powers is the author of the WSJ review.

    She refers to the author of the book she is reviewing as Ms. Flanders. That's American (New York) newspaperese.

    I added "According to Powers" at the last minute (just before posting) to avoid ambiguity.

    Please discuss the substance of the post.

  3. Michael Watts said,

    October 18, 2020 @ 12:30 pm

    we only have evidence of a single invention of phonetic writing, this by traders and mercenaries in the Middle East sometime in the second or third millenium B.C. These travelers, sharing no common tongue, communicated in a form of creole whose words were then disassembled into phonetic characters which could be written as sounds required.

    This doesn't seem to accord with my impression of the development of writing?

    I thought there was a not-too-well-supported theory that the Phoenician script developed out of Egyptian hieroglyphics. Those hieroglyphics certainly don't originate as a medium of communication between travelers sharing no common tongue.

    And in the alternative event that Phoenician script is unrelated to Egyptian hieroglyphics, the claim that phonetic writing was only invented once falls on its face, since hieroglyphics are used that way in what appears to be a straightforward indigenous development.

  4. John from Cincinnati said,

    October 18, 2020 @ 12:58 pm

    Thank you for this interesting post. There's one claim that perhaps oversimplifies things.

    one major sorting system has evolved that requires no previous knowledge from the searcher: alphabetical order.

    Some time ago, a friend of mine was studying library science and had to recognize a collection of special cases for alphabetizaton of names and titles. Ordinary mortals scanning for a particular author or title on a shelf in a library or a bookstore, or searching through library index cards, would have needed (and perhaps still need) expert assistance in these instances.

    I don't have the friend's old study list, but I did find
    G100.pdf at the Library of Congress web site (sorry, couldn't get the comment software to accept a URL). It discusses a number of different past and current conventions, and is itself only a subset of the full Library of Congress Filing Rules (LCFR).

    * Letter by letter or word by word. That is, does "A Wrinkle in Time" come before or after Aardvark.
    * Names with a prefix. Are Mc, Mac, and M' all treated as though spelled Mac. And do they all belong before any other M words, or in among them.
    * What about prefixes De, Del, and D'. Does it matter if they are followed by a space? So, for example, does De Lange come before or after DeAndrea.
    * Names with a suffix, such as III, 10th, Jr. The LOC puts them in that order, but would you have expected that?
    * Numbers expressed as words. The LOC files seventy-six trombones under "S", not under "7".
    * Numbers expressed as digits (e.g. 1243) or other notation (e.g. XXVII) filed according to their numeric value and arranged before letters of the alphabet. Except that before the year 1981 numerals in titles were arranged as if spelled out, and in the language of the rest of the title.
    * and so forth

    My point is that "no previous knowledge" has more than a few exceptions.

  5. David Marjanović said,

    October 18, 2020 @ 1:10 pm

    …we only have evidence of a single invention of phonetic writing, this by traders and mercenaries in the Middle East sometime in the second or third millenium B.C. These travelers, sharing no common tongue, communicated in a form of creole whose words were then disassembled into phonetic characters which could be written as sounds required.

    I'm not aware of any evidence for such a… pidgin, I would guess (a creole forms when a pidgin becomes the native language of children). The oldest alphabetic inscriptions, short as they are, are all in grammatical Northwest Semitic. The places where they are found, mines in the Sinai Peninsula in an Egyptian context, argue for prisoners of war used as mineworkers rather than for traders or mercenaries – though traders and possibly mercenaries may well have spread it later.

    What the alphabet is, originally, is a straightforward reinterpretation of a small number of hieroglyphs by the initial consonants of the Semitic, rather than Egyptian, words for the objects they depict.

    The abstraction had happened in Egyptian long before. Some hieroglyphs stand for one consonant, others for a sequence of two or three (with any vowels in between); they were sometimes used as logograms for the objects or actions they depicted, but that required a diacritic. One-consonant signs were available for every contemporary Egyptian consonant something like a thousand years before the alphabet was invented.

    Not spelling the vowels out is actually the last remnant of the logographic principle: in the Semitic languages and the fairly closely related Egyptian, the consonant sequence is more or less the lexical item, and the vowels belong to the grammar rather than the vocabulary (an oversimplification, but not by much).

    The Latin word “alphabeto” came into use around the year A.D. 200, with “alphabet” common by the fourth century.

    *throwing up hands in despair* Whoever wrote this has no idea of Latin.

  6. cameron said,

    October 18, 2020 @ 2:14 pm

    Ivan Ilich wrote a book (with Barry Sanders) in 1989 about the technology of alphabetization and its place in intellectual history. It's called ABC: The Alphabetization of the Popular Mind. I read it in the early 90s and remember it as being interesting, but rather puzzling.

  7. maidhc said,

    October 18, 2020 @ 8:06 pm

    I am currently reading Daidalos and the Origins of Greek Art by Sarah Morris. It was published in 1992, so perhaps new information has come to light since then. But there is much discussion about trade in the eastern Mediterranean during the Bronze Age. According to the author, rather than the creation of a pidgin, language difficulties were addressed by the establishment of small colonies of interpreters–Greeks in Syria, Phoenicians in Greece and so on. These are supported by archaeological evidence.

    A wax-covered writing tablet was found in a shipwreck dated to about the late 14th century BC, suggesting that Phoenicians brought their alphabet for their own use, and Greeks may have known about it, but the first known alphabetic writing in Greek was not until the 8th century,

  8. Chas Belov said,

    October 18, 2020 @ 10:39 pm

    In my 1989 visit to Singapore, I purchased Hanyu Cidian, a Chinese-only dictionary arranged alphabetically by Pinyin. While I can't read Chinese, I used it to verify the existence of characters that weren't in my Chinese-English dictionary, as it was much more complete.

    @John from Cincinnati: Then there are the digraphs from other languages such as LL in Spanish or Welsh and IJ in Dutch which are, I believe, alphabetized as if they were single letters distinct from L, I, or J. Wikipedia has an article about this sort (pun not intended) of thing.

  9. Michael Watts said,

    October 18, 2020 @ 11:44 pm

    a Chinese-only dictionary arranged alphabetically by Pinyin. While I can't read Chinese, I used it to verify the existence of characters that weren't in my Chinese-English dictionary, as it was much more complete.

    I don't understand this. Do you mean that you would have a candidate character in mind ("目 on the left, and 女 on the right"), and check the dictionary to see whether it existed or not? That would work fine in a dictionary organized in the traditional way, but it would appear to be impossible in a dictionary organized in alphabetical Pinyin order. How would you know where to look?

  10. Bob Ladd said,

    October 19, 2020 @ 1:30 am

    John from Cincinnati and Chas Belov –
    There's also the question of whether you ignore diacritics (e.g. French, German) or treat letter-with-diacritic as separate from the same letter without diacritic (Swedish, Romanian).

  11. Twill said,

    October 19, 2020 @ 5:39 am

    It's very odd to extol alphabetical order itself when it is simply a very arbitrary ordering of our graphs. Hanzi collations are quite logical, generally counting the number of strokes of the first written radical and of the rest. The great difficulty, as always, lies in the graphs themselves, as it is simply impossible (even in principle) to assign, much less memorize, a fixed order for hanzi, whereas every child learns in preschool how to collate Latin letters.

  12. Philip Taylor said,

    October 19, 2020 @ 6:19 am

    Indeed. I think that Twill has hit the nail on the head. There is no logical reason whatsoever for placing "b" between "a" and "c" — it is simply a long-established convention.

  13. Victor Mair said,

    October 19, 2020 @ 6:54 am

    Duh….

    What about the ordering of numbers? Any logic there? Everybody knows it, accepts it, and uses it without even thinking about it. 10 numbers.

  14. Rodger C said,

    October 19, 2020 @ 6:55 am

    While we're at it:
    …we only have evidence of a single invention of phonetic writing, this by traders and mercenaries in the Middle East sometime in the second or third millenium [sic] B.C.

    Maya consonant-vowel signs? Well, I assume all these oddities are Power's and not Flanders's.

  15. Philip Taylor said,

    October 19, 2020 @ 7:04 am

    The ordering of the natural numbers (qua) numbers is 100% logical, and as in most (but probably not all) languages, each number is implicitly associated with one or more names ("more" in cases such as "zero" / "nought"), the ordering of those names therefore follows.

    But to be picky, there are not ten numbers — there are ten digits in the decimal system, 16 in the hexadecimal system, and so on, but there are an infinity of numbers.

  16. Rodger C said,

    October 19, 2020 @ 7:13 am

    *Powers's's

  17. Bob Ladd said,

    October 19, 2020 @ 8:12 am

    Exactly – what's arbitrary about the numbers (or digits) is their NAMES, not their order. (And in some simple number systems, even the name are not entirely arbitrary, e.g. the word for 5 being related to the word for 'hand', or the words for 6-9 being transparently 1+5, 2+5, etc.). With the letters of the alphabet, it's the opposite – the acrophonic names of letters in Greek, Hebrew, etc. are in origin not arbitrary, but their order is, it appears, an arbitrary convention the serves the practical reasons discussed in the O/P.

  18. Kristian said,

    October 19, 2020 @ 8:59 am

    It's precisely because there isn't any "logical" reason for the alphabet to be in any particular order that it is a brilliant invention to give it an order (from the point of view of linguistics, it might be nice if the letters were arranged by type, but it doesn't make any practical difference).

    The order of the alphabet is one of the first things children learn, so when I studied Russian, I was "surprised" to notice that I could learn the Cyrillic alphabet without learning the order, which is only important (for me) for looking up words in a dictionary. In fact I still don't know the order of the Cyrillic alphabet with 100% accuracy.

    Like money, the alphabet was a radical abstraction, for letters have no value or meaning in themselves, but stand for something that does.

    Off topic for this forum, but this seems like an odd characterization of how money works.

  19. Antonio L. Banderas said,

    October 19, 2020 @ 9:12 am

    @VictorMair

    Ad hoc, yes, but thoughts on it?

    Order of "numbers" according to the number of corners in their graphs
    https://i.imgur.com/KrHrAll.png

  20. Michael Watts said,

    October 19, 2020 @ 10:25 am

    it is simply impossible (even in principle) to assign, much less memorize, a fixed order for hanzi, whereas every child learns in preschool how to collate Latin letters

    What a ridiculous thing to say. Not only is it possible in principle to assign a fixed order to hanzi, doing so is the purpose of the Unicode project. That ordering is what allows you to communicate hanzi from one computer to another.

    And while ordering the characters is not a goal of traditional Chinese dictionaries in the same way that it is a goal of Unicode, it was necessary; any such dictionary uses, and therefore assigns, an ordering.

  21. Victor Mair said,

    October 19, 2020 @ 11:30 am

    The question is whether you can keep the order readily in mind.

  22. Michael Watts said,

    October 19, 2020 @ 1:14 pm

    No, when you mention "assigning a fixed order" separately from "memorizing [the same] fixed order" — within the same sentence! — it has to be assumed that you meant to distinguish the two concepts.

    I don't question that it is essentially impossible to keep the order in mind.

  23. Twill said,

    October 19, 2020 @ 6:49 pm

    @Michael Watts I was particularly referring to the open-ended nature of hanzi, which at least means it is impossible to comprehensively collate graphs. By any traditional system that would also rule out any possibility of assigning a fixed position to any graph, but you are of course right that if you eschew those principles it is perfectly possible to come up with an (incomplete and unnavigable in itself) fixed order. Even so, I would submit that the nightmare that is Han unification means that you don't have a system that could unambiguously collate graphs in Unicode anyway.

    It's not a given that "alphabetic" orders have to be an arbitrary, "e is between d and f because that's where it's always been" affair. Japanese syllabaries, e.g., can either be ordered according to a structured table of onset + nucleus pairs, or according to a pretty poem. There are really a lot of things we could find fault with about the Latin or English alphabet (that so many letters rhyme that everyone who has communicated them has invented an alternative system of readings to avoid ambiguities is one).

  24. Chris Button said,

    October 20, 2020 @ 11:51 am

    Given that there are 30-odd distinct strokes used when writing Chinese characters, I've always wondered why more recent dictionaries weren't arranged according to those strokes on the basis of the standardized script (I'm not talking about existing systems with a handful of abstract strokes). In that sense it wouldn't be so different from arranging things according to the letters (broadly strokes) of an alphabet, albeit a little more time consuming. Minor regional differences in stroke order would really just be be akin to minor differences in spelling conventions with alphabets.

  25. rozele said,

    October 20, 2020 @ 5:41 pm

    >> What about the ordering of numbers? Any logic there?

    this is making me think about the various uses of letter-signs to indicate numbers, in various writing systems: latin alphabet [MCIX = 1109]; jewish abjad (hebrew, aramaic, &c) / alphabet (yiddish, judezmo, &c) [לו=36]; etc

    is there evidence for when that usage begins, in relation to the fixing of a canonical order of graphs? or thinking on whether there's a relationship there?

  26. rozele said,

    October 20, 2020 @ 6:23 pm

    also:

    several folks have pointed out problems with flanders' linguistics.

    i can add at least one regular old untruth, after looking at the 1813, 1818, and 1877 editions of the Yale College "Catalogue" (scanned at https://elischolar.library.yale.edu/yale_catalogue/), which was the school's standard official list of students and faculty until 1940.

    powers writes, presumably on flanders' authority:

    >>Yale College did not abandon the practice of listing its students by their families’ social status until 1886, when it finally adopted an alphabetical arrangement.

    i'd hoped seeing how exactly they did that ordering would give me an interesting glimpse of how exactly Yale understood and explicitly described social hierarchy in the u.s. but alas, it's just not so. students' names are alphabetical all the way back to the first published edition. (faculty aren't, but that's not the claim here.)

    makes me rather skeptical of anything in the book, i gotta say.

  27. ktschwarz said,

    October 20, 2020 @ 8:51 pm

    rozele, good catch on the Yale catalog. That appears to be the book reviewer's oversimplification of a couple of sentences in Flanders' book. In the Amazon preview, I see: (bold added)

    [chap. 9] "… the earliest surviving lists of students at Harvard and Yale Colleges show them ranked not according to their own merits, by examination results or by their conduct, but by their families' social status. It was not until 1886 that Yale began to list graduating students in alphabetical order."

    (This is footnoted to a secondary source.) Also, the bit about "the Latin word 'alphabeto’" is entirely the book reviewer's error: the book itself says

    It was around 200 CE that the theologian Hippolytus of Rome wrote the phrase "ex graecorum alphabeto," meaning "from the Greek alphabet"—the first surviving use of the word. By the fourth century St. Jerome was using the word as though it were* well known: his aim was "alphabet hebraicum discerem," "to learn the Hebrew alphabet."

    So you can probably get a fairer idea of the book by downloading the preview from Amazon, or reading somebody else's review.

    *Can't resist: "were" is a hypercorrection, since that's a factual inference, not a counterfactual.

  28. Chas Belov said,

    October 20, 2020 @ 10:58 pm

    @Michael Watts

    I don't understand this. Do you mean that you would have a candidate character in mind ("目 on the left, and 女 on the right"), and check the dictionary to see whether it existed or not? That would work fine in a dictionary organized in the traditional way, but it would appear to be impossible in a dictionary organized in alphabetical Pinyin order. How would you know where to look?

    Can't seem to find it but IIRC, there is an index, or rather, multiple indexes in the back. Radical index pointing to a radical-stroke index. Those are the ones I used. I think there was also a four-corner index that I never grokked.

  29. ktschwarz said,

    October 21, 2020 @ 4:43 am

    @Michael Watts, David Marjanović: since I'm looking at the Amazon preview, Flanders' book does say that phonetic writing was derived from Egyptian hieroglyphics, as David described; the book review left out that connection. The reference to "traders and mercenaries" comes from a 2005 publication on rock inscriptions along what was once a major trade route in Egypt: Darnell et al., "Two Early Alphabetic Inscriptions from the Wadi el-Ḥôl: New Evidence for the Origin of the Alphabet from the Western Desert of Egypt" (JSTOR). If I understand correctly, those authors argue that these inscriptions are closer to the origin of the alphabet than the ones in the Sinai mines, because the army would have employed Semitic-speaking scribes who had the opportunity to learn hieroglyphics and adapt them for their own language. I don't know where Flanders is getting "creole".

    @Rodger C, Flanders briefly mentions Maya glyphs, only to dismiss them as insufficiently phonetic.

    Anyway, this book is not about the history of writing or the alphabet (there are already enough popular books on those topics), it's about the history of indexing. Going from the concept of the alphabet to the concept of an alphabetically ordered catalog wasn't an obvious step.

RSS feed for comments on this post