Digitizing specialized language dictionaries

[The following is a guest post by David Dettmann.  The "Schwarz Uyghur dictionary" to which he refers in the third paragraph is this:  Henry G. Schwarz, An Uyghur-English dictionary (Bellingham, Washington:  Center for East Asian Studies, Western Washington University, 1992).]

It is a bit of a nerdy obsession of mine to customize my computers to comfortably use languages that I've studied.

About 10 years ago, I got relatively proficient with using optical character recognition (OCR) software and scanner hardware. Any time I found an essential dictionary for the languages I studied, I converted them to unicode OCR scans in pdf format (i.e., converting images of pages to text). I later used that data to create dictionary content files that would work together with the Mac OS dictionary application. I did this process with several dictionaries that I found essential while I studied Kazakh, Uzbek, and Uyghur.

This process was particularly useful for me to use the Schwarz Uyghur dictionary. I could not get used to the alphabetical order that he favored (which was different from typical Latin order AND Uyghur Arabic script order). As a result, any lookup would just take forever. That said, the formatting of each page was quite pleasant, and there were some nice illustrations of plants of traditional Uyghur medicine as well as handy keys at the bottom of each page to explain abbreviations.

"Frequency illusion" in the OED

The latest batch of updates to the online edition of the Oxford English Dictionary includes a term that originated right here on Language Log, in a 2005 post by Arnold Zwicky. The term is frequency illusion, first attested in Arnold's classic post, "Just Between Dr. Language and I." Here is the OED treatment, an addition to the main entry for frequency:

frequency illusion n. a quirk of perception whereby a phenomenon to which one is newly alert suddenly seems ubiquitous.
Also called the Baader-Meinhof phenomenon (see Baader-Meinhof phenomenon at BAADER-MEINHOF n. 2).
2005   A. ZWICKY Lang. Log 7 Aug. in (blog, Internet Archive Wayback Machine 10 Sept. 2005)    Another selective attention the Frequency Illusion: once you've noticed a phenomenon, you think it happens a whole lot, even ‘all the time’.
2018   R. J. HILTON in J. Marques & S. Dhiman Engaged Leadership (e-book, accessed 25 June 2018) xiv. 244   The frequency illusion occurs when you buy a new car, and suddenly you see the same car everywhere. Or when a pregnant woman suddenly notices other pregnant women all over the place.

Corpora and the Second Amendment: “arms”

An introduction and guide to my series of posts "Corpora and the Second Amendment" is available here. The corpus data that is discussed can be downloaded here. That link will take you to a shared folder in Dropbox. Important: Use the "Download" button at the top right of the screen.

New URL for COFEA and COEME:

This post on what arms means will follow the pattern of my post on bear. I’ll start by reviewing what the Supreme Court said about the topic in District of Columbia v. Heller. I’ll then turn to the Oxford English Dictionary for a look at how arms was used over the history of English up through the end of the 18th century, when the Second Amendment was proposed and ratified.. And finally, I’ll discuss the corpus data.

Justice Scalia’s majority opinion had this to say about what arms meant:

The 18th-century meaning [of arms] is no different from the meaning today. The 1773 edition of Samuel Johnson’s dictionary defined ‘‘arms’’ as ‘‘[w]eapons of offence, or armour of defence.’’ Timothy Cunningham’s important 1771 legal dictionary defined ‘‘arms’’ as ‘‘any thing that a man wears for his defence, or takes into his hands, or useth in wrath to cast at or strike another.’’ [citations omitted]

As was true of what Scalia said about the meaning of bear, this summary was basically correct as far as it went, but was also a major oversimplification.

A corpus-linguistic take on "emolument(s)" (updated)

From the Washington Post:

The study is a corpus analysis performed by Jesse Egbert, a corpus linguist at Northern Arizona University and Clark Cunningham, a law professor who did work in law and linguistics from the late 1980s through the mid-1990s (link, link, link, link), including co-authoring an article with Chuck Fillmore that was what really opened my eyes to the power of linguistics in analyzing issues of word meaning.

Corpora and the Second Amendment: “bear”

An introduction and guide to my series of posts "Corpora and the Second Amendment" is available here. The corpus data that is discussed can be downloaded here. That link will take you to a shared folder in Dropbox. Important: Use the "Download" button at the top right of the screen.

New URL for COFEA and COEME:

Starting with this post, I’m (finally) getting to the meat of what I’ve called “the coming corpus-based reexamination of the Second Amendment.” The plan, as I’ve said before, is to more or less mirror the structure of the Supreme Court’s analysis of keep and bear arms. This post will focus on bear, and subsequent posts will focus separately on arms, bear arms, and keep and bear arms; I won’t be separately discussing keep arms because I have nothing to say about it. [Update: If you're confused about why I'm following this approach, as one of the commenters was, I've offered an explanation at the end of the post.]

In discussing the meaning of the verb bear, Justice Scalia’s majority opinion in District of Columbia v. Heller said, “At the time of the founding, as now, to ‘bear’ meant to ‘carry.’’’ That statement was backed up by citations to distinguished lexicographic authority—Samuel Johnson, Noah Webster, Thomas Sheridan, and the OED—but evidence that was not readily available when Heller was decided shows that Scalia’s statement was very much an oversimplification. Although bear was sometimes used in the way that Scalia described, it was not synonymous with carry and its overall pattern of use was quite different.

A new and useful dictionary of Sinographs

We have often noted how much easier it is to learn Chinese now than it was just ten or twenty years ago.  That's because of all the new digital resources that have become available in recent years:

Of course, there are a lot quick fix programs out there, and one should be wary of them:

But every so often a really good resource comes along, and I should like to introduce one such in this post.

The concept of word in Sinitic

In the following posts, we've been tackling the thorny, multifaceted question of whether Vietnamese has words and lexemes, as opposed to having syllables and morphemes:

During the course of our discussions, the parallel question of whether Sinitic had words or not also came up.  Let me put it this way:  although there was no concept of "word" in Sinitic before the 20th century, there were Sinitic words, going all the way back to the oracle bone inscriptions (the first stage of Chinese writing) more than three thousand years ago, as documented in these posts and dozens of others:

Corpora and the Second Amendment: Heller

[An introduction and guide to my series of posts "Corpora and the Second Amendment" is available here. The corpus data that is discussed can be downloaded here. That link will take you to a shared folder in Dropbox. Important: Use the "Download" button at the top right of the screen.]

Before I get into the corpus data (next post, I promise), I want to set the stage by talking a bit about the Heller decision. Since the purpose of this series of posts is to show the ways in which the corpus data casts doubt on the Supreme Court's interpretation of keep and bear arms, I'm going to review the parts of the decision that are most relevant to that purpose. I'm also going to point out several ways in which I think the Court's linguistic analysis is flawed even without considering the corpus data. Although that wasn't part of my plans when I began these posts, this project has led me to read Heller more closely than I had done before and therefore to see flaws that had previously escaped my notice. And I think that being aware of those flaws will be important when the time comes to decide whether  and to what extent the data undermines Heller's analysis.

The Second Amendment's structure

As is well known (and as has been discussed previously on Language Log here, here, and here), the Second Amendment is unusual in that it is divided into two distinct parts, which the Court in Heller called the "prefatory clause" and the "operative clause":

Corpora and the Second Amendment: Weisberg responds to me; plus update re OED

An introduction and guide to my series of posts "Corpora and the Second Amendment" is available here. The corpus data that is discussed can be downloaded here. That link will take you to a shared folder in Dropbox. Important: Use the "Download" button at the top right of the screen.

New URL for COFEA and COEME:

Two quick updates.

First, David Weisberg has replied to my response to his post on the Originalism Blog, but he doesn't address the point that I made, which was that I disagreed with his framing of the issue.

Weisberg also notes that I didn't respond to the second point in his original post (which dealt with a purely legal issue), and he goes on to say this:

Many people (and I think Goldfarb is one) believe the correct sense of the 2nd Amend is this: “The right of the people to keep and bear Arms, for use in a State’s well regulated Militia, shall not be infringed.” But, if that is what the framers meant, why isn’t that what they wrote? I think that is a very fair question to ask, and it merits an answer. After all, 5 words would have been saved. Will corpus linguistics provide an answer?

I'm not going to offer any views in this series of posts about how I think the Second Amendment as a whole should be interpreted; I'm focusing only on Heller's interpretation of the phrase keep and bear arms. So I'm not going to say whether Weisberg is correct in his speculation about what I think on that score. Weisberg then asks why, if the framers had intended to convey the meaning he posits, they didn't write the amendment in those terms. Although Weisberg thinks that is "a very fair question to ask," I don't think it's a question that's relevant to the issue as the Court framed it in Heller, which had to do with how the Second Amendment's text was likely to have been understood by members of the public, not with what the framers intended. Nevertheless, I'll say that the question to which Weisberg wants an answer is not one that can be answered by corpus linguistics.

Really weird sinographs

Scott Wilson has written an entertaining, and I dare say edifying, article on "W.T.F. Japan: Top 5 strangest kanji ever 【Weird Top Five】", SoraNews24 (10/6/16) — sorry I missed it when it first came out.  Wilson refers to the "Top 5 strangest kanji", but he actually treats nearly three times that many.  The reason he emphasizes "5" is so that he can stick with his theme of W.T.F., cf.:

Scott Wilson, "W.T.F. Japan: Top 5 most difficult kanji ever【Weird Top Five】", SoraNews24 (8/4/16)

Scott Wilson, "W.T.F. Japan: Top 5 kanji with the longest readings【Weird Top Five】", SoraNews24 (4/20/17)

Webster’s Second and Webster’s Third: Editors going against stereotype

One of the most well-known pieces of lexicographic history is the controversy that greeted the publication of Webster’s Third New International Dictionary. Whereas the predecessor of W3, Webster’s Second New etc., had been regarded as authoritatively prescriptive, W3 was condemned in the popular media for its descriptive approach, the widespread perception of which can be boiled down to “anything goes.” (For the details, see The Story of Webster’s Third by Herbert Morton and The Story of Ain’t by David Skinner.)

I recently came across two articles that seem to be largely unknown but deserve wider attention— one by the General Editor of W2 (Thomas Knott), and the other by the Editor-in-Chief of W3 (Philip Gove). Each article is notable by itself because it fleshes out the author’s attitude toward usage and correctness, and does so in a way that undermines the stereotype that is associated with the dictionary each one worked on. And when the two articles are considered together, they suggest that despite the very different reputation of the two dictionaries, the authors’ attitudes toward usage and correctness probably weren’t far apart.

OED on the language of sexual and gender identity

On Twitter, Katherine Connor Martin (Head of U.S. dictionaries at Oxford University Press) writes:

In the latest @oed update, dozens of entries relating to sexual and gender identity were revised, the first phase of a project to revisit this rapidly changing segment of the English lexicon.

She links to the lengthy Release Notes, of which the following is just the introduction:

Ask Language Log: Looking up hanzi for ignoramuses

From Mark Meckes:

I'm a regular Language Log reader, completely ignorant of Chinese languages.  I was just wondering whether there exist worthwhile online tools to help someone like me figure out the meaning of something written only in hanzi.  (The question is occasioned by my looking at a package of tea given to me by a Chinese student; the writing on the package is mostly hanzi, with a little English and no pinyin.)  I'm perfectly competent to use Google Translate and similar tools (and know how much skepticism to approach the results with) for the last stage of the process.  But starting from written hanzi on a physical object, I first need some way to translate that image into either pinyin, Unicode, English, or something equivalent to one of the above — and something that relies on no knowledge of the meaning or pronunciation of the characters, or knowledge of the structure of Chinese characters in general.  Do you have any suggestions?

