Language Log

Living fossils: Taiwan tea and salmon

December 31, 2019 @ 7:31 pm· Filed by Victor Mair under Classification, Language and biology, Language and food

Two articles in Chinese (here and here) recently brought news of an indigenous type of tea and referred to it as a rare type of salmon. Trying to figure that out led to two linguistic puzzles:

1. Making sense of the unusual name for the salmon: yīnghuā gōu wěn guī 櫻花鉤吻鮭 (lit., "cherry-hook-kiss / mouth-salmon"; i.e., the Formosan landlocked salmon).

2. Understanding how, even metaphorically, a kind of tea would be referred to as a type of salmon.

Read the rest of this entry »

Permalink Comments (4)

Throes?

December 30, 2019 @ 4:38 pm· Filed by Mark Liberman under Words words words

"Dave Barry's Year in Review 2019"

… which begins with the federal government once again in the throes (whatever a “throe” is) of a partial shutdown, which threatens to seriously disrupt the lives of all Americans who receive paychecks from the federal government.

Consulting the OED on throe (entry updated 2017), we learn that its orthographic history is interesting:

Of uncertain origin. Perhaps a variant or alteration of another lexical item. […]
The range of forms attested for this word is difficult to account for. […]
The current standard spelling throe […] is a 16th-cent. alteration of throw, throwe […] (compare with similar alteration the current forms of roe (earlier row , rowe ), hoe (earlier how , howe ), etc.), perhaps motivated by a desire to differentiate this word from throw.

Read the rest of this entry »

Permalink Comments (9)

New Years party themes

December 30, 2019 @ 11:11 am· Filed by Mark Liberman under Language and history, Linguistics in the comics

Today's xkcd:

The mouseover title: ""Off-by-one errors" isn't the easiest theme to build a party around, but I've seen worse."

Read the rest of this entry »

Permalink Comments (10)

An 8th-century Chinese epitaph written by a Japanese courtier

December 30, 2019 @ 3:15 am· Filed by Victor Mair under Epigraphy, Language and history

Here's news of a remarkable discovery:

"Ancient Chinese epitaph penned by Japanese found in China", THE ASAHI SHIMBUN (December 26, 2019 at 19:00 JST).

The article includes a photograph of a rubbing of the last line of the epitaph with the following kanji:

日本國朝臣備書

I can read that easily as Sino-Japanese "Nihonkoku chōshin Bi sho", which would mean "written by the Japanese courtier [Ki]bi". The article says that the last line of the epitaph reads “Nihonkoku Ason Bi Sho", so it would appear that I am reading "朝臣" incorrectly as "chōshin" instead of as "ason".

Read the rest of this entry »

Permalink Comments (5)

Meanest pun of the year

December 29, 2019 @ 7:15 pm· Filed by Mark Liberman under Humor

From "Who's Bill This Time", Wait Wait…Don't Tell Me! 12/21/2019:

Peter Sagal: Mayor- Mayor Pete has been getting some heat.
I don't know if you saw this.
He attended a big fundraiser in Napa
at a winery with a, quote, "wine cave."
And everybody was so mad that he did this.
But why would you be mad about a wine cave?
It celebrates the two things Democrats are known for, whining and caving.

Permalink Comments (5)

Sweethoney dessert

December 29, 2019 @ 8:48 am· Filed by Victor Mair under Diglossia and digraphia, Language and business, Signs

Maidhc Mac Roibin sent in this photograph of the front of a dessert shop in Cupertino from Fintano's flickr site:

Read the rest of this entry »

Permalink Comments (11)

Standardized Project Gutenberg Corpus

December 28, 2019 @ 11:16 am· Filed by Mark Liberman under Computational linguistics

Martin Gerlach and Francesc Font-Clos, "A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics", arXiv 12/19/2018:

The use of Project Gutenberg (PG) as a text corpus has been extremely popular in statistical analysis of language for more than 25 years. However, in contrast to other major linguistic datasets of similar importance, no consensual full version of PG exists to date. In fact, most PG studies so far either consider only a small number of manually selected books, leading to potential biased subsets, or employ vastly different pre-processing strategies (often specified in insufficient details), raising concerns regarding the reproducibility of published results. In order to address these shortcomings, here we present the Standardized Project Gutenberg Corpus (SPGC), an open science approach to a curated version of the complete PG data containing more than 50,000 books and more than 3×10⁹ word-tokens. Using different sources of annotated metadata, we not only provide a broad characterization of the content of PG, but also show different examples highlighting the potential of SPGC for investigating language variability across time, subjects, and authors. We publish our methodology in detail, the code to download and process the data, as well as the obtained corpus itself on 3 different levels of granularity (raw text, timeseries of word tokens, and counts of words). In this way, we provide a reproducible, pre-processed, full-size version of Project Gutenberg as a new scientific resource for corpus linguistics, natural language processing, and information retrieval.

Read the rest of this entry »

Permalink Comments (1)

Robot calligraphy

December 27, 2019 @ 10:34 am· Filed by Victor Mair under Language and computers, Writing, Writing systems

People's Daily video posted on illegal Twitter:

Chinese calligraphy, an artistic expression of human language in a tangible form, is not exclusively for mankind anymore. This robot not only masters the art, but it is also designed to preserve the character calligraphy culture. pic.twitter.com/IrFZ2rlOaZ

— People's Daily, China (@PDChina) December 25, 2019

Read the rest of this entry »

Permalink Comments (17)

Beneath modern Melbourne lie(s) clues

December 27, 2019 @ 8:53 am· Filed by Mark Liberman under Historical linguistics, Syntax

Bob Ladd sent in a screenshot from the Guardian, with the message:

I think this suggests that, except with auxiliary verbs, subject-verb inversion is not really something that is fully a part of English speakers' competence any more. The agreement discrepancy of "clues" and "lies" would be instantly detectable in most other contexts, but not when it's required by residual English verb-second constraints.

He notes that the screen shot came "from first thing this morning UTC, but it was still up and uncorrected at mid-afternoon UTC". And he suggests that things would be very different with a copula or auxiliary verb, e.g. "Beneath modern Melbourne is two of the richest hoards of pirate gold ever found".

Read the rest of this entry »

Permalink Comments (8)

The semiotics of an East Asian hand gesture

December 26, 2019 @ 10:45 pm· Filed by Victor Mair under Semiotics, Sign language

These days people make all sorts of public and private hand gestures to convey a wide variety of information. Innocent though they may seem, for various reasons many of them become controversial (e.g., the sign for "OK", which has recently been classified by some organizations as a symbol of hate).

These students at a university in China are making the sign of bǐxīn 比心. According to their professor, it means "love you" or "give you my heart".

Read the rest of this entry »

Permalink Comments (7)

So

December 26, 2019 @ 10:12 am· Filed by Mark Liberman under Words words words

When I was skimming the transcript of the 12/19 Democratic presidential debate for "Warren vocal stereotypes", I noticed that several of the candidates started some of their answers to questions with "so". Among the dozen examples:

WOODRUFF: Senator Warren, why do you think — why do you think more Americans don't agree that this is the right thing to do? And what more can you say?
WARREN: So I see this as a constitutional moment.

WOODRUFF: Brief answers — brief responses from Mr. Steyer and Mr. Buttigieg.
STEYER: So let me say that I agree with Senator Warren in much of what she says.

WOODRUFF: Welcome back to the PBS NewsHour Democratic debate with Politico. And now it's time for closing statements. Each have 60 seconds, beginning with Mr. Steyer. […] Mayor Buttigieg?
BUTTIGIEG: So the nominee is going to have to do two things: defeat Donald Trump and unite the country as president.

Read the rest of this entry »

Permalink Comments (27)

Semiotic lesson of the week

December 26, 2019 @ 8:47 am· Filed by Mark Liberman under Gesture

Five-year-old girl gives audience middle finger for 20 minutes while starring as angel in nativity play https://t.co/6sYPlbzrzC

— The Independent (@Independent) December 17, 2019

Read the rest of this entry »

Permalink Comments (5)

Badge of honor: Language Log is blocked in China

December 26, 2019 @ 8:41 am· Filed by Victor Mair under Language and business, Language and computers, Language and politics, Language and science

Two days ago, I received this message from a colleague in China:

Not sure if this should be a badge of honor or a disappointment, but a few days ago Language Log got blocked in China. (Source — GreatFire.org: Language Log is 100% censored)

This caps off a miserable year where we also lost Wikipedia (all languages), The Guardian, Al Jazeera, Hackernews, Imgur….

[VHM: Of course, Google, Facebook, Twitter, YouTube, and many other invaluable websites were already off-limits to Chinese citizens for years The internet in China is severely decimated by the CCP government.]

Read the rest of this entry »

Permalink Comments (6)

Archive for December, 2019

Living fossils: Taiwan tea and salmon

Throes?

New Years party themes

An 8th-century Chinese epitaph written by a Japanese courtier

Meanest pun of the year

Sweethoney dessert

Standardized Project Gutenberg Corpus

Robot calligraphy

Beneath modern Melbourne lie(s) clues

The semiotics of an East Asian hand gesture

So

Semiotic lesson of the week

Badge of honor: Language Log is blocked in China

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta