Sweethoney dessert

Maidhc Mac Roibin sent in this photograph of the front of a dessert shop in Cupertino from Fintano's flickr site:

201908-PSP-R4-33 Sweethoney Dessert, SJ CA

Read the rest of this entry »

Comments (11)


Standardized Project Gutenberg Corpus

Martin Gerlach and Francesc Font-Clos, "A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics", arXiv 12/19/2018:

The use of Project Gutenberg (PG) as a text corpus has been extremely popular in statistical analysis of language for more than 25 years. However, in contrast to other major linguistic datasets of similar importance, no consensual full version of PG exists to date. In fact, most PG studies so far either consider only a small number of manually selected books, leading to potential biased subsets, or employ vastly different pre-processing strategies (often specified in insufficient details), raising concerns regarding the reproducibility of published results. In order to address these shortcomings, here we present the Standardized Project Gutenberg Corpus (SPGC), an open science approach to a curated version of the complete PG data containing more than 50,000 books and more than 3×109 word-tokens. Using different sources of annotated metadata, we not only provide a broad characterization of the content of PG, but also show different examples highlighting the potential of SPGC for investigating language variability across time, subjects, and authors. We publish our methodology in detail, the code to download and process the data, as well as the obtained corpus itself on 3 different levels of granularity (raw text, timeseries of word tokens, and counts of words). In this way, we provide a reproducible, pre-processed, full-size version of Project Gutenberg as a new scientific resource for corpus linguistics, natural language processing, and information retrieval.

Read the rest of this entry »

Comments (1)


Robot calligraphy

People's Daily video posted on illegal Twitter:

 

Read the rest of this entry »

Comments (17)


Beneath modern Melbourne lie(s) clues

Bob Ladd sent in a screenshot from the Guardian, with the message:

I think this suggests that, except with auxiliary verbs, subject-verb inversion is not really something that is fully a part of English speakers' competence any more. The agreement discrepancy of "clues" and "lies" would be instantly detectable in most other contexts, but not when it's required by residual English verb-second constraints.

He notes that the screen shot came "from first thing this morning UTC, but it was still up and uncorrected at mid-afternoon UTC". And he suggests that things would be very different with a copula or auxiliary verb, e.g. "Beneath modern Melbourne is two of the richest hoards of pirate gold ever found".

Read the rest of this entry »

Comments (8)


The semiotics of an East Asian hand gesture

These days people make all sorts of public and private hand gestures to convey a wide variety of information.  Innocent though they may seem, for various reasons many of them become controversial (e.g., the sign for "OK", which has recently been classified by some organizations as a symbol of hate).

These students at a university in China are making the sign of bǐxīn 比心.  According to their professor, it means "love you" or "give you my heart".

Read the rest of this entry »

Comments (7)


So

When I was skimming the transcript of the 12/19 Democratic presidential debate for "Warren vocal stereotypes", I noticed that several of the candidates started some of their answers to questions with "so". Among the dozen examples:

WOODRUFF: Senator Warren, why do you think — why do you think more Americans don't agree that this is the right thing to do? And what more can you say?
WARREN: So I see this as a constitutional moment.

WOODRUFF: Brief answers — brief responses from Mr. Steyer and Mr. Buttigieg.
STEYER: So let me say that I agree with Senator Warren in much of what she says.

WOODRUFF: Welcome back to the PBS NewsHour Democratic debate with Politico. And now it's time for closing statements. Each have 60 seconds, beginning with Mr. Steyer. […] Mayor Buttigieg?
BUTTIGIEG: So the nominee is going to have to do two things: defeat Donald Trump and unite the country as president.

Read the rest of this entry »

Comments (27)


Semiotic lesson of the week


Read the rest of this entry »

Comments (5)


Badge of honor: Language Log is blocked in China

Two days ago, I received this message from a colleague in China:

Not sure if this should be a badge of honor or a disappointment, but a few days ago Language Log got blocked in China.  (Source — GreatFire.org:  Language Log is 100% censored)

This caps off a miserable year where we also lost Wikipedia (all languages), The Guardian, Al Jazeera, Hackernews, Imgur….

[VHM:  Of course, Google, Facebook, Twitter, YouTube, and many other invaluable websites were already off-limits to Chinese citizens for years  The internet in China is severely decimated by the CCP government.]

Read the rest of this entry »

Comments (6)


Tao vs. Dao: amazing restaurant sign near UPenn

I've eaten in this hot pot (huǒguō / WG huo3-kuo1 / IPA [xwò.kwó] 火锅 / 火鍋) restaurant at 3717 Chestnut St. on a number of occasions, and each time I go, I am struck by the creative sign out front:

Read the rest of this entry »

Comments (11)


Warren vocal stereotypes

A recent WSJ editorial ("A $900 Bottle of Hypocrisy", 12/20/2019) engages Democratic presidential candidates, and especially Elizabeth Warren, on the issue of money in politics:

Few political spectacles are more amusing than watching Democrats who are millionaires attempting to deny that they consort with other millionaires, much less with dastardly billionaires. This was on extended display at Thursday’s presidential debate, and it offers a lesson about money and politics.

South Bend Mayor Pete Buttigieg has been raising millions of dollars in Silicon Valley, New York, Hollywood and other well-to-do progressive enclaves. This has riled Elizabeth Warren, who used to be a favorite of the wealthy liberal class but as a presidential candidate has taken a vow of non-association with the rich. Ms. Warren accused the young mayor of holding a fundraiser “in a wine cave full of crystals” and $900-a-bottle wine.

Read the rest of this entry »

Comments (18)


Too tired to love: new set phrases in Pinyin

Literary Sinitic / Classical Chinese has an extreme propensity for elision, truncation, and abbreviation, which is one of the factors that make it so hard to read.

Yesterday, we looked at the current Chinese proclivity for acronyms and initialisms, made much easier to produce and apply due to the use of digital technology and pinyin as part of an emerging Sinitic digraphia.  See "Chinese acronyms" (12/22/19).

In recent years, a new kind of quadrisyllabic "set phrase" has arisen in internet usage, one not based on historical allusion or other traditional source.  Here are seven typical examples:

Read the rest of this entry »

Comments (4)


Multicultural pork buns

Emery Snyder spotted this sign in New York City's Chinatown:


Read the rest of this entry »

Comments (3)


Moon ultra parking

Comments (10)