Language Log

Tortured phrases, LLMs, and Goodhart's Law

June 20, 2023 @ 8:55 am · Filed by Mark Liberman under Uncategorized

A few years ago, I began to notice that the scientific and technical papers relentless spammed at me, by academia.edu and similar outfits, were becoming increasingly surrealistic. And I soon learned that the source for such articles was systems for "article spinning" by "rogeting" — automatic random subsitution of (usually inappropriate) synonyms. Those techniques were originally developed many years ago for spamdexing, i.e. generating "link farms" of fake pages, in order to fool search engine ranking systems by evading simple forms of content similarity detection,

And the same techniques also fool simple systems for plagiarism detection — though the incoherent results are not useful for student papers, at least in cases where instructors actually read the submissions. But the same time period saw the parallel growth of predatory publishing (and analogous developments among generally reputable publishers), and the use of mindless quantitative publication metrics to evaluate researchers, faculty and institutions. The result: an exponential explosion of "tortured phrases" in the scientific, technical, and scholarly literature: "talk affirmation" for "speech recognition", "straight expectation" for "linear prediction", "huge information" for "big data", "gullible Bayes" for "naive Bayes", "irregular woodland" for "random forest", "savvy home" for "smart home", and so on.

Read the rest of this entry »

Permalink Comments (3)

Saturn < Cronus (Κρόνος) ≠ Chronos (Χρόνος)

June 17, 2023 @ 11:50 pm · Filed by Victor Mair under Language and art, Language and politics

[This is a guest post by Jichang Lulu, with some minor modifications and additions by VHM]

You might have seen this — the PRC embassy in Poland has given Badiucao's forthcoming exhibition in Warsaw (coorganised by Sinopsis) some very welcome, completely unexpected publicity by trying to have it shut down. Lots of international reporting:

The Guardian, Sydney Morning Herald, &c., &c.

The ‘cannibalistic’ theme (picture below [with Badiucao standing next to the poster featuring his art] via the Sydney Morning Herald):

of course alludes to Cronus eating his sons, as in Hesiod:

Read the rest of this entry »

Permalink Comments (14)

"Throw a photo" in South Florida English

June 17, 2023 @ 3:45 pm · Filed by Victor Mair under Borrowing, Language contact

Article by Phillip M. Carter in The Conversation (6/12/23):

"Linguists have identified a new English dialect that’s emerging in South Florida"

Beginning sentences:

“We got down from the car and went inside.”

“I made the line to pay for groceries.”

“He made a party to celebrate his son’s birthday.”

These phrases might sound off to the ears of most English-speaking Americans.

In Miami, however, they’ve become part of the local parlance.

According to my recently published research, these expressions – along with a host of others – form part of a new dialect taking shape in South Florida.

This language variety came about through sustained contact between Spanish and English speakers, particularly when speakers translated directly from Spanish.

Read the rest of this entry »

Permalink Comments (32)

Ancient eggcorns

June 17, 2023 @ 8:29 am · Filed by Mark Liberman under Eggcorns

The word eggcorn was originally proposed in a LLOG post almost 20 years ago — "Egg corns: folk etymology, malapropism, mondegreen, ???", 9/23/2003. And the word is now recognized by most current English dictionaries and other relevant sources, which gloss it variously, e.g. —

the Oxford English Dictionary, ("An alteration of a word or phrase through the mishearing or reinterpretation of one or more of its elements as a similar-sounding word")
Merriam-Webster: ("a word or phrase that sounds like and is mistakenly used in a seemingly logical or plausible way for another word or phrase either on its own or as part of a set expression")
Wiktionary: ("A word or phrase that sounds like and is mistakenly used in a seemingly logical or plausible way for another word or phrase either on its own or as part of a set expression")
the Collins English Dictionary: ("a malapropism or misspelling arising from similarity between the sound of the misspelled or misused word and the correct one in the accent of the person making the mistake")
the American Heritage Dictionary, ("A series of words that result from the misunderstanding of a word or phrase as some other word or phrase having a plausible explanation")
Wikipedia: ("An eggcorn is the alteration of a phrase through the mishearing or reinterpretation of one or more of its elements, creating a new phrase having a different meaning from the original but which still makes sense and is plausible when used in the same context")

Those sources cite the examples eggcorn, to the manor born, old-timers' disease, ex-patriot, for all intensive purposes, feeble position, free reign, wipe board, card shark, and so on. Many more can be found at Chris Waigl's Eggcorn Database.

This morning, I'm appealing for help in answering two questions: What are some examples of eggcorns in other languages? And what are the earliest documented (or reconstructed) examples?

Read the rest of this entry »

Permalink Comments (148)

Coors Light Bear

June 17, 2023 @ 4:28 am · Filed by Mark Liberman under Humor

An NFL policy prohibits players from endorsing alcoholic beverages. So Coors found a linguistic work-around:

Read the rest of this entry »

Permalink Comments (6)

Rivers and lakes: quackery

June 16, 2023 @ 2:03 pm · Filed by Victor Mair under Language and literature, Language and medicine, Metaphors

Get ready to go a-wanderin'. I'll take you down to the rivers and lakes, and we shall lose ourselves in them, get lost from the hurlyburly hustlebustle of the mundane world. That's what jiānghú 江湖 ("rivers and lakes") is all about. It's where you go to xiāoyáo yóu 逍遙遊 ("wander freely / carefreely / leisurely").

The first occurrence of jiānghú 江湖 in traditional Chinese literature is to be found in the Zhuāng Zǐ 莊子 ("Master Zhuang") (late 4th-early 3rd BC), which happens to be my favorite work of ancient Chinese literature:

Quán hé, yú xiāngyǔ chǔ yú lù, xiāng xǔ yǐ shī, xiāng rú yǐ mò, bùrú xiāngwàng yú jiānghú.

泉涸，魚相與處於陸，相呴以溼，相濡以沫，不如相忘於江湖。

"When springs dry up, fish huddle together on the land. They blow moisture on each other and keep each other wet with their slime. But it would be better if they could forget themselves in the rivers and lakes."

VHM, tr., Wandering on the Way: Early Taoist Tales and Parables of Chuang Tzu (New York: Bantam, 1994), p. 53.

Read the rest of this entry »

Permalink Comments (10)

"Syllabolic"?

June 15, 2023 @ 4:58 pm · Filed by Mark Liberman under Words words words

On June 1 in Iowa, Donald Trump gave a speech in which he attacked Ron DeSantis from several angles. One of them was DeSantis' variation in pronunciation of his last name (see "Pronouncing 'DeSantis'", 6/3/2023), which Trump characterized as "changing his name", while introducing a puzzling (but promising?) new linguistic term, "syllabolic":

But uh he's going around saying "oh well I can serve for eight years
it takes eight years to fix it".
No he made a big mistake —
uh just like you don't change your name
in the middle of a uh election.

Changed his name in the middle of the election, you don't do that.
You do it before, or after, but ideally you don't do it at all.

I liked it before anyway, I liked his name better before,
I don't like the name change, shall we tell him that?

uh but uh most people don't know what I mean,
no he's actually sort of changed a name.

It's uh syllabolic, they call it,
wants a syllabolic name.

Read the rest of this entry »

Permalink Comments (14)

The Cantophone and the state

June 15, 2023 @ 7:19 am · Filed by Victor Mair under Announcements, Language and literature, Language and politics, Topolects

Cantonese — its nature, its status, its past, present, and future, its place in the realm of Sinitic languages and in the world — has been one of the chief foci of Language Log. Consequently, it is my great pleasure to announce the publication of the three-hundred-and-thirty-fourth issue of Sino-Platonic Papers:

“The Concept of the Cantophone: Memorandum for a Stateless Literary History,” by Wayne C. F. Yeung.

https://sino-platonic.org/complete/spp334_cantophone.pdf

This is a landmark work of scholarship that penetratingly probes the position of Cantonese — and thereby all "Chinese" topolects — in the complex mix of language, literature, nation, politics, and culture.

Read the rest of this entry »

Permalink Comments (1)

"Tortured syllables"?

June 14, 2023 @ 10:16 am · Filed by Mark Liberman under Phonetics and phonology, Variation

"Language change (about to be?) in progress" (6/12/2023) linked to media commentary on divergent features of Northeast Philadelphia speech, e.g. "Side effect of the highway collapse: A perfect example of Northeast Philly hoagiemouth", Billy Penn 6/11/2-23. Some of the characterization was extremely evaluative:

Philadelphians have perfected torturing vowels like medieval Europe perfected torturing people. Every syllable is drawn and quartered, chained to the breaking wheel, boiled alive. https://t.co/frLFfwG3NR

— Erin "MY BOXES" Ryan (@morninggloria) June 11, 2023

The Billy Penn article was gentler and more descriptive:

You can really hear the accent in the elongated roundness of all the “ooo” words he speaks, the way he drags out the end of others, and how he softens each and every consonant (“phouen,” “tex messagessss,” “schreenshoz”).

But in fact, none of the commentary describes this man's speech in an accurate way.

Read the rest of this entry »

Permalink Comments (17)

Indigenous languages of Taiwan

June 14, 2023 @ 6:35 am · Filed by Victor Mair under Language and religion, Language extinction, Language preservation

How many are there?

Taiwan’s unrecognized indigenous tribes are reviving dead languages to achieve recognition

There are currently 16 officially recognized indigenous peoples in Taiwan. The Pingpu — which comprise 10 groups on the island’s lowlands — are lobbying to make that number 17, and they’re doing it by reviving lost languages and culture.

By Jordyn Haime, The China Project (6/5/23)

In contemporary Mandarin, many of the speakers of these languages are called shāndì tóngbāo 山地同胞 ("mountain countrymen / compatriots"), which meshes well with the opening paragraph of Haime's article:

Long before Chinese settlers came to the flat, sprawling lands of the Pingtung plain — the southern Taiwanese county now known for its pineapple and mango production — the area was inhabited by Pingpu (plains indigenous) tribes like the Makatao. Waves of colonization pushed indigenous tribes from their ancestral lands and closer to the mountains, or in some cases, to the other side of the island.

Read the rest of this entry »

Permalink Comments (15)

Victorious Secret

June 13, 2023 @ 5:25 am · Filed by Mark Liberman under Humor, Language and gender, Language and music

The next event in the Salon Sanctuary concert series is "Victorious Secret: Love Gamed and Gender Untamed in the Sparkling Courts of the Baroque":

Before the bars of gender binaries caged the mainstream operatic imagination, a golden age of fluidity guided the vocal soundscape. Virility declared itself with the castrato’s clarion high notes, while femininity spoke in earthy tessiture that plunged to shimmering depths.

Texts of the period revel in ambiguity, unfurling genderless narratives of anonymous lovers and unnamed beloveds. Stories of active pursuit and passive reverie remain alike at loose ends, with neat resolutions many movements away.

Please join us for this special program in honor of Pride Month, as the music of the past reveals a golden underground of nonbinary riches, accompanying us in our witness to a new Renaissance.

Read the rest of this entry »

Permalink Comments (15)

Is it a rat's head or a duck's neck?

June 12, 2023 @ 11:20 pm · Filed by Victor Mair under Language and food, Proverbs

Main dish served as part of a college cafeteria lunch in Nanchang, China:

Read the rest of this entry »

Permalink Comments (8)

Old Sinitic "wheat" and Early Middle Sinitic "camel"

June 12, 2023 @ 3:56 pm · Filed by Victor Mair under Borrowing, Etymology, Phonetics and phonology

[This is a guest post by Chris Button]

OC uvulars tended to condition rounding (e.g OC q- becoming EMC kw-). In the case of ʁ-, we sometimes get m- (for a modern-day example, note how惟, which also had a ʁ- onset in Old Chinese, gives an m- reflex in Fuzhou Min). The classic example is 卯, where Pulleyblank once postulated ʁ- and Li Fang-kuei notes lack of evidence for a cluster, such as ml- or mr-, in its Tai loan. Unfortunately Li’s Tai evidence tends to either be ignored (e.g. 丑 ^hr- is often erroneously reconstructed with a nasal ^hn- based on misleading xiesheng evidence) or overly literally interpreted (e.g. 戌 χ- being treated as something like sm-).

Read the rest of this entry »

Permalink Comments (24)

« Previous Page — « Previous Entries

Next Entries » — Next Page »

Language Log

Tortured phrases, LLMs, and Goodhart's Law

Saturn < Cronus (Κρόνος) ≠ Chronos (Χρόνος)

"Throw a photo" in South Florida English

Ancient eggcorns

Coors Light Bear

Rivers and lakes: quackery

"Syllabolic"?

The Cantophone and the state

"Tortured syllables"?

Indigenous languages of Taiwan

Victorious Secret

Is it a rat's head or a duck's neck?

Old Sinitic "wheat" and Early Middle Sinitic "camel"

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta