AI based on Xi Jinping Thought

It's hard to believe they're serious about this:

China rolls out large language model based on Xi Jinping Thought

    Country’s top internet regulator promises ‘secure and reliable’ system that is not open-sourced
    Model is still undergoing internal testing and is not yet available for public use

Sylvie Zhuang in Beijing
Published: 7:57pm, 21 May 2024

It's the antithesis of open-sourced, i.e., it's close-sourced.  What are the implications of that for a vibrant, powerful system of thought?

Read the rest of this entry »

Comments (6)

Brazilian eggcorns

From André Vítor Camargo De Toledo:

Original: "Ideia de Jerico" (An ass's idea)
Eggcorn: "Ideia de Girino" (A tadpole's idea)
Why it happened: "Jerico" is almost a fossil word, and, to most people, only ever shows up when used in that idiom. It's an old word for "ass", which, as an animal, is associated with intellectual dullness here, so the idiomatic expression translates to "a dumb idea." Its meaning is preserved in the misheard version, as one would suppose tadpole's aren't much brighter than asses.

Original: "internet discada" (dial-up internet)
Eggcorn: "internet de escada" ("staircase internet")
Why it happened: Millennials like me tend to use the term "dial up internet" to refer to any kind of bad internet connection. Younger generations, not knowing what dial-up internet is, interpret it as "staircase internet", which makes sense, as people are generally much slower walking up staircases than we normally walk.

Original: "Não é da minha alçada" (not of my jurisdiction)
Eggcorn: "Não é da minha ossada" (not from my skeleton)
Why it happened: just a misheard expression. It means "that trouble doesn't belong to me" in both cases; one is a legal analogy while the other is an anatomical analogy, perhaps influenced by the idea that Eve was originally one of Adam's bones.

Read the rest of this entry »

Comments (1)

Language Log asks: Mari Sandoz

In preparation for my run across Nebraska during the month of June, I'm boning up on the land, culture, and history of the state.  It wasn't long in my researches before I encountered the esteemed writer Marie Sandoz (1896-1966).  Hers is one of the most touching stories about a writer, nay, a human being, that I have ever read.  She has much to tell us about her language background and preferences, and how she had to struggle with her publishers to retain them in the face of standardization.

She became one of the West's foremost writers, and wrote extensively about pioneer life and the Plains Indians.

Marie Susette Sandoz was born on May 11, 1896 near Hay Springs, Nebraska, the eldest of six children born to Swiss immigrants, Jules and Mary Elizabeth (Fehr) Sandoz. Until the age of 9, she spoke only German. Her father was said to be a violent and domineering man, who disapproved of her writing and reading. Her childhood was spent in hard labor on the home farm, and she developed snow blindness in one eye after a day spent digging the family's cattle out of a snowdrift.

Read the rest of this entry »

Comments (10)

Bloom filters

Today's xkcd:

According to Wikipedia,

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". […]

Read the rest of this entry »

Comments (3)

Linguistic capture errors

Back in 2008, Arnold Zwicky described a category of typos that he called  "completion errors":

…a "completion error", a typo that results you start writing or typing a word and then drift part-way in to another word.  I do this all too often with -ation and -ating words — starting the verb COOPERATING but ending up with COOPERATION, for instance.  And several people have reported on the American Dialect Society mailing list that their intention to type LINGUISTS frequently leads them into LINGUISTICS, which then has to be truncated.  (This discussion on ADS-L followed my typing "original Broadway case", with CASE instead of CAST, and commenting on it.) 

26 years earlier, David Rumelhart and Donald Norman used the term "capture errors" for this phenomenon ("Simulating a skilled typist: A study of skilled cognitive-motor performance", Cognitive Science 1982:

This category of error occurs when one intends to type one sequence, but gets "captured" by another that has a similar beginning (Norman, 1981). Examples include:

efficiency – > efficient
incredibly – > incredible
normal – > norman

Read the rest of this entry »

Comments (13)

Google AI Overview has a ways to go

…or maybe I should say, "is deeply stupid, so far".

At least, that's the verdict from my first encounter with this heralded innovation.

I updated a Chromebook, re-installed Linux, and thought (incorrectly) that I might need to add repositories in order to install some non-standard apps like R and Octave and Emacs. (Never mind if that's all opaque to you — AI supposedly knows its way around basic tech stuff…)

So I googled "how to install R in linux on a chromebook", and got this:

Read the rest of this entry »

Comments (8)

Linguistic evidence for migration to the Americas from Siberia

1st Americans came over in 4 different waves from Siberia, linguist argues:  The languages of the earliest Americans evolved in 4 waves, according to one expert.

By Kristina Killgrove, Live Science (May 3, 2024)

Killgrove reports:

Indigenous people entered North America at least four times between 12,000 and 24,000 years ago, bringing their languages with them, a new linguistic model indicates. The model correlates with archaeological, climatological and genetic data, supporting the idea that populations in early North America were dynamic and diverse.

Nearly half of the world's language families are found in the Americas. Although many of them are now thought extinct, historical linguistics analysis can survey and compare living languages and trace them back in time to better understand the groups that first populated the continent.

Read the rest of this entry »

Comments (3)

One more for the "passive voice" files

There have been many LLOG posts on misuse of the term "passive voice", going back to 2003. As far as I can tell, the most recent post was "'Is it the passive voice you don't like?'", 8/11/2021.

In "'Passive Voice' — 1397-2009 — R.I.P", I wrote that

the traditional sense of passive voice has died after a long illness. It has ceased to be; it's expired and gone to meet its maker, kicked the bucket, shuffled off this mortal coil, rung down the curtain and joined the choir invisible. It's an ex-grammatical term.

Its ghost walks in the linguistics literature and in the usage of a few exceptionally old-fashioned intellectuals. For everyone else, what passive voice now means is "construction that is vague as to agency".

Today, Ambarish Sridharanarayanan sent me a link to a piece of writing that illustrates the issue perfectly:

The press release makes heroic use of the passive voice to obscure the actors: “an unprecedented sequence of events whereby an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services ultimately resulted in the deletion of UniSuper’s Private Cloud subscription.”


Comments (12)

Retraction watch: Irish roots of "french fries"?

It's been a while since we had a post in the Prescriptivist Poppycock category. This example is more a case of badly-researched etymology, but we'll take what we can get, courtesy of Florent Moncomble, who writes:

In the May update of the prescriptive « Dire, ne pas dire » section of their website, in a post condemning « carottes fries » (for « carottes frites », as the past participle should go), they contend that the ‘French’ of ‘French fries’ has nothing to do with France but comes from an ‘Old Irish verb’ meaning ‘to mince’.

Sensing that that was absolute nonsense, I debunked the assertion on X in a thread that you can find here.

Specialists in Old Irish on X have joined in my (to remain polite) bemusement. Evidently the Immortels trusted the first page of a Google search and did not bother to actually fact-check this (apparently popular) myth. These are the people, paid with tax money, who we trust the official dictionary of the French language with.

Read the rest of this entry »

Comments (24)

Peevable words and phrases: journey

They mostly start out clever, cute, and catchy:  e.g., "curated".  The problem is that they soon go viral, and then just never go away, even after they have become banal and overused, as with "perfect storm":

I'm campaigning to have "perfect storm" added to peeve polls in the future. As in "at the end of the day it was a perfect storm." It's not unheard of for a book title to turn into a catch[22]phrase, and maybe perfect storm will become a permanent part of the language, but it smacks of fad to me. I feel like I hear it at least three times a week in NPR interviews.

[Comment by Dick Margulis to "'Annoying word' poll results: Whatever!" (10/9/09)]

That was 2009, but "perfect storm" is still with us, and so is "curated", which begins to appear with increasing frequency in the early 70s and really takes off in the 80s.

Now we're facing a veritable onslaught from "journey":

Read the rest of this entry »

Comments (21)

Political bias in economics

Zubin Jelveh, Bruce Kogut, and Suresh Naidu, "Political language in economics", The Economic Journal:

Abstract: Does academic writing in economics reflect the political orientation of economists? We use machine learning to measure partisanship in academic economics articles. We predict observed political behavior of a subset of economists using the phrases from their academic articles, show good out-of-sample predictive accuracy, and then predict partisanship for all economists. We then use these predictions to examine patterns of political language in economics. We estimate journal-specific effects on predicted ideology, controlling for author and year fixed effects, that accord with existing survey-based measures. We show considerable sorting of economists into fields of research by predicted partisanship. We also show that partisanship is detectable even within fields, even across those estimating the same theoretical parameter. Using policy-relevant parameters collected from previous meta-analyses, we then show that imputed partisanship is correlated with estimated parameters, such that the implied policy prescription is consistent with partisan leaning. For example, we find that going from the most left-wing authored estimate of the taxable top income elasticity to the most right-wing authored estimate decreases the optimal tax rate from 84% to 58%.

Read the rest of this entry »

Comments (4)

Nonword literacy

Upon first hearing, the very idea sounded preposterous, but when I searched the internet, I found it all over the place as "nonword reading / repetition", "nonsense words", "non word phonics / fluency", "non-word decoding", "pseudowords", etc.  In other words (!), it's a real thing, and lots of people take the concept seriously as a supposedly useful device in reading theory and practice, justifying it thus:

"as a tool to assess phonetic decoding ability" (here)

"contribute to children's ability to learn new words"  (here)

"a true indicator of the alphabetic principle and basic phonics" (here)

etc., etc., etc.

I would not have taken the topic of nonwords seriously and posted on it, had not AntC pointed out that it is actually being applied in the classroom in New Zealand.

Read the rest of this entry »

Comments (45)

Brown Revisited

A couple of months ago, I told you about a project to recreate the Supreme Court oral arguments associated with Brown v. Board of Education ("Spontaneous SCOTUS", 3/2/2024):

Years ago, Jerry Goldman (then at Northwestern) created the website as

 a multimedia archive devoted to making the Supreme Court of the United States accessible to everyone. It is the most complete and authoritative source for all of the Court’s audio since the installation of a recording system in October 1955. Oyez offers transcript-synchronized and searchable audio, plain-English case summaries, illustrated decision information, and full-text Supreme Court opinions

He rescued decades of tapes and transcripts from the National Archives, digitized and improved them, and arranged the website's interactive presentations of the available recordings. Jiahong Yuan and I played a role, by devising and validating a program to identify which justice was speaking when (See "Speaker Identification on the Scotus Corpus", 2008).

More recently, Jerry has inspired an effort to recreate oral arguments from famous cases that took place before the recording system was installed, starting with Brown v. Board of Education. Rejecting the idea of producing "deep fakes" using the existing transcripts and extant recordings of the justices involved, he and his colleagues decided to create what we might call "shallow fakes", where actors will perform (selections from) the transcripts, and a voice morphing system will then be used to make their recordings sound like the target speakers. The recreated clips will be embedded in explanatory material.

All the scripts have been written, and in a few months, you'll be able to hear the results — which I expect will be terrific.

And here it is, at!

Read the rest of this entry »