Archive for Computational linguistics


Knowing how much I like to invent terms for things that have no name ("topolect", "character amnesia", etc.), and needing a word for the parlance produced by ChatGPT-4 and kindred AI chatbots, Conal Boyce asked me to coin a term for it.  I instantly obliged him by coming up with "pablumese" to designate the sort of language that is unremittingly neutral and takes no stance on any subject or topic it addresses.

Conal liked my invention and responded:

Here's one of the problems with ChatGPT and its brethren: Not only does it spew what Victor calls 'pablumese' but for technical questions it then mixes its pablumese with quantitative nonsense, creating a truly creepy kind of output.

I was curious to see how it would handle the question of how many copper atoms fit into the cross-section of a typical copper wire. It responded in a way that made it sound very knowledgeable, breaking everything down into tiny (sometimes condescending) steps, and yet, at the very end of its perfect logic, it botched its answer, because it was unable to do a conversion between millimeters and picometers correctly.

But here's the kicker: What makes this stuff maximally odious is that the creeps who design it will succeed in taking over the world anyway, because this week "version 4 is astonishingly better than the beta ChatGPT!!!" and version 5 next week will be astonishingly better than…. etc. etc. until they've improved it enough that it really will threaten the jobs of 3/4 of the human race. It must be an absolutely sickening time to be a young person, trying to plan one's career.

Read the rest of this entry »

Comments (25)

The mind of artificial intelligence

Sean Carroll's Preposterous Universe Podcast #230

Raphaël Millière on How Artificial Intelligence Thinks, March 20, 2023 / Philosophy, Technology, Thinking / Comments    

Includes transcript of the two hour podcast.

Welcome to another episode of Sean Carroll's Mindscape. Today, we're joined by Raphaël Millière, a philosopher and cognitive scientist at Columbia University. We'll be exploring the fascinating topic of how artificial intelligence thinks and processes information. As AI becomes increasingly prevalent in our daily lives, it's important to understand the mechanisms behind its decision-making processes. What are the algorithms and models that underpin AI, and how do they differ from human thought processes? How do machines learn from data, and what are the limitations of this learning? These are just some of the questions we'll be exploring in this episode. Raphaël will be sharing insights from his work in cognitive science, and discussing the latest developments in this rapidly evolving field. So join us as we dive into the mind of artificial intelligence and explore how it thinks.

[The above introduction was artificially generated by ChatGPT.]

Read the rest of this entry »

Comments (5)

ChatGPT-4: threat or boon to the Great Firewall?

"The practical value of LLMs is high enough that it will induce Chinese to seek out the best systems, and they will not be censored by China.”

"Yes, the Chinese Great Firewall will be collapsing"

by  Tyler Cowen Marginal Revolution (March 21, 2023)

Something that the PRC censors had not predicted:

As framed from China:

Fang Bingxing, considered the father of China’s Great Firewall, has raised concerns over GPT-4, warning that it could lead to an “information cocoon” as the generative artificial intelligence (AI) service can provide answers to everything.

Fang said the rise of generative AI tools like ChatGPT, developed by Microsoft-backed OpenAI and now released as the more powerful ChatGPT-4 version, pose a big challenge to governments around the world, according to an interview published on Thursday by Red Star News, a media affiliate to state-backed Chengdu Economic Daily.

“People’s perspectives can be manipulated as they seek all kinds of answers from AI,” he was quoted as saying.

Fang, a computer scientist and former government official, is widely considered the chief designer of China’s notorious internet censorship and surveillance system. He played a key role in creating and developing the Great Firewall, a sophisticated system of internet filters and blocks that allows the Chinese government to control what its citizens can access online.

Comments (4)

This is the 4th time I've gotten Jack and his beanstalk

Bill Benzon shares the response he got from ChatGPT to the prompt, "Tell me a story."

Read the rest of this entry »

Comments (30)

ChatGPT writes VHM

[This is a guest post by ChatGPT]

Write a Language Log blog post by Victor H. Mair

Title: "The Wonders of the Chinese Language"

As one of the world's oldest and most complex languages, Chinese has captivated linguists and language enthusiasts for centuries. From its unique writing system to its vast vocabulary and intricate grammar, Chinese is a fascinating and challenging language that continues to surprise and delight those who study it.

Perhaps the most distinctive feature of Chinese is its writing system. Chinese characters are visual symbols that convey meaning, and they are used in both written and spoken language. Unlike the phonetic alphabets used in many other languages, Chinese characters do not correspond to specific sounds, but rather to concepts or ideas. This makes the learning process much more complex, as students must memorize thousands of characters and learn their meanings and pronunciations.

Read the rest of this entry »

Comments (22)

Vignettes of quality data impoverishment in the world of PRC AI

Some snippets:

Limited data sets a hurdle as China plays catch-up to ChatGPT

Lack of high-quality Chinese texts on Internet a barrier to training AI models.

Ryan McMorrow, Nian Liu, Eleanor Olcott, and Madhumita Murgia, FT, Ars Technica (2/21/23)

Baidu struggled with its previous attempt at a chatbot, known as Plato, which analysts said could not even answer a simple question such as: “When is Alibaba co-founder Jack Ma’s birthday?”

Analysts point to the lack of high-quality Chinese-language text on the Internet and in other data sets as a barrier for training AI software.

GPT, the program underlying ChatGPT, sucked in hundreds of thousands of English academic papers, news articles, books, and social media posts to learn the patterns that form language. Meanwhile, Baidu’s Ernie has been trained primarily on Chinese-language data as well as English-language data from Wikipedia and Reddit.

Read the rest of this entry »

Comments (11)

Uh-oh! DeepL in the classroom; it's already here

Yesterday in my Classical Chinese class, we were reading Ouyang Xiu's (1007-1072) "Discussion on 'Biographies of Eunuchs'" in the New History of the Five Dynasties (written 1036-1039, published 1072).  Here's the relevant passage:

Móu zhī ér bùkě wéi. Wéi zhī ér bùkě chéng. Zhì qí shén zé jù shāng ér liǎng bài. ——“Xīn wǔdài shǐ huàn zhě chuán lùn”

謀之而不可為。為之而不可成。至其甚則俱傷而兩敗。 ——《新五代史宦者傳論》 

[Because of the special circumstances of this post, I will not adhere to my usual custom of providing Pinyin Romanization, Hanzi transcription, and English translation all three together.]

Read the rest of this entry »

Comments (8)

ChatGPT: Theme and Variations

[This is a guest post by Conal Boyce]

Here I’ll recount some recent exchanges I had with ChatGPT. Given the scope of ChatGPT, and the fact that it’s in a self‑described intermediate state, our various impressions of it as of February 2023 must be like those of the three blind men examining an elephant — except the elephant is running. In the heart of the professional programmer, ChatGPT creates existential dread since it can spit out in a few seconds a page of code which would have required hours or days for him/her to write and debug — and that only after a lifetime of coding. For the rest of us, for the moment at least, it just provokes curiosity perhaps.

Read the rest of this entry »

Comments (21)

Vocalizations of wolves and justices

Tessa Koumoundouros, "Adorable Study Tests How Dogs Respond to Wild Wolf Calls – And, Yes, There's Footage", ScienceAlert 2/12/2023:

Without convenient access to phones for pens for letter-writing, wolves must rely on howls to communicate long distances. These woeful wails allow the social mammals to maintain their territories as well as keep track of and stay in synchrony with other pack members. […]

A new study exposes family dogs to wolf howls to better understand why some of our canine companions no longer seem to bother with this seemingly important form of dog communication.

Read the rest of this entry »

Comments (9)

Bing gets weird — and (maybe) why

For weeks, everyone was talking about how great the Large Language Model (LLM) ChatGPT is, or else showing that it can make serious mistakes of fact or logic. But since the alliance between OpenAI and Microsoft added (a version of) this LLM to (a version of) Bing, people have been encountering weirder issues. As Mark Frauenfelder pointed out a couple of days ago at BoingBoing, "Bing is having bizarre emotional breakdowns and there's a subreddit with examples". The cited subreddit, r/bing,  has examples going back to the start of the alliance. And today, Kevin Roose posted a long series of strikingly strange passages from his own interactions with the chatbot , "Bing's A.I. Chat: 'I Want to Be Alive", NYT 2/16/2023.

Read the rest of this entry »

Comments (24)

GLM-130B: An Open Bilingual Pre-Trained Model

Description of a General Language Model (GLM; also GLaM) project based at Tsinghua University in Beijing, but with users and collaborators around the world.

Homepage (August 4, 2022)

This prospectus is difficult for outsiders to understand because of the large number of unexplained acronyms, abbreviations, initialisms, etc. and other such participants' terminology.

GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the General Language Model (GLM) algorithm1. It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English) and exhibits the following unique features:

    • Bilingual: supports both English and Chinese.
    • Performance (EN): better than GPT-3 175B (+5.0%), OPT-175B (+6.5%), and BLOOM-176B (+13.0%) on LAMBADA and slightly better than GPT-3 175B (+0.9%) on MMLU.
    • Performance (CN): significantly better than ERNIE TITAN 3.0 260B on 7 zero-shot CLUE datasets (+24.26%) and 5 zero-shot FewCLUE datasets (+12.75%).
    • Fast Inference: supports fast inference on both SAT and FasterTransformer (up to 2.5X faster) with a single A100 server.
    • Reproducibility: all results (>30 tasks) can be easily reproduced with open-sourced code and model checkpoints.
    • Cross-Platform: supports training and inference on NVIDIA, Hygon DCU, Ascend 910, and Sunway.

The model checkpoints of GLM-130B and code for inference are publicly available at our GitHub repo. The code for pre-training and fine-tuning as well as the research paper are coming soon.

Read the rest of this entry »

Comments off

Detecting LLM-created essays?

As I observed in "Alexa down, ChatGPT up?" (12/8/2022), there's reason to fear that LLMs ("Large Language Models") like ChatGPT will force major changes in writing education, by offered a cheap and easy way to generate essay assignments. A small sample of the extensive published discussion:

Stephen Marche, "The College Essay is Dead", The Atlantic 12/6/2022
Daniel Lametti, "A.I. Could Be Great for College Essays", Slate 12/7/2022
Daniel Herman, "ChatGPT will end High School English", The Atlantic 12/9/2022
Beth McMurtrie, "AI and the Future of Undergraduate Writing: Teaching experts are concerned, but not for the reasons you think", The Chronicle of Higher Education 12/13/2022

Of course, various other forms of cheating have been common for hundreds of years, starting with simple plagiarism and ghost-written submissions. The internet has made it easier to find texts to copy or ghostwriters to hire — but modern technology has also brought us plagiarism-detection systems, which catch at least the simplest cases. Will we see effective LLM-detection software?

Read the rest of this entry »

Comments (16)

Alexa down, ChatGPT up?

Two recent developments seem to point in opposite directions. On one hand, there are R&D cutbacks as voice assistants are seen as failures. On the other hand, there's widespread enthusiasm for the impressive capabilities of ChatGPT, including suggestions that it will take over internet search (Ben Cost, "Rise of the bots: ‘Scary’ AI ChatGPT could eliminate Google within 2 years", NY Post 12/6/2022), destroy writing education (Stephen Marche, "The College Essay is Dead", The Atlantic 12/6/2022), and more.

Read the rest of this entry »

Comments (20)