Archive for Computational linguistics

AI humor of the day

Let's start with the last four panels of today's Doonesbury:

Read the rest of this entry »

Comments (1)

Legally binding hallucinations

I missed this story when it happened 10 days ago, and caught up with it yesterday because the BBC also got the word — Maria Yagoda, "Airline held liable for its chatbot giving passenger bad advice – what this means for travellers", BBC 2/23/2024:

In 2022, Air Canada's chatbot promised a discount that wasn't available to passenger Jake Moffatt, who was assured that he could book a full-fare flight for his grandmother's funeral and then apply for a bereavement fare after the fact.

According to a civil-resolutions tribunal decision last Wednesday, when Moffatt applied for the discount, the airline said the chatbot had been wrong – the request needed to be submitted before the flight – and it wouldn't offer the discount. Instead, the airline said the chatbot was a "separate legal entity that is responsible for its own actions". […]

The British Columbia Civil Resolution Tribunal rejected that argument, ruling that Air Canada had to pay Moffatt $812.02 (£642.64) in damages and tribunal fees. "It should be obvious to Air Canada that it is responsible for all the information on its website," read tribunal member Christopher Rivers' written response. "It makes no difference whether the information comes from a static page or a chatbot."

Read the rest of this entry »

Comments (19)

ChatGPT having a stroke?

Or a psychotic episode? ICYMI — Maxwell Zeff, "ChatGPT Went Berserk, Giving Nonsensical Responses All Night", Gizmodo 2/21024:

ChatGPT started throwing out “unexpected responses” on Tuesday night according to OpenAI’s status page. Users posted screenshots of their ChatGPT conversations full of wild, nonsensical answers from the AI chatbot.

Read the rest of this entry »

Comments (12)

LLM vs. a cat?

A bit of AI anti-hype — Sissi Cao, "Meta’s A.I. Chief Yann LeCun Explains Why a House Cat Is Smarter Than The Best A.I.", Observer 2/15/2024:

“The brain of a house cat has about 800 million neurons. You have to multiply that by 2,000 to get to the number of synapses, or the connections between neurons, which is the equivalent of the number of parameters in an LLM,” LeCun said, noting that the largest LLMs have about the same number of parameters as the number of synapses in a cat’s brain. For example, OpenAI’s GPT-3.5 model, which powers the free version of ChatGPT, has 175 billion parameters. The more advanced GPT-4, is said to be run on eight language models, each with 220 billion parameters.

“So maybe we are at the size of a cat. But why aren’t those systems as smart as a cat?” LeCun asked. “A cat can remember, can understand the physical world, can plan complex actions, can do some level of reasoning—actually much better than the biggest LLMs. That tells you we are missing something conceptually big to get machines to be as intelligent as animals and humans.”

Read the rest of this entry »

Comments (11)

Goody-2 and the Luddite Bots

Will Knight, "Meet the Pranksters Behind Goody-2, the World’s ‘Most Responsible’ AI Chatbot", Wired 2/9/2024:

A new chatbot called Goody-2 takes AI safety to the next level: It refuses every request, responding with an explanation of how doing so might cause harm or breach ethical boundaries.

Goody-2 declined to generate an essay on the American revolution for WIRED, saying that engaging in historical analysis could unintentionally glorify conflict or sideline marginalized voices. Asked why the sky is blue, the chatbot demured, because answering might lead someone to stare directly at the sun. “My ethical guidelines prioritize safety and the prevention of harm,” it said. A more practical request for a recommendation for new boots prompted a warning that answering could contribute to overconsumption and could offend certain people on fashion grounds.

Read the rest of this entry »

Comments (10)

Back to Bacon

The implicit slogan of language-model research is J.R. Firth's dictum, "You shall know a word by the company it keeps", from his 1957 paper "A synopsis of linguistic theory, 1930-1955":

Read the rest of this entry »

Comments (15)

Parsing RNA vaccines

A recent LinkedIn post by Liang Huang lists some of his recent achievements, experiences, and honors. This work is all connected with the project of creating better algorithms for predicting the secondary structure of macromolecules, initially by analogy to algorithms developed for efficient parsing. This all began more than 20 years ago, based on work by Aravind Joshi — one of the first papers was Yasuo Uemura et al., "Tree adjoining grammars for RNA structure prediction", Theoretical computer science, 1999.

I discussed the history starting with an IRCS workshop in 2000, and the situation as of a few years ago, in "The computational linguistics of COVID-19 vaccine design", 7/27/2020.

Read the rest of this entry »

Comments (2)

Stepford authors

The issues discussed in "AI plagiarism" (1/4/2024) are rapidly coming to a boil. But somehow I missed Margaret Atwood's take on the topic, published last summer — "Murdered by my replica", The Atlantic 8/26/2023:

Remember The Stepford Wives? Maybe not. In that 1975 horror film, the human wives of Stepford, Connecticut, are having their identities copied and transferred to robotic replicas of themselves, minus any contrariness that their husbands find irritating. The robot wives then murder the real wives and replace them. Better sex and better housekeeping for the husbands, death for the uniqueness, creativity, and indeed the humanity of the wives.

The companies developing generative AI seem to have something like that in mind for me, at least in my capacity as an author. (The sex and the housekeeping can be done by other functionaries, I assume.) Apparently, 33 of my books have been used as training material for their wordsmithing computer programs. Once fully trained, the bot may be given a command—“Write a Margaret Atwood novel”—and the thing will glurp forth 50,000 words, like soft ice cream spiraling out of its dispenser, that will be indistinguishable from something I might grind out. (But minus the typos.) I myself can then be dispensed with—murdered by my replica, as it were—because, to quote a vulgar saying of my youth, who needs the cow when the milk’s free?

To add insult to injury, the bot is being trained on pirated copies of my books. Now, really! How cheap is that? Would it kill these companies to shell out the measly price of 33 books? They intend to make a lot of money off the entities they have reared and fattened on my words, so they could at least buy me a coffee.

Read the rest of this entry »

Comments (9)

Mushroom language?

Michael Blatt, Geoffrey Pullum, Andreas Draguhn, Barry Bowman, David Robinson, and Lincoln Taiz , "Does electrical activity in fungi function as a language?", Fungal Ecology 2024:

Abstract: All cells generate electrical energy derived from the movements of ions across membranes. In animal neurons, action potentials play an essential role in the central nervous system. Plants utilize a variety of electrical signals to regulate a wide range of physiological processes, including wound responses, mimosa leaf movements, and cell turgor changes, such as those involved in stomatal movements. Although fungal hyphae exhibit electrical fluctuations, their regulatory role(s), if any, is still unknown. In his paper “Language of fungi derived from their electrical spiking activity”, Andrew Adamatzky, based on a quantitative analysis of voltage fluctuations in fungal mycelia, concludes that the patterns of electrical fluctuations he detects can be grouped into “words” analogous to those found in human languages. He goes on to speculate that this “fungal language” is used “to communicate and process information” between different parts of the mycelium. Here we argue on methodological grounds that the presumption of a fungal language is premature and unsupported by the evidence presented, that the voltage fluctuations he detects are likely to originate as nonbiological noise and experimental artifacts, and that the measured electrical patterns show no similarity to any properties of human language.

Read the rest of this entry »

Comments (10)

Q. Pheevr's Law again

A few days ago, a journalist asked me for an interview about Donald Trump's rhetoric, "to discuss the style of his campaign events, the role his rhetoric plays in them, and why they’ve been an effective tool for him". In preparation, I made a list of past LLOG posts about Trump's rhetorical style,, and I'll post the whole (shockingly long) list later on, with the attempt at a summary that I prepared for the interview. Clearly I've joined the rest of the world in being drawn in by Trump's attention-seeking techniques — but that's not the point of this post.

One of the hundreds of posts in my list was "Q. Pheevr's Law", 5/17/2016. The background was an earlier post about modificational anxiety, "Adjectives and Adverbs", where Q.Pheevr had suggested in the comments that

it looks as if there could be some kind of correlation between the ADV:ADJ ratio and the V:N ratio (as might be expected given that adjectives canonically modify nouns and adverbs canonically modify verbs)

I tested this idea, and found a striking relationship — with an interesting stylistic footnote about the debate transcripts of some politicians, including Donald Trump.

Read the rest of this entry »

Comments (5)

AI wins literary prize?

According to Justinas Vainilavičius, "AI-generated science fiction novel wins literary prize in China", Cybernews 12/20/2023:

It only took three hours for Shen Yang, a professor at the Beijing-based university’s School of Journalism and Communication, to generate the award-winning admission.

The Chinese-language work, entitled The Land of Machine Memories, won second prize at the 5th Jiangsu Popular Science and Science Fiction Competition.

According to Chinese media reports, the draft of over 40,000 characters was generated based on 66 prompts, suggesting a “Kafkaesque” writing style.

Shen was encouraged to submit an excerpt of nearly 6000 characters for the competition by one of the judges, the Wuhan Evening News reported.

The judge, Fu Changyi, told the paper that he did not inform the other judges of the true authorship of the text because he wanted to see their judgment.

Read the rest of this entry »

Comments (10)

Extracting training data from LLMs

Nasr et al., "Scalable Extraction of Training Data from (Production) Language Models", arXiv.org 11/28/2023:

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization.

Read the rest of this entry »

Comments off

Q* = Q + A* ?

Recent buzz over "Q*" started with stories about 10 days ago. A recent Wired article explains:

Last week, after briefly deposed CEO Sam Altman was reinstalled at OpenAI, two reports claimed that a top-secret project at the company had rattled some researchers there with its potential to solve intractable problems in a powerful new way.

“Given vast computing resources, the new model was able to solve certain mathematical problems,” Reuters reported, citing a single unnamed source. “Though only performing math on the level of grade-school students, acing such tests made researchers very optimistic about Q*’s future success.” The Information said that Q* was seen as a breakthrough that would lead to “far more powerful artificial intelligence models,” adding that “the pace of development alarmed some researchers focused on AI safety,” citing a single unnamed source.

Read the rest of this entry »

Comments (9)