Archive for Artificial intelligence

AI copy editing crapping?

"Evolution journal editors resign en masse to protest Elsevier changes", Retraction Watch 12/27/2024:

All but one member of the editorial board of the Journal of Human Evolution (JHE), an Elsevier title, have resigned, saying the “sustained actions of Elsevier are fundamentally incompatible with the ethos of the journal and preclude maintaining the quality and integrity fundamental to JHE’s success.” […]

Among other moves, according to the statement, Elsevier “eliminated support for a copy editor and special issues editor,” which they interpreted as saying “editors should not be paying attention to language, grammar, readability, consistency, or accuracy of proper nomenclature or formatting.” The editors say the publisher “frequently introduces errors during production that were not present in the accepted manuscript:”

"In fall of 2023, for example, without consulting or informing the editors, Elsevier initiated the use of AI during production, creating article proofs devoid of capitalization of all proper nouns (e.g., formally recognized epochs, site names, countries, cities, genera, etc.) as well italics for genera and species. These AI changes reversed the accepted versions of papers that had already been properly formatted by the handling editors. This was highly embarrassing for the journal and resolution took six months and was achieved only through the persistent efforts of the editors. AI processing continues to be used and regularly reformats submitted manuscripts to change meaning and formatting and require extensive author and editor oversight during proof stage."

Read the rest of this entry »

Comments (12)

Grok (mis-)counting letters again

In a comment on "AI counting again" (12/20/2024), Matt F asked "Given the misspelling of ‘When’, I wonder how many ‘h’s the software would find in that sentence."

So I tried it — and the results are even more spectacularly wrong than Grok's pitiful attempt to count instances of 'e', where the correct count is 50 but Grok answered "21".

Read the rest of this entry »

Comments (14)

AI counting again

Following up on "AIs on Rs in 'strawberry'" (8/24/2024), "Can Google AI count?" (9/21/2024), "The 'Letter Equity Task Force'" (12/5/2024), etc., I thought I'd try some of the great new AI systems accessible online. The conclusion: they still can't count, though they can do lots of other clever things.

Read the rest of this entry »

Comments (4)

The Knowledge

Chatting with my London cabbie on a longish ride, I was intrigued by how he frequently referred to "the Knowledge".  He did so respectfully and reverently, as though it were a sacred catechism he had mastered after years of diligent study.  Even though he was speaking, it always sounded as though it came with a capital letter at the beginning.  And rightly so, because it is holy writ for London cabbies.

Read the rest of this entry »

Comments (24)

More AI satire

Read the rest of this entry »

Comments (4)

The "Letter Equity Task Force"

Previous LLOG coverage: "AI on Rs in 'strawberry'", 8/28/2024; "'The cosmic jam from whence it came'", 9/26/2024.

Current satire: Alberto Romero, "Report: OpenAI Spends Millions a Year Miscounting the R’s in ‘Strawberry’", Medium 11/22/2024.

OpenAI, the most talked-about tech start-up of the decade, convened an emergency company-wide meeting Tuesday to address what executives are calling “the single greatest existential challenge facing artificial intelligence today”: Why can’t their models count the R’s in strawberry?

The controversy began shortly after the release of GPT-4, on March 2023, when users on Reddit and Twitter discovered the model’s inability to count the R’s in strawberry. The responses varied from inaccurate guesses to cryptic replies like, “More R’s than you can handle.” In one particularly unhinged moment, the chatbot signed off with, “Call me Sydney. That’s all you need to know.”

Read the rest of this entry »

Comments (19)

AI Overview (sometimes) admits that it doesn't have an answer

When I first encountered AI Overview (AIO) about half a year ago, I was amazed by how it would whirl and swirl while searching for an answer to whatever query I had entered into the Google search engine.  It would usually find a helpful answer within a second.

As the months passed, the response time became more rapid (usually instantaneous), the answers better organized and almost always helpful, but sometimes AIO would simply not answer.

About a week ago, I was stunned when occasionally AIO — after thinking for a split second — would declare that it didn't have an answer for what I had asked about.

Read the rest of this entry »

Comments (16)

The humanities as preparation for the End Times

Comments (13)

Searle's "Chinese room" and the enigma of understanding

In this comment to "'Neutrino Evidence Revisited (AI Debates)' | Is Mozart's K297b authentic?" (11/13/24), I questioned whether John Searle's "Chinese room" argument was intelligently designed and encouraged those who encounter it to reflect on what it did — and did not — demonstrate.

In the same comment, I also queried the meaning of "understand" and its synonyms ("comprehend", and so forth).

Both the "Chinese room" and "understanding" had been raised by skeptics of AI, so here I'm treating them together.

Read the rest of this entry »

Comments (21)

"Neutrino Evidence Revisited (AI Debates)" | Is Mozart's K297b authentic?

[This is a guest post by Conal Boyce]

Recently I watched a video posted by Alexander Unzicker, a no-nonsense physicist who often criticizes Big Science (along the same lines as Sabine Hossenfelder — my hero). But in this case (link below) I was surprised to see Unzicker play back a conversation between himself and ChatGPT, on the subject of the original discovery of neutrinos — where the onslaught of background noise demands very strict screening procedures and care not to show "confirmation bias" (because one wants so badly to be the first one to actually detect a neutrino, thirty years after Pauli predicted them). It is a LONG conversation, between Unzicker and ChatGPT, perfectly coherent and informative, one that I found very pleasant to listen to (he uses the audio option: female voice interleaved with his voice).
 
[VHM note: This conversation between Unzicker and GPT is absolutely astonishing.  Despite the dense technicality of the subject, GPT understands well what he is saying and replies accordingly and naturally.]

Read the rest of this entry »

Comments (51)

Nazca lines

For basic facts, see below.

Thanks to AI and our Japanese colleagues, the study of Peru's mysterious Nazca lines has made a quantum leap forward.

AI Revealed a New Trove of Massive Ancient Symbols
The 2,000-year-old geoglyphs offer clues to ancient Nazca people and their rituals
By Aylin Woodward, Science Shorts, WSJ (Nov. 6, 2024)

Anthropologists have spent decades documenting a mysterious collection of symbols etched into the Peruvian desert, depicting everything from human decapitation and domesticated animals to knife-wielding orcas.

In the past century or so, 430 of these geoglyphs have been found. Now, an analysis using artificial intelligence has nearly doubled the number in just six months.

Constructed primarily by ancient South American people known as the Nazca millennia ago, the geoglyphs, which can be as long as a football field, are concentrated on a roughly 150-mile-square area called the Nazca Pampa. The Nazca people created the geoglyphs in an area unsuitable for farming, removing the black stones that pepper the desert to reveal a layer of white sand beneath. The contrast between tones yielded the geoglyphs.

Much of their mystery lies in how challenging it is to spot them.

“These geoglyphs have been around for at least 2,000 years, during which time dust has accumulated on the white lines and areas, causing their colors to fade,” said Masato Sakai, a professor of anthropology at Yamagata University in Japan and lead author of a study published in the journal Proceedings of the National Academy of Sciences detailing the new discoveries.

The symbols fall into two categories. Larger figurative geoglyphs, known as the Nazca Lines, average about 300 feet in length, Sakai said, while smaller ones, akin to marble reliefs, average just 30 feet.

Read the rest of this entry »

Comments (8)

Psychotic Whisper

Whisper is a widely-used speech-to-text system from OpenAI — and it turns out that generative AI's hallucination problem afflicts Whisper to a surprisingly serious extent, as documented by Allison Koenecke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, and Mona Sloane,"Careless Whisper: Speech-to-Text Hallucination Harms", In The 2024 ACM Conference on Fairness, Accountability, and Transparency,  2024:

Abstract: Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI’s Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper’s transcriptions were highly accurate, we find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations—a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.

Read the rest of this entry »

Comments (12)

AI Hyperauthorship

This paper's content is interesting — Mirzadeh, Iman, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models." arXiv preprint arXiv:2410.05229 (2024). In short, the authors found that small changes in Grade-School Mathematics benchmark questions, like substituting different numerical values or adding irrelevant clauses, caused all the tested LLMs to do worse. You should read the whole thing for the details, to which I'll return another time.

Read the rest of this entry »

Comments (3)