Archive for Artificial intelligence

Reading Old Turkic runiform inscriptions with the aid of 3D simulation

"Augmenting parametric data synthesis with 3D simulation for OCR on Old Turkic runiform inscriptions: A case study of the Kül Tegin inscription", Mehmet Oğuz Derin and Erdem Uçar, Journal of Old Turkic Studies (7/21/24)

Abstract

Optical character recognition for historical scripts like Old Turkic runiform script poses significant challenges due to the need for abundant annotated data and varying writing styles, materials, and degradations. The paper proposes a novel data synthesis pipeline that augments parametric generation with 3D rendering to build realistic and diverse training data for Old Turkic runiform script grapheme classification. Our approach synthesizes distance field variations of graphemes, applies parametric randomization, and renders them in simulated 3D scenes with varying textures, lighting, and environments. We train a Vision Transformer model on the synthesized data and evaluate its performance on the Kül Tegin inscription photographs. Experimental results demonstrate the effectiveness of our approach, with the model achieving high accuracy without seeing any real-world data during training. We finally discuss avenues for future research. Our work provides a promising direction to overcome data scarcity in Old Turkic runiform script.

Read the rest of this entry »

Comments (1)

Government dampers on AI in the PRC, part 2

"China deploys censors to create socialist AI:  Large language models are being tested by officials to ensure their systems ‘embody core socialist values’", by Ryan McMorrow and Tina Hu in Beijing, Financial Times (July 17 2024)

Chinese government officials are testing artificial intelligence companies’ large language models to ensure their systems “embody core socialist values”, in the latest expansion of the country’s censorship regime.

The Cyberspace Administration of China (CAC), a powerful internet overseer, has forced large tech companies and AI start-ups including ByteDance, Alibaba, Moonshot and 01.AI to take part in a mandatory government review of their AI models, according to multiple people involved in the process.

The effort involves batch-testing an LLM’s responses to a litany of questions, according to those with knowledge of the process, with many of them related to China’s political sensitivities and its President Xi Jinping.

Read the rest of this entry »

Comments (9)

Government dampers on AI in the PRC

"China Puts Power of State Behind AI—and Risks Strangling It:  Government support helps China’s generative AI companies gain ground on U.S. competitors, but political controls threaten to weigh them down", by Lia Lin, WSJ (7/16/24)

Most generative AI models in China need to obtain the approval of the Cyberspace Administration of China before being released to the public. The internet regulator requires companies to prepare between 20,000 and 70,000 questions designed to test whether the models produce safe answers, according to people familiar with the matter. Companies must also submit a data set of 5,000 to 10,000 questions that the model will decline to answer, roughly half of which relate to political ideology and criticism of the Communist Party.

Generative AI operators have to halt services to users who ask improper questions three consecutive times or five times total in a single day.

Read the rest of this entry »

Comments (6)

OpenAI blocks API traffic from China

Screenshot of emails circulating on social media:

   

Read the rest of this entry »

Comments

Singing Presidents (a triumph of Chinese AI)

Read the rest of this entry »

Comments (1)

Are LLMs writing PubMed articles?

Kyle Orland, "The telltale words that could identify generative AI text", ars technica 7/1/2024

In a pre-print paper posted earlier this month, four researchers from Germany's University of Tubingen and Northwestern University said they were inspired by studies that measured the impact of the COVID-19 pandemic by looking at excess deaths compared to the recent past. By taking a similar look at "excess word usage" after LLM writing tools became widely available in late 2022, the researchers found that "the appearance of LLMs led to an abrupt increase in the frequency of certain style words" that was "unprecedented in both quality and quantity."

To measure these vocabulary changes, the researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024, tracking the relative frequency of each word as it appeared across each year. They then compared the expected frequency of those words (based on the pre-2023 trendline) to the actual frequency of those words in abstracts from 2023 and 2024, when LLMs were in widespread use.

The results found a number of words that were extremely uncommon in these scientific abstracts before 2023 that suddenly surged in popularity after LLMs were introduced. The word "delves," for instance, shows up in 25 times as many 2024 papers as the pre-LLM trend would expect; words like "showcasing" and "underscores" increased in usage by nine times as well. Other previously common words became notably more common in post-LLM abstracts: the frequency of "potential" increased 4.1 percentage points; "findings" by 2.7 percentage points; and "crucial" by 2.6 percentage points, for instance.

Read the rest of this entry »

Comments (12)

Astonishing new Google Translate, with the help of generative AI

Google Translate adds Cantonese support, thanks to AI advancement:  “Cantonese has long been one of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it’s tricky to find data and train models,” Google said.  By Tom Grundy, Hong Kong Free Press (June 30, 2024).

The Google Translate app has been expanded to include Cantonese, thanks to generative Artificial Intelligence (AI) advancements.

In 2022, Google began using Zero-Shot Machine Translation to expand its pool of supported languages. The machine learning model learns to translate into another language without ever seeing an example, Google said in a Thursday blog post. Now it is using AI to expand the number of supported languages.

It added 110 new languages this week, in its largest-ever expansion, thanks to its PaLM 2 large language model.

Read the rest of this entry »

Comments (9)

Stochastic parrots extended

Philip Resnik, "Large Language Models are Biased Because They Are Large Language Models", arXiv.org 6/19/2024:

This paper's primary goal is to provoke thoughtful discussion about the relationship between bias and fundamental properties of large language models. We do this by seeking to convince the reader that harmful biases are an inevitable consequence arising from the design of any large language model as LLMs are currently formulated. To the extent that this is true, it suggests that the problem of harmful bias cannot be properly addressed without a serious reconsideration of AI driven by LLMs, going back to the foundational assumptions underlying their design.

Read the rest of this entry »

Comments (33)

Unknown language #19

Inscribed sandstone known as the "Singapore Stone", Singapore, 10th–14th century:


Collection of the National Museum of Singapore

(Source; also includes an animated photo that can be rotated 360º in any direction and enlarged or reduced to any size)

Read the rest of this entry »

Comments (7)

AI plagiarism again

Along with concerns about hallucinations and learned bias, there's increasing evidence that generative AI systems sometimes commit what would obviously be plagiarim if a human did it. One particularly striking example is discussed in a recent article by Randall Lane, editor of Forbes Magazine: "Why Perplexity’s Cynical Theft Represents Everything That Could Go Wrong With AI", 6/11/2024:

For most of this year, two of our best journalists, Sarah Emerson and Rich Nieva, have been reporting on former Google CEO Eric Schmidt’s secretive drone project, including a June 6 story detailing the company’s ongoing testing in Silicon Valley suburb Menlo Park as well as the frontlines of Ukraine. The next day, Perplexity published its own “story,” utilizing a new tool they’ve developed that was extremely similar to Forbes’ proprietary article. Not just summarizing (lots of people do that), but with eerily similar wording, some entirely lifted fragments — and even an illustration from one of Forbes’ previous stories on Schmidt. More egregiously, the post, which looked and read like a piece of journalism, didn’t mention Forbes at all, other than a line at the bottom of every few paragraphs that mentioned “sources,” and a very small icon that looked to be the “F” from the Forbes logo – if you squinted. It also gave similar weight to a “second source” — which was just a summary of the Forbes story from another publication.

Read the rest of this entry »

Comments (2)

ChatGPT is bullshit

So say Michael Townsen Hicks, James Humphries & Joe Slater — "ChatGPT is bullshit", Ethics and Information Technology 2024.

The background is Harry Frankfurt's philosophical definition of the term in his essay "On Bullshit":

What bullshit essentially misrepresents is neither the state of affairs to which it refers nor the beliefs of the speaker concerning that state of affairs. Those are what lies misrepresent, by virtue of being false. Since bullshit need not be false, it differs from lies in its misrepresentational intent. The bullshitter may not deceive us, or even intend to do so, either about the facts or about what he takes the facts to be. What he does necessarily attempt to deceive us about is his enterprise. His only indispensably distinctive characteristic is that in a certain way he misrepresents what he is up to.

This is the crux of the distinction between him and the liar. Both he and the liar represent themselves falsely as endeavoring to communicate the truth. The success of each depends upon deceiving us about that. But the fact about himself that the liar hides is that he is attempting to lead us away from a correct apprehension of reality; we are not to know that he wants us to believe something he supposes to be false. The fact about himself that the bullshitter hides, on the other hand, is that the truth-values of his statements are of no central interest to him; what we are not to understand is that his intention is neither to report the truth nor to conceal it. This does not mean that his speech is anarchically impulsive, but that the motive guiding and controlling it is unconcerned with how the things about which he speaks truly are.

Read the rest of this entry »

Comments (20)

Povinelli et al. on "Reinterpretation"

In yesterday's "AI deception?" post, I proposed that we ought to apply to AI an analogy to the philosophical evaluation of "theory of mind" issues in animals. And one of the clearest presentations of that evaluation is in Daniel Povinelli,  Jesse Bering, and Steve Giambrone, "Toward a science of other minds: Escaping the argument by analogy" (2000). You should read the whole thing — and maybe look through some of the many works that have cited it. But today I'll just present some illustrative quoted passages.

Read the rest of this entry »

Comments (3)

AI deception?

Noor Al-Sibai, "AI Systems Are Learning to Lie and Deceive, Scientists Find", Futurism 6/7/2024:

AI models are, apparently, getting better at lying on purpose.

Two recent studies — one published this week in the journal PNAS and the other last month in the journal Patterns — reveal some jarring findings about large language models (LLMs) and their ability to lie to or deceive human observers on purpose.

Read the rest of this entry »

Comments (11)