Language Log

Unknown language #19

June 16, 2024 @ 10:44 pm · Filed by Victor Mair under Artificial intelligence, Decipherment, Philology

Inscribed sandstone known as the "Singapore Stone", Singapore, 10th–14th century:

Collection of the National Museum of Singapore

(Source; also includes an animated photo that can be rotated 360º in any direction and enlarged or reduced to any size)

"The Language On This 1,000 Year Old Stone is a “Glyph Breaker’s Nightmare.” Scientists Want to Use AI to Crack It", MJ Banias, The Debrief (6/14/24)

For centuries, the Singapore Stone has stood as one of Southeast Asia’s most enigmatic artifacts. Discovered in 1819 at the mouth of the Singapore River, this sandstone slab, inscribed with an unknown script, has baffled historians and linguists alike. Dubbed a “glyph breaker’s nightmare,” researchers are turning to artificial intelligence to unlock the secrets of this ancient relic.

The Singapore Stone, originally a large boulder, was partially destroyed by the British in 1843 for use in building a stone fort. The remaining fragment, now housed in the National Museum of Singapore, bears an inscription in a script that has yet to be deciphered. Scholars have speculated that the stone dates back to between the 10th and 14th centuries, possibly linked to the Majapahit empire or a South Indian rajah. Despite various efforts, the script remains a mystery, with no known parallels in other historical records.

…

“To decipher an undeciphered writing system (as well as to crack a cipher we do not have the key for), the essential requirement is to have enough text available. The Ancient Egyptian Hieroglyphs were deciphered because Champollion was a genius, but also because they were everywhere in Egypt. Same for the Cuneiform writing system and Henry Rawlinson and Edward Hincks,” Dr Francesco Perono Cacciafoco, the lead researcher on this project told The Debrief.

“With the Singapore Stone, we have just a small fragment, plus the reproductions of two other (for now) lost fragments, plus some reproductions of the whole slab, before it was blown up, with not very clear characters and entire sections missing because of erosion. Therefore, the amount we have is very little. Moreover, its text/script is unique and never found anywhere else in the world.”

In 1837, several years before the British blew the stone up, it was hand-drawn by the politician William Bland and philologist James Prinsep. Later, Sir Stamford Raffles, the British East India Company’s administrator and the “founder” of Singapore, attempted to decode it. However, when it was blown up, only three recovered fragments were graphically reproduced before being sent to India.

After the stone and its fragments languished for several centuries in a museum collection, Cacciafoco and his team are now trying to leverage the power of AI to unravel the mysterious script.

“Our work is mainly aimed, for now, at a digital restitution and/or recovery of the full text of the Stone (a possibly reasonable version of it), to have a consistent starting point for frequency analyses, comparisons, and pattern recognitions,” Perono Cacciafoco explained.

Their project, based at Nanyang Technological University and now continuing at Xi’an Jiaotong-Liverpool University in China, employs an AI tool named Read-y Grammarian. This tool is designed to analyze the stone’s text using advanced computational methods, including computer vision, artificial neural networks, and deep learning.

Perono Cacciafoco’s team knows, however, that this is not a simple task. You can’t just ask AI to decode the text. The sheer volume of data that needs to be processed is immense. The original stone, when intact, measured approximately 3 meters by 3 meters and contained 50 to 52 lines of text. However, the fragments and reproductions of the stone are insufficient for comprehensive frequency analyses and pattern recognition. There just isn’t enough stone left, nor are there any other examples out in the world for cross-referencing. So, the team needs to feed the AI model other known languages from the geographic areas it may be from, such as Kawi and Pali, which may or may not be related to the script of the Singapore Stone but can be used as reference points. It’s a lot of data points.

Read-y Grammarian, the AI tool at the heart of this project, is a “prediction machine” developed by Colin Loh, a former engineer and mathematician. The algorithm has been further refined by Dr. Perono Cacciafoco’s colleagues. To simplify the science here, it analyzes various parameters, including the shape, size, and width of the extant characters, the degree of erosion on the stone’s surface, and the length and position of repeated symbol clusters. By comparing these features with well-known writing systems, the AI tool generates possible lines of missing text, which can then be further analyzed for frequency and pattern recognition.

“This process is what Philologists do ‘by hand’ with ancient manuscripts, trying to fill the gaps in text based on the contents of a work and on the lines and writing,” Dr. Perono Cacciafoco said. “The ‘machine’ can produce a mountain of mistakes and negative results, and no text is ‘final’, but its ‘products’ are unbiased and based only on data. This is fundamental in an exercise in Crypto-linguistics like the one we are performing. We cannot allow our own ideas and postulations on the script of the Stone to compromise our analysis.”

Using high-definition 3D scans of the fragments and the drawn images of the stone, they built a digital model that the AI will use to “learn.” As it moves through the process of deciphering, each read of the stone’s text will hopefully improve its ability to make predictions as to what fits where. With a reasonable reconstruction in hand, the team can begin to try to decipher what the symbols mean.

…

Under such circumstances, in order for a decipherment to be successful, researchers must have at least some clue linking it to a known language.

One of the most significant hurdles in deciphering the Singapore Stone is the “unknown writing system” and “unknown language” dichotomy. This combination is a nightmare for crypto linguists, as it provides no phonetic or linguistic clues. However, history has shown that human ingenuity can prevail in such situations. The decipherment of Egyptian hieroglyphs by Jean-François Champollion and Linear B by Michael Ventris are prime examples of breakthroughs achieved through a combination of genius and fortunate discoveries.

Dr. Perono Cacciafoco’s team hopes for a similar “lucky match”—a recognizable name or phrase that could provide a key to unlocking the script. This could be the name of a king, a deity, or a place, which, once identified, could help decode other parts of the text. This creates a sort of rainfall moment, and suddenly, that one match leads to other matches, and a cascade occurs. This assumes the symbols on the stone aren’t completely unique.

…

AI is under scrutiny. Does it have any sense of values or of what's good and bad? Or does it simply attempt to solve problems with the data it has mastered? In the case of deciphering an unknown language like that on the Singapore Stone, will it cheat and claim it has done so when it really has not? Will it admit defeat and say, "This is beyond me. I cannot read this unless you give me more information."

Selected readings

"Unknown language #18" (6/3/24)
"(Linear) A/B testing" (5/18/19)

[Thanks to Don Keyser]

June 16, 2024 @ 10:44 pm · Filed by Victor Mair under Artificial intelligence, Decipherment, Philology

Permalink

7 Comments

Chris Button said,

June 17, 2024 @ 4:36 am

… unknown writing system …

But as noted in the article, the script is known to be Kawi (Old Javanese). And even I can clearly see the similarities with Burmese, Khmer, Mon, Pyu, etc.
Victor Mair said,

June 17, 2024 @ 5:41 am

Right you are, Chris.
David Marjanović said,

June 17, 2024 @ 11:14 am

So the text can be read just fine, it's just the language that isn't understood?

AI is under scrutiny. Does it have any sense of values or of what's good and bad?

…no, why? Did anyone claim it did?

Or does it simply attempt to solve problems with the data it has mastered?

It doesn't attempt to solve problems. It's just the next level of automatic text prediction.

In the case of deciphering an unknown language like that on the Singapore Stone, will it cheat and claim it has done so when it really has not?

It will not claim anything. It will produce a text without comment, because it isn't able to comment. And then it'll be up to the linguists to figure out if the text is plausible.

Will it admit defeat and say, "This is beyond me. I cannot read this unless you give me more information."

It is entirely incapable of doing that.
Chris Button said,

June 17, 2024 @ 8:09 pm

@ David Marjanović

Straying off topic, but it's worth bearing in mind that predictive AI and generative AI are different, and that transformer-based genAI is neither restricted to generative pre-trained transformers (GPTs) nor is it the only kind of genAI.
Benjamin E. Orsatti said,

June 18, 2024 @ 7:30 am

Chris B.,

Eh, for those of us playing along at home, what is "transformer-based genAI" (other than that it's evidently "more than meets the eye")? My understanding of AI in general is that you feed the machine enough internet until it learns how to bullshit convincingly. If that's "generative AI", then what's "predictive AI"?
Chris Button said,

June 18, 2024 @ 9:41 pm

I think you are referring to GPTs specifically. GPTs are what have grabbed the headlines.

Sort of like cryptocurrencies (like bitcoin) have done with the much broader area of cryptoassets in general.
Benjamin Ernest Orsatti said,

June 19, 2024 @ 7:49 am

Thanks, Chris B. I wish the media would be more precise about these kinds of things so us laymen can keep our LLM's and GPT's straight!

RSS feed for comments on this post

Unknown language #19

7 Comments

Chris Button said,

Victor Mair said,

David Marjanović said,

Chris Button said,

Benjamin E. Orsatti said,

Chris Button said,

Benjamin Ernest Orsatti said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta