Language Log

ChatGPT does cuneiform studies

May 21, 2023 @ 5:32 am · Filed by Victor Mair under Artificial languages, Romanization, Transcription

We have seen ChatGPT tell stories (and variants of the stories it tells), fancify Coleridge's famous poem on Xanadu, pose a serious challenge to the Great Firewall of China, mimic VHM, write Haiku, and perform all manner of amazing feats. In a forthcoming post, we will witness its efforts to translate Chinese poetry. Today, we will watch ChatGPT make a credible foray into Akkadiology.

Translating old clay tablet by using chatGPT

Jan Romme, Jan's Stuff (5/15/23)

The author commences:

You might have heard how I asked chatGPT to pose as a Jehovah’s Witness, write a “witnessing letter” with 2 or 3 bible scriptures in it, and then translate that letter into an English rap song, Eminem style. Or you might have missed that news. My point is, I like to play with AI’s.

I’m increasingly stupefied by how much AI models like OpenAI’s chatGPT, Google’s BARD, and Facebooks LLaMMa and others are capable of.

Romme tells how in 2008 he saw a clay tablet at the British Museum “Babylon: Myth and Reality” exhibition. At the time, he failed to take a photograph because, as he says, a Karen prevented him from doing so. He was also flummoxed by her intervention from even taking down basic notes about the tablet. Later, upon recollecting that the tablet pertained to the Biblical account of King Manasseh of Judah (716–662 B.C.E.), he felt compelled to track it down.

The Bible, in the second book of Chronicles 33:10, 11 tells us:

10 Jehovah kept speaking to Manasseh and his people, but they paid no attention.

11 So Jehovah brought against them the army chiefs of the king of Assyria, and they captured Manasseh with hooks and bound him with two copper fetters and took him to Babylon.

At the time, the city of Babylon was under the control of Assyria. The clay tablet that I mentioned above is only making a reference to Manasseh.

The tablet was written, or more likely, dictated, by an Assyrian prince who happens to be in jail with Manasseh in an adjacent cell. And this prince is complaining, because he is still in prison while this rebellious has-been king that was in the cell next to his, has now been released from prison and on his way back to his country, back to king-ing again!

Why is this tablet so fascinating?

Second Chronicles chapter 33 continues:

12 In his distress, he begged Jehovah his God for favor and kept humbling himself greatly before the God of his forefathers.

13 He kept praying to Him, and He was moved by his entreaty and heard his request for favor, and He restored him to Jerusalem to his kingship. Then Manasseh came to know that Jehovah is the true God.

So King Manasseh does a remarkable thing while in prison! He feels genuinely sorry for his sins and repents. He keeps praying to his God, till he is restored as king in Judah.

Assyrians were never that lenient. Never. But this time, they were. And there is evidence in that clay tablet.

That is why Romme felt obliged to find that tablet, prompting him to call upon ChatGPT for assistance.

There are multiple archives and search engines that make an effort in indexing all known cuneiform clay tablets from long-gone civilizations like Mitanni, Assyria, Babylonia, Elam and so forth.

Romme tried his luck with the Cuneiform Digital Library Initiative (CDLI), but hit a brick wall because he got 0 hits. Apparently, although many tablets have been digitized, there is either no digital translation of the tablet Romme was after, or it might be copyrighted and thus not on an open platform like the one he was searching.

So he realized the need to find the Akkadian cuneiform spelling for “Manasseh” in order to search on the data base he was using.

He writes, "I try to guide chatGPT into the right state of mind, by starting easy."

From here on, for most of the rest of the post, Romme provides screen shots of his dialog with ChatGPT, with him asking leading questions, such as "Do you have access to catalogues of ancient clay tablets from Babylonia, Assyria and Mitanni?" and ChatGPT providing informed and detailed, but honest and straightforward responses.

No breakthroughs right off the bat, but things get a bit more exciting when Romme asks: "Can you speculate on how the biblical name of king Manasseh of Judah would have been written in Babylonian Cuneiform?"

Here ChatGPT begins to display its erudition, including command of certain materials in cuneiform. It comes up with various relevant proposals and variant spellings. ChatGPT's reasoning in all of this is quite sophisticated.

Romme attempts to take into account the difference between Sumerian and Akkadian, and proceeds to interrogate ChatGPT: "Can you tell me the meaning of the akkadian word Lugal ?

This leads to a serious discussion of kingship in Mesopotamian societies, but doesn't help solve the problem of the identity (spelling) of king Manasseh of Judah on the elusive clay tablet, because, as Romme admits when someone pointed it out to him, "Lugal" is a Sumerian word, not Akkadian.

At least, though, with the help of ChatGPT, he now has three variant spellings of the king's name:

me-na-si-i
ma-na-si-i
ma-an-si-i

By using these spellings, Romme gets a lot of hits when he submits them to CDLI.

Romme presents the first one, nine lines of romanized Akkadian, to chatGPT, and asks the bot whether it can translate the text into English.

ChatGPT replies: "Certainly! Here is the translation of the provided text", and rattles it off in respectable English, then apologizes: "Please note that this translation is based on my understanding of the Akkadian language and grammar. Variations in transliteration, content, and interpretation may exist" — prompting Romme to exclaim:

Holy Moly!

It can translate not just words but whole Akkadian texts as well?!?

Too bad this is not the King Manasseh of Judah, but instead "Menasî of Assur".

I don’t have time now to go through all these clay tablets and translate them with chatGPT, but I’m dreaming of a plugin that automatically translates and publishes the translation of ancient cuneiform clay tablets.

These draft translations can then be proofread by actual translators who are knowledgable in Akkadian, Sumerian, Elamite and all the other long-dead languages that once wrote in cuneiform. Imagine a chatGPT version 5 or 6 that has been trained on these refined and checked translations. One day, we could potentially outsource most if not all translating to machines and we, humans, could do the much more fun work of reading and commenting on these long silent texts.

Keep dreaming kids, and keep building.

PS: After reading some comments on my beloved HN, I changed the text around "Lugal" a bit and added the next to last paragraph. In no way do I believe chatGPT can today translate Akkadian (or other texts with small training sets) correctly, but I hope that in a future day, it will be.

What I see happening is that ChatGPT can already do much of the grunt work and draft translations, lessening the burden on human experts. Furthermore, if it can do this passably well for esoteric languages like Akkadian, Sumerian, and Elamite, and somewhat profitably for mind-numbingly difficult languages like Literary Sinitic / Classical Chinese (I have some test cases from Jing Hu and David Cowhig), it is not surprising that it produces close to perfect translations of Modern Standard Mandarin (if I find time I'll give some examples in the coming weeks) and other contemporary languages with large, well-documented data bases.

We are only beginning to feel the potential impact of ChatGPT and similar tools. I, for one, welcome their awesome power, but caution that we must keep an eye on them, lest they go astray and make mischief that will be hard to correct later on.

Selected articles

"An example of ChatGPT 'hallucinating'?" (4/16/23)
"Desultory philological, literary, and historical notes on Xanadu" (4/4/23)
"Hallucinations: In Xanadu did LLMs vainly fancify" (4/3/23)
"Detecting LLM-created essays?" (12/20/22)
"Alexa down, ChatGPT up?" (12/8/22)
"Bing gets weird — and (maybe) why" (2/16/23)
"ChatGPT-4: threat or boon to the Great Firewall?" (3/21/23)
"ChatGPT writes VHM" (2/28/23)
"ChatGPT: Theme and Variations" (2/21/23)
"GLM-130B: An Open Bilingual Pre-Trained Model" (1/25/2023)
"ChatGPT writes Haiku" (12/21/22)
"Artificial Intelligence in Language Education: with a note on GPT-3" (1/4/23)
"DeepL Translator" (2/16/23)
"Uh-oh! DeepL in the classroom; it's already here" (2/22/23)
"This is the 4th time I've gotten Jack and his beanstalk" (3/15/23)

[Thanks to Hiroshi Kumamoto]

May 21, 2023 @ 5:32 am · Filed by Victor Mair under Artificial languages, Romanization, Transcription

Permalink

7 Comments

Peter Grubtal said,

May 21, 2023 @ 6:19 am

Purely an aside, but I have now learnt what "a Karen" is: but from Wikipedia, not ChatGPT.
Museums sometimes have restrictions on photo-taking, but they are difficult to enforce in the days of mobiles.

The article is otherwise fascinating. ChatGPT's achievement is obviously based on the work of many Assyriologists over the years, and one feels there's something in the jibe "high-tech plagiarism". I have loads of respect and gratitude for those who work in field: they probably need a solid grounding in semitic languages and knowledge of the unrelated Sumerian, a truly arduous apprenticeship I imagine. All that to be made redundant by a computer.
Carl said,

May 21, 2023 @ 6:45 am

This story was discussed on Hacker News, and they found the quality of translation poor. In general, ChatGPT struggles to stay factual when it has a small corpus. It’s easy to say “looks good” but if you don’t check it meticulously, you can be fooled. It’s a good tool, but using it as a Google replacement leads to disappointment.
Carl said,

May 21, 2023 @ 6:48 am

See the analysis in the comments here: https://news.ycombinator.com/item?id=35954259
Victor Mair said,

May 21, 2023 @ 4:06 pm

From Jane Hickman:

I know there may not be enough examples of Linear A, but this opens up such possibilities for translations.
Victor Mair said,

May 21, 2023 @ 4:08 pm

Brilliant thought, Jane!
Elizabeth Barber said,

May 23, 2023 @ 10:21 am

As I showed in my 1974 book, Archaeological Decipherment, there is a mathematical algorithm showing how much text one needs to PROVABLY accomplish a decipherment for what sort of script. Since 1974, we haven't added enough new text to our pile of LINEAR A to make it over the hump, if the language it hides is unrelated to anything we already know (or if the hidden language, like Semitic, "cross-classifies" its morphemes between consonants and vowels, since each phonological sign in Linear A represents one C and one V). And if it IS hiding some language we already have a linguistic handle on, we are still scarcely up to the top of the hump. So what language, or language family might one try? We already know that Linear A shows virtually nothing in the way of suffixing or other inflection, so it looks very UN-Indo-European.
And note that I said "provably" deciphered. Below that threshold, one can neither prove NOR DISPROVE any purported decipherment!
(A prime case of unprovability is the Phaistos disc, "deciphered" many times– my favorite being as a "double hymn to Zeus and the Minotaur"!! )
David Marjanović said,

May 24, 2023 @ 10:12 am

See the analysis in the comments here: https://news.ycombinator.com/item?id=35954259

Read the whole page!

…though the very best part is right at the beginning: "From 'do robots dream' to 'how do we make them stop'"

RSS feed for comments on this post

ChatGPT does cuneiform studies

7 Comments

Peter Grubtal said,

Carl said,

Carl said,

Victor Mair said,

Victor Mair said,

Elizabeth Barber said,

David Marjanović said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta