Language Log

Archive for Decipherment

AI to the rescue of a Greek philosopher's work buried by Vesuvius

October 15, 2025 @ 4:43 pm· Filed by Victor Mair under Artificial intelligence, Decipherment, Language and philosophy

A year and a half ago, we learned of the initial AI-assisted decipherment of a charred scroll that had been buried for two millennia under the volcanic ashes of Mt. Vesuvius (eruption 79AD) in the city of Herculaneum: "AI (and human ingenuity) to the rescue" (2/6/24).

Since then, researchers have continued to work on the scroll until now they have identified the precise text on it:

Lost Work of Greek Philosopher Philodemus Unearthed from Herculaneum Scroll
By Tasos Kokkinidis, Greek Reporter (May 6, 2025)

Read the rest of this entry »

Permalink Comments (19)

AI for reconstructing degraded Latin text

August 9, 2025 @ 6:38 am· Filed by Victor Mair under Artificial intelligence, Decipherment

AI Is Helping Historians With Their Latin
A new tool fills in missing portions of ancient inscriptions from the Roman Empire

By Nidhi Subbaraman Aug. 6, 2025

In recent years, we have encountered many cases of AI assisting (or not) in the decipherment of ancient manuscripts in diverse languages. See several cases listed in the "Selected readings". Now it's Latin's turn to benefit from the ministrations of artificial intelligence.

People across the Roman Empire wrote poetry, kept business accounts and described their conquests and ambitions in inscriptions on pots, plaques and walls.

The surviving text gives historians a rare glimpse of life in those times—but most of the objects are broken or worn.

“It’s like trying to solve a gigantic jigsaw puzzle, only there is tens of thousands more pieces to that puzzle, and about 90% of them are missing,” said Thea Sommerschield, a historian at the University of Nottingham.

Now, artificial intelligence is filling in the blanks.

An AI tool designed by Sommerschield and other European scientists can predict the missing text of partially degraded Latin inscriptions made hundreds of years ago and help historians estimate their date and place of origin.

Read the rest of this entry »

Permalink Comments (8)

Unknown language #20

July 4, 2025 @ 2:57 pm· Filed by Victor Mair under Conlanging, Decipherment

From Rebecca Turner in Seattle:

Read the rest of this entry »

Permalink Comments (20)

Decipherment of the Indus script: new angles and approaches, part 4

April 17, 2025 @ 7:17 pm· Filed by Victor Mair under Alphabets, Decipherment, Language and archeology, Writing systems

These are remarks by Ron Vara from here:

ᱮᱞᱚᱱ ᱨᱤᱣ ᱢᱩᱥᱠ ( /ˈiːlɒn/ EE-lon; ᱡᱟᱱᱟᱢ ᱡᱩᱱ ᱒᱘, ᱑᱙᱗᱑) ᱩᱱᱤ ᱫᱚ ᱢᱤᱫ ᱵᱮᱯᱟᱨᱤᱭᱟᱹ ᱠᱟᱱᱟᱭ ᱚᱠᱚᱭ ᱫᱚ ᱩᱱᱤᱭᱟᱜ ᱢᱩᱲᱩᱫ ᱵᱷᱩᱢᱤᱠᱟ Tesla, Inc., SpaceX, ᱟᱨ ᱴᱩᱭᱴᱚᱨ (ᱡᱟᱦᱟᱸ ᱩᱱᱤ ᱮᱠᱥ ᱞᱮᱠᱟᱛᱮ ᱧᱩᱛᱩᱢ ᱵᱚᱫᱚᱞ ᱮᱱᱟ) ᱨᱮ ᱵᱟᱰᱟᱭᱚᱜ ᱠᱟᱱᱟ᱾

This is the first sentence in the article Elon Musk in Santali alphabet (Ol Chiki). Yes, it's an alphabetic writing system, not an abugida. What makes the Santali alphabet really elusive is that it resembles the shapes of the undeciphered Indus Valley script. Soviet archaeologists once tried to decipher IVC seals using Santali alphabet. Sounds ridiculous, but it's a sad truth that Santali is a unique language with little to no academic attention having been paid to it.

Read the rest of this entry »

Permalink Comments (7)

Are all writing systems equally easy / hard?

April 17, 2025 @ 12:15 pm· Filed by Victor Mair under Alphabets, Decipherment, Philology, Writing systems

Some folks seem to think so, but not Benjamin James who wrote this letter to the London Review of Books, 47.6 (April 3, 2025), p. 4:

Simple Script

In his fascinating article on the recent decipherment of Linear Elamite, Tom Stevenson finds it difficult to accept that 'the Latin or Greek writing systems are simpler or "more precise" than mostly logographic writing systems like written Chinese' (LRB, 6 March). Does he really believe Chinese script is just as suited as Latin to the rendering of foreign words? 'Tom Stevenson' is far simpler and more phonetically precise than 汤姆•史帝⽂森，'Tangmu Shidiwénsen', which adds two syllables, six tones and six individual character meanings. The Committee for Language Reform in China acknowledged the relative simplicity of the Latin script as one of the factors behind its abandonment in 1956 of the attempt to develop a phonetic script based on Chinese characters.

Read the rest of this entry »

Permalink Comments (18)

Decipherment of the Indus script: new angles and approaches, part 3

March 28, 2025 @ 6:53 pm· Filed by Victor Mair under Decipherment, Epigraphy, Language and archeology

Martin Schwartz called my attention to the Jiroft culture:

The Jiroft culture, also known as the Intercultural style or the Halilrud style, is an early Bronze Age (3rd millennium BC) archaeological culture, located in the territory of present-day Sistan and Baluchestan and Kermān provinces of Iran.

The proposed type site is Konar Sandal, near Jiroft in the Halil River area. Other significant sites associated with the culture include Shahr-e Sukhteh (Burnt City), Tepe Bampur, Espiedej, Shahdad, Tal-i-Iblis and Tepe Yahya.

The grouping of these sites as an "independent Bronze Age civilization with its own architecture and language", intermediate between Elam to the west and the Indus Valley civilization to the east, was first proposed by Yusef Majidzadeh, head of the archaeological excavation team in Jiroft (south central Iran). The hypothesis is based on a collection of artifacts that have been formally excavated and recovered from looters by Iranian authorities; accepted by many to have derived from the Jiroft area (as reported by online Iranian news services, beginning in 2001).

(Wikipedia)

The late Irene Good (PhD University of Pennsylvania, 1999; Harvard University Peabody Museum, 2001; later Oxford University Research Laboratory for Archaeology and the History of Art) worked at a number of Jiroft sites in the years just after they were discovered, especially on the important textiles that were preserved there. Her investigations of the "palaeo-environmental perspective" on ancient textiles were instrumental in helping us understand the networks of trade, technology, and cultural transmission among Europe, MP, Iran, and IV. See "Selected readings" for a biographical note on Irene.

Read the rest of this entry »

Permalink Comments (5)

Decipherment of the Indus script: new angles and approaches, part 2

March 22, 2025 @ 12:26 pm· Filed by Victor Mair under Decipherment, Language and culture, Language and history, Language and literature, Language and religion, Language and travel

In the first part of this inquiry, I stressed the connection between Mesopotamian and Indus Valley (IV) civilizations. My aim was to provide support for a scriptal and lingual link between the undeciphered IV writing system and the well-known languages and writing systems of Mesopotamia (MP), which tellingly is translated as liǎng hé liúyù 兩河流域 ("valley / drainage basin of two rivers") in contemporary Sinitic. The point is to detach IV from IE, which is a red herring and a detraction from productive efforts to decipher the IV script. If we concentrate on the civilization, languages, and writing systems of MP, it should be easier to crack the IV code.

Read the rest of this entry »

Permalink Comments (11)

Decipherment of the Indus script: new angles and approaches

March 6, 2025 @ 5:58 pm· Filed by Victor Mair under Decipherment, Language and archeology, Language and religion, Writing systems

Want a Million Dollars? Get Busy Deciphering This Ancient Script. A prize offered by an Indian state leader is intended to shed light on a Bronze Age civilization — and settle a cultural battle.
By Pragati K.B., NYT (2/1/25)

The Indus Valley civilization, also called the Harappan civilization, is seen by experts as on a par with the better-known ones of Egypt, Mesopotamia and China.

One of the earliest, it flourished on the banks of the Indus and Saraswati Rivers during the Bronze Age. It had planned townships, water management and drainage systems, huge fortified walls and exquisite pottery and terra cotta artistry.

Read the rest of this entry »

Permalink Comments (15)

Enigmatic writing from the Republic of Georgia

December 12, 2024 @ 12:04 am· Filed by Victor Mair under Decipherment, Writing

"Mysterious tablet with unknown language unearthed in Georgia", by Dario Radley, Archeology News (12/4/24)

Tablet with inscription in an unknown language, discovered in Georgia.
Credit: R. Shengelia et al., Journal of Ancient History and Archaeology

Read the rest of this entry »

Permalink Comments (14)

Yet again the Voynich manuscript

September 11, 2024 @ 11:32 am· Filed by Victor Mair under Alphabets, Decipherment, Language and technology, Manuscripts, Writing

Perhaps as early as 1640, decipherers have tried practically everything to decode the maddeningly frustrating Voynich manuscript. So far it has resisted all efforts to identify the language in which it was presumably written. About the only way to make further progress in cracking the code is to apply some new technology. As described in the following reports, it seems that a type of digital enhancement has become available and been used to fill in some of the gaps in the manuscript.

The first is the primary document, "Multispectral Imaging and the Voynich Manuscript", which appears on Lisa Fagin Davis' blog, Manuscript Road Trip (9/8/24). She begins with an explanation of what the technology consists of.

Read the rest of this entry »

Permalink Comments (27)

Reading Old Turkic runiform inscriptions with the aid of 3D simulation

July 21, 2024 @ 4:41 am· Filed by Victor Mair under Artificial intelligence, Decipherment, Philology

"Augmenting parametric data synthesis with 3D simulation for OCR on Old Turkic runiform inscriptions: A case study of the Kül Tegin inscription", Mehmet Oğuz Derin and Erdem Uçar, Journal of Old Turkic Studies (7/21/24)

Abstract

Optical character recognition for historical scripts like Old Turkic runiform script poses significant challenges due to the need for abundant annotated data and varying writing styles, materials, and degradations. The paper proposes a novel data synthesis pipeline that augments parametric generation with 3D rendering to build realistic and diverse training data for Old Turkic runiform script grapheme classification. Our approach synthesizes distance field variations of graphemes, applies parametric randomization, and renders them in simulated 3D scenes with varying textures, lighting, and environments. We train a Vision Transformer model on the synthesized data and evaluate its performance on the Kül Tegin inscription photographs. Experimental results demonstrate the effectiveness of our approach, with the model achieving high accuracy without seeing any real-world data during training. We finally discuss avenues for future research. Our work provides a promising direction to overcome data scarcity in Old Turkic runiform script.

Read the rest of this entry »

Permalink Comments (1)

Unknown language #19

June 16, 2024 @ 10:44 pm· Filed by Victor Mair under Artificial intelligence, Decipherment, Philology

Inscribed sandstone known as the "Singapore Stone", Singapore, 10th–14th century:

Collection of the National Museum of Singapore

(Source; also includes an animated photo that can be rotated 360º in any direction and enlarged or reduced to any size)

Read the rest of this entry »

Permalink Comments (7)

Unknown language #10, part 2

June 8, 2024 @ 7:31 pm· Filed by Victor Mair under Decipherment, Names, Transcription, Translation

[This is a guest post by Martin Schwartz.]

"Unknown language #10" (12/1/17) left all stumped, including a broad range of superb scholars of many languages. I have no Rosetta Stone for it, but have something that may be called a Russetta or Rusetta (as in ruse) Bone.

First, the mystery text, which was the focus of Language Log Unknown Language #10, I reproduce it here as was transmitted there:

Ukhant karapet qulkt kirlerek
Iqat ighun chapuq sireleq,
Poghtu Paghytei Piereleq
Azlayn qoghular eliut karapet.

Now, to the above I give a set of verse found in Aleksandr Kuprin's Russian novel Jama ('The Pit'), 1909-1915:

U Karapeta est' bufet
Na bufete est' konfet,
Na konfete est' portret
Ètot samyj Karapet.

'Karapet has a buffet
On the buffet is a bonbon (vel sim.)
On the bonbon is a portrait,
It's the very same Karapet.'

Read the rest of this entry »

Permalink Comments (9)

« Previous Entries

Archive for Decipherment

AI to the rescue of a Greek philosopher's work buried by Vesuvius

AI for reconstructing degraded Latin text

Unknown language #20

Decipherment of the Indus script: new angles and approaches, part 4

Are all writing systems equally easy / hard?

Decipherment of the Indus script: new angles and approaches, part 3

Decipherment of the Indus script: new angles and approaches, part 2

Decipherment of the Indus script: new angles and approaches

Enigmatic writing from the Republic of Georgia

Yet again the Voynich manuscript

Reading Old Turkic runiform inscriptions with the aid of 3D simulation

Unknown language #19

Unknown language #10, part 2

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta