Voynich code cracked?

« previous post | next post »

Since high school, the Voynich manuscript is something that I have puzzled over from time to time.  What language and script is it written in?  What's it about?  Although no one has been able to read the manuscript since Wilfrid Voynich, the PolishSamogitian bibliophile and book dealer first brought it to light more than a century ago, the evocative illustrations and mysteries swirling around it have led to many fruitless attempts at decipherment.

Now a British academic (in Journal of Romance Studies) declares that it was a manual for nuns written in unencrypted proto-Romance:

"Bristol academic cracks Voynich code, solving century-old mystery of medieval text", University of Bristol (May 15, 2019).

A University of Bristol academic has succeeded where countless cryptographers, linguistics scholars and computer programs have failed—by cracking the code of the 'world's most mysterious text', the Voynich manuscript.

Although the purpose and meaning of the manuscript had eluded scholars for over a century, it took Research Associate Dr. Gerard Cheshire two weeks, using a combination of lateral thinking and ingenuity, to identify the language and writing system of the famously inscrutable document.

That's pretty impressive!  It only took him two weeks, relying on "lateral thinking and ingenuity", to succeed where countless others had failed.

Here are the title and abstract of Cheshire's peer-reviewed paper:

"The Language and Writing System of MS408 (Voynich) Explained" In Romance Studies.  Published online: 29 Apr 2019

[VHM:  MS 408 is the call number under which the Voynich manuscript is catalogued in Yale University's Beinecke Rare Book and Manuscript Library, to which it was donated by Hans P. Kraus in 1969.]

Manuscript MS408 (Voynich) is unusual in a number of respects: 1. It uses an extinct language. 2. Its alphabet uses a number of unfamiliar symbols alongside more familiar symbols. 3. It includes no dedicated punctuation marks. 4. Some of the letters have symbol variants to indicate punctuation. 5. Some of the symbol variants indicate phonetic accents. 6. All of the letters are in lower case. 7. There are no double consonants. 8. It includes diphthong, triphthongs, quadriphthongs and even quintiphthongs for the abbreviation of phonetic components. 9. It includes some words and abbreviations in Latin. As a result, identifying the language and solving the writing system required some ingenuity and lateral thinking, but both were duly revealed. The writing system is rather more singular and less intuitive than modern systems, which may explain why it failed to become culturally ubiquitous and ultimately became obsolete. On the other hand, a significant vestige of the language has survived into the modern era, because its lexicon has been sequestered into the many modern languages of Mediterranean Europe. Here, the language and writing system are explained, so that other scholars can explore the manuscript for its linguistic and informative content.

Just reading through the oddly worded, weakly reasoned abstract already made me start to have misgivings about this "peer-reviewed paper".  After digging through the paper itself, although it is long and detailed, with numerous tables and figures, and is provided with extensive notes and references, the fact that it is undergirded by the following kinds of assumptions and assertions gave me pause:

Unbeknown to the scholarly community, the manuscript was written in an extinct and hitherto unrecorded language as well as using an unknown writing system and with no punctuation marks, thereby making the problem triply difficult to solve. Furthermore, some of the manuscript text uses standard Latin phrasing and abbreviations, only adding a fourth dimension of difficulty.

Thus, without knowledge of this information it was quite impossible for anyone to even begin to fathom the meaning of the symbols and apprehend the words, the phrases and the sentences they spelled out. When a connection between the lost language and the writing system was explored, in May 2017, the solution duly emerged by elucidating both the language and the writing system in unison: i.e. both revealed themselves in the process, rather like patiently unravelling a tangle of chains. Thus, the solution was found by employing an innovative and independent technique of thought experiment.

Perhaps inevitably, and certainly ironically, the manuscript has revealed itself to be far more interesting and informative than imagined by the aforementioned scholars. It was written by an entirely unknown and ordinary figure from the past, and without any deliberate code but a language and writing system that were in normal and everyday use for their time and place, yet the linguistic and historic information it holds are of unparalleled importance. So it turns out that the manuscript is remarkable after all, but in academic ways rather than sensationalistic and fantastical ways.

I came away wondering just what the credentials of the author are and how the article passed through the peer review process of an academic journal in the field of Romance linguistics.

At the very end of the paper, we find this note on "Author information":

Dr. Gerard Cheshire has recently completed his doctorate, expounding an adaptive theory for human belief systems, and is now a Research Associate with University of Bristol. The solution to the codex of MS408 was developed over a 2-week period in May 2017 after he came across the manuscript for the first time whilst conducting research for his PhD dissertation. Having deciphered the writing system, he subsequently realized the significance of the manuscript to Romance linguists and Mediaeval historians, and so decided to publish the information.

Well, that made me wonder all the more.

It was reassuring to find my doubts echoed by a careful, long-time researcher on the Voynich manuscript such as Nick Pilling:

"Gerard Cheshire, Vulgar Latin, and the siren call of the polyglot…", Cipher Mysteries (11/10/17)

That was two years ago, when Cheshire was making his initial forays into the study of the Voynich manuscript.  He did not heed Pilling's warnings, but went ahead with his project.  Now that Cheshire has gone public with the full monty, he has to face meticulous, determined denials such as these:

"Cheshire reCAsT", J. K. Petersen, The Voynich Portal (5/7/19)

"Cheshire Reprised", J. K. Petersen, The Voynich Portal (5/16/19)

"No, someone hasn’t cracked the code of the mysterious Voynich manuscript.  Medieval scholar: "Sorry, folks, 'proto-Romance language' is not a thing.""  Jennifer Ouellette, Ars Technica (5/15/19)

What's particularly poignant is that this morning we find evidence of more critical scholarship on Twitter (see here and here) than in Romance Studies.

When all is said and done, what I see is a Cheshire Cat smiling enigmatically back at me.


"Mystery Language" (12/17/14)

"From the American Association for the Advancement (?) of Science (?)" (5/25/13)

"Translation as cryptography as translation" (11/19/12)

"Postcard language puzzle" (11/23/12)

"Voynich and midfix" (7/3/04)

"Neil deGrasse Tyson on linguists and Arrival" (3/3/17)

"Latin, Hebrew … proto-Romance? New theory on Voynich manuscript:  Researcher claims to have solved mystery of 15th-century text but others are sceptical", Esther Addley, The Guardian (5/15/19)

[h.t. Bryan Van Norden, Kyle Olbert, Ben Zimmer, and GKP]


  1. Rube said,

    May 16, 2019 @ 11:51 am

    Glad to see this discussed here. I've been seeing the originally story uncritically shared on Facebook by people who should know better. It seems like Voynich is solved every week, with each "solution" being forgotten when the next one comes along.

  2. janwo said,

    May 16, 2019 @ 12:10 pm

    What bothers me about the actual paper is the completely abstruse use of terminology: Grapheme sequences are referred to as “Quadraphthongs” [sic!], and to the author refers to a type of cursive as “Proto Italic” – which is more than unusual. Whoever greenlighted the paper in peer review, please be ashamed and please visit a Graphematics 101 class together with the author of the study!

  3. Joe said,

    May 16, 2019 @ 12:42 pm

    "This is just more aspirational, circular, self-fulfilling nonsense."

  4. Kyle said,

    May 16, 2019 @ 1:10 pm

    Even as an amateur, this particular proposed translation seems incredibly shakey. "Proto-Romance"? Unless the American public schooling system has led me seriously astray, "Proto-Romance" would be Latin. Maybe some later medieval form, Vulgar Latin instead of Church Latin or Classical Latin, but still something distinctly Latin, and Latin is a language we know well.

    Using a novel writing system for it doesn't pass the sniff test – if all other forms of Latin used the Latin alphabet, and nearly all* Romance languages use the Latin alphabet, why would some intermediate step use a new writing system that disappears with no trace in its descendants? Maybe if it were trying to be deliberately obfuscated, but that's an additional complexity to have to explain… and the paper actually raises that point and *refutes* it! It makes the specific claim that this hitherto-unknown intermediate language used a hitherto-unknown script, then disappeared with no evidence besides a single manuscript. I just cannot imagine that happening.

    Even the terminology seems very odd, again from the perspective of a guy with no formal linguistic training, just too much time spent on Wikipedia. Extending -phthong to four and five already seems unlikely, given that up to now I've only heard "triphthong" used rarely, using them for graphemes instead of phonemes is confusing at best, and using them for consonant clusters is just clearly wrong. Even had I first encountered it from an article that believed it, I'd have been suspicious. (Fortunately, I first saw it on Ars, then was glad to see both their and my suspicions confirmed here.)

    I think, given the calculated age of the Voynich manuscript, it cannot be using a "normal" writing system, by which I mean that whatever it's written in was not a normal way of writing in whatever time and place it was made. It's just too recent for it to be a common writing system, without some other texts to survive with it. It could be a code, it could be asemic writing, it could be some really early attempt at an artlang, it could be fraud, or any of a dozen other explanations. But any purported translation claiming it was the standard way of writing some language has a lot of hard questions to answer.

    * Save Romanian historically using Cyrillic, Aljamiado texts using Arabic to write Spanish and related languages, and maybe a few other obscure things I've never heard about, every Romance language used some derivative of the Latin alphabet.

  5. Christian Weisgerber said,

    May 16, 2019 @ 1:34 pm

    Quoting what I wrote on sci.lang:

    I started skimming [through the paper] but halfway through I have lost all patience.

    This is garbage.

    The author provides a character mapping that he uses to transcribe some fragments. He then proceeds to interpret these fragments by equating the words with identical/similar words randomly picked from across all Romance languages plus Latin or even loanwords in other languages.

    I don't think the author understands what "Proto-Romance" means. There is no way this was reviewed by anybody with an understanding of historical Romance linguistics.

    I assume [Romance Studies is] a junk journal, given that they saw it fit to publish this turd.

  6. Christian Weisgerber said,

    May 16, 2019 @ 1:54 pm


    "Proto-Romance" would be Latin. Maybe some later medieval form, Vulgar Latin instead of Church Latin or Classical Latin

    Proto-Romance is, by definition, the last common ancestor of the Romance languages. Vulgar Latin by the time of the dissolution of the Western Roman Empire at the end of the 5th century CE is a good candidate, although, depending on how much dialectal variation you are willing to accept in a unified proto-language, you may want to push that to a few centuries earlier.

    Proto-Romance is certainly not Classical Latin. This is most apparent in phonology, where the Romance languages clearly preserve the reflexes of a seven vowel system (i u e o ɛ ɔ a) at odds with the ten vowel phonemes of Classical Latin (i ī u ū e ē o ō a ā).

    And, to get back to this wretched paper, a proto-language is most emphatically not a random collection of words from its daughter languages.

  7. Gwen Katz said,

    May 16, 2019 @ 2:10 pm

    And he just needs a little bit more grant money to translate the whole thing! Presumably he'll have it finished around the time his expedition finds the Ark in the mountains of Turkey.

  8. Guy Plunkett said,

    May 16, 2019 @ 2:56 pm

    Not my field, by a long shot, but after reading the paper I must assume other fields have differing standards of per review? As to "lateral thinking and ingenuity," many years ago as an undergraduate I came up with the term bialaca, which means "by intuition and luck and cheating also."

  9. Guy Plunkett said,

    May 16, 2019 @ 2:56 pm

    Peer review, not per review …

  10. Peter Erwin said,

    May 16, 2019 @ 5:10 pm

    Romance Studies turns out to be published by Taylor & Francis, so it's not necessarily a junk journal. But a quick glance at the (English-language) articles from recent issues suggests it's a grab-bag of literary/cultural studies (with a common theme of "stuff written or performed in Romance languages", ranging from Dante to contemporary Spanish comic books), so it's easy to imagine the editors not having a clue about how to find reviewers competent in medieval linguistics, manuscript studies, etc.

  11. maidhc said,

    May 16, 2019 @ 5:46 pm

    There's a lot more comment on this thread https://www.metafilter.com/180913/Voynich-Decoded

    Most of it is on the skeptical side.

    Ars Technica has some discussion of the last few times this manuscript has been decoded.

  12. David Marjanović said,

    May 16, 2019 @ 5:48 pm

    There was no way anybody could have known Proto-Romance in the 15th century. That's easily a thousand years to late.

    Lingua franca would be a more plausible option, because that was a mixture of words from different Romance languages… but, yeah, what Gwen Katz said.

    the Romance languages clearly preserve the reflexes of a seven vowel system (i u e o ɛ ɔ a)

    Except Sardinian, whose five vowels derive from the Classical Latin system by simple loss of length without the cross-length mergers (e.g. i, ē > e) found elsewhere. And while all others can indeed be derived from a seven-vowel system, that stage must have arisen in three different ways in the west, the east (Romanian) and a few dialects on Corsica.

    The usual solution is to postulate a short-lived nine-vowel system for Proto-Romance, where the a and ā of Classical Latin had merged but all others were still distinct.

  13. Alex said,

    May 16, 2019 @ 5:55 pm

    Perhaps it should be a test to determine if something is true AI. To make a determination of hoax or not.

  14. Cwæþ said,

    May 16, 2019 @ 6:17 pm


  15. Bathrobe said,

    May 16, 2019 @ 6:36 pm

    The article GERARD CHESHIRE, VULGAR LATIN, AND THE SIREN CALL OF THE POLYGLOT… ends in a resounding denunciation of historical linguistics as a discipline.

  16. Elias said,

    May 16, 2019 @ 11:00 pm

    Living in Mormon Utah, I could not help but compare this thinking, in part, to Joseph Smith's claims to have translated the Book of Mormon "gold plates" from the nonexistent "Reformed Egyptian."

  17. CD said,

    May 17, 2019 @ 3:22 am

    The Sun's headline is hilarious, though: "Voynich manuscript dubbed ‘world’s most mysterious text’ FINALLY decoded by UK genius – revealing sex tips and abortion advice".

    Stay tuned for my forthcoming "Harappan Erotica."

  18. David Marjanović said,

    May 17, 2019 @ 4:18 am

    I just noticed the "paper" is in open access – I didn't expect that from Taylor & Francis!

    Yup, this "Proto-Romance" consists of random words randomly drawn from random Romance languages. It's not a language.

  19. NW said,

    May 17, 2019 @ 5:15 am

    Perhaps open access before it's of more interest than the average Romance Studies article. Their second-most viewed article has been viewed 1720 times since February 2014. This erotic bathing tips one is up to 62 303 already.

  20. David Morris said,

    May 17, 2019 @ 5:21 am

    I once read somewhere that a newspaper headline in the form of a question almost certainly has the answer 'no'.

  21. Victor Mair said,

    May 17, 2019 @ 6:54 am

    From an old, old friend of Language Log:

    Part of Cheshire's translation of a passage about volcanoes reads: "to look it is man not mouse and marry and embrace an opening thus you go carefully to the queen to avoid not getting wet with seawater".

    Well, that just about convinces me that he's cracked it. Sorry, I typed that wrong: I meant that just about convinces me that he's cracked.

  22. Victor Mair said,

    May 17, 2019 @ 7:18 am

    From an anonymous correspondent:

    Actually, it might be worth reflecting for a few minutes on what a Freudian literary analyst would say about this purported sentence about volcanoes:

    to look
    it is man not mouse and
    marry and
    embrace an opening
    thus you go carefully to the queen to avoid
    not getting wet . . .

    How long have you been having these fantasies about getting wet, going carefully to the queen, and embracing her opening, Dr Cheshire?

    I think he has a secret inner life (a man, not a mouse — a volcano of erotic energy), and the Voynich is contributing no more than a sort of Rorschach blot presentation.

  23. Victor Mair said,

    May 17, 2019 @ 7:49 am

    "University backtracks on disputed Voynich manuscript theory: Bristol distances itself from academic who claims to have solved century-old mystery", Esther Addley, The Guardian (5/17/19)


  24. AntC said,

    May 17, 2019 @ 7:59 am

    a resounding denunciation of historical linguistics as a discipline.

    random words randomly drawn from random Romance languages.

    Dominican nuns.

    The resonance of not-quite-inchoate strings of words.

    This is all weirdly familiar … where have I seen it before? Oh yes, Edo Nyland: all the world's languages were concocted by Benedictine monks; using random bits of Basque (aka proto-Saharan) chopped into segments and re-assembled. It's obvious; and it's only the global conspiracy of historical linguistics seeking to protect the original monks that is blinding everybody with science.

    convinces me that he's cracked

    I've seen Nyland described as "the crackpots' crackpot". Seems he has competition.

  25. Rube said,

    May 17, 2019 @ 8:21 am

    From the Guardian article Professor Mair links to:

    "Asked for his reaction, Cheshire told the Guardian he felt “no disappointment at all” at the university’s backtracking. “It was inevitable and expected, given the passion that the manuscript arouses, that a marginal group would find it difficult to accept new evidence,” he said.
    “The paper has been blind peer-reviewed and published in a highly reputable journal, which is the gold standard in scientific corroboration. Thus, all protocol was followed to the letter and the work is officially supported. Given time, many scholars will have used the solution for their own research of the manuscript and published their own papers, so the small tide of resistance will wane.”"

    As far as I can make out, the "marginal group" is everyone with expertise in the subject matter, but maybe I am missing something.

  26. NW said,

    May 17, 2019 @ 8:35 am

    Cheshire's always so cool and reasonable, isn't he? My favourite comment from him is this:

    Whilst you bitch and prattle and peck and spit feathers, someone else
    will be quietly using their mind for better things. Take my word.

    That was from somewhere in the "siren call of the polyglot" thread when he was posting as a third person, whose name coincidentally was an anagram of his own. (Did he think self-styled VMS experts couldn't even crack an anagram?)

  27. Stephen Hart said,

    May 17, 2019 @ 8:47 am

    Here's the permanent link to the xkcd comic:

  28. Belial Issimo said,

    May 17, 2019 @ 1:43 pm

    "to look it is man not mouse and marry and embrace an opening thus you go carefully to the queen to avoid not getting wet with seawater"

    I believe we have identified the first known instance of proto-Romance misnegation.

  29. Steve said,

    May 17, 2019 @ 2:01 pm

    @Belial Pshaw. Proto-Romance obviously had negative concord, so it’s not misnegation.

  30. BobW said,

    May 17, 2019 @ 9:40 pm

    Did Cheshire publish on April 1?

  31. Christopher J. Henrich said,

    May 18, 2019 @ 12:58 am

    "a combination of lateral thinking and ingenuity"…
    Well, "lateral thinking" could be a calque of "paranoia."

  32. Norval Smith said,

    May 18, 2019 @ 3:41 am

    And to think I was hoping to find a new use for my copy of Hall's Proto-Romance Phonology!

  33. Henry Grodsk said,

    May 18, 2019 @ 4:08 am

    "Their second-most viewed article has been viewed 1720 times since February 2014. This erotic bathing tips one is up to 62 303 already."

    Perhaps it's a spoof aimed at demonstrating the unsuitability of citation-counting etc. for awarding grants, promotions, chairs, and jobs in academia.

  34. boynamedsue said,

    May 18, 2019 @ 9:44 am

    I read the article all the way through and marvelled at the horrors that I saw within. The "these are all words from Latin languages which I have assembled into a nearly sentence" was very refreshing indeed.

    However, the bit about zodiac signs and month names actually looked quite convincing, does anyone know what the flaws in his argument are there? I'm sure there must be some, but I couldn't pick them myself.

  35. Christian Weisgerber said,

    May 18, 2019 @ 2:40 pm


    However, the bit about zodiac signs and month names actually looked quite convincing,

    That's because those parts are actual Romance month names written in Latin letters.

  36. D.N.O'Donovan said,

    May 19, 2019 @ 9:39 am

    Thanks to all who contributed comments. Here I was thinking that refuting Cheshire would be left to two or three dedicated writers such as Pelling (an historian) or Koen Gheuens (majored in historical linguistics)… but there you all are, sound as a bell and twice as bright.

    And Guy Plunkett – thanks so much for the word "bialaca".

  37. boynamedsue said,

    May 19, 2019 @ 10:19 am

    @Christian W.

    Thanks, I thought the March one was in Voynich script, but now re-reading I can see it's all "proto-Italics"

  38. Sergi Turiella said,

    May 19, 2019 @ 12:35 pm

    Watching the Zodiac images, it's familiar to me, because it's in Catalan. So it's the language of the Kings of Neaples. I think it's interesting to look also about the oral transcription of this manuscript.

  39. John Kozak said,

    May 19, 2019 @ 7:55 pm

    It's "Pelling", not "Pilling".

RSS feed for comments on this post