The indecipherability of the Voynich manuscript

« previous post | next post »

Less than half a year ago, we were treated to yet another among countless claims for the decipherment of the mysterious Voynich manuscript (henceforth "Vm"):  "Voynich code cracked?" (5/16/19).  I was skeptical then and am even more skeptical now after having read this article:

Peter Bakker, "The Voynich manuscript: the decipherment of ms. 408", Lingoblog (9/10/19)

I like the way Bakker's article begins:

Last year I was contacted by someone who claimed to have deciphered the Voynich manuscript. This manuscript is one of the big enigmas of medieval history and, for that matter, linguistics. No one has yet been able to decipher it, and many have tried. It is written in a totally unknown script in an unidentified language.

The manuscript is more than 500 years old. It has been publicly available for a century, and now it is also available online. Nobody has been able to translate the manuscript; there have been many proposals, but all have been rejected. People have claimed it could be written in a form of Hebrew, in a Romance language, in an earlier form of Romani, an Indic language, or even in a language from another planet. Medievalists are at a loss. Cryptographers—specialists in secret writing—have broken their brains on it. Linguists have tried as well, but all in vain.

In this contribution I will argue that the manuscript is in fact not interesting at all for language nerds.

I have to agree with him.

In laying out a rational way to decipher an unknown language like that of the Vm, Bakker applies the following types of analysis:

1. Is it an alphabet, an abugida, an abjad, or other type of writing system?

2. Average word length.

3. Frequency of speech sounds.

4. Distribution of sounds within words.

5. Application of Zipf's law (see also here for a 21:04 video).

6. Could it be a complicated cipher, "in which each letter is replaced by another letter, but where the form of the letter is adjusted according to some rule"?

7. Does it reflect a creole?

While the writing system of the Vm seems most likely to be an alphabet, it does not conform comfortably, distinctively, and naturally to these categories, suggesting that it does not reflect a functional language, whether invented from scratch or transliterated / transcribed from a real language.  (The Vm does subscribe to Zipf's law, but does not result in a convincing match with any known language.)

Bakker's conclusion:

Indeed, in this article I have thus far not included any links to anything having to do with the manuscript; people should just not waste their time trying to decipher it, as it is most likely a clever hoax. Who did it, and when and why, is what is interesting about it. I would however like to make an exception for this article, which presents a nice, objective and down-to-earth overview.

Almost all other things you find on the net are written by cranks. And there is a lot out there.

If you ask me, I believe that it is a very clever practical joke, a hoax, probably from the 15thcentury, the same date as the vellum and the ink. If it would have been a real language, in a rational and regular writing system, experts would have figured it out by now. There is so much text available, there are illustrations, such as the signs of the Zodiac, that provide clues to the contents. It should be easy to crack it. The mere fact that it has not been decoded, means that it is not decodable. It is simply a fake text.

I am inclined to concur.


  1. Scott P. said,

    September 11, 2019 @ 8:06 am

    What of the hypothesis that it is an attempt to create a 'natural' language, where concepts are expressed in terms of definitional categories, rather than an attempt to represent any actual spoken language of the time? There certainly are plenty of examples from the period; one of the motivations for Europeans to study Sinitic was the mistaken idea that it formed such a natural language.

  2. Cervantes said,

    September 11, 2019 @ 8:19 am

    Well, if it is nothing but a massive goof, one helluva lot of work went into it. It seems to me there must be a substantial motive, a scam of some sort, that would justify thousands of hours of work. Note that somebody had to learn to write fluently in the script, and to consistently use the apparent prefixes and root words to make it look like a real language. It's hard to imagine what the purpose would be.

  3. Christopher Barts said,

    September 11, 2019 @ 8:58 am

    Henry Darger wrote something a lot bigger than the Voynich Manuscript when he created "The Story of the Vivian Girls, in What Is Known as the Realms of the Unreal, of the Glandeco-Angelinian War Storm, Caused by the Child Slave Rebellion" by himself. That was 15,145 pages, whereas the Voynich Manuscript was 272 pages at its longest. Darger made his magnum opus while holding down a day job, whereas the unknown author of the Voynich Manuscript could well have been a noble with little or nothing to do.

    My point is that art is its own reason, and a compelling reason at that for some people, so saying "it must have taken a lot of work" is not a good argument against something being a pure piece of art with no semantic value.

  4. JB said,

    September 11, 2019 @ 9:04 am

    Between the entire corpus of so-called Outsider Art and transgressive post-modern trash literature available, not to mention the verbal logorrhoea enabled by the internet and plentiful case studies of the insane, one would think that someone would have finally realised it is the work of some schizophrenic monk banged up in a scriptorium somewhere, with ready access to an encyclopaedia of herbs as well as a map of the skies.

  5. Kyle Gorman said,

    September 11, 2019 @ 9:18 am

    Bakker's discussion of Zipf's law is hopelessly confused. First off, we know that the mere fact that something shows Zipfian statistical behaviors is meaningless because so do, for instance, the size of cities, and the size of cities is not writing. See Sproat 2014 (in Language) for a fuller critique on this note.

    Secondly, what he calls Zipf's law is a well-known observation made by Zipf. but is absolutely not wha the phrase "Zipf's Law" refers to; that's another, even more widely known observation.

  6. Frédéric Grosshans said,

    September 11, 2019 @ 9:42 am

    “It is most likely a clever hoax. Who did it, and when and why, is what is interesting about it. ”

    Another interesting question, not unrelated to the three questions above, is how this clever hoax was done. Several recent works proposed simple algorithms, doable with medieval technologies, reproducing the statistical properties of the “text”, including the ones setting it apart from natural languages texts.

    Gordon Rugg proposed a method based on Cardan grilles, refined in a 2016 paper with Gavin Taylor. This year, Thorsten Timm and Andreas Schinner proposed a simpler method, based on “self-citation”.

  7. Cervantes said,

    September 11, 2019 @ 10:00 am

    Well maybe. But Darger's work is not gibberish. He didn't share it during his lifetime but it does consist of real, illustrated stories, in English. The intention of the Voynich manuscript as art, if the text is indeed meaningless, would be quite puzzling. But I suppose it could be the product of some form of delusion or psychosis.

  8. Scott P. said,

    September 11, 2019 @ 10:37 am

    Between the entire corpus of so-called Outsider Art and transgressive post-modern trash literature available, not to mention the verbal logorrhoea enabled by the internet and plentiful case studies of the insane

    The question is whether those categories are meaningful in the 15th century. There was no internet, no 'art for art's sake,' no post-modernism.

    Can you point to any analogous work of the period that would make a good comparandum?

  9. Victor Mair said,

    September 11, 2019 @ 10:59 am

    The paintings of Pieter Bruegel the Elder (1525-1569) are almost as zany as those on the Vm.

  10. Trogluddite said,

    September 11, 2019 @ 1:49 pm

    @Scott P.
    I'm not sure that the prior existence of such categories, or even the intent to have a particular effect on the audience, are necessary. For example, autistic people very often report creating elaborate visual art, music, literature, even computer programs, with not the slightest intent of sharing them, nor any consideration for how others might perceive the end product. Such activities may be performed solely for the effect on the creator's own mind during the creative process, and the results might even be discarded as having no significance once complete. If the creator of the Vm experienced forms of synaesthesia, they may even have been aiming for perceptual effects which are completely invisible to anybody else.

    There is no reason to believe that such people did not exist in the 15th century, and while "art for art's sake" may not have been recognised as a category pertaining to the economic or social value of artistic creations, I don't believe that this excludes the private production of such works. Before speculating about hoaxes and practical jokes, we may have to consider that the Vm was the product of a deeply personal obsession and was never intended to be presented to an audience at all.

  11. Cervantes said,

    September 11, 2019 @ 1:55 pm

    Yes Trog, that seems possible. But if the text was meaningful to the author it might still, in principle, be decipherable. There might be little point in doing so, however.

  12. Richard Sproat said,

    September 11, 2019 @ 5:43 pm

    The would-be decipherer mentioned in Bakker's piece spammed the editorial board of Written Language and Literacy at least twice with his discovery, before it finally appeared in print (and of course in the popular science press).

    The problem with injunctions such as Bakker's (or Victor's) that it is not worth trying to decipher something because it is probably a hoax, or there isn't enough text for Shannon unicity, or whatever, is that such appeals universally fail to have any stopping power on the enthusiast. And why should they? If the existence of dozens or hundreds of equally plausible previous "decipherments" of a corpus fail to dissuade them, why should other considerations?

    Witness the hundreds of attempts to decipher the Phaistos Disk, or the Indus Valley corpus.

  13. John said,

    September 11, 2019 @ 5:54 pm

    Once again, the xkcd:

    Druids and dicotyledons.

  14. Richard Hershberger said,

    September 11, 2019 @ 5:55 pm

    For that matter, consider Tolkien spending years inventing Middle Earth, complete with invented languages. It's not quite the same thing, but the point is that people have odd hobbies with little or not practical purpose. I have spent endless hours reading old newspaper baseball coverage. I have a book and several papers out of it, but these have no professional benefit for me, and very little financial benefit. Or how about a Victorian gentleman scientist chasing down butterflies, sticking them with pins in a case, and carefully constructing taxonomies. The point is that, given the resources, particularly of leisure time, people will find some way to use it. Most will do so in ways uninteresting to the broader world, but some will do something interesting, or even gloriously eccentric. I applaud the impulse.

  15. maidhc said,

    September 11, 2019 @ 6:28 pm

    We know about Tolkien's languages because he published books using them. But Tolkien said that when he was in WWI he met an enlisted man whose hobby was inventing languages, and no one knows anything more about that person. So such people do exist.

    But in inventing your own private language, it would be difficult to avoid influences from existing languages, to such an extent that not even a structure can be discerned.

    It might possibly be something related to

    The article refers to some linguistic studies showing that glossolalia has no underlying language structure.

  16. Scott P. said,

    September 11, 2019 @ 7:45 pm

    The paintings of Pieter Bruegel the Elder (1525-1569) are almost as zany as those on the Vm.

    To an extent, but they fit well into their chronological and cultural context, and we know they were popular at his time, with collectors already competing for his works during his lifetime. He was no loner; we know his patrons quite well. So his work isn't really a mystery.

    we may have to consider that the Vm was the product of a deeply personal obsession and was never intended to be presented to an audience at all.

    Well there are a lot of questions to be answered before we can get to that point, I think. Was there a singular author/illustrator? How did they get access to the materials used to make the MS? Why was it bound and preserved if it wasn't intended for circulation?

  17. AG said,

    September 11, 2019 @ 10:34 pm

    Invented languages and private ravings are all well and good, but isn't it also still very likely that the VM was created, like almost all art back then, to impress and/or scam some rich guy?

    Europe used to be lousy with court alchemists and so on, all sucking up to royals with claims of knowing ancient secrets. I think creating a fake grimoire could have been just part of someone's elaborate con job on a Habsburg or whoever.

    That way, when he asks why you need more diamonds, you can just point to your manuscript. "This part says 'add diamonds', your majesty."

  18. loonquawl said,

    September 12, 2019 @ 2:13 am

    Baker seems to concentrate on the aspect of "can't do it now, couldn't do it yesterday -therefore it's impossible" I'd like to see the 'technique' applied double-blind to some examples of chiffre-obscured dead language corpus – the kind of dead language that is read by leveraging inference heaped upon inference heaped upon slightly-but-not-quite-but-possibly-kinda-similar-kings'-names. I'd wager those would come out with the verdict 'gibberish' as well.

    "We can't know for certain, if not some more examples and a Rosetta stone turn up"? – Sure. Not all chiffres are breakable, for instance one-time pads.
    "We don't know now therefore it's nonsense"? – Bah.

  19. Keith said,

    September 12, 2019 @ 4:20 am

    I don't think that it is a prop from a 500 year old role-playing game.

    However, the idea that it is a hoax is in my mind very plausible: carefully crafted and presented as being extremely valuable and full of arcane alchemical and astrological knowledge and sold to some rich nobleman 500 years ago…

  20. Alex Woods said,

    September 12, 2019 @ 12:19 pm

    I read through everything linked in this article, and some of the stuff Bakker linked to as well. Clearly, most of what is written about the Voynich manuscript is garbage, but not all. Bakker's piece is very interesting but clearly a shot from the hip, albeit a very well-informed one. I don't think his conclusion that it's a hoax is warranted. Seems that all the evidence he marshals points towards it being an actual text in an unknown language. Which leads me to my question. I'm surprised to see no references here or elsewhere to what I think is the best stab at the problem, a short piece by someone named Stephen Bax, which concludes, tentatively, that the Voynich is in a Turkic language, but written in a European-inspired script. He proposes translations for a handful of words, based on a careful analysis of some of the astrological and botanic translations. PDF here:
    But I'm not a pro, just an interested layman. If any of you pros have the time and inclination to read Bax, I would love to hear your thoughts.



  21. Sean M said,

    September 12, 2019 @ 4:13 pm

    Alex Woods: I don't know if any fellow linguists have responded to Bax, but Nick Pelling of is pretty good at criticizing other people's theories (like everyone else, he has a theory which somehow does not convince anyone else). Dr. Bax died a few years ago.

  22. Philip Taylor said,

    September 12, 2019 @ 4:57 pm

    I have just read the Bax paper for the first time, but lack either the linguistic or the botanical skills necessary in order to be able to offer meaningful comment. However, one apparent anomoly did strike me — the author writes "In order to avoid this danger [the danger of adopting the big-theory approach], the current paper deliberately avoids advancing, or subscribing to, any overarching theory concerning the manuscript, apart from the basic notions that it is probably a 15th century document with apparent European elements […]. It seeks on that basis alone to examine the linguistic evidence piece by piece, and only when a certain amount of evidence has been assembled and analysed does it attempt, towards the end, to offer some broad and highly tentative proposals about the manuscript’s possible provenance and purpose (see page 49 et seq.)". Now "the current paper" may well avoid advancing, or subscribing to, any overarching theory concerning the manuscript, but the author must surely already have reached "some broad and highly tentative proposals about the manuscript’s possible provenance and purpose" before the paper was written, so even though the paper itself may carefully adduce evidence before reaching conclusions, the conclusions that were ultimately to be reached were (I would respectfully suggest) already in the author's mind.

  23. Adam F said,

    September 13, 2019 @ 5:56 am

    "not interesting at all for language nerds"—I guess it depends on how you define "interesting" and "language nerds". If the VM is not really language, then it's not properly, technically, of interest to linguists, but a lot of language nerds certainly seem to show an interest in it.

  24. Nick Pelling said,

    September 18, 2019 @ 5:57 am

    When Bakker writes "The mere fact that it has not been decoded, means that it is not decodable. It is simply a fake text" – and this is the totality of his argument – do you not read that as an admission of laziness, poor logic, and sloppy thinking all rolled into one?

    Voynichese is not – as the Friedman's already knew 50 years ago – a simple language, just as it is also not a simple cipher. But to use that as an excuse for throwing your hands up and giving in to a hoax argument is just useless.

  25. Mike S. said,

    September 20, 2019 @ 6:45 pm

    Bakker's criticism of the crank who got published (arguing that the language is a Romance creole) is harsh but on target. But the essay does not remotely exhaust everything that might be said about the Voynich MS, nor does it warrant its final conclusion that the thing is a hoax. A hoax is one possibility; another is some difficult sort of encryption — perhaps the Voynich "words" are indexes into a lost code book, without which decryption might be practically impossible. We can't say for sure.

RSS feed for comments on this post