The birth of Spanish

New article by Johnson in The Economist (4/23/22):

On the origin of languages
It is tempting to think that they have clear beginnings. They don’t

First two paragraphs:

IN A CHURCH hewn out of a mountainside, just over a thousand years or so ago, a monk was struggling with a passage in Latin. He did what others like him have done, writing the tricky bits in his own language between the lines of text and at the edges. What makes these marginalia more than marginal is that they are considered the first words ever written in Spanish.

The “Emilian glosses” were written at the monastery of Suso, which was founded by St Aemilianus (Millán, in Spanish) in the La Rioja region of Spain. Known as la cuna del castellano, “the cradle of Castilian”, it is a UNESCO world heritage site and a great tourist draw. In 1977 Spain celebrated 1,000 years of the Spanish language there.

Then come the complications:

First, while “Castilian” and “Spanish” are synonymous for most Spanish-speakers, philologists argue that what the anonymous monk wrote is closer to the Aragonese than to the Castilian variety of Romance (the name for the range of dialects that continued their wayward development when Rome retreated from most of Europe after the fifth century AD). In any case, the Suso monk’s scribblings have been pipped by the discovery in nearby Burgos province of writings that may be two centuries older.

Even those are not the origin of Spanish. The very idea treats languages like a person, with a name, birth date and birthplace. But languages are not like an individual. They are much more like a species, gradually diverging from another over many years. It would be as accurate to describe such jottings as degenerate Latin as it is to call them early Spanish—but that would probably not draw as many tourists.

Most accurate would be to call the monk’s prose an intermediate form: words like sieculos (centuries) in the text are almost perfectly halfway between Latin’s saecula and modern Spanish’s siglos. In its way, the church in which the glosses were written is a mirror of such evolution. It includes arches in Visigothic, Mozarabic (Moorish-influenced) and more recent styles, added as it was expanded. As many visitors to an ancient site find, it can be hard to date buildings in use for centuries. Little of the original remains; all is layers upon layers.

We may compare the birth of a specific language to the emergence of a biological species, but even here we must be wary, since "the neat nodes on a palaeobiologist’s tree of life are just simplifications of a messy continuum."

Most people think that Basque is much, much older than Spanish, though Basque "evolved gradually from some now-unknown ancestor".  In fact, "the first known words written in Old Basque—just six of them—also appear in the Emilian glosses"!

The article ends on a cautionary, yet positive, note:

Legal entities like the United States of America really do have a birth date. But languages do not. American English, Castilian Spanish and all other products of slow, disorderly change do not lend themselves to neat origin stories. Remembering this is a good thing, reminding people of their membership in a common family. The need for stories of a glorious past is part of human nature, too. But like “Beowulf” or “La Chanson de Roland”, these are often best seen as literature, not history.

That's why I always refer to myself as a "professor of language and literature", not one or the other, and why there are so many Language Log posts under that rubric.

[Thanks to Don Keyser]


  1. Ferdinand Cesarano said,

    April 23, 2022 @ 7:10 pm

    Is it not true that a meeting was called by Charlemagne soon after he had been named emperor, around 800? This meeting would have been attended by representatives from throughout the empire, all of whom thought of themselves as speaking Latin. However, when these people got together, they found that they could not understand one another very well. My perception is that the birth of all the Romance languages dates to this event.

  2. Neil said,

    April 23, 2022 @ 7:11 pm

    Would it be correct to say all languages are equally old?

    What comes to mind is the frequent claim about Tamil being a “classical” language versus Hindi being described essentially as sort of some young upstart. I understand where the claim comes from, as Hindustani per se didn’t coalesce around khadi boli until the [19?]th century. But I’ve always thought this doesn’t make sense as Hindi didn’t emerge from nowhere— it would have come from braj, awadhi etc.

  3. Michael Watts said,

    April 23, 2022 @ 7:45 pm

    Would it be correct to say all languages are equally old?

    No. , for example, gives us an example of the spontaneous creation of a language ex nihilo in the late 20th century.

    But for pretty much any language you've heard of, yes, they must all by definition be equally old.

  4. Jason Green said,

    April 23, 2022 @ 9:26 pm


    It’s traditional to label the Serments de Strasbourg (842 CE) as the earliest French (as distinct from Latin) text, although doing so falls into the trap Johnson seeks to avoid, these being the ritual components of an alliance between Charles and Louis, two of Charlemagne’s grandsons, against their brother Lothair.

  5. Carl said,

    April 23, 2022 @ 9:44 pm

    Creoles and pidgins can have definite birthdates as well.

  6. S Frankel said,

    April 23, 2022 @ 9:56 pm

    Not all languages are equally old. Creole languages have a definite date before which they didn't exist. And many languages are made up, typically in order to provide a common languages (written or spoken) for an area which lacked one. Dante made up Italian based on Tuscan shorn of "local" characteristics. Indonesian was a standardized form of Malay based on local varieties in the archipelago. Lots of standard languages were made up for writing the Bible, some of which eventually developed into a spoken standard (German) and some of which remained strictly literary (Welsh). These were made of pre-existent components, but they were at least new synthesis.

  7. Michael Watts said,

    April 24, 2022 @ 12:23 am

    Lots of standard languages were made up for writing the Bible, some of which eventually developed into a spoken standard (German) and some of which remained strictly literary (Welsh). These were made of pre-existent components, but they were at least new synthesis.

    But this is a meaningless distinction; every language is a "new synthesis" combining an overwhelming amount of old material with a modest amount of new material. The development of literary Welsh out of oral Welsh is no different from the development of modern oral Basque out of ancient oral Basque; it makes no sense to call literary Welsh "newer".

    On the other hand, we can certainly say that classical Latin is an older language than Hindi is, in a different sense of the word "old". In the sense being used in this comment thread, we can invert that to make the correct claim that modern Hindi is an older language (more history) than classical Latin is.

  8. ~flow said,

    April 24, 2022 @ 3:20 am

    Relevant XKCD:

    I can not agree with the statement that all languages are equally old. When you take Latin as an example, say you have a good grasp of the language as it was used 2000 years ago and you travel through time to put it to a test, you will realize there's a time window somewhere between –500 and +500 where when you step out of your Tardis that has landed somewhere in Italy or elsewhere in mediterranean Europe to talk to people on the street it's much easier for both sides than in centuries before and after that. You can still land your Tardis in 2022's Vatican state and have a good chat with one of the chaps over there but as for the reach of the vernacular that's about it. So while languages do not typically have precisely drawn borders in space and time they also do not stretch indefinitely across dimensions. That "Latin never died, it evolved into the Romance languages" is a true statement also does not mean that French is older than ~1500 years; if you tell me why it evolved from Latin so its history stretches back to at least 700 BC I say yes, if you will, but then who's to keep me from claiming that French is really 13.8 billion years old; like the rest of the world it has a history that goes back to that primordial point in time. It's like with grains of sand vs heaps of sand; you know you can't tell the moment that a growing assembly of countable, single grains turns into a heap but you also know there *are* single grains and there *are* heaps. Even though a 'heap' is just a figment of the human mind, but then so are grains, too.

  9. Michael Watts said,

    April 24, 2022 @ 4:09 am

    ~flow, you have to be careful what you mean by "old". Some senses of the word are opposite to others, as I noted in my earlier comment.

    You can say that Latin is older than French in the sense that Latin came into existence first. You can say that French is older than Latin in that French has existed for longer.

    But you can't say that Basque is older than Spanish, or that Tamil is older than Hindu. That is the type of claim (same as "French is older than Latin") that people want to make when they say that one language is older than another, and it is logically incoherent when applied to two living languages. All living languages are equally old; none of them has existed for longer than any other. The fact that you can't draw boundaries between stages of a language is sufficient for this.

  10. M. Paul Shore said,

    April 24, 2022 @ 4:52 am

    I think the best formulation would be to say that all living languages, with the exception of certain recently created pidgins and creoles and certain recently artificially created literary dialects, are approximately equally old, namely about four to six centuries. (There are certain circumstances in which the overall rate of change of a language into something no longer comprehensible, as opposed to the rate of change of only certain of its component parts, can be faster or slower than average. For example, for various reasons Middle English came and went pretty fast. On the other hand, perhaps there’s some validity to the notion that Icelandic has been changing more slowly than average—though not, I hasten to add, any validity to the urban myth that Icelandic has been in complete stasis for the last thousand-plus years.)

    By the way, some of the arguments voiced in the preceding posts are on shaky ground because they conflate two or more languages under a single name: for example, conflating Old French, Middle French, and Modern French under the single name “French”.

  11. ~flow said,

    April 24, 2022 @ 5:08 am

    "All living languages are equally old; none of them has existed for longer than any other. The fact that you can't draw boundaries between stages of a language is sufficient for this."

    This is not true. Take Latin, English, and Afrikaans. Latin is still a living language in the sense that there is a small number of people that more or less regularly use it for writing, broadcasting, and conversation. It is largely the same language with almost the same spelling and pronunciation as was used by the likes of Cicero and Caesar. Compare that to English, a language that has changed so much in the past 500 years one would be hard pressed to understand a contemporary of Shakespeare. Go back another thousand years or so and it is a whole nother language, more similar to Dutch or German than modern English. Afrikaans is not older than 350 years at most.

    To claim that "all living languages are equally old" is the same as saying that "English and German are in principle the same language", that "there is no Afrikaans because it evolved from Dutch", or that "there are no heaps of sand; the fact that you can't draw a boundary between one, two, three grains of sand and a heap of sand is sufficient for this".

    None of these statements makes any practical sense beyond pointing out the obvious.

    When I order an elliptical cake and you offer me a circular one I will not give you my money and I won't appreciate it when you exclaim "Actually! A circle is a kind of ellipse!". Likewise, if I ask you to be my interpreter for English and you turn my modern German into Anglo-Saxon English, you just lost your job. I cannot draw the line that separates Anglo-Saxon from modern English but do my ears know one is not like the other.

  12. ~flow said,

    April 24, 2022 @ 5:26 am

    @M. Paul Shore

    Yes! We have no very good reason to assume that rate of linguistic change is a constant across languages and millennia.

    This discussion reminds me of the discussions about historical maps of Europe. When people look at France or Great Britain with their large uniform patches they feel reassured those countries have been in existing for so long, uniformly; when they look at the motley patchwork quilt that is the Holy Roman Empire they shout "clusterf*". However there were many local authorities in France and England and, as in Germany, there were strong local traditions, including different length units and regional vernaculars, and then Germany had an emperor and a parliament, too so was not without an overarching unity and not without a sense of a common nationality. Those colors on the map are ultimately not very real; they enshrine a cartographical convention more than anything else.

    It's a matter of labeling. At some point you could call the language of France either late Frankish or early French; depending on your choice people will then claim different ages for the French language.

    Some will claim it French has no definite age at all because, you know, it evolved from Latin.

  13. DJL said,

    April 24, 2022 @ 6:32 am

    "Dante made up Italian based on Tuscan shorn of "local" characteristics"?

    Much more accurate to say that Dante sometimes wrote in his own language, this variety (or vernacular) eventually became the most prestigious literary language in the Italian peninsula, and much, much later this vernacular became the national language, though the modern standard was largely shaped by rather recent events.

  14. Cervantes said,

    April 24, 2022 @ 7:43 am

    Sure. So-called Olde English is incomprehensible to modern English speakers. So why is it designated "English"? Modern English is actually a creole, owing almost as much to French as it does to Olde English. In fact English grammarians used to try to impose Latin grammatical principles on English diction. The point is that languages can blend as well as diverge. Those events can be fairly discrete in time, as previous commenters have noted. In the case of English, while the beginning of the process may be identified as 1066, it was really gradual over the following centuries. In this regard languages are not like species, as only closely related species can hybridize.

  15. S Frankel said,

    April 24, 2022 @ 9:24 am

    @Michael Watts – There's something different about people self-consciously creating "new" literary languages, that's not like natural language evolution. You're obviously right that both have ancestries of indeterminate age, but they also don't seem to be the same phenomena.

    @DJL – Dante explicitly did not write in his own dialect. He wrote (or attempted to write) in a vernacular that was common to all Italian vernaculars. In practice, this was a Tuscan with (what he considered to be) local characteristics removed. Translation of his treatise on the subject here:

    @Cervantes. Why is it called English? We missed an opportunity. If it had been called "Saxon," we'd all be saxophones.

  16. Antonio L. Banderas said,

    April 24, 2022 @ 10:55 am

    @S Frankel

    No, you'd be Saxophones.

  17. DJL said,

    April 24, 2022 @ 11:20 am

    I think you are conflating the philosophical position Dante argues for in the De vulgari eloquentia with his actual writing; the language used in the Commedia is mostly his own Tuscan dialect and certainly not the "illustrious language" he went on about in the De vulgari, and I think it makes little sense to claim that he was even attempting to write in a language common to all Italian vernaculars – he actually uses loads of words from all sorts of languages/vernaculars, often from outside the Italian peninsula, which would have been an odd way to write a common Italian language.

    Not that such an attempt would have made any sense in the 13th century, and Dante would have been the first to know this; in the very De vulgari, he claims that there were thousands of vernaculars in his region, many of these incommensurable to each other, and has some fun telling the anecdote about how in some quarters in Bologna the local Bolognese vernacular is not actually understood to make this point.

    (Yes, Dante did talk of some 14 major vernaculars in the Italian peninsula in De vulgari, but went on to say that 'if we wished to calculate the number of primary, and secondary, and still further subordinate varieties of the Italian vernacular, we would find that, even in this tiny corner of the world, the count would take us not only to a thousand different types of speech, but well beyond that figure' (I, x, 9)).

    I think it's just a mistake to assume that what he says in De vulgari is what he actually attempted in his literary production.

  18. Scott P. said,

    April 24, 2022 @ 11:25 am

    All of the debate here seems to involve spoken languages, but I always understood the glosas Emilienses to be the first written Spanish (not spoken) — and a written language can certainly have a first attested appearance.

    The description of them as 'degenerate Latin' seems to call back to the old thesis that in early Medieval Iberia there were two spoken languages — Latin, spoken by the educated, and Romance, spoken by the uneducated, but I think Roger Wright punctured that theory pretty effectively.

  19. Terry K. said,

    April 24, 2022 @ 12:29 pm

    Good point about sign languages (any of them, I'd say) not being equally old as other languages (both compared to each other, since there's multiple origins, and compared to spoken language).

    If one is thinking of some particular standard form (including especially written standards) then there might be some sense in saying one language is older than another. Although, seems to me, we would want to choose the approximate date when the standard developed, not the first appearance in writing of some form of the language. Although, this is complicated by the fact that written language standards do change over time.

    I suppose, going back to the original post, that the significance of the text as far as the beginnings of Spanish is that it's written language that departs from the Latin written standard.

  20. Chris Button said,

    April 24, 2022 @ 2:38 pm

    New article by Johnson in The Economist (4/23/22):

    Minor quibble, but this should really say “in Johnson” rather than “by Johnson” because the column is named after Samuel Johnson.

  21. S Frankel said,

    April 24, 2022 @ 4:07 pm


    "I think you are conflating the philosophical position Dante argues for in the De vulgari eloquentia with his actual writing." I think you're right. It's kind of uncomfortable if he didn't even try to follow his own program, though.

    "[T]he language used in the Commedia is mostly his own Tuscan dialect" – but that "mostly" is interesting. Are there any specific Tuscan characteristics that he filtered out?

    " I think it makes little sense to claim that he was even attempting to write in a language common to all Italian vernaculars – he actually uses loads of words from all sorts of languages/vernaculars, often from outside the Italian peninsula, which would have been an odd way to write a common Italian language."

    So, maybe he was trying to mix things up, and not write in strict Florentine? He did, in fact, manage to establish a common Italian language, whatever the source. The question is whether it was strictly Florentine dialect or something a bit – and deliberately – different, in which case it might deserve to be called "new."

    (Thanks for your patience in explaining all this.)

    @Antonio A Banderas – A useful reminder that English nouns (and adjectives) still have cases, even if only upper and lower.

  22. DJL said,

    April 24, 2022 @ 4:30 pm

    @S Frankel

    Dante invented a literary language as much as most writers at the time, to be fair – there were hardly any standards around – but he didn't really establish a common Italian language – not in any strict sense, anyway. His vernacular became the most prestigious literary language in the peninsula, but this was a long process and many other writers contributed to this. Florentine only became the common language very recently indeed, in the 19th century, and by then it was rather different (I'm an Italian speaker and find it rather hard to read the Commedia).

    Also, have just had a look at my copy of The Cambridge Companion to Dante's Commedia, and a chapter entitled 'Dante's Language and Linguistic Thought from the Vita Nova to the Commedia' has this to say:

    "Dante wrote in Florentine, the language spoken and written by the Florentines of his generation, throughout his career. Indeed, the phonological and morphological system present in his work coincides with that documented in non-literary texts written in the city during the last quarter of the thirteenth century…The Commedia – the culmination of Dante’s entire intellectual and literary trajectory – is his most Florentine work, the one that incorporates unfiltered Dante's native vernacular."

    The chapter goes on to chronicle some of the linguistic features of Dante's Florentine, along with many of the innovations he introduced.

  23. S Frankel said,

    April 24, 2022 @ 10:53 pm

    "The Cambridge Companion to Dante's Commedia" …
    Looking around for this. Unfortunately, my local academic library is still Corona-closed. Thanks for the reference. I stand corrected re:Dante.

