Was PIE SOV?
« previous post | next post »
Danny L. Bate has a new article declaring that "PIE was not SOV" (2/20/25), in which he attempts to demonstrate under three objections why "SOV" is not a useful term for describing and summarizing the word order of Proto-Indo-European clauses: 1. "clausal bias", 2. "changing the subject", 3. "discourse dominates".
Before we delve into the details of Bate's anti-SOV argument, let's look at the dimensions of what PIE became before its daughters spread to the New World and elsewhere across the globe:
It does not bother me that there are three conspicuous blanks in the Eurasian IE palette: Finland, Hungary, Turkey. The majority languages of the first two together constitute Finno-Ugric, which are European latecomers — Hungarian entered the Carpathian Basin at the tail end of the 9th century and Finnish and its congeners arrived in the Baltic region around the same time, ultimately both from the southern Urals (hence Uralic) and Turkey, formerly populated by speakers of numerous Anatolian (ergo, IE) languages, including Lydian, Carian, and Hittite, the first IE tongue, which were overlaid by Turkic speakers from the distant east beginning in the 11th century. For me, the most glaring absence is Tocharian, which was historically located in the Tarim Basin — linguistically datable by manuscripts to the 5th-8th, but other types of evidence (cultural, archeological, anthropological, etc.) locate them in Eastern Central Asia as early as the 1st and 2nd millennia BC, having impinged from the northwest.
In attempting to understand such grossen Fragen as the typology of PIE, determining the nature and spatiotemporal dispersal of its constituent descendants is vital, i.e., how and when the IE languages got where they are (that includes, of course, what other languages they came in contact with during their peregrinations. For that reason, I invested a lot of effort in conceiving and creating what I call the “Die Sprachamöbe". See Victor H. Mair, “Die Sprachamöbe: An archeolinguistic parable" in The Bronze Age and Early Iron Age Peoples of Eastern Central Asia, 2 vols. (Washington, D.C.: The Institute for the Study of Man; Philadelphia: The University of Pennsylvania Museum, 1998), pp. 835-855.
Here are key passages from Bate's article:
1. The first objection is more of an appeal for clarification and specification. The view that early Indo-European languages, and therefore their common ancestor, ‘were SOV’ works better for some types of clause than others. The label of SOV seems to be rooted in what we call declarative main clauses – combinations of verbs, nouns and other words that together express simple statements of fact, like the sky is blue or cats are great. If we first specify that this is the type of clause that we have in mind, then yes, SOV does appear to be the norm.
2. The second objection stems from my squeamishness over the very terms subject and object, and an undefined sense of uncertainty over their relevance for Proto-Indo-European. The subject of a sentence is one of those linguistic terms (along with word) that elude easy definition when you try to pin them down. This is not to say that subject is a useless grammatical label, but rather that it may mask a variety of phenomena, and vary in usefulness across different languages.
3. The final objection, which I consider the most serious, is a matter of context and conversational flow. It is generally agreed that individual early Indo-European languages like Latin and Ancient Greek adapted the word order of their sentences according to the larger conversation to which a given sentence belongs. This is usually referred to in the literature as discourse information.
Conclusions:
In producing the documented word order of Latin, Ancient Greek, Sanskrit and the rest, it was the Topic-Focus-Verb system that had the final say, or at least considerable power. It is the output of this system that we have to use to get back to the word order of Proto-Indo-European, and the prescence of this schema across its elder daughters indicates that the proto-language operated through it too.
Perhaps then it would be fruitful to retire the label of ‘SOV’ for Proto-Indo-European, or pause the hunt for its basic arrangement of those clausal components. Considering it to have been principally TFV, instead of SOV, could put the whole endeavour of tracking the developmental paths of Indo-European syntax on firmer foundations. From this departure point, I feel we can better understand the interesting and distinctive changes that the ancestral syntax would then undergo across the expanding family of languages, such as in its later Celtic, Germanic and Romance branches.
Although Hittite, the PIE ursprache, so to speak (!), massively uses topicalization and focus and other means to tie discourse together, its fundamental word order is verb-final.
Bate's bold thesis is certain to meet with opposition. One thing his critics are likely to tell him is that he needs to balance off variation against syntax.
Perhaps some such judicious assessment as this by Hiroshi Kumamoto will help the author refine his thesis and make it more acceptable:
The author seems to be trying to address the question that was plentifully discussed half a century ago. It's not a bad thing as long as he's based on the actual language data, albeit limited. But perhaps he might find a look at the researches in linguistic typology beneficial. When in the 70's some non-IE languages with exotic structures became known and fully described, such as the ergative construction and the focus system of the Philippine languages, and the traditional concepts of the "subject" and "object" became shaken, scholars / researchers gathered in a conference and produced the volume entitled Subject and Topic. A New Typology of Language, edited by Charles N. Li (the co-author with Sandra Thompson of Mandarin Chinese), 1976, Academic Press, which became the de-facto standard for this kind of discussion. This trend continues to Aikhenvald, Dixon & Onishi eds., Non-canonical Marking of Subjects and Objects, John Benjamins 2001. Here's the advertisement for the book:
In some languages every subject is marked in the same way, and also every object. But there are languages in which a small set of verbs mark their subjects or their objects in an unusual way. For example, most verbs may mark their subject with nominative case, but one small set of verbs may have dative subjects, and another small set may have locative subjects. Verbs with noncanonically marked subjects and objects typically refer to physiological states or events, inner feelings, perception and cognition. The Introduction sets out the theoretical parameters and defines the properties in terms of which subjects and objects can be analysed. Following chapters discuss Icelandic, Bengali, Quechua, Finnish, Japanese, Amele (a Papuan language), and Tariana (an Amazonian language); there is also a general discussion of European languages. This is a pioneering study providing new and fascinating data, and dealing with a topic of prime theoretical importance to linguists of many persuasions.
And another quarter century has seen more along similar lines of studies, too many to be listed here.
As Asko Parpola put it to me: I think its title is overshooting, "PIE was basically a SOV language” is good enough.
Despite all objections that may be raised, I should note that I enjoyed reading Bate's essay beause it is full of good humor, such as this caption for an intriguing illustration of a body of deliberating Roman politicians: "Catiline (ca. 108-62 BC) waits patiently for Cicero (106 BC-43 BC) to reach the main verb".

Let the debate begin!
Selected readings
- "SOV emerges in the Negev" (2/3/05)
- "Word-order 'universals' are lineage-specific?" (4/15/11)
- "Korean oralization of Literary Sinitic" (4/23/24)
[h.t. Pamela Kyle Crossley; thanks to Donald Ringe and Craig Melchert]
Jonathan Smith said,
March 5, 2025 @ 2:59 pm
The "SOV" label for "PIE" is rooted in (parallels to) e.g. the sky is blue or cats are great? Color me skeptical.
Returning to something I know at least something about, conventional wisdom re: "Sinitic" unless it has changed lately is that Chinese languages (+ Karenic) are "SVO" in contrast to "TB" which is basically "SOV". But I am pretty sure Chinese languages are not "SVO". Like at all. Or maybe "these labels are just not very meaningful" / "it depends what you're saying and how and why" or something… which may be part of the point of the paper above.
Cervantes said,
March 5, 2025 @ 3:39 pm
It's interesting why this changes. Most modern Romance and Germanic languages (including the hybrid of both we're using now) mostly have SVO, with exceptions. E.g. in Spanish pronoun objects generally precede the verb, although in these situations the subject is typically marked by the verb and not even present. Lo tomó, I drank it, vs. Tomó el agua. In English of course you get a Yoda-like effect which seems quite strange. The water I drank actually strikes me as a bit less weird than I the water drank, perhaps because the former can exist as a noun phrase so at least we've heard it before.
Anyway this seems to me a much more drastic shift than the drift of word meanings and forms over time, so it's a little hard to imagine how it happens. Someone in a SOV community who suddenly started using SVO would appear deranged, I think, although maybe it could be the marker of an in-group, like slang? I suppose we can only speculate.
Chris Button said,
March 5, 2025 @ 6:45 pm
@ Cervantes
Yes, and even just within the history of individual languages without needing to look further afield.
Although, conversely, I would argue that this haphazardness is also precisely what makes it uninteresting.
I always avoided syntax because I don't believe there are any precise explanations to be found.
Nelson Goering said,
March 6, 2025 @ 2:13 am
"Someone in a SOV community who suddenly started using SVO would appear deranged"
At least in early Indo-European, and even in a lot of languages with supposedly more rigid word order, there's room for variation (for IE, quite a lot of room) based on things like information structure. As long as you have variation, you have the possibility for reanalysis. Not too different from phonology or morphology in that respect. The surface variation in the messy input children hear gives opportunities to draw different generalizations.
David Marjanović said,
March 6, 2025 @ 4:40 am
Yes.
No, they arrived in the Bronze Age. They actually brought the Bronze Age to Neolithic Estonia.
That's why there are all these pre-Germanic and very early Balto-Slavic loanwords in Proto-Finnic. This is not controversial, there's a huge amount of literature on it.
While I'm at it, all speakers of Anatolian probably shifted to Greek long before they shifted to Turkish; there's no evidence for Anatolian languages after Emperor Zeno (a native speaker of one).
This is not a map of where IE languages are, let alone ever were, spoken. It's a map of where an IE language happens to have official status today. That's why there's no white spot for Basque or any Sámi languages spoken outside Finland, why the other three white spots coincide exactly with the borders of Hungary, Estonia and Finland, and why all of Russia plus Kazakhstan, Turkmenistan and even Kirgizstan are in at least pastel green.
I like that.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I was explicitly taught Standard Mandarin has Topic-Verb-Comment (or -Focus if you prefer) as the default word order.
Comment-Topic-Verb, BTW, Yodaspeak is. Traditionally as Object-Subject-Verb described it has been, but empirically simply false that is.
Almost all occurrences of this are short for "the water that/which I drank". Very few are OSV for emphasis on the object, or Topic-Comment for emphasis on the comment ("drank").
David Marjanović said,
March 6, 2025 @ 5:35 am
It is in part. The full-size version does have a pale spot for Basque and another for Sámi, along with one for Hungarian in central Romania and suchlike, and of course the shown distribution of Indo-Iranian doesn't correlate with official status very well.
Still, painting all of formerly Soviet Asia pale green makes less sense than painting the entire Congo red for French would. And the pale green spot next to Finland tracks the borders of the Republic of Karelia awfully closely; likewise Greek with Greece and Cyprus.
In any case it's a map of the present; no Tocharian, no Anatolian, no Iranic north of Ossetia and Tajikistan, no Greek in Turkey or Alexandria…
cervantes said,
March 6, 2025 @ 7:05 am
Of course I meant tomé, not tomó. There's no easy way to enter Spanish characters here and I pasted the wrong one. But you get the idea. It works the same way in third person anyway. If you did specify a subject, it would indeed go first.
Jim Unger said,
March 6, 2025 @ 12:47 pm
Well, if pIE were an SOV language, then how have so many IE languages come to have relative pronouns, and why do those pronouns and related words appear to have so many cognates in not immediately related languages of the phylum?
More generally, I am skeptical of the value of imposing concepts that seem natural in synchronic analysis to entities we work with in diachronic linguistics (historical stages of languages and languages that we believe are related but were spoken in different times and places). I recall a remark Erica Reiner made in a lecture in an Introduction to Linguistics course at the University of Chicago in the late 1960s: "all linguistics are historical linguistics." At the time, given her specialization in Assyriology and the then surging interest in Chomsky's ideas, it sounded mostly like reactionary griping, but given the deficiencies of the Saussurean model of language change as a sort of illusion like cinematic motion, I have long since come to appreciate her insight.
Tom Recht said,
March 6, 2025 @ 1:39 pm
I'm less sure about other early IE languages but for Greek at least there indeed seems to be no good basis for positing any specific underlying, or even "default", constituent order at the clause level — it's basically all discourse pragmatics. I'd even go a bit further than Danny and say that "Topic-Focus-Verb" isn't quite right for the same kinds of reasons, namely that verbs can themselves be topical or focus and their placement then reflects that.
Chris Button said,
March 6, 2025 @ 1:55 pm
Outside of phonetics and associated spectrogram analyses of recordings (and, to my comment, leaving syntax entirely out of consideration), I agree with that. As soon as phonetics blurs with phonology, you're into the comparative historical domain.
Conversely, those working in the comparative historical domain will likely struggle without a basic familiarity in phonetics and how languages are actually spoken.
Jonathan Smith said,
March 6, 2025 @ 3:05 pm
Re: Mandarin, the notion of a (or more than one) sentence-initialish "Topic(s)" does some work, but lots of other factors are in play that affect specifics. IDK exactly what kind of sentence would be covered by the template "Topic-Verb-Comment" as the stuff people tend to characterize as "Comment" subsumes verb where one is present.
J.W. Brewer said,
March 6, 2025 @ 4:09 pm
More or less by definition the SOV/SVO/VSO/etc. schema doesn't work so well when applied to what wikipedia is now calling a https://en.wikipedia.org/wiki/Topic-prominent_language, where syntactic phenomena like word order are based on a topic/comment structure rather than a subject-predicate structure. Bate seems at least in part to be arguing that some of the earliest-attested IE languages should be thought of as topic-prominent, at least as to word order even if not as to e.g. the case system (no inflectional ending functionally equivalent to the Japanese "wa" particle, for example).
That said, he seems to largely accept without appropriate skepticism the idea that PIE word order can be confidently reconstructed from the apparent word order of the earliest IE languages we have texts in, even though that earliest evidence is several millennia after the breakup of PIE and even though it's a partial and not-necessarily-representative sample of the daughter languages that existed then. I'm more skeptical about the possibility of reconstruction, period.
Chris Button said,
March 6, 2025 @ 4:13 pm
Takashima & Yue "Evidence of possible dialect mixture in oracle-bone inscriptions" compares two opposing word-order patterns and then makes a comparison with the evolution of modern Hong Kong Cantonese syntax.
R. Fenwick said,
March 21, 2025 @ 8:04 pm
An aspect that doesn't often seem to be touched on in these sorts of word-order discussions is that of how subject/object and topic/focus systems might also be further confounded by agent/patient relationships.
There's an intriguing instance of this in Wichita, where both the agent/patient and subject/object categories are required to describe the full morphological workings of the polysynthetic verb: for instance, person marking of the verb's arguments follows S/O and surface word order in declarative sentences without incorporated nouns is normally SOV or OVS, but noun incorporation is governed by A/P, with only patient NPs (but both subject and object patients) able to be incorporated in the verb. (Verbal number marking adds another wrinkle in that third-person plurality follows A/P but non-third-person plurality follows S/O.) Despite the consistent use of OV word orders when O isn't incorporated, it appears that the deep structure of Wichita is best described by a system in which the A/P distinction is basic and S/O is derived from it.
David Rood's paper on the topic is an interesting read:
Rood, D. S. 1971 Agent and Object in Wichita. Lingua 28: 100-107.