Language Log

Implementing Pāṇini's grammar

December 1, 2023 @ 6:16 pm · Filed by Victor Mair under Computational linguistics, Grammar, Language and computers, Morphology

« previous post | next post »

[Here's the conclusion to the hoped for trifecta on things Indian — see the preface here. It comes in the form of a guest post by Arun Prasad]

The cornerstone of traditional Sanskrit grammar is Pāṇini's Aṣṭādhyāyī, which in around 4,000 short rules defines a comprehensive system for generating valid Sanskrit expressions. It continues to prompt vigorous discussion to this today, some of which has featured in Language Log before.

As a professional software engineer and amateur Sanskritist, my lens is more pragmatic: if we could implement the Aṣṭādhyāyī in code and generate an exhaustive list of Sanskrit words, we could create incredibly valuable tools for Sanskrit students and scholars.

To that end, I have implemented just over 2,000 of the Aṣṭādhyāyī's rules in code, with an online demo here. These rules span all major sections of the text that pertain to morphology, including: derivation of verbs, nominals, secondary roots, primary nominal bases, and secondary nominal bases; compounding; accent; and sandhi.

Implementing Pāṇini's grammar in code is an obvious idea, but actually doing so is challenging for several reasons.

The first is that a core difficulty of using the grammar is in how the user decides which rule to apply next in the derivation. I wish to make clear that I've avoided this critical point of theory by manually ordering rules in whatever way would produce valid output, which is why I say that I'm implementing Pāṇini's grammar and not simulating it. One consolation of this approach is that the resulting program is much simpler and much faster.

The second is that it is not always obvious (to amateurs like me, anyway) which outputs are correct and which are not. Thankfully, the traditional grammatical literature comments meticulously on each rule in the grammar and provides extensive examples and counterexamples. These examples become our unit tests and integration tests to help check for program correctness.

The third is that the Aṣṭādhyāyī's rules are deeply interconnected, such that an innocuous change in one part of the grammar can have major effects elsewhere. This is where the main merits of a fast program reveal themselves: a faster program can be checked against more examples than slower programs, which means that software bugs can be found more easily and more cheaply. This extensive test suite also means that if we wish to change a certain design decision in the overall program, we can do so with assurance that the overall system will not break.

A journey through the Aṣṭādhyāyī at this minute level has been a great joy (as long as I'm not stuck on a strange bug). Pāṇini's grammar is meticulous and exhaustive, and it's a rich source of information on the Sanskrit idiom of the time, both in the rules themselves and in the commentary upon them. I've also become increasingly humbled at the thought of those pandits who are in full command of the rules of the grammar.

In the long term, I hope that this program will become "morphologically complete" and generate all valid word forms that the grammar allows. In addition to its pedagogical value, I believe such a program could become an invaluable tool for anyone who wishes to further explore Pāṇini's wondrous system.

Since this is Language Log and not Software Slog, I've kept the implementation details light. But I'll briefly say that this is a Rust project with bindings to WebAssembly and Python. The code is free and open-source, and it can be found on GitHub here.

Selected readings

"Sanskrit is far from extinct" (11/29/23)
"Spelling and intuition" (11/30/23)
"'In Pāṇini We Trust'" (12/15/22)

December 1, 2023 @ 6:16 pm · Filed by Victor Mair under Computational linguistics, Grammar, Language and computers, Morphology

Permalink

1 Comment

Peter Ludemann said,

December 10, 2023 @ 3:41 pm

Pāṇini's 4000 rules are a stunning intellectual feat — I've written grammars for programming languages that are only a few hundred rules, and I've had to use software tools to verify that my grammars were unambiguous; although from what I've read, Pāṇini's formalism seems to use a "first match" rule to avoid ambiguities, similar to how "PEG"s (Parsing Expression Grammars) work.

[Programming languages typically use "context free grammar" or Chomsky type-2 whereas (I think) the Sanskrit rules are an "unrestricted grammar" or type-0.]

Abramson and Dahl's "Logic Grammars" (Chapter 10 – Discontiguous Grammars) has an example of handling free word order in Sanskrit or Latin, and contrasts it with Pullum's "augmented phrase structure". This book is fairly old (1989); has there been much work subsequently in this area? – and does Pāṇini's grammar use any of these techniques?

At the other end of the grammar hierarchy – regular grammars or Chomsky type-3 – I've been surprised at how well dictionaries with inflections can be encoded using finite state morphology (https://web.stanford.edu/group/cslipublications/cslipublications/site/1575864347.shtml).

RSS feed for comments on this post

Implementing Pāṇini's grammar

1 Comment

Peter Ludemann said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta