LLMs for judicial interpretation of "ordinary meaning"
« previous post | next post »
Kevin Newsom serves as a United States circuit judge of the United States Court of Appeals for the Eleventh Circuit (of which there are a total of 13 across the country; since they are the next level below the Supreme Court, their practices and opinions are of great importance).
Judge Suggests Courts Should Consider Using "AI-Powered Large Language Models" in Interpreting "Ordinary Meaning", Eugene Volokh, The Volokh Conspiracy | 5.30.2024
That's from Judge Kevin Newsom's concurrence yesterday in Snell v. United Specialty Ins. Co.; the opinion is quite detailed and thoughtful, so people interested in the subject should read the whole thing. Here, though, is the introduction and the conclusion:
I concur in the Court's judgment and join its opinion in full. I write separately … simply to pull back the curtain on the process by which I thought through one of the issues in this case—and using my own experience here as backdrop, to make a modest proposal regarding courts' interpretations of the words and phrases used in legal instruments.
Here's the proposal, which I suspect many will reflexively condemn as heresy, but which I promise to unpack if given the chance: Those, like me, who believe that "ordinary meaning" is the foundational rule for the evaluation of legal texts should consider—consider—whether and how AI-powered large language models like OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude might—might—inform the interpretive analysis. There, having thought the unthinkable, I've said the unsayable.
Now let me explain myself….
I think that LLMs have promise. At the very least, it no longer strikes me as ridiculous to think that an LLM like ChatGPT might have something useful to say about the common, everyday meaning of the words and phrases used in legal texts….
Let's see how this plays out in actual practice and in terms of precedent.
Selected readings
- "The non-culpability of ChatGPT in legal cases" (2/2/24)
- "AI percolates down through the legal system" (12/16/23) — with lengthy bibliography on AI, LLM, etc.
- "AI and the law" (10/15/23)
- "AI and the law, part 2" (10/19/23) — another long bibliography
[h.t. Kent McKeever]
AntC said,
May 30, 2024 @ 6:14 pm
Yeah. Count me amongst the "many". LLM hallucinations would be my first worry.
"words and phrases used in legal instruments." are often _not_ used in "ordinary meaning", but rather in specialist/technical legal sense (no different in principle to technical language in any specialism). Furthermore many statutes, and especially the USA's founding Constitutional documents are in historical language, not using current "ordinary meaning".
OTOH interpreting legal meaning does need applying (contemporary) "ordinary meaning" judiciously [ha!].
I see no evidence LLMs are can be trained to be 'aware' of these subtleties. Furthermore judicial process requires that an expert witness's opinion and their supporting evidence be testable/cross-examinable, whereas LLM's 'reasoning' (if that's even the applicable term) seems opaque.
That it's a so-called "Judge" proposing this reinforces my grave doubts about the competence of the US judicial system, with its political appointees.
J.W. Brewer said,
May 30, 2024 @ 8:08 pm
In this particular case, the relevant legal rule is apparently that the word "landscaping" in the insurance policy has its ordinary common meaning if there is no definitional language in the policy giving it a different or more carefully specified meaning. Insurance companies in the U.S. can and do give very precise and sometimes somewhat counterintuitive definitions to common words in their policy language, but if they don't do so, they run the risk of finding out in court what the common meaning of the word is thought (by some relevant decision-maker) to be. And the question would be what "landscaping" was generally understood to mean (and not to mean) as of the not-too-long-ago time when the insurance policy was issued, so the problems associated with determining the original meaning of words in a centuries-old document do not arise. That said, since the insured was a small business specializing in "landscaping" work, it does seem possible to me that the focus ought to be on what persons engaged in the landscaping trade typically think of as within the general scope of their business, which might plausibly include some ancillary work outside a layman's sense of "landscaping" in the strict sense that customers frequently ask them to do because it makes more sense than hiring another vendor/contractor for the purpose. But maybe that's not how Alabama insurance law would think about the question.
I would have liked a better and more detailed explanation about why Judge Newsom thinks LLM's would provide more useful insight than a good corpus linguistics might. Part of his argument seems to be that LLM's are "trained" on vastly larger amounts of prior usage than a typical corpus contains, but still. Judge Newsom's overall point that dueling sets of cherry-picked dictionary definitions are not the optimal way of figuring out what a word in a contract means (when the parties predictably and self-servingly disagree as to what it means) seems correct, and it would be an improvement to use better ways, even if they may have their own limitations and imperfections.
The selection of U.S. federal appellate judges has become increasingly politically contentious in recent decades. For what it's worth, Judge Newsom was in the "least contentious" 25% of the appellate judges confirmed by the Senate during the presidency in question, as measured by how many votes he attracted (largely meaning in practice how many Senators from the other political party voted for him). Now, in some cases judges confirmed by very close votes were "contentious" because their nomination got mixed up politically with some other ongoing controversy or power play and it's not necessarily a reflection on their personal merits, but being non-contentious is usually a sign of perceived competence for the role.
Viseguy said,
May 30, 2024 @ 9:23 pm
The problem with resorting to LLMs is that "ordinary meaning" refers to what reasonable people (or the "reasonable reader") would understand a word or phrase to mean, not what AI robots, however "thoughtful", might "think" it means. So action on Judge Newsome's double-em-dash-qualified proposal should probably await a scientific consensus on whether and to what extent AI-based judgments on such matters are a reliable proxy for human judgments — a time, it seems to me, that could be a long way off. Still, one can appreciate the attractiveness to a judge of having a purportedly objective way of determining ordinary meaning, given that in practice the process often boils down to a choice between dueling dictionary definitions offered in service of an outcome favorable to one or the other party to a dispute.
KeithB said,
May 31, 2024 @ 8:25 am
Seems to me that a search of the Google corpus would be just as fruitful and allow for the interpretation to be done by the judge, not the LLM.
GH said,
June 1, 2024 @ 8:37 am
I would not condemn the idea out of hand on principle. We shouldn't consider LLMs as an oracle overruling the judgment of humans, but a tool – one tool among others – to analyze the "ordinary meaning" of a word, based on a very large corpus of text written (hopefully) by "ordinary humans." The structure of these models enable them to capture patterns that could be missed by more traditional corpus analysis.
Of course, because they are presented conversationally "as if" by a human, there is a dangerous temptation to use the model's responses as direct answers to the question, rather than simply plausible response strings. As AntC says, the ability to LLMs to actually perform and report analyses of the data they contain is suspect at best (just look at their complete inability to count properly).
So rather than outright asking the model "What is the ordinary meaning of X?" we should probably have a number of "conversations" with it about X, and analyze those to figure out what it implicitly "understands" by X: what the model has captured about the meaning of the term from its training data.
What I think is a stronger argument against the proposal is that these models are not neutral algorithms. They have been trained, tuned, tweaked, and in some cases outright censored, in order to produce responses considered desirable by their creators. This introduces an unknown bias. A suitable model would have to be outside the control of any company or organization with a potential interest in the outcome, at the very least.