Law Sites: Stanford Will Augment Its Study Finding that AI Legal Research Tools Hallucinate in 17% of Queries, As Some Raise Questions About the Results

Ambrogi writes

Stanford University will augment the study it released last week of generative AI legal research tools from LexisNexis and Thomson Reuters, in which it found that they deliver hallucinated results more often than the companies say, as others have raised questions about the study’s methodology and fairness.

The preprint study by Stanford’s RegLab and its Human-Centered Artificial Intelligence research center founds that these companies overstate their claims of the extent to which their products are free of hallucinations. While both hallucinate less than a general-purpose AI tool such as GPT-4, they nevertheless each hallucinate more than 17% of the time, the study concluded.

The study also found substantial differences between the LexisNexis (LN) and Thomson Reuters (TR) systems in their responsiveness and accuracy, with the LN product delivering accurate responses on 65% of queries, while the TR product responded accurately just 18% of the time.

But the study has come under criticism from some commentators, most significantly because it effectively compared apples and oranges. For LN, it studied Lexis+ AI, which is the company’s generative AI platform for general legal research.

But for TR, the study did not review the company’s AI platform for general legal research, AI-Assisted Research in Westlaw Precision. Rather, it reviewed Ask Practical Law AI, a research tool that is limited to content from Practical Law, a collection of how-to guides, templates, checklists, and practical articles.

The authors acknowledged that Practical Law is “a more limited product,” but say they did this because Thomson Reuters denied their “multiple requests” for access to the AI-Assisted Research product.

“Despite three separate requests, we were not granted access to this tool when we embarked on this study, which illustrates a core point of the study: transparency and benchmarking is sorely lacking in this space,” Stanford Law professor Daniel E. Ho, one of the study’s authors, told me in an email today.

Thomson Reuters has now made the product available to the Stanford researchers and Ho confirmed that they will “indeed be augmenting our results from an evaluation of Westlaw’s AI-Assisted Research.”

Ho said he could not provide concrete timing on when the results would be updated, as the process is resource intensive, but he said they are working expeditiously on it.

“It should not be incumbent on academic researchers alone to provide transparency and empirical evidence on the reliability of marketed products,” he added.

Apples to Oranges

With respect to TR, the difference between the AI capabilities of Westlaw Precision and Practical Law is significant.

Read full report