Trust But Verify: The Lawyer's Guide to AI-Driven Legal Research
AI Tools Have Many Uses in the Legal Sphere, But Are They Ready to Handle Research?
In March 2023, Goldman Sachs published a report that said 44% of legal tasks could be done by AI. In the year since, they have completely backed away from that puffed-up, premature statement. Yet AI-driven legal research tools are still promising to dramatically increase efficiency, allowing lawyers to quickly find relevant cases, statutes, and legal principles. Recently major legal tech companies have been quick to tout their AI products, claiming to have solved or at least significantly reduced longstanding issues like "hallucinations" - instances where AI generates false or misleading information. For example, LexisNexis claimed to deliver "100% hallucination-free linked legal citations." This refers specifically to the citations to case law. In other words, Lexis+ will return citations to actual cases; but this won’t guarantee that the case linked is the correct case for the query. As the study notes, simply responding with “Brown vs. Board of Education” to every query could qualify under that definition as 100% hallucination free because Brown vs. Board of Education is an actual case, but that is far from useful. Thomson Reuters stated they "avoid [hallucinations] by relying on the trusted content within Westlaw and building in checks and balances." Casetext, whose AI product is CoCounsel, claimed their system "does not make up facts, or 'hallucinate,' because we've implemented controls to limit CoCounsel to answering from known, reliable data sources."
But how well do these claims hold up under scrutiny? A recent study conducted by researchers from Stanford University offers a sobering look at the current state of AI in legal research. This article examines their findings and discusses the implications for legal practitioners and the wider legal tech industry.
Understanding AI-Driven Legal Research Tools
The study focused on three prominent AI-driven legal research tools: LexisNexis's Lexis+ AI, Thomson Reuters's Ask Practical Law AI, and Westlaw's AI-Assisted Research. These tools utilize a technology called Retrieval-Augmented Generation (RAG), which combines large language models (LLMs) with access to vast databases of legal documents. The study also used ChatGPT 4 as a sort of benchmark to test how a general purpose model would compare to legal-specific tools.
In theory, RAG should allow these AI systems to provide accurate, up-to-date legal information by retrieving relevant documents and then generating responses based on that information. This approach should prevent the LLM’s from returning fictional information—say a case from a John Grisham novel—and also out-of-date information. This would play to the strengths of AI in processing and synthesizing large amounts of data while grounding responses in authoritative legal sources.
If you’re not familiar with RAG, here’s how it works, and why it helps reduce, if not eliminate, hallucinations. AI engineers or a programming team build a database of cases. The cases would be verified as valid; no cases from fiction or cinema. Due to the size of our legal canon, the technology could filter or limit the dataset based on the user prompts, such as “only cases from the 9th Circuit.” When the prompt is submitted by the user, the RAG engine retrieves the relevant cases based on a set of queries to the database. The text of these cases is then added to the user prompt and presented to the LLM. The prompt usually includes the instruction to only use information from the database. The LLM then reads the cases to “reason” a response. In addition, the technology can look for each citation in the LLM response and ensure that the citations actually exist in the database. This technique should essentially eliminate all citation errors, but does not guarantee that reasoning errors are eliminated. However, reasoning errors should also be reduced as vendors build more capable legal LLM’s that are only trained on precisely curated facts.
Key Findings of the Study
Despite the ambitious claims of legal tech companies, the study found that AI-driven legal research tools still produce a significant number of hallucinations. The hallucination rates varied across the tools:
Lexis+ AI: 17%
Westlaw AI-Assisted Research: 33%
Thomson Reuters's Ask Practical Law AI: 17%
GPT-4: 43% (included as the control in the experiment)
However, the hallucination rate is only part of the issue. The study also looked at how often the legal research tools provided incomplete answers:
Lexis+ AI: 18%
Westlaw AI-Assisted Research: 25%
Thomson Reuters's Ask Practical Law AI: 63%
GPT-4: 8%
When completeness and hallucinations are taken into account, Practical Law was only accurate 20% of the time — not even half of ChatGPT 4’s accuracy rate of 49%
All of the legal-specific tools show improvement over general-purpose AI models like GPT-4, when you consider incomplete answers as correct. But these rates are still alarmingly high for tools meant to assist in legal research, where our accuracy is paramount. It seems like a risky choice to me, but Lexis Nexis has touted this same study for showing its accuracy over its competitors. Note that both Westlaw and Ask Practical Law are owned by Reuters.
Common Errors
In reviewing the AI tools, the researchers identified several common types of errors, and organized them into four (4) distinct categories. They are:
1. Misunderstanding case holdings: AI systems often misinterpreted the actual rulings of courts. Legal cases can be difficult to understand, and nuanced in their applicability. The AI tools were not always able to predict how or why a ruling mattered in regard to a specific area of law.
2. Confusing litigant arguments with court rulings: The AI sometimes presented arguments made by parties as if they were the court's decision. Yikes!
3. Misapplying legal authority hierarchies: The systems struggled with properly applying the complex hierarchies of legal authority in the U.S. legal system. So although citations from these tools might not be completely erroneous, they might just be irrelevant or only mildly persuasive.
4. Citing irrelevant or overruled cases: In many instances, the AI cited cases that were either not applicable or had been overturned. Even with the best legal-specific tools, you must verify your cases.
Causes of AI Errors in Legal Research
The study identified several root causes for these errors:
Naïve retrieval: The AI often failed to find the most relevant sources for a given query, leading to responses based on tangentially related or irrelevant information.
Inapplicable authority: Even when relevant documents were retrieved, the AI sometimes cited sources from the wrong jurisdiction or time period, or that had been overruled.
Reasoning errors: In some cases, the AI made logical errors in interpreting the information it retrieved, leading to incorrect conclusions.
These issues highlight the limitations of current AI in truly understanding the nuances and complexities of legal reasoning. While AI can process vast amounts of information quickly, it still struggles with the contextual understanding that is crucial in legal analysis. This is likely to change as AI platforms are refined, but until then, it is imperative that we keep these errors top of mind while using AI tools; even the most ideally-positioned and legal field-focused.
Implications for Legal Practice
The findings of this study have significant implications for lawyers using AI-driven research tools. These implications touch on ethical obligations, practical considerations, and best practices for integrating AI into legal research workflows.
Ethical Obligations
1. Duty of Competence: Under Rule 1.1 of the ABA Model Rules of Professional Conduct, lawyers have a duty to provide competent representation. This includes understanding the benefits and risks of relevant technology. Using AI tools without comprehending their limitations could potentially breach this duty.
2. Duty of Supervision: Rule 5.3 requires lawyers to properly supervise non-lawyer assistance to ensure compliance with professional obligations. This extends to the use of AI tools, which must be adequately supervised and their outputs verified.
3. State-Specific Guidance: Some state bar associations, including those in New York, California, and Florida, have issued specific guidance on AI use in legal practice. Lawyers must be aware of and comply with these evolving standards.
The Verification Dilemma
1. Efficiency vs. Accuracy: AI tools promise significant time savings in legal research. However, the high rate of hallucinations necessitates thorough verification of AI-generated information. This creates a dilemma: How much verification is necessary, and at what point does it negate the efficiency gains of using AI?
2. Hidden Costs: The time spent verifying AI outputs represents a hidden cost of using these tools. Lawyers and firms must factor this into their assessments of AI's value and efficiency.
3. Liability Concerns: Relying on unverified AI-generated information could potentially lead to malpractice claims if it results in errors in legal advice or court submissions. Even if it doesn’t lead to outright errors such as the “ChatGPT lawyer” who included completely fabricated cases, relying on a tool that returns overruled cases, or is in other ways incomplete, you could be setting yourself up for weaker briefs and arguments than your opponents.
Best Practices for Using AI in Legal Research — Balancing Innovation and Caution
While the study reveals significant challenges with current AI legal research tools, it doesn't suggest abandoning them entirely. Instead, lawyers should strive for a balanced approach that takes advantage of AI's strengths while guarding against its weaknesses.
To navigate these challenges, lawyers should follow these best practices:
1. Selective Use. Use AI legal research as a starting point. Treat AI-generated research as a preliminary step, not a final product. Imagine all of its output is equivalent to the output of a 2L student intern. Use its output to identify potentially relevant sources and legal concepts, but always conduct further investigation.
2. Always Verify Key Propositions: Independently verify any significant legal claims or citations provided by AI tools. This is especially crucial for central arguments or novel legal theories.
3. Cross-Reference Multiple Sources: Don't rely solely on AI-generated research. Cross-reference findings with traditional legal research methods and authoritative secondary sources. Share your research with your AI legal tools, using your preferred case law and commentary to help guid the LLM.
4. Be Wary of Using AI-Led Research In Certain Areas: AI tools may struggle to keep up with very recent legal changes. For areas of law that are new, novel, or in flux, always supplement AI research with up-to-date sources.
5. Hybrid Approach: Use AI for legal research in conjunction with, not as a replacement for, human legal expertise. Critical thinking and professional judgment remain essential. As always, verify all output from your AI Legal Research tool.
6. Document AI Use: When using AI tools, take a moment to document the process, including any verification steps you took. This can help demonstrate due diligence if questions arise later.
7. Stay Informed: Keep up with developments in legal AI and any relevant ethical guidelines or court rules regarding its use.
8. Understand Tool Limitations: No AI tool has been completely cleared of all biases. No AI tool is bulletproof when conducting legal research. Familiarize yourself with the specific limitations and potential biases of the AI tools you use. Be especially cautious in complex or novel legal areas.
9. Client Communication: Consider whether and how to communicate your use of AI tools to clients, especially if it might affect billing or the scope of work. Here is an article to explain why and how to talk with your clients about your use of AI. And here is an article explaining why you should address AI use with your clients as soon as possible.
10. Ongoing Training: Have an AI policy at your firm, and require training of all of the attorneys and legal assistants. Then invest in ongoing training for yourself and your team on the proper use and limitations of AI research tools. This can help mitigate risks and maximize benefits.
11. Expect Improvements and Innovation: This space has changed dramatically in just 18 months. Expect that vendors and technologists will continue to improve and evolve the technology, so its important to stay up to date on the latest AI improvements.
The Long Road Ahead for AI Companies
The study's findings also raise important considerations for companies developing and marketing legal AI tools:
Potential legal exposure: Companies making broad claims about their AI's capabilities or performance could face liability for false or misleading advertising. Although, the companies most focused on the legal space are fairly savvy about making verifiable claims. There's a small but growing discussion about potential tort liability for AI-inflicted harms, including negligent release of products with known defects.
Need for transparency: The researchers emphasize the importance of transparent benchmarking and public evaluations of legal AI tools. This transparency is crucial for responsible integration and oversight of AI in the legal profession.
The Future of AI in Legal Research
Despite the challenges identified, AI in legal research shows promise. These tools can offer value as a first step in the research process, potentially uncovering relevant sources more efficiently than traditional keyword searches.
However, significant improvements are needed, particularly in:
- Accurately understanding and applying legal hierarchies
- Distinguishing between different types of legal statements (holdings, dicta, party arguments)
- Properly identifying and applying relevant authority
Ongoing research and evaluation will be crucial in refining these tools and realizing their potential to enhance legal practice. With the millions of dollars that have already been poured into making AI useful in the legal field, and more money set to be spent on it, we can be hopeful that these tools will improve and evolve.
Conclusion
AI-driven legal research tools represent a significant advancement in legal technology, but they are far from infallible. Approach these tools with a balanced perspective, taking advantage of their benefits while remaining vigilant about their limitations. AI tools are especially well-suited for document summarization, discovery, and basic drafting. But this study shows the tools have a ways to go before they are completely trustworthy in conducting legal research.
The legal profession has a good track record of adapting to technological changes, and AI presents both opportunities and challenges. Only as we stay informed about the capabilities and limitations of AI tools and critically evaluate their outputs, can we effectively use these tools while avoiding their pitfalls.