Why high data quality is fundamental for Legal AI

How poor data quality leads to hallucinations and why generic language models fail in legal practice.

A lawyer submits a brief to the Higher Regional Court of Cologne. The citations are precise, the references authoritative: "Meyer-Götz, in: Hauß/Gernhuber, Familienrecht, 6th ed. 2022, Section 1671 margin note 33." Edition number, paragraph, margin note – all professionally formatted.

All fabricated.

The court's finding: The quoted source doesn't exist. A chatbot had apparently mixed three different publications together. Author names, anthology titles, and edition numbers were combined without ever verifying they belonged to the same source. Legal principles were asserted that had never been established in literature or court decisions. A chatbot had generated plausible but completely baseless citations and the lawyer had submitted them without verification.

This isn't an isolated incident. In September 2025, the Regional Court of Frankfurt am Main ruled on a similar case: three alleged Federal Court of Justice decisions, all made up. The court responded unequivocally, accusing the lawyer of endangering the administration of justice. A clear signal.

These cases reveal what really matters when using legal AI: It's not the speed of responses or the elegance of the user interface that's decisive, but the reliability of the underlying data.

When AI hallucinates: Two cases from practice

The Frankfurt and Cologne cases show a common pattern: lawyers relied on AI-generated citations without verifying them. The result was briefs full of precise-sounding but entirely fabricated quotes. What's technically known as hallucination, the phenomenon where language models generate plausible-sounding but false information, had immediate legal consequences here. The courts branded this approach in harsh terms as abuse and endangerment of the administration of justice.

The problem doesn't lie solely with the lawyers who violated their duty of care, but fundamentally also with the technology they used. Generic language models, systems like ChatGPT or Claude trained on texts from the open internet, aren't designed to reliably process specialized legal content. Of their vast training data, only a minimal portion consists of legal texts, and of that, an even smaller fraction comprises German legal documents. They can recognize patterns and generate convincing-sounding text, but they understand neither the law's inherent systematic structure, the subtle but critical semantic distinctions in legal language, nor the importance of precision in legal work.

What data quality really means in legal practice

Data quality in the legal field means far more than technical cleanliness or the sheer volume of available documents. It's about five central dimensions that must work together for legal AI to deliver reliable results.

Relevance. Not every decision matters for every question. A high-quality legal database must be able to distinguish between leading decisions and peripheral rulings, between current and outdated court decisions, between lower court decisions and supreme court precedents.
Completeness. A single decision may be accurately reproduced, but what good is that if the opposing view is missing, if subsequent decisions aren't considered, if the doctrinal classification from literature is excluded? Legal work thrives on weighing, comparing, systematically penetrating a legal problem. An incomplete data foundation prevents exactly that.
Precision. Case numbers, citations, margin notes – legal references follow strict conventions because only this ensures verifiability. A system that doesn't guarantee this precision is unusable for legal practice, no matter how seemingly eloquent the answers may appear at first glance.
Contextualization. A court decision only unfolds its meaning in conjunction with literature, commentaries, legislative materials, and indication of whether it's been superseded by a newer decision. Processing case texts in isolation doesn't create legal certainty but a fragile knowledge base.
Currency. The law continuously evolves. A decision valid yesterday may be outdated today. Legal AI must not only access current data but also recognize when, why, and how the legal situation has changed.

Why court decisions alone aren't enough

Many legal AI providers focus on collecting and making as many court decisions searchable as possible. That sounds like a sensible approach, but it's only half the battle. A judgment isn't an isolated fact but part of a complex legal discourse.

Take an employment law wrongful dismissal case as an example. The relevant Federal Court of Justice ruling alone may provide insight into a specific point. But how does the literature classify this decision? Are there dissenting voices? What consequences does commentary literature draw for contract practice? Has a lower court decision implemented the Federal Court of Justice's guidelines or deviated from them? Only when all these questions can be answered does a complete picture of the legal situation emerge.

This is precisely where established legal publishers excel. Platforms like beck-online provide not just judgments but also commentaries, legislative materials, journal articles, and practice handbooks. Critical here is the ongoing editorial maintenance and linking of content by legal professionals. Only this structure allows court decisions to be understood in their doctrinal and practical context.

The path to reliable Legal AI

The allure of generic language models is understandable. They're easily available, simple to use, and deliver an answer to almost any question. But they ignore the complex peculiarities of legal texts and are subject to a constant updating requirement they cannot meet without specialized maintenance.

Modern technologies like Retrieval-Augmented Generation (RAG) represent significant progress. Instead of generating answers exclusively from a language model's training material, RAG systems retrieve relevant documents from a database and integrate them into the answer. This significantly reduces hallucinations – but only if the underlying database is high-quality, current, and comprehensive.

A RAG system that only accesses an incomplete or poorly maintained court decision database may function technically flawlessly yet deliver unusable results. Data quality determines answer quality. No matter how sophisticated, an AI algorithm cannot compensate for missing or flawed content.

How modern Legal AI creates reliability

Good Legal AI addresses exactly this point. It combines high-quality, editorially maintained specialized content with modern AI technology, creating answers that aren't just generated but verifiable. Every statement is documented with a concrete source – a judgment, a commentary, an article. Users can immediately trace where information comes from and decide for themselves whether to trust the source.

This transparency and traceability isn't a technical feature but a basic prerequisite for using legal AI in practice. Only when attorneys, legal departments, and courts can understand the basis on which AI provides its answers can trust develop. And only with this trust does legal AI become a tool that doesn't replace legal work but provides solid support.

The lesson from the Frankfurt and Cologne cases is clear: The path from court decisions to legal certainty runs only one way: through high-quality data. Only when legal specialized content is continuously maintained, structured, and linked with AI methods like RAG does an instrument emerge that reliably supports practice. Everything else remains patchwork and puts not only clients but trust in the rule of law at stake.

Maximilian Detken

Resources

Imprint