Legal Retrieval for Public Defenders

06 Feb 2026 (modified: 08 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: AI tools are suggested as solutions to assist public agencies with heavy workloads. In public defense---where a constitutional right to counsel meets the complexities of law, overwhelming caseloads, and constrained resources---practitioners face especially taxing conditions. Yet, there is little evidence of how AI could meaningfully support defenders' day-to-day work. In partnership with the anonymized Office of the Public Defender, we develop the anonymized BriefBank, a retrieval tool which surfaces relevant appellate briefs to streamline legal research and writing. We show that existing retrieval benchmarks fail to transfer to real public defense research, however adding domain knowledge improves retrieval quality. This includes query expansion with legal reasoning, domain-specific data and curated synthetic examples. To facilitate further research, we release a taxonomy of realistic defender search queries and a manually annotated evaluation dataset for public defense retrieval. This benchmark is highly correlated with a proprietary retrieval dataset annotated by experienced public defenders. Our work improves on the status quo of realistic legal retrieval benchmarking and illustrates one approach to applying AI in a real-world public interest setting.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Abstract: - "To facilitate further research, we release a new, realistic retrieval dataset, manually annotated by real public defenders, and provide a taxonomy of these realistic defender search queries. Together, our work improves on the status quo of realistic retrieval benchmarking and provides a starting point for leveraging AI in a real-world public interest setting." --> "To facilitate further research, we release a taxonomy of realistic defender search queries and a manually annotated evaluation dataset for public defense retrieval. This benchmark is highly correlated with a proprietary retrieval dataset annotated by experienced public defenders. Our work improves on the status quo of realistic legal retrieval benchmarking and illustrates one approach to applying AI in a real-world public interest setting." Section 1, Introduction: - Rewritten paragraph about performance (starting with ""On the other hand, using larger embedding models[...]", now starting with ""On the other hand, we improve recall by (1) using more recent and larger embedding models" - "we release a manually annotated dataset, fine-tuned models, and a taxonomy of defender search queries."\footnote{upon publication.} --> "we release a manually annotated dataset, fine-tuned models and replication code. footnote{anonymized replication package: https://anonymous.4open.science/r/anonymized-public-defender-retrieval-782E/README.md Section 2, Public Defense Retrieval: - "such failure rates are unacceptable in practice. Moreover, generative models provide limited transparency about sources, which makes it difficult in practice for attorneys to verify output accuracy." --> "such failure rates are unacceptable in practice. Beyond accuracy, using commercial generative models raises confidentiality risks: case details submitted to proprietary APIs may fall outside attorney-client privilege and be subject to mandatory disclosure. Generative models also provide limited transparency about sources, making it difficult for attorneys to verify output accuracy." Section 3.2 "The Public Defense Dataset" - Added a paragraph about Inter-annotator agreement: "To compute inter-annotator agreement, [...]" - Added one paragraph about anonymization procedure: "Lastly, we anonymize party-related personally identifiable information in the dataset [...]." - Added robustness of anonymization: "We will further show that anonymization does not drive retrieval performance: the spearman R between the anonymized and non-anonymized version is 1.0 in zero-shot settings (perfect correlation), and 0.99 (p=2.0e-16) for fine-tuning experiments." Section 4, Taxonomy: - Slightly reworded paragraph around "One illustrative example for agentic queries is "has Counterman v. Colorado been addressed in a published anonymized state opinion". Section 5, Experiments: - Added anonymization robustness checks: "We also report and compare to retrieval results on a non-anonymized version of the PD dataset, and report results in Appendix Table [...]" - Added robustness check for older v. newer models: "Since OPD documents span 25 years, and our PD dataset only contains briefs from 2023-2025, we also verify that retrieval performance is robust across different time periods of the OPD dataset. We report spearman R between the PD dataset and OPD subsets stratified by year in Appendix Table 6. Correlations between the released PD dataset and the OPD dataset are not driven by old OPD retrieval targets, but also generalize to more recent retrieval targets." Section 6, Discussion: - We have rewritten 6.1 "Collaborations between Academia and Legal Institutions" Conclusion: - "Our results suggest that progress in legal retrieval for public defense is constrained less by model scale, but by domain mismatch and lack of realistic available data." --> "Our results suggest that progress in legal retrieval for public defense may be constrained less by model scale, but by domain mismatch and lack of available datasets." Added Broader Impact Statement (as requested by both reviewers) Added Appendix Table about anonymization robustness checks, and relevant prompts for anonymizing the dataset.
Assigned Action Editor: ~Huaxiu_Yao1
Submission Number: 7389
Loading