Deterministic Fuzzy Triage for Legal Compliance Classification and Evidence Retrieval

Published: 13 Dec 2025, Last Modified: 16 Jan 2026AILaw26EveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: legal compliance, contract clauses, dual encoder, fuzzy triage, deterministic models, ACORD, CUAD, evidence retrieval, graded relevance, human-in-the-loop
Paper Type: Full papers
TL;DR: Deterministic fuzzy triage for legal compliance clause retrieval
Abstract: Legal practitioners increasingly rely on machine learning systems to triage large volumes of contractual evidence, yet most deployed models are opaque, non-deterministic, and difficult to align with strict regulatory frameworks such as HIPAA or NERC CIP. We study a simple, reproducible alternative based on deterministic dual encoders and transparent fuzzy triage bands. Concretely, we train a RoBERTa-base dual encoder with a 512-dimensional projection and cosine similarity on the ACORD benchmark for graded clause retrieval, and then fine-tune it on a CUAD-derived binary compliance dataset. Across five random seeds (40--44) on a single NVIDIA A100 GPU, our model achieves ACORD-style retrieval performance of approximately NDCG@5 $\approx 0.38$--$0.42$, NDCG@10 $\approx 0.45$--$0.50$, and 4-star Precision@5 $\approx 0.37$ on the test split. On CUAD-derived binary labels, we obtain AUC $\approx 0.98$--$0.99$ and F$_1 \approx 0.22$ - $0.30$ depending on the positive-class weight, substantially outperforming majority and random baselines on a highly imbalanced setting (positive rate $\approx 0.6\%$). On top of the scalar compliance scores, we introduce a simple fuzzy triage mapping that partitions the score axis into three regions: auto-noncompliant, auto-compliant, and human-review. We tune the lower and upper thresholds on validation data to maximize auto-decision coverage subject to a hard constraint on the empirical error rate (at most $2\%$) over auto-decided examples. This yields deterministic, seed-stable models whose behavior can be summarized by a small number of scalar parameters and reported consistently across runs. We argue that this combination of deterministic encoders, calibrated fuzzy bands, and explicit error constraints offers a practical middle ground between hand-crafted rules and fully opaque large language models: it supports explainable evidence triage, enables reproducible audit trails, and provides a concrete interface for mapping scores and triage regions onto legal concepts such as access control, risk-based review, and residual-risk handling under regulatory frameworks like HIPAA.
Submission Number: 50
Loading