Peeking Inside  LLMs: Disentangling Internal Signals In Legal Violation Prediction

Peeking Inside LLMs: Disentangling Internal Signals In Legal Violation Prediction

ACL ARR 2026 January Submission8300 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Legal AI, Interpretability, Legal Violation Prediction

Abstract: The use of Large Language Models (LLMs) has recently been ubiquitous in the legal field. Legal practitioners have increasingly been using LLMs as a supportive tool, but with caution. One peril that concerns practitioners is the variability in performance across traditional legal NLP tasks, further exacerbated by a lack of transparency. Our work is the _first ever_ to adopt interpretability studies in usage of LLMs as legal assistant, focusing on one downstream task of legal violation prediction. We address the concern whether general-purpose LLMs such as LLaMA or Gemma, can be relied upon. In the work, we investigate model's latent representation space to evaluate whether the model’s generated outputs align with its internal knowledge, thereby attempting to turn LLM utilization in the legal domain from black-box into moderately informed.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: calibration/uncertainty, probing,

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 8300

Loading