TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

ACL ARR 2025 May Submission6074 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In the legal domain, Fact-based Judgment Prediction and Explanation (FJPE) aims to predict judicial outcomes and generate grounded explanations using only factual information, mirroring early-phase legal reasoning. Motivated by the overwhelming case backlog in the Indian judiciary, we introduce **TathyaNyaya**, the first large-scale, expert-annotated dataset for FJPE in the Indian context. Covering judgments from the Supreme Court and multiple High Courts, the dataset comprises four complementary components, **NyayaFacts**, **NyayaScrape**, **NyayaSimplify**, and **NyayaFilter**, that facilitate diverse factual modeling strategies. Alongside, we present **FactLegalLlama**, an instruction-tuned LLaMa-3-8B model fine-tuned to generate faithful, fact-grounded explanations. While FactLegalLlama trails transformer baselines in raw prediction accuracy, it excels in generating interpretable explanations, as validated by both automatic metrics and legal expert evaluation. Our findings show that fact-only inputs and preprocessing techniques like text simplification and fact filtering can improve both interpretability and predictive performance. Together, TathyaNyaya and FactLegalLlama establish a robust foundation for realistic, transparent, and trustworthy AI applications in the Indian legal system.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Fact based Judgment Prediction, Legal AI, Explanation Generation, Indian Legal System, Legal Natural Language Processing, Annotated Legal Datasets, Large Language Models, Interpretability in AI, Automated Legal Analysis

Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources, Data analysis, Position papers

Languages Studied: English

Submission Number: 6074

Loading