TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

ACL ARR 2025 July Submission343 Authors

27 Jul 2025 (modified: 25 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In the legal domain, Fact-based Judgment Prediction and Explanation (FJPE) aims to predict judicial outcomes and generate grounded explanations using only factual information, mirroring early-phase legal reasoning. Motivated by the overwhelming case backlog in the Indian judiciary, we introduce TathyaNyaya, the first large-scale, expert-annotated dataset for FJPE in the Indian context. Covering judgments from the Supreme Court and multiple High Courts, the dataset comprises four complementary components, NyayaFacts, NyayaScrape, NyayaSimplify, and NyayaFilter, that facilitate diverse factual modeling strategies. Alongside, we present FactLegalLlama, an instruction-tuned LLaMa-3-8B model fine-tuned to generate faithful, fact-grounded explanations. While FactLegalLlama trails transformer baselines in raw prediction accuracy, it excels in generating interpretable explanations, as validated by both automatic metrics and legal expert evaluation. Our findings show that fact-only inputs and preprocessing techniques like text simplification and fact filtering can improve both interpretability and predictive performance. Together, TathyaNyaya and FactLegalLlama establish a robust foundation for realistic, transparent, and trustworthy AI applications in the Indian legal system.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Fact based Judgment Prediction, Legal AI, Explanation Generation, Indian Legal System, Legal Natural Language Processing, Annotated Legal Datasets, Large Language Models, Interpretability in AI, Automated Legal Analysis

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis, Position papers

Languages Studied: English

Previous URL: https://openreview.net/forum?id=lXV2Be4b6b

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: We sincerely appreciate the valuable feedback and efforts of the previous action editor and reviewers. However, for this new submission, we have made substantial changes to the paper, including incorporating expert evaluations, conducting inter-annotator agreement analysis, adding new experimental results and ablation studies, and significantly reorganizing and rewriting parts of the manuscript. Given the extent of these changes, we believe a fresh perspective from a new action editor and reviewers may lead to a more balanced and updated evaluation of the revised version. This decision is made with due respect to the prior reviewers and to ensure that the revised paper is assessed on its current merits.

Software: zip

Data: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Subsection 4.1 Dataset Compilation and Statistics

B2 Discuss The License For Artifacts: No

B2 Elaboration: We will release the Dataset, code and models after acceptance of the paper.

B3 Artifact Use Consistent With Intended Use: No

B3 Elaboration: We intend to release the dataset for research.

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: Yes

B6 Elaboration: Section 4 Dataset

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Appendix section A Experimental Setup and Hyper-parameters

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Appendix section A Experimental Setup and Hyper-parameters

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 7 Results and Analysis

C4 Parameters For Packages: Yes

C4 Elaboration: Appendix section A Experimental Setup and Hyper-parameters

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: Yes

D1 Elaboration: Subsection 4.2 Annotation Methodology and Quality Assurance

D2 Recruitment And Payment: Yes

D2 Elaboration: We assigned the annotation work to students as part of their academic assignments. No additional monetary payment was provided, as this task was integrated into their coursework and aligned with their academic learning objectives.

D3 Data Consent: No

D3 Elaboration: Openly available dataset

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: Yes

D5 Elaboration: Subsection 4.2 Annotation Methodology and Quality Assurance

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 343

Loading