TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

ACL ARR 2024 December Submission2120 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In the landscape of Fact-based Judgment Prediction and Explanation (FJPE), reliance on factual data is essential for developing robust and realistic AI-driven decision-making tools. This paper introduces **TathyaNyaya**, the largest annotated dataset for FJPE tailored to the Indian legal context, encompassing judgments from the Supreme Court of India and various High Courts. Derived from the Hindi terms **"Tathya"** (fact) and **"Nyaya"** (justice), the **TathyaNyaya** dataset is uniquely designed to focus on factual statements rather than complete legal texts, reflecting real-world judicial processes where factual data drives outcomes. Complementing this dataset, we present FactLegalLlama, an instruction-tuned variant of the LLaMa-3-8B Large Language Model (LLM), optimized for generating high-quality explanations in FJPE tasks. Fine-tuned on the factual data in TathyaNyaya, **FactLegalLlama** integrates predictive accuracy with coherent, contextually relevant explanations, addressing the critical need for transparency and interpretability in AI-assisted legal systems. Our methodology combines transformers for binary judgment prediction with FactLegalLlama for explanation generation, creating a robust framework for advancing FJPE in the Indian legal domain. TathyaNyaya not only surpasses existing datasets in scale and diversity but also establishes a benchmark for building explainable AI systems in legal analysis. The findings underscore the importance of factual precision and domain-specific tuning in enhancing predictive performance and interpretability, positioning **TathyaNyaya** and **FactLegalLlama** as foundational resources for AI-assisted legal decision-making.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Fact based Judgment Prediction, Legal AI, Explanation Generation, Indian Legal System, Legal Natural Language Processing, Annotated Legal Datasets, Large Language Models, Interpretability in AI, Automated Legal Analysis
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 2120
Loading