IndLaw-QA: Fine-Tuned LLMs with RAG for Indian Legal QA

Aayush Badoni, Divyansh Anand Singh, Kapil Vuthoo, Shivansh Singh, Sonia Khetarpaul, L. Venkata Subramaniam

Published: 01 Jan 2026, Last Modified: 26 May 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Natural Language Processing tasks like information retrieval, question-answering (QA), cross-lingual in legal domain face significant challenges in removing multiple interpretations and ambiguity, contextually complex and jurisdiction-specific legal terminology. This study introduces IndLaw-QA question-answering framework tailored to legal corpora, with a primary focus on the Indian legal system. The proposed architecture leverages latest techniques like Retrieval-Augmented Generation (RAG) with large language models (LLMs), cross-model validation across LLMs and domain-specific fine-tuning, to alleviate limitations in existing legal information retrieval pipelines. The system integrates latest LLMs such as GPT-4o, Llama 3.2, and Claude Haiku 3.5. with prompt optimization and iterative training strategies to enhance semantic contextual precision and output. Further, the framework also incorporates an evaluation protocol to utilize a standardized test corpus comprising 100 legal QA pairs and performance metrics. It includes BERTScore, precision, recall and F1 score, combined with cross-LLM validation with real-time document embeddings. This methodology delivers a scalable, accurate, and semantically adaptive solution for legal information extraction and validation within legal domain specific to Indian law corpora.

External IDs:doi:10.1007/978-3-032-15134-6_3