Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej

ACL ARR 2025 July Submission775 Authors

28 Jul 2025 (modified: 02 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Automating legal document drafting can enhance efficiency, reduce manual workload, and streamline legal workflows. However, the structured generation of private legal documents remains underexplored, particularly in the Indian legal context due to limited public data and model adaptation challenges. We propose a Model-Agnostic Wrapper (MAW), a flexible, two-stage generation framework that first produces section titles and then generates section-wise content using retrieval-based prompts. This wrapper decouples generation from any specific model, enabling compatibility with a range of open and closed-source LLMs, and ensuring coherence, factual alignment, and reduced hallucination. To enable practical use, we build a Human-in-the-Loop Document Generation System, an interactive interface where users can input document types, refine sections, and iteratively generate structured drafts. The tool supports real-world legal workflows and will be made publicly accessible upon acceptance with privacy and security safeguards. Comprehensive evaluations, including expert-based assessments, demonstrate that the wrapper-based approach substantially improves document quality over baseline and fine-tuned models. Our framework establishes a scalable and adaptable path toward structured AI-assisted legal drafting in the Indian domain.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, NLP Applications, Machine Learning for NLP, Language Modeling, Interpretability and Analysis of Models for NLP, Generation, Human-Centered NLP
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis, Position papers
Languages Studied: English
Previous URL: https://openreview.net/forum?id=nTzKGxQMU8
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: Yes, I want a different area chair for our submission
Reassignment Request Reviewers: Yes, I want a different set of reviewers
Justification For Not Keeping Action Editor Or Reviewers: We have made substantial structural and methodological revisions to the manuscript, including updated experimental results, new evaluation components (e.g., Inter-Annotator Agreement, ablation studies), improved dataset analysis, and expanded discussions in the Related Work and Limitations sections. Given these major changes, we believe a fresh review process would provide an unbiased evaluation of the current version. Retaining the previous reviewers may inadvertently anchor the assessment to the earlier draft, which differs significantly in scope and depth.
Software: zip
Data: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: No
A2 Elaboration: Ethics Statement
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: No
B1 Elaboration: We created new scientific artifacts, including a dataset of private legal documents, a domain-adapted legal language model, a Model-Agnostic Wrapper (MAW) for structured legal drafting, and a Human-in-the-Loop (HITL) Document Generation System. As these are novel contributions, there were no prior creators to cite. The artifacts will be publicly released upon acceptance for transparency and reproducibility.
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: We will release the Dataset, code and models after acceptance of the paper.
B3 Artifact Use Consistent With Intended Use: N/A
B3 Elaboration: Section 4 Dataset
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: 4.2 Data Anonymization and Ethical Considerations
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 4 Dataset
B6 Statistics For Data: Yes
B6 Elaboration: Section 4 Dataset
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 6 Experimental Setup
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 6.2 Hyperparameters
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 8 Results and Analysis
C4 Parameters For Packages: Yes
C4 Elaboration: Section 6 Experimental Setup
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: Section 7 Evaluation Metrics
D2 Recruitment And Payment: Yes
D2 Elaboration: Section 7 Evaluation Metrics
D3 Data Consent: Yes
D3 Elaboration: Section 4.1 Dataset Composition and Diversity
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: Yes
D5 Elaboration: Section 7 Evaluation Metrics
E Ai Assistants In Research Or Writing: No
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 775
Loading