Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej

Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej

ACL ARR 2025 July Submission775 Authors

28 Jul 2025 (modified: 02 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Automating legal document drafting can enhance efficiency, reduce manual workload, and streamline legal workflows. However, the structured generation of private legal documents remains underexplored, particularly in the Indian legal context due to limited public data and model adaptation challenges. We propose a Model-Agnostic Wrapper (MAW), a flexible, two-stage generation framework that first produces section titles and then generates section-wise content using retrieval-based prompts. This wrapper decouples generation from any specific model, enabling compatibility with a range of open and closed-source LLMs, and ensuring coherence, factual alignment, and reduced hallucination. To enable practical use, we build a Human-in-the-Loop Document Generation System, an interactive interface where users can input document types, refine sections, and iteratively generate structured drafts. The tool supports real-world legal workflows and will be made publicly accessible upon acceptance with privacy and security safeguards. Comprehensive evaluations, including expert-based assessments, demonstrate that the wrapper-based approach substantially improves document quality over baseline and fine-tuned models. Our framework establishes a scalable and adaptable path toward structured AI-assisted legal drafting in the Indian domain.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Resources and Evaluation, NLP Applications, Machine Learning for NLP, Language Modeling, Interpretability and Analysis of Models for NLP, Generation, Human-Centered NLP

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis, Position papers

Languages Studied: English

Previous URL: https://openreview.net/forum?id=nTzKGxQMU8

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: We have made substantial structural and methodological revisions to the manuscript, including updated experimental results, new evaluation components (e.g., Inter-Annotator Agreement, ablation studies), improved dataset analysis, and expanded discussions in the Related Work and Limitations sections. Given these major changes, we believe a fresh review process would provide an unbiased evaluation of the current version. Retaining the previous reviewers may inadvertently anchor the assessment to the earlier draft, which differs significantly in scope and depth.

Software: zip

Data: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: No

A2 Elaboration: Ethics Statement

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: No

B1 Elaboration: We created new scientific artifacts, including a dataset of private legal documents, a domain-adapted legal language model, a Model-Agnostic Wrapper (MAW) for structured legal drafting, and a Human-in-the-Loop (HITL) Document Generation System. As these are novel contributions, there were no prior creators to cite. The artifacts will be publicly released upon acceptance for transparency and reproducibility.

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: We will release the Dataset, code and models after acceptance of the paper.

B3 Artifact Use Consistent With Intended Use: N/A

B3 Elaboration: Section 4 Dataset

B4 Data Contains Personally Identifying Info Or Offensive Content: Yes

B4 Elaboration: 4.2 Data Anonymization and Ethical Considerations

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Section 4 Dataset

B6 Statistics For Data: Yes

B6 Elaboration: Section 4 Dataset

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 6 Experimental Setup

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 6.2 Hyperparameters

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 8 Results and Analysis

C4 Parameters For Packages: Yes

C4 Elaboration: Section 6 Experimental Setup

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: Yes

D1 Elaboration: Section 7 Evaluation Metrics

D2 Recruitment And Payment: Yes

D2 Elaboration: Section 7 Evaluation Metrics

D3 Data Consent: Yes

D3 Elaboration: Section 4.1 Dataset Composition and Diversity

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: Yes

D5 Elaboration: Section 7 Evaluation Metrics

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 775

Loading