AutoStat: DSL-based Automated Statistical Modeling from Natural Language

Wuqiang Zheng; Zhiyang Dou; Minghao Guo; Benjamin Tod Jones; Wojciech Matusik

AutoStat: DSL-based Automated Statistical Modeling from Natural Language

Wuqiang Zheng, Zhiyang Dou, Minghao Guo, Benjamin Tod Jones, Wojciech Matusik

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Statistical Modeling

TL;DR: We introduce AutoStat, a DSL-centered framework that unifies statistical modeling workflows and enables reliable end-to-end automation with LLMs.

Abstract: Statistical modeling plays a critical role and is widely used in data analysis across diverse domains. Despite its importance, existing workflows remain cumbersome: they rely on fragmented programming environments and domain-specific probabilistic programming languages that are verbose and difficult to use, especially for non-experts. Although many efforts have been made toward automated statistical modeling, the methods still suffer from low accuracy, high computational cost, and heavy reliance on manual intervention. To address these challenges, we present \textbf{\textit{AutoStat}}, a novel Domain-Specific Language (DSL)-based framework for automating statistical modeling. AutoStat leverages \textbf{\textit{StatModelDSL}}, the first compact and structured DSL that specifies complete modeling tasks in a unified and portable form. AutoStat further enhances the automated process via interactive modeling by integrating two agents -- StatModelChatbot, which interactively refines underspecified user requirements, and StatModelCopilot, which generates executable DSL programs. With StatModelChatbot clarifying intent and StatModelCopilot emitting executable DSL, AutoStat compiles and executes the specification end-to-end, delivering the complex statistical models directly from natural-language dialogue. We demonstrate that the proposed StatModelDSL affords both LLM amenability and practical usability: when instantiated with GPT-4o, it yields a \textbf{91.59\%} reduction in error rate and a \textbf{5.89\%} uplift in user preference over a Stan-based workflow. Meanwhile, AutoStat achieves a \textbf{100\%} syntax correctness rate for DSL generation and a \textbf{98.76\%} semantic passing rate, significantly surpassing previous methods. Our dataset, codes, and models will be publicly released upon acceptance.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 9294

Loading