AN ITERATIVE PROMPTING FRAMEWORK FOR LLM-BASED DATA PREPROCESSING

AN ITERATIVE PROMPTING FRAMEWORK FOR LLM-BASED DATA PREPROCESSING

ICLR 2026 Conference Submission19091 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models (LLMs), data preprocessing, iterative prompting

Abstract: Data preprocessing plays a crucial role in machine learning, directly impacting model convergence and generalization, especially for simple yet widely used linear models. However, preprocessing methods are diverse, and there are no deterministic rules for selecting the most suitable method for each feature in a dataset. As a result, practitioners often rely on exhaustive manual searches, which are both time-consuming and costly. In this paper, we propose an LLM-based iterative prompting framework that automates the selection of preprocessing methods. Our approach significantly reduces the number of iterations required to identify effective preprocessing strategies, thereby lowering human effort. We conduct an ablation study to analyze the contribution of each design component and provide extensive empirical evaluations. Results show that our method matches or surpasses baselines while substantially improving efficiency. The discovered preprocessing methods also accelerate training—either by improving convergence speed, enhancing generalization performance, or both.

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Submission Number: 19091

Loading