Scaling Controllable Modeling Via Self-Evolving Feature Engineering

Scaling Controllable Modeling Via Self-Evolving Feature Engineering

ICLR 2026 Conference Submission16867 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, Artificial Superintelligence, Self Evolving AI, Interpretable AI, Feature Engineering

TL;DR: FEST automates feature engineering with LLMs and decision trees, narrowing the gap between accuracy and control in machine learning.

Abstract: Machine learning faces a fundamental dilemma: models that achieve high predictive performance are typically opaque, while models that provide control, the ability to understand and guide their outcomes, often sacrifice accuracy for transparency. This performance-control trade-off constrains ML adoption in critical domains where both capabilities are essential. Practitioners have historically addressed this challenge through manual feature engineering, embedding domain expertise into models to achieve reasonable accuracy while retaining some degree of control. However, this process is costly, time-consuming and limited by human expertise, restricting scalability. We present FEST (Feature Engineering with Self-evolving Trees), a framework for automated controllable modeling that replaces manual feature design with an iterative, self-evolving process. FEST leverages large language models (LLMs) as feature discovery engines to generate plausible features from observational data by analyzing contrasting samples. Next, these features are semantically clustered, deduplicated and validated for predictive performance using interpretable decision trees. The evolving trees refine feature sets over iterations, producing human-readable decision rules that practitioners can inspect, modify and intervene upon, thus providing both accuracy and control. To demonstrate FEST's effectiveness in bridging the performance-control gap, we evaluate it against traditional interpretable models, neural networks, and LLM classifiers across diverse real-world tasks in social science, NLP, and marketing domains. We also introduce GLoRE, a controlled synthetic benchmark, designed to test to test a model's ability to deduce outcomes from complex logical rule relationships embedded in natural language, with true features and their relationships unknown to LLMs. FEST recovers all of the target features. These results show that automated, self-evolving feature engineering can make controllable modeling practical at scale, reducing reliance on costly manual design while narrowing the long-standing divide between performance and control in machine learning.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 16867

Loading