Keywords: constraint solving, algorithm selection, LLM, combinatorial optimization, feature extraction
Abstract: Feature engineering remains a critical bottleneck in machine learning, often requiring significant manual effort and domain expertise. While end-to-end deep learning models can automate this process by learning latent representations, they do so at the cost of interpretability. We propose a gray-box paradigm for automated feature engineering that leverages Large Language Models for program synthesis. Our framework treats the LLM as a meta-learner that, given a high-level problem description for constraint optimization, generates executable Python scripts that function as interpretable feature extractors. These scripts construct symbolic graph representations and calculate structural properties, combining the generative power of LLMs with the transparency of classical features. We validate our approach on algorithm selection across 227 combinatorial problem classes. Our synthesized feature extractors achieve 58.8\% accuracy, significantly outperforming the 48.6 \% of human-engineered extractors, establishing program synthesis as an effective approach to automating the ML pipeline.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 24962
Loading