A General Feature-Informed Crossover for Two-Stage Feature Selection in Symbolic Regression

Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, Mengjie Zhang

Published: 2025, Last Modified: 07 Jan 2026CEC 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Genetic programming-based symbolic regression is a widely used machine learning technique, but its effectiveness can be limited as the number of input features increases. In genetic programming, two-stage feature selection has been extensively applied to enhance performance when dealing with a large number of input features. Existing two-stage feature selection methods typically require reinitializing new GP trees based on the selected features after feature selection, which disrupts the building blocks accumulated during evolution. In this paper, we propose a crossover operator that is aware of the selected features to leverage the feature selection results, thereby bypassing the need for reinitialization. This operator guides the crossover process to prioritize selected features, gradually eliminating unimportant features while preserving evolved building blocks. Experimental results validate the proposed method across three different feature-selection mechanisms on 98 datasets, demonstrating its effectiveness and broad applicability across various feature-selection strategies.

External IDs:dblp:conf/cec/ZhangC0B025