A Semantic-Based Hoist Mutation Operator for Evolutionary Feature Construction in Regression

Published: 2024, Last Modified: 16 Oct 2025IEEE Trans. Evol. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, genetic programming (GP) has achieved impressive results on evolutionary feature construction tasks. To increase search effectiveness, researchers have developed many semantic-based crossover and mutation operators to guide GP searches toward the target semantics. However, semantics has not yet been explored for the hoist mutation operator, which is an operator designed for controlling the bloat effect. Although the hoist mutation operator can significantly reduce model sizes, the most informative subtree may be disrupted by the randomness in mutation. To address this issue, we develop a semantic-based hoist mutation (SHM) operator in this article to preserve the most informative subtree that has the largest cosine similarity between its semantics and the target semantics. Experimental results on 98 regression datasets from the Penn Machine Learning Benchmark show that using this operator not only significantly reduces model size but also improves the test accuracy of features constructed by GP. A comparison with seven bloat control methods shows that the proposed operator achieves the best tradeoff between accuracy and model size. Moreover, an experiment on the state-of-the-art symbolic regression benchmark shows that GP with the SHM operator achieves the best test accuracy and competitive model sizes compared with 22 symbolic regression and machine learning algorithms.
Loading