Build Roadmap for Automated Feature Transformation: A Graph-based Reinforcement Learning Approach

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automated Feature Transformation, Tabular Data, Multi-Agent Reinforcement Learning
Abstract: Feature transformation tasks aim to generate high-value features by combining existing ones through mathematical operations, which can improve the performance of downstream machine learning models. Current methods typically use iterative sequence generation, where exploration is guided by performance feedback from downstream tasks. However, these approaches fail to effectively utilize historical decision-making experiences and overlook potential relationships between generated features, thus limiting the flexibility of the exploration process. Additionally, the decision-making process lacks the ability to dynamically backtrack on efficient decisions, which hinders adaptability and reduces overall robustness and stability. To address these issues, we propose a novel framework that uses a graph to track the feature transformation process, where each node represents a transformation state. In this framework, three cascading agents sequentially select nodes and mathematical operations to generate new nodes. This strategy benefits from the graph structure’s ability to store and reuse valuable transformations, and it incorporates backtracking via graph pruning techniques, allowing the framework to correct inefficient paths. To demonstrate the effectiveness and flexibility of our approach, we conducted extensive experiments and detailed case studies, demonstrating superior performance across a variety of datasets. This strategy leverages the graph structure's inherent properties, allowing for the preservation and reuse of sight-seen and valuable transformations. It also enables back-tracking capabilities through graph pruning techniques, which can rectify inefficient transformation paths. To validate the efficacy and flexibility of our approach, we conducted comprehensive experiments and detailed case studies, demonstrating superior performance in diverse datasets.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4841
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview