DIFER: Differentiable Automated Feature EngineeringDownload PDF

25 Feb 2022, 12:35 (modified: 16 Jul 2022, 13:35)AutoML-Conf 2022 (Main Track)Readers: Everyone
Abstract: Feature engineering, a crucial step of machine learning, aims to extract useful features from raw data to improve model performance. In recent years, great efforts have been devoted to Automated Feature Engineering (AutoFE) to replace expensive human labor. However, all existing methods treat AutoFE as an optimization problem over a discrete feature space. Huge search space leads to significant computational overhead. Unlike previous work, we perform AutoFE in a continuous vector space and propose a differentiable method called DIFER in this paper. We first introduce a feature optimizer based on the encoder-predictor-decoder framework, which maps features into the continuous vector space via the encoder, optimizes the embedding along the gradient direction induced by the predictor, and recovers better features from the optimized embedding by the decoder. Based on the feature optimizer, we employ a feature evolution method to search for better features iteratively. Extensive experiments on classification and regression datasets demonstrate that DIFER can significantly outperform the state-of-the-art AutoFE methods in terms of both model performance and computational efficiency. The implementation of DIFER is avaialable on https://anonymous.4open.science/r/DIFER-3FBC/.
Keywords: Automated Feature Engineering, Classification, AutoML
One-sentence Summary: propose the first differentiable method to automate feature engineering efficiently
Track: Main track
Reproducibility Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Reviewers: Guanghui Zhu,zgh@nju.edu.cn Zhuoer Xu,zhuoer.xu@smail.nju.edu.cn
Main Paper And Supplementary Material: pdf
Code And Dataset Supplement: zip
CPU Hours: 0.3
GPU Hours: 0
TPU Hours: 0
Evaluation Metrics: Yes
Class Of Approaches: Gradient-based Methods, Evolutionary Methods
Datasets And Benchmarks: OpenML, Kaggle, UCIrvine
Performance Metrics: f1-score, relative absolute error
6 Replies