DAFEE: A Scalable Distributed Automatic Feature Engineering Algorithm for Relational DatasetsOpen Website

Published: 01 Jan 2020, Last Modified: 16 May 2023ICA3PP (2) 2020Readers: Everyone
Abstract: Automatic feature engineering aims to construct informative features automatically and reduce manual labor for machine learning applications. The majority of existing approaches are designed to handle tasks with only one data source, which are less applicable to real scenarios. In this paper, we present a distributed automatic feature engineering algorithm, DAFEE, to generate features among multiple large-scale relational datasets. Starting from the target table, the algorithm uses a Breadth-First-Search type algorithm to find its related tables and constructs advanced high-order features that are remarkably effective in practical applications. Moreover, DAFEE implements a feature selection method to reduce the computational cost and improve predictive performance. Furthermore, it is highly optimized to process a massive volume of data. Experimental results demonstrate that it can significantly improve the predictive performance by 7% compared to SOTA algorithms.
0 Replies

Loading