Dsa-PAML: a parallel automated machine learning system via dual-stacked autoencoder

Pengjie Liu, Fucheng Pan, Xiaofeng Zhou, Shuai Li, Pengyu Zeng, Shurui Liu, Liang Jin

Published: 01 Jan 2022, Last Modified: 05 Nov 2023Neural Comput. Appl. 2022Readers: Everyone

Abstract: Finding a high-performance machine learning pipeline (ML pipeline) for a supervised learning task takes much time. It requires many choices, including preprocessing datasets, selecting algorithms, tuning hyperparameters, and ensembling candidate models. With increasing pipelines arises a combination explosion problem. This work presents a new automated machine learning (AutoML) system called Dsa-PAML to address this challenge by recommending, training, and ensembling suitable models for supervised learning tasks. Dsa-PAML is a parallel automated system based on a dual-stacked autoencoder (Dsa). Firstly, meta-features of datasets and ML pipelines are used to alleviate cold-start recommendation problems. Secondly, a novel dual-stacked autoencoder is used to simultaneously learn the latent features of datasets and ML pipelines, efficiently learning collaborations of both datasets and ML pipelines and recommending suitable ML pipelines for a new dataset. Thirdly, Dsa-PAML can train the recommended ML pipelines on the new dataset in a parallel method, which substantially reduces the time complexity of the proposed method. Finally, a parallel selective ensemble system is embedded into Dsa-PAML. It selects base models from candidate ML pipelines according to their runtime, classification performance, and diversity on the validation set, enhancing Dsa-PAML’s stability for most datasets. Amounts of experiments on 30 UCI datasets show that our approach outperforms current state-of-the-art methods.

0 Replies