Abstract: Finding a high-performance machine learning pipeline (ML pipeline) for a supervised learning task takes much time. It requires many choices, including preprocessing datasets, selecting algorithms, tuning hyperparameters, and ensembling candidate models. With increasing pipelines arises a combination explosion problem. This work presents a new automated machine learning (AutoML) system called Dsa-PAML to address this challenge by recommending, training, and ensembling suitable models for supervised learning tasks. Dsa-PAML is a parallel automated system based on a dual-stacked autoencoder (Dsa). Firstly, meta-features of datasets and ML pipelines are used to alleviate cold-start recommendation problems. Secondly, a novel dual-stacked autoencoder is used to simultaneously learn the latent features of datasets and ML pipelines, efficiently learning collaborations of both datasets and ML pipelines and recommending suitable ML pipelines for a new dataset. Thirdly, Dsa-PAML can train the recommended ML pipelines on the new dataset in a parallel method, which substantially reduces the time complexity of the proposed method. Finally, a parallel selective ensemble system is embedded into Dsa-PAML. It selects base models from candidate ML pipelines according to their runtime, classification performance, and diversity on the validation set, enhancing Dsa-PAML’s stability for most datasets. Amounts of experiments on 30 UCI datasets show that our approach outperforms current state-of-the-art methods.
0 Replies
Loading