Bandit Limited Discrepancy Search and Application to Machine Learning Pipeline OptimizationDownload PDF

Published: 14 Jul 2021, Last Modified: 05 May 2023AutoML@ICML2021 PosterReaders: Everyone
Keywords: automated machine learning, machine pipeline optimization, limited discrepancy search, multi-armed bandit
TL;DR: Optimizing machine learning pipelines using heuristic search
Abstract: Optimizing a machine learning (ML) pipeline has been an important topic of AI and ML. Despite recent progress, this topic remains a challenging problem, due to potentially many combinations to consider as well as slow training and validation. We present the BLDS algorithm for optimized algorithm selection (ML operations) in a fixed ML pipeline structure. BLDS performs multi-fidelity optimization for selecting ML algorithms trained with smaller computational overhead, while controlling its pipeline search based on multi-armed bandit and limited discrepancy search. Our experiments on well-known benchmarks show that BLDS is superior to competing algorithms.
Ethics Statement: The pipeline optimization task has been an important topic of AutoML studied in the machine learning community, since it is tedious and time-consuming to manually optimize the pipeline. Developing an efficient pipeline optimization algorithm has been necessary to be able to get AutoML used in a wider range of machine learning applications. Our empirical results on various open benchmark datasets clearly show that our algorithm converges more quickly than competing algorithms. This gives an implication that more accurate pipelines can be quickly developed, thus being to get deployed in real-world applications arising in the society, where machine learning pipelines are necessary to perform classification tasks. Examples of the real-world applications include credit fraud detection, fake news detection, medical diagnosis, and so on. Our approach has not yet addressed the case where the dataset is biased. In this case, even if our approach generates an optimized pipeline with the dataset, the actual performance of the pipeline might not meet what practitioners expect.
Crc Pdf: pdf
Poster Pdf: pdf
Original Version: pdf
3 Replies

Loading