Keywords: unsupervised learning, meta-learning, optimal transport
Abstract: Automated machine learning has been widely researched and adopted for supervised tasks such as classification and regression. Unsupervised scenarios, lacking a ground truth to optimize on, are much harder to automate. We propose a novel zero-shot meta-learning approach that recommends which algorithms and hyperparameters to use on new unsupervised tasks by learning from prior supervised proxy datasets. Our premise is that the selection of optimal unsupervised algorithms depends on the inherent properties of the data distribution. We first build a large meta-dataset evaluating many algorithms and hyperparameter settings on prior datasets, leverage optimal transport to find the prior datasets with the most similar underlying distribution, and then recommend the (tuned) algorithm that proved to work best for that data distribution. We evaluate the robustness of our approach on one particular task, i.e. outlier detection, and find that it outperforms state of the art methods in unsupervised outlier detection.