Faster Training by Selecting Samples Using Embeddings

Santiago Gonzalez; Joshua Landgraf; Risto Miikkulainen

Faster Training by Selecting Samples Using Embeddings

Santiago Gonzalez, Joshua Landgraf, Risto Miikkulainen

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Long training times have increasingly become a burden for researchers by slowing down the pace of innovation, with some models taking days or weeks to train. In this paper, a new, general technique is presented that aims to speed up the training process by using a thinned-down training dataset. By leveraging autoencoders and the unique properties of embedding spaces, we are able to filter training datasets to include only those samples that matter the most. Through evaluation on a standard CIFAR-10 image classification task, this technique is shown to be effective. With this technique, training times can be reduced with a minimal loss in accuracy. Conversely, given a fixed training time budget, the technique was shown to improve accuracy by over 50%. This technique is a practical tool for achieving better results with large datasets and limited computational budgets.

Keywords: Machine Learning, Embeddings, Training Time, Optimization, Autoencoders

TL;DR: Training is sped up by using a dataset that has been subsampled through embedding analysis.

4 Replies

Loading