- Abstract: Long training times have increasingly become a burden for researchers by slowing down the pace of innovation, with some models taking days or weeks to train. In this paper, a new, general technique is presented that aims to speed up the training process by using a thinned-down training dataset. By leveraging autoencoders and the unique properties of embedding spaces, we are able to filter training datasets to include only those samples that matter the most. Through evaluation on a standard CIFAR-10 image classification task, this technique is shown to be effective. With this technique, training times can be reduced with a minimal loss in accuracy. Conversely, given a fixed training time budget, the technique was shown to improve accuracy by over 50%. This technique is a practical tool for achieving better results with large datasets and limited computational budgets.
- Keywords: Machine Learning, Embeddings, Training Time, Optimization, Autoencoders
- TL;DR: Training is sped up by using a dataset that has been subsampled through embedding analysis.