Test time cost sensitivity in machine learning

Gavin Gray

Published: 2019, Last Modified: 28 Apr 2023undefined 2019Readers: Everyone

Abstract: The use of deep neural networks has enabled machines to classify images, translate between languages and compete with humans in games. These achievements have been enabled by the large and expensive computational resources that are now available for training and running such networks. However, such a computational burden is highly undesirable in some settings. In this thesis we demonstrate how the computational expense of a machine learning algorithm may be reduced. This is possible because, until recently, most research in deep learning has focused on achieving better statistical results on benchmarks, rather than targeting efficiency. However, the learning process is flexible enough for us to control for the test-time computational expense that will be paid when the model is run in an application. To achieve this test-time computation sensitivity, a budget can be incorporated as part of the model. This budget expresses what costs we are willing to incur when we allocate resources at test time. Alternatively we can prescribe the size or computational resources we expect and use that to decide on the appropriate classification model. In either case, considering the resources available when building the model allows us to use it more effectively. In this thesis, we demonstrate methods to reduce the stored size, or floating point operations, of state-of-the-art classification models by an order of magnitude with little effect on their performance. Finally, we find that such compression can even be performed by simply changing the parameterisation of linear transforms used in the network. These results indicate that the design of learning systems can benefit from taking resource efficiency into account.

0 Replies