Keywords: ensemble learning, interpretability, loss function landscape, theoretical chemistry
Abstract: Minima of the loss function landscape of a neural network are locally optimal sets of
weights that extract and process information from the input data to make outcome predictions.
In underparameterised networks, the capacity of the weights may be insufficient to fit all the relevant information.
We demonstrate that different local minima specialise in certain aspects of the learning problem, and process the input
information differently. This effect can be exploited using a meta-network in
which the predictive power from multiple minima of the LFL is combined to produce a better
classifier. With this approach, we can increase the area under the receiver operating characteristic curve
(AUC) by around $20\%$ for a complex learning problem.
We propose a theoretical basis for combining minima and show how a meta-network can
be trained to select the representative that is used for classification of a
specific data item. Finally, we present an analysis of symmetry-equivalent
solutions to machine learning problems, which provides a systematic means to improve the
efficiency of this approach.
One-sentence Summary: We provide evidence that different ensemble classifiers truly specialise in different parts of the input data and based on this propose a method to solve certain problems in theoretical molecular sciences.
5 Replies
Loading