Keywords: metacognition, reliability
TL;DR: Information about whether a neural network's output will be correct or incorrect is somewhat present in the outputs of the network's intermediate layers.
Abstract: We show that information about whether a neural network's output will be correct or incorrect is present in the outputs of the network's intermediate layers. To demonstrate this effect, we train a new "meta" network to predict from either the final output of the underlying "base" network or the output of one of the base network's intermediate layers whether the base network will be correct or incorrect for a particular input. We find that, over a wide range of tasks and base networks, the meta network can achieve accuracies ranging from 65% - 85% in making this determination.
1 Reply
Loading