Abstract: It is typically understood that the training of modern neural networks is a process of fitting the probability distribution of expected/desired output. However, recent paradoxical observations in a number of language generation tasks let one wonder if this canonical probability-based explanation can really account for the empirical success of deep learning.
To resolve this issue, we propose an alternative *value-based explanation* to the standard supervised learning procedure in deep learning. The basic idea is to interpret the learned neural network not as a probability model but as a kind of *action-value function* (also called Q-function), and to interpret the training of the neural network as a *value learning* process. In particular, we show that for all neural networks with softmax outputs, the learning dynamic of maximum likelihood estimation (MLE) can be seen as an iteration process that optimizes the neural network toward an optimal Q-function. This value-based interpretation can explain several otherwise-paradoxical observations about the neural networks thus trained. Moreover, our value-based theory also entails an equation that can transform the learned Q-values back to a new kind of probability estimation with which probability-compatible decision rules enjoy dramatic (double-digits) performance improvements.
These evidences collectively reveal a phenomenon of *value-probability duality* in terms of what modern neural networks are (truly) modeling: We thought they are one thing (probabilities), until the unexplainable showed up; changing mindset and treating them as another thing (action values) largely reconcile the theory, despite remaining subtleties regarding its original (probabilistic) identity.
0 Replies
Loading