Abstract: Claude Shannon coined entropy to quantify the uncertainty of a random distribution for communication coding theory in 1948(Shannon, 1948). Ever since then, entropy has been widely used and equivalently treated as information by many people, and Shannon information theory has achieved huge success in all science and engineering fields. However, we observe that the uncertainty nature of entropy also limits its direct usage in mathematical modeling such as prediction tasks. Entropy has been mainly used as a measure of information gain or loss and it hardly has been directly employed for modeling and prediction purposes in the literature. We notice that a quantity measuring the certainty of a random distribution is the desired directly usable information for such modeling purposes. Therefore we propose a new information concept troenpy, as the canonical dual of entropy, to quantify the certainty of the underlying distribution. We establish the necessary concepts and properties for this new Information Theory of Certainty (ITC), analogue of the classical Shannon information theory. We demonstrate two important applications of troenpy for machine learning. The first is for the classical supervised document classification task, we develop a troenpy based weighting scheme to leverage the document class label distribution and show that the weighting scheme can be easily and very effectively used for classification tasks. The second is for the popular self-supervised language modeling task, where we introduce a self-troenpy weighting scheme for sequential data and show that it can be directly and easily included in modern recurrent neural network based language models and achieve dramatic perplexity reduction. Besides machine learning ITC also has the potential application on quantum information theory. We generalize the idea and define quantum troenpy as the dual of the Von Neumann entropy to quantify the certainty of quantum systems. In conclusion we developed a new information theory quantifying certainty in random systems and such information can be easily, directly and effec- tively used in modern machine learning and neural network models. With the theory sup- ported cheap and effective way of extracting and representing useful information from data, ITC offers not only promising ways to leverage useful information to improve current machine learning model performances but also possibil- ities of designing new machine learning and neural network models from the view of useful information processing.
0 Replies
Loading