Keywords: Categorical Random Variables, Latent Random Variables, Probabilistic Methods
TL;DR: Motivated by information geometry, we propose catnat, a hierarchical alternative to softmax for parameterizing categorical random variables in ML. We demonstrate its effectiveness through both theoretical results and extensive empirical evidence.
Abstract: Latent categorical variables are frequently found in deep learning architectures. They can model actions in discrete reinforcement-learning environments, represent categories in latent-variable models, or express relations in graph neural networks. Despite their widespread use, their discrete nature poses significant challenges to gradient-descent learning algorithms. While a substantial body of work has offered improved gradient estimation techniques, we take a complementary approach. Specifically, we: 1) revisit the ubiquitous _softmax_ function and demonstrate its limitations from an information-geometric perspective; 2) replace the _softmax_ with the _catnat_ function, a function composed by a sequence of hierarchical binary splits; we prove that this choice offers significant advantages to gradient descent due to the resulting diagonal Fisher Information Matrix. A rich set of experiments — including graph structure learning, variational autoencoders, and reinforcement learning — empirically show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance. _Catnat_ is simple to implement and seamlessly integrates into existing codebases. Moreover, it remains compatible with standard training stabilization techniques and, as such, offers a better alternative to the _softmax_ function.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 19410
Loading