- Abstract: A distinct commonality between HMMs and RNNs is that they both learn hidden representations for sequential data. In addition, it has been noted that the backward computation of the Baum-Welch algorithm for HMMs is a special case of the back-propagation algorithm used for neural networks (Eisner (2016)). Do these observations suggest that, despite their many apparent differences, HMMs are a special case of RNNs? In this paper, we show that that is indeed the case, and investigate a series of architectural transformations between HMMs and RNNs, both through theoretical derivations and empirical hybridization. In particular, we investigate three key design factors—independence assumptions between the hidden states and the observation, the placement of softmaxes, and the use of non-linearities—in order to pin down their empirical effects. We present a comprehensive empirical study to provide insights into the interplay between expressivity and interpretability in this model family with respect to language modeling and parts-of-speech induction.
- TL;DR: Are HMMs a special case of RNNs? We investigate a series of architectural transformations between HMMs and RNNs, both through theoretical derivations and empirical hybridization and provide new insights.
- Keywords: rnns, hmms, latent variable models, language modelling, interpretability, sequence modelling