- Abstract: A distinct commonality between HMMs and RNNs is that they both learn hidden representations for sequential data. In addition, it has been noted that the backward computation of the Baum-Welch algorithm for HMMs is a special case of the back propagation algorithm used for neural networks (Eisner (2016)). Do these observations suggest that, despite their many apparent differences, HMMs are a special case of RNNs? In this paper, we investigate a series of architectural transformations between HMMs and RNNs, both through theoretical derivations and empirical hybridization, to answer this question. In particular, we investigate three key design factors—independence assumptions between the hidden states and the observation, the placement of softmax, and the use of non-linearity—in order to pin down their empirical effects. We present a comprehensive empirical study to provide insights on the interplay between expressivity and interpretability with respect to language modeling and parts-of-speech induction.
- Keywords: rnns, hmms, latent variable models, language modelling, interpretability, sequence modelling
- TL;DR: Are HMMs a special case of RNNs? We investigate a series of architectural transformations between HMMs and RNNs, both through theoretical derivations and empirical hybridization and provide new insights.