Consistent Bidirectional Language Modelling: Expressive Power and Representational Conciseness

Georgi Shopov, Stefan Gerdjikov

Published: 01 Jan 2024, Last Modified: 26 Nov 2024EMNLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The inability to utilise future contexts and the pre-determined left-to-right generation order are major limitations of unidirectional language models. Bidirectionality has been introduced to address those deficiencies. However, a crucial shortcoming of bidirectional language models is the potential inconsistency of their conditional distributions. This fundamental flaw greatly diminishes their applicability and hinders their capability of tractable sampling and likelihood computation. In this work, we introduce a class of bidirectional language models, called latent language models, that are consistent by definition and can be efficiently used both for generation and scoring of sequences. We define latent language models based on the well-understood formalism of bisequential decompositions from automata theory. This formal correspondence allows us to precisely charaterise the abilities and limitations of a subclass of latent language models, called rational language models. As a result, we obtain that latent language models are exponentially more concise and significantly more expressive than unidirectional language models.