Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

Wei Liu; Songlin Yang; Yoon Kim; Kewei Tu

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

Wei Liu, Songlin Yang, Yoon Kim, Kewei Tu

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: Syntax, Parsing and their Applications

Submission Track 2: Machine Learning for NLP

Keywords: grammar induction, unsupervised parsing, latent variable models

TL;DR: A simplistic PCFG formalism built upon a stronger independence assumption exhibits remarkable performance

Abstract: Scaling dense PCFGs to thousands of nonterminals via low-rank parameterizations of the rule probability tensor has been shown to be beneficial for unsupervised parsing. However, PCFGs scaled this way still perform poorly as a language model, and even underperform similarly-sized HMMs. This work introduces $\emph{SimplePCFG}$, a simple PCFG formalism with independent left and right productions. Despite imposing a stronger independence assumption than the low-rank approach, we find that this formalism scales more effectively both as a language model and as an unsupervised parser. We further introduce $\emph{FlashInside}$, a hardware IO-aware implementation of the inside algorithm for efficiently scaling simple PCFGs. Through extensive experiments on multiple grammar induction benchmarks, we validate the effectiveness of simple PCFGs over low-rank baselines.

Submission Number: 5824

Loading