Efficient Weighted Deduction Systems for Earley’s Algorithm

Anonymous

Efficient Weighted Deduction Systems for Earley’s Algorithm

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone

Abstract: The parsing algorithm of Earley (1970), as presented, has a runtime complexity of $\mathcal{O}(N^3\lvert\mathcal{G}\rvert \lvert\mathcal{R}\rvert)$ where $N$ is the length of the sentence, $\lvert\mathcal{G}\rvert $ is the size of the grammar, and $\lvert\mathcal{R}\rvert$ is the number of productions in the grammar. This is unworkable for the large grammars that arise in natural language processing. Fortunately, the dynamic programming algorithm can be improved to run in time $\mathcal{O}(N^3\lvert\mathcal{G}\rvert)$, matching the complexity of running CKY on a binarized version of $\mathcal{G}$. Some of the necessary speed-ups have been presented in part or in full in various parts of the literature. However, there has been no unified, formal treatment that is written as a deduction system or covers the weighted case. We present such a treatment in terms of five proof rules that can be used in weighted deduction, which refine Earley's \predict, \scan and \complete actions. We also provide a generalization of Earley's algorithm that uses a finite-state automaton to represent the grammar, and whose runtime is proportional to the size of the automaton (and the usual $\mathcal{O}(N^3)$ term), or more precisely the size of the portion of the automaton that is reached while parsing the input sentence. Further speed-ups can then be achieved by minimizing the automaton so that similar productions share transitions.

Paper Type: long

0 Replies

Loading