Abstract: This paper introduces a new technique for phrase-structure parser analysis, categorizing possible treebank structures by integrating regular expressions into derivation trees. We analyze the performance of the Berkeley parser on OntoNotes WSJ and the English Web Treebank. This provides some insight into the evalb scores, and the problem of domain adaptation with the web data. We also analyze a “test-ontrain” dataset, showing a wide variance in how the parser is generalizing from different structures in the training material.
0 Replies
Loading