Abstract: In order to obtain a fine-grained evaluation of parser accuracy over naturally occurring text, we study 100 examples each of ten reasonably frequent linguistic phenomena, randomly selected from a parsed version of the English Wikipedia. We construct a corresponding set of gold-standard target dependencies for these 1000 sentences, operationalize mappings to these targets from seven state-of-the-art parsers, and evaluate the parsers against this data to measure their level of success in identifying these dependencies.
0 Replies
Loading