Abstract: The primary objective of this project is to develop a robust, high-performance parser for English by automatically extracting a grammar from an annotated corpus of bracketed sentences, called the Treebank. The project is a collaboration between the IBM Continuous Speech Recognition Group and the University of Pennsylvania Department of Computer Sciences. Our initial focus is the domain of computer manuals with a vocabulary of 3000 words. We use a Treebank that was developed jointly by IBM and the University of Lancaster, England, during the past three years.
0 Replies
Loading