A Corpus-Based Approach to Automatic Compound Extraction

Keh-Yih Su, Ming-Wen Wu, Jing-Shin Chang

1994 (modified: 16 Jul 2019)ACL 1994Readers: Everyone

Abstract: An automatic compound retrieval method is proposed to extract compounds within a text message. It uses n-gram mutual information, relative frequency count and parts of speech as the features for compound extraction. The problem is modeled as a two-class classification problem based on the distributional characteristics of n-gram tokens in the compound and the non-compound clusters. The recall and precision using the proposed approach are 96.2% and 48.2% for bigram compounds and 96.6% and 39.6% for trigram compounds for a testing corpus of 49,314 words. A significant cutdown in processing time has been observed.

0 Replies