Automatic idiom identification in Wiktionary

Published: 18 Oct 2013, Last Modified: 14 Jan 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Online resources, such as Wiktionary, provide an accurate but incomplete source of idiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictio- nary entries with such resources. We train an idiom classifier on a newly gathered cor- pus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed compositionally. Experiments demonstrate that the learned classifier can provide high quality idiom labels, more than doubling the number of idiomatic entries from 7,764 to 18,155 at precision levels of over 65%. These gains also translate to idiom detection in sen- tences, by simply using known word sense disambiguation algorithms to match phrases to their definitions. In a set of Wiktionary def- inition example sentences, the more complete set of idioms boosts detection recall by over 28 percentage points.
Loading