Abstract: Online resources, such as Wiktionary, provide
an accurate but incomplete source of idiomatic
phrases. In this paper, we study the problem
of automatically identifying idiomatic dictio-
nary entries with such resources. We train
an idiom classifier on a newly gathered cor-
pus of over 60,000 Wiktionary multi-word
definitions, incorporating features that model
whether phrase meanings are constructed
compositionally. Experiments demonstrate
that the learned classifier can provide high
quality idiom labels, more than doubling the
number of idiomatic entries from 7,764 to
18,155 at precision levels of over 65%. These
gains also translate to idiom detection in sen-
tences, by simply using known word sense
disambiguation algorithms to match phrases
to their definitions. In a set of Wiktionary def-
inition example sentences, the more complete
set of idioms boosts detection recall by over
28 percentage points.
Loading