Abstract: Unsupervised constituency parsing has been explored much but is still far from being solved as currently mainstream unsupervised constituency parser only captures the unlabeled structure of sentences. Properties in the substitution of constituents make it possible to detect constituents in a particular label. We propose an unsupervised and training-free labeling procedure by leveraging a newly introduced metric, Neighboring Distribution Divergence (NDD), which evaluates semantic changes caused by editions. We develop NDD into Dual POS-NDD (DP-NDD) and build templates called "molds" to extract labeled constituents from sentences. We show that DP-NDD labels constituents precisely and inducts more accurate unlabeled constituency trees than all previous unsupervised methods. Following two frameworks for labeled constituency trees inference, we set the new state-of-the-art for unlabeled F1 and labeled F1. Further studies show our approach can be scaled to other span labeling problems, i.e., named entity recognition.
Paper Type: long
0 Replies
Loading