Distributional Representation Clusters Complement Part-of-Speech TagsDownload PDF

Anonymous

22 May 2018 (modified: 22 May 2018)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
Abstract: Many works have successfully co-opted word clusters derived from distributional information, such as Brown clusters, as features in language processing tasks. We note that not only do such clusters make poor proxies for part-of-speech tags; these clusters are in fact quite different from part-of-speech tags. This paper investigates the gap between Brown clusters, clusterings in word embedding space, and part-of-speech tags, across a range of languages. We find that, while word types clustered together may seem at a glance to be cohesive, distributionally derived clusters in fact strongly complement part-of-speech tags across many languages, suggesting a surprising amount of difference between the information contained in these representations.
0 Replies

Loading