Abstract: A central task of computational linguistics is to decide if two pieces of texts have similar meanings. Ideally, this depends on an intuitive notion of semantic distance. While this semantic distance is most likely undefinable and uncomputable, in practice it is approximated heuristically, consciously or unconsciously. In this paper, we present a theory, and its implementation, to approximate the elusive semantic distance by the well-defined information distance. It is mathematically proven that any computable approximation of the intuitive concept of semantic distance is "covered" by our theory. We have implemented our theory to question answering (QA) and performed experiments based on data extracted from over 35 million question-answer pairs. Experiments demonstrate that our initial implementation of the theory produces convincingly fewer errors in classification compared to other academic models and commercial systems.
0 Replies
Loading