Keywords: sentence embeddings, nearest neighbors, semantic similarity
TL;DR: We propose nearest neighbor overlap, a procedure which quantifies similarity between embedders in a task-agnostic manner, and use it to compare 21 sentence embedders.
Abstract: As distributed approaches to natural language semantics have developed and diversified, embedders for linguistic units larger than words (e.g., sentences) have come to play an increasingly important role. To date, such embedders have been evaluated using benchmark tasks (e.g., GLUE) and linguistic probes. We propose a comparative approach, nearest neighbor overlap (N2O), that quantifies similarity between embedders in a task-agnostic manner. N2O requires only a collection of examples and is simple to understand: two embedders are more similar if, for the same set of inputs, there is greater overlap between the inputs' nearest neighbors. We use N2O to compare 21 sentence embedders and show the effects of different design choices and architectures.
Data: [GLUE](https://paperswithcode.com/dataset/glue), [MultiNLI](https://paperswithcode.com/dataset/multinli), [SNLI](https://paperswithcode.com/dataset/snli)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/arxiv:1909.10724/code)
Original Pdf: pdf
6 Replies
Loading