Explain-and-Test: An Interactive Machine Learning Framework for Exploring Text Embeddings

Published: 01 Jan 2023, Last Modified: 15 Oct 2025IEEE VIS (Short Papers) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Text embeddings–mappings of collections of text to points in high-dimensional space–are a common object of analysis. A classic method to visualize these embeddings is to create a nonlinear projection to two dimensions and look for clusters and other structures in the resulting map. Explaining why certain texts cluster together, however, can be difficult. In this paper, we introduce a human-in-the-loop framework for applying machine learning (ML) to this challenge. The framework has two stages: (1) explain, in which we use ML to produce a description of a pattern; and (2) test, in which the user can verify the explanation by entering new text that fits the pattern, and sees where it appears on the map. If the new text is mapped to the original cluster, that is evidence in favor of the ML-generated explanation. We illustrate this process with a visualization application that provides two kinds of explanations: Natural Language Explanations and Contrastive PhraseClouds. Scenarios on exploring academic papers and literary work showcase the benefit of our workflow in discovering related topics and analyzing thematic differences in text.
Loading