Visual Topics via Visual Vocabularies

Shreya Havaldar; Weiqiu You; Lyle Ungar; Eric Wong

Visual Topics via Visual Vocabularies

Shreya Havaldar, Weiqiu You, Lyle Ungar, Eric Wong

Published: 27 Oct 2023, Last Modified: 10 Nov 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX

TL;DR: We develop visual vocabularies as an interface between image datasets and topic modeling algorithms.

Abstract: Researchers have long used topic modeling to automatically characterize and summarize text documents without supervision. Can we extract similar structures from collections of images? To do this, we propose visual vocabularies, a method to analyze image datasets by decomposing images into segments, and grouping similar segments into visual "words". These vocabularies of visual "words" enable us to extract visual topics that capture hidden themes distinct from what is captured by classic unsupervised approaches. We evaluate our visual topics using standard topic modeling metrics and confirm the coherency of our visual topics via a human study.

Submission Track: Full Paper Track

Application Domain: Computer Vision

Survey Question 1: We develop visual vocabularies as an interface between image datasets and topic modeling algorithms. Our resulting visual topics allow us to explain relationships in datasets that cannot be captured by traditional similarity-based clusters. Visual topics capture themes grounded in relatedness and are thus a complimentary addition to classic unsupervised explanation techniques.

Survey Question 2: Topic modeling is a valuable explanation tool in NLP - words like "government" and "president" and "America", though not similar in meaning, are related and co-occur in documents about politics. Clustering (based on some similarity metric) would not capture such relationships between words, but topic modeling does. We were inspired by the efficacy of topic modeling in NLP and wanted to create parallel visual topics that capture relationships in image datasets.

Survey Question 3: We use segmentation and k-means clustering to construct a visual vocabulary and visual document, and Latent Dirichlet Allocation to create visual topics.

Submission Number: 68

Loading