Keywords: Topic Models, ML for Social Science, Robustness
TL;DR: Environment-adjusted topic models (EATMs) are designed to uncover consistent topics across varying environments.
Abstract: Probabilistic topic models are a powerful tool for extracting latent themes from large text datasets.
However, when applied to data from diverse sources or environments, topic models can fail to capture consistent themes across different sources.
Recognizing this limitation, we propose environment-adjusted topic models (EATMs), designed to uncover consistent topics across varying environments.
EATMs are unsupervised probabilistic models that analyze text from multiple environments and can separate universal and environment-specific terms to learn consistent topics.
Through extensive experimentation on a variety of political content, from ads to tweets and speeches, we show that EATMs produce interpretable global topics and separate environment-specific words.
Importantly, EATMs retains higher performance on out-of-distribution data, compared to strong baselines.
Primary Subject Area: Impact of data bias, variance, and drifts
Paper Type: Research paper: up to 8 pages
DMLR For Good Track: Participate in DMLR for Good Track
Participation Mode: Virtual
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Submission Number: 16
Loading