Keywords: alignment, underspecification, fairness, bias, pluralistic alignment, persona conditioning, inclusivity
Abstract: When a user prompts a language model (LLM) to plan, design, or decide on behalf of others, the user often cannot specify every relevant constraint. Needs that are materially relevant but neither stated by the user nor surfaced by the model, what we term $\textit{\textbf{latent needs}}$, represent a failure mode that existing approaches to bias and alignment are not designed to detect. We frame their discovery as a problem of $\textbf{\textit{second-order pluralistic alignment}}$, where the model must account for the preferences of affected third parties the user has not enumerated and may not know about. Drawing on standpoint theory, we argue that LLMs reason from a standpoint that determines which preferences are visible. We introduce $\textbf{BlindSpot}$, a dataset of 1,830 scenarios, with four prompt conditions that decompose failures into $\textit{aleatoric}$ gaps (not recognizing the relevance of a relevant group) and $\textit{epistemic}$ gaps (not knowing their preferences). Across 6 models, we find failures concentrated in religious, cultural, and socioeconomic needs, consistent with a standpoint that is $\textit{physically normative, English-speaking, religiously unaffiliated, and economically stable}$. We find that persona conditioning on a group is especially effective, and can even substantially outperform naming the group as relevant.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Type: Provocation
Archival Status: Non-archival
Submission Number: 88
Loading