MiniMax Learning of Interpretable Factored Stochastic Policies from Conjoint Data, with Uncertainty Quantification

12 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal inference; High-dimensional treatments; Randomized experiments; Policy learning
TL;DR: We derive optimal stochastic interventions for conjoint experiments—incorporating adversarial, game-theoretic candidate selection—and show adversarial strategies better match historical stratgic outcomes than nonstrategic methods.
Abstract: We study offline learning of factored stochastic policies over extremely large, combinatorial action spaces and show how standard conjoint data can be used to estimate such policies with valid statistical uncertainty. Conjoint analyses typically report AMCEs by averaging over opponent attributes and thus ignore strategic interdependence. We instead learn stochastic interventions—product-of-Categorical policies over factor levels—that (i) optimize expected outcomes in an average-case setting and (ii) extend to a two-player minimax (adversarial) setting that realistically captures simultaneous strategic candidate selection. Methodologically, we derive a closed-form solution for the average-case optimizer under two-way interactions with L2 variance regularization, and provide a general gradient-based procedure for richer model classes. Uncertainty from the outcome model propagates exactly to both the optimal policy and its value via the Delta method. We further model institutional details (e.g., primaries) inside the minimax objective and introduce a data-driven measure of strategic divergence between parties. On synthetic data, we characterize sample complexity and coverage as dimensionality and n vary. On a U.S. presidential conjoint, adversarially learned policies produce equilibrium vote shares that align with historical election ranges, in stark contrast to non-adversarial (averaging) optimizers. To facilitate reproducibility and further research, we release an open-source dataset of mapped historical U.S. presidential candidate features on Hugging Face (anonymous URL). Our framework connects causal policy learning with multi-agent RL in high-dimensional discrete action spaces while preserving interpretability and statistical guarantees.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 4569
Loading