Abstract: Individual human decision-makers may benefit from different forms of support to improve decision outcomes, but \textit{which} form of support will yield better outcomes? In this work, we propose the general problem of learning a \textit{decision support policy} that, for a given input, chooses which form of support to provide to decision-makers for whom we initially have no prior information. Using techniques from stochastic contextual bandits, we introduce \texttt{THREAD}, an online algorithm to personalize a decision support policy for each decision-maker. We further propose a variant of \texttt{THREAD} for the multi-objective setting to account for auxiliary objectives like the cost of support. We find that \texttt{THREAD} can learn a personalized policy that outperforms offline policies, and, in the cost-aware setting, reduce the incurred cost with minimal degradation to performance. Our experiments include various realistic forms of support (e.g., expert consensus and predictions from a large language model) on vision and language tasks. We deploy \texttt{THREAD} with real users to show how personalized policies can be learned online and illustrate nuances of learning decision support policies in practice.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Added:
- Section 3: clarified motivation for multi-class classification
- Section 6.4: added paragraph discussing limitations and practical challenges
- Appendix B: provided more details on proofs
Assigned Action Editor: ~Branislav_Kveton1
Submission Number: 1664
Loading