Offline Learning of Nash Stable Coalition Structures with Possibly Overlapping Coalitions

Saar Cohen

Published: 15 Feb 2026, Last Modified: 24 Mar 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Coalition formation concerns strategic collaborations of selfish agents that form coalitions based on their preferences. It is often assumed that coalitions are disjoint and preferences are fully known, which may not hold in practice. In this paper, we thus present a new model of coalition formation with *possibly overlapping* coalitions under *partial information*, where selfish agents may be part of *multiple* coalitions simultaneously and their full preferences are initially unknown. Instead, information regarding past inter-actions and associated utility feedbacks is stored in a fixed offline dataset, and we aim to efficiently infer the agents’ preferences from this dataset. We analyze the impact of diverse dataset information constraints by studying two types of utility feedbacks that can be stored in the dataset: *semi-bandit* (agent-level) and *bandit* (coalition-level) utility feedbacks. For both feedback models, we identify assumptions under which the dataset covers sufficient information for an offline learning algorithm to infer preferences and use them to recover a partition that is (approximately) *Nash stable*, in which no agent can improve her utility by unilaterally deviating. Our additional goal is devising algorithms with *low sample complexity*, requiring only a small dataset to obtain a desired approximation to Nash stability. Under semi-bandit feedback, we provide a sample-efficient algorithm proven to obtain an approximately Nash stable partition under a *sufficient* and *necessary* assumption on the information covered by the dataset. However, under bandit feedback, we show that only under a stricter assumption is sufficient for sample-efficient learning. Still, in multiple cases, our algorithms’ sample complexity bounds have ***optimality*** guarantees up to logarithmic factors. Finally, extensive experiments show that our algorithm converges to a low approximation level to Nash stability across diverse settings.