Abstract: Several important real-world problems involve multiple entities interacting with each other and can thus be modeled as multi-agent systems. Multi-agent systems are at the core of our society and, due to the recent advances in big data and artificial intelligence, are rapidly permeating into new application domains such as autonomous driving, e-commerce, shared mobility, etc. At the same time, however, such recent progress has brought relevant challenges related to the decision-making, learning, and efficiency of such systems which makes them less understood than their single-agent counterparts. In this thesis, we aim to partially address some of these challenges. The first part of the thesis investigates the problem of sample-efficient active data collection in multi-agent systems, i.e., how agents can acquire new data and learn about the underlying game without sacrificing performance. This problem, also known as the exploration vs. exploitation dilemma, has been extensively studied in single-agent problems but remains fairly unexplored in multi-agent domains. We propose a novel approach to this, which consists of using past observed data to exploit the correlations present in the game by means of statistical regression techniques. This allows the agents to build high-probability confidence intervals around the underlying game rewards and use these to improve their strategy via optimism in the face of uncertainty. We first instantiate this idea in normal-form games and then extend it to a newly defined class of contextual games (where agents observe contextual information before playing), Markov games, and sequential (Stackelberg) games. We provide theoretical regret bounds of the resulting algorithms, yielding provable convergence to equilibria. Moreover, we evaluate our methods in experimental case studies in traffic routing, autonomous driving interactions, and wildlife protection. Our algorithms gradually learn about the underlying game and display a significantly lower regret compared to the existing baselines that utilize solely the obtained game rewards (the so-called bandit feedback). Moreover, they often achieve comparable performance to methods that – unlike ours – require full information about the game. In the second part of the thesis, we study the system-level efficiency of multi-agent systems, i.e., the quality of their equilibria (arising from agents’ selfishness) with respect to a system-level objective. There is a long history of research that upper bounds their inefficiency but this has mostly considered games with finite or discrete actions. We extend some of these results to a novel class of continuous action games displaying certain regularity conditions. Moreover, we provide more general efficiency bounds for the case of time-varying contextual games and in the presence of learning agents. Then, motivated by the obtained results and by emerging applications in shared mobility, we consider the problem of designing multi-agent systems to solve hard resource allocation problems (such as rebalancing a bike-sharing system) in a distributed fashion. We propose a novel algorithm for this task and, based on the results obtained in the previous chapters, we provide rigorous convergent and approximation guarantees.
0 Replies
Loading