Thompson Sampling for some decentralized control problems

Mukul Gagrani, Ashutosh Nayyar

2018 (modified: 07 Nov 2022)CDC 2018Readers: Everyone

Abstract: We consider a two-agent team learning problem over an infinite time horizon under two different dynamics and information sharing models: i) Decoupled dynamics with no information sharing, ii) Coupled dynamics with one-step delayed information sharing. The state transition kernels are parametrized by an unknown but fixed parameter taking values in a finite space. We study a decentralized Thompson sampling based approach to learn the underlying parameter where each agent maintains a belief about the underlying parameter. The agents draw a sample from their beliefs at each time and select their action using the benchmark policy for the sampled parameter. We show that under some assumptions on the state transition kernels, the regret achieved by Thompson sampling is upper bounded by a constant independent of the time horizon.

0 Replies