Abstract: We consider dynamic spectrum access where distributed
secondary users search for spectrum opportunities
without knowing the primary traffic statistics. In each slot,
a secondary transmitter chooses one channel to sense and
subsequently transmit if the channel is sensed as idle. Sensing
is imperfect, i.e., an idle channel may be sensed as busy and
vice versa. Without centralized control, each secondary user
needs to independently identify the channels that offer the
most opportunities while avoiding collisions with both primary
and other secondary users. We address the problem within a
cooperative game framework, where the objective is to maximize
the throughput of the secondary network under a constraint on
the collision with the primary system. The performance of a
decentralized channel access policy is measured by the system
regret, defined as the expected total performance loss with respect
to the optimal performance in the ideal scenario where the
traffic load of the primary system on each channel is known
to all secondary users and collisions among secondary users
are eliminated through centralized scheduling. By exploring the
rich communication structure of the problem, we show that the
optimal system regret has the same logarithmic order as in the
centralized counterpart with perfect sensing. A decentralized
policy is constructed to achieve the logarithmic order of the
system regret. In a broader context, this work addresses imperfect
reward observation in decentralized multi-armed bandit
problems.
0 Replies
Loading