Abstract: In fog-assisted IoT systems, it is a common practice to cache popular content at the network edge to achieve high quality of service. Due to various uncertainties such as unknown file popularities in practice, the design of effective cache placement scheme is still an open problem with two key challenges: 1) how to incorporate online learning into the cache placement process to minimize performance loss (a.k.a. regret), and 2) how to maintain caching costs under budgets in the long run. In this paper, we formulate the content cache placement problem with unknown file popularities as a combinatorial multi-armed bandit (CMAB) problem with long-term time-average constraints. We adopt bandit learning methods and virtual queue technique to deal with the exploration-exploitation tradeoff and long-term time-average constraints, respectively. With an effective integration of online learning and online control, we devise a learning-aided cache placement scheme called CPB (Cache Placement with Bandit Learning). Our theoretical analysis and simulation results show that CPB achieves a tunable sublinear regret over a finite time horizon and keeps caching costs within budgets in the long run.
Loading