Online learning algorithms for wireless scheduling

Song, Jianhan

Online learning algorithms for wireless scheduling

Access full-text files

SONG-PRIMARY-2024-1.pdf (8.56 MB)

Date

2023-12

Authors

Song, Jianhan

Abstract

Online learning, and more specifically, multi-armed bandit algorithms, has recently garnered significant interest across diverse fields. Within an online learning framework, agents can leverage past interactions with their environment to optimize future decisions, making it an ideal mechanism for use in applications such as recommendation systems. Driven by these advantages, we believe that the online learning approach can be effectively employed to address resource allocation and scheduling challenges in wireless systems, with the potential to enhance the adaptability and robustness of system performance. In this dissertation, we explore the applications of multi-armed bandit algorithms in various wireless settings, showcasing their efficacy through both theoretical analysis and empirical demonstrations. We first studied the multi-user scheduling problem for the wireless downlink with instantaneous channel rate and queue information. We introduced the concept of "meta-scheduling", which formulates the task of selecting an optimal wireless scheduler as a bandit problem, and proposed a UCB-type bandit algorithm designed to adapt to the dynamics of a queueing system. Expanding on the meta-scheduling concept, we then studied a model of hierarchical scheduling in the context of network slicing, in which the base station learns the optimal option among infinitely-many arms. Our approach involves formulating the problem as a blackbox optimization and addressing it using an HOO-type bandit algorithm adaptive to random queueing cycles. Lastly, we transitioned into a multi-agent setting, where decisions of learning agents in close proximity are coupled with each other through interference. Within this context, we identified a low-complexity structure termed the "weakly-coupled system", and developed a decentralized bandit algorithm to facilitate the learning of optimal collective actions. Throughout each of these segments, we presented rigorous theoretical proofs demonstrating that the proposed algorithms exhibit the desired sub-linear regret compared to an idealized genie. Furthermore, we validated the efficacy of the algorithms through a series of experiments using simulation.