Online learning for scheduling in wireless networks
Over the last few years, online learning has grown in importance as it allows us to build systems that can interact with the environment while continuously learning from past interactions to improve future decisions to maximize some objective. While online learning is used in several areas like recommendation systems, however, due to the complexity of wireless scheduling it is unclear how to utilize online learning. For instance, MU-MIMO scheduling involves the selection of a user subset and associated rate selection each time-slot for varying channel states (the vector of quantized channels matrices for each of the users) — a complex integer optimization problem that is different for each channel state. We propose that a low-dimensional structure is present in the wireless systems which can be exploited through online learning. For instance, channel-states "near" each other will likely have the same optimal solution.
In our first problem, we present a framework through which we formulate the wireless scheduling problem as a multi-armed bandit problem. We then propose an online algorithm that can cluster the channel-states and learn the capacity region of these clusters. We show that our algorithms can significantly reduce the complexity of online learning for wireless settings and provide regret guarantees for our algorithm.
In the second problem, we expand on our previous work and present (1) a framework that further exploits the low-dimensional structure present in the system by clustering users and (2) an online algorithm that utilizes the parameters learned by our previous algorithms to optimize the subset of users to be scheduled for given channel-state. We show that our algorithms can not only converge faster but also improve the overall throughput of the system. We also provide regret guarantees for the user clustering algorithm.