Online learning and decision-making from implicit feedback

dc.contributor.advisorShakkottai, Sanjay
dc.contributor.advisorVishwanath, Sriram
dc.contributor.committeeMemberBaccelli, Francois
dc.contributor.committeeMemberZitkovic, Gordon
dc.contributor.committeeMemberSrikant, Rayadurgam
dc.creatorKrishnasamy, Subhashini
dc.date.accessioned2017-06-20T20:33:42Z
dc.date.available2017-06-20T20:33:42Z
dc.date.issued2017-05
dc.date.submittedMay 2017
dc.date.updated2017-06-20T20:33:43Z
dc.description.abstractThis thesis focuses on designing learning and control algorithms for emerging resource allocation platforms like recommender systems, 5G wireless networks, and online marketplaces. These systems have an environment which is only partially known. Thus, the controllers need to make resource allocation decisions based on implicit feedback obtained from the environment based on past actions. The goal is to sequentially select actions using incremental feedback so as to optimize performance while simultaneously learning about the environment. We study three problems which exemplify this setting. The first is an inference problem which requires identification of sponsored content in recommender systems. Specifically, we ask if it is possible to detect the existence of sponsored content disguised as genuine recommendations using implicit feedback from a subset of users of the recommender system. The second problem is the design of scheduling algorithms for switch networks when the user-server link statistics are unknown (for e.g., in wireless networks, online marketplaces). The scheduling algorithm has to tradeoff between scheduling the optimal links and obtaining sufficient feedback about all the links for accurate estimates. We observe the close connection of this problem to the stochastic multi-armed bandit problem and analyze bandit-style explore-exploit algorithms for learning the statistical parameters while simultaneously assigning servers to users. The third is the joint problem of base station activation and rate allocation in an energy efficient wireless network when the channel statistics are unknown. The controller observes instantaneous channel rates of activated BSs, and thereby sequentially obtains implicit feedback about the channel. Here again, there is a tradeoff between learning the channel versus optimizing the operation cost based on estimated parameters. For each of these systems, we propose algorithms with provable asymptotic guarantees. These learning algorithms highlight the use of implicit feedback in online decision making and control.
dc.description.departmentElectrical and Computer Engineering
dc.format.mimetypeapplication/pdf
dc.identifierdoi:10.15781/T2P26Q88C
dc.identifier.urihttp://hdl.handle.net/2152/47285
dc.language.isoen
dc.subjectOnline learning
dc.subjectResource allocation
dc.subjectLearning algorithm design
dc.subjectImplicit feedback
dc.subjectIncremental feedback
dc.subjectSponsored content
dc.subjectSponsored content detection
dc.subjectScheduling algorithm design
dc.subjectStochastic multi-armed bandit
dc.subjectBase station activation
dc.subjectLearning algorithms
dc.subjectOnline decision-making
dc.subjectOnline control
dc.titleOnline learning and decision-making from implicit feedback
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical and Computer Engineering
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KRISHNASAMY-DISSERTATION-2017.pdf
Size:
1.65 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.85 KB
Format:
Plain Text
Description: