Browsing by Subject "Learning agents"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Autonomous trading in modern electricity markets(2015-12) Urieli, Daniel; Stone, Peter, 1971-; Mooney, Raymond; Ravikumar, Pradeep; Baldick, Ross; Kolter, ZicoThe smart grid is an electricity grid augmented with digital technologies that automate the management of electricity delivery. The smart grid is envisioned to be a main enabler of sustainable, clean, efficient, reliable, and secure energy supply. One of the milestones in the smart grid vision will be programs for customers to participate in electricity markets through demand-side management and distributed generation; electricity markets will (directly or indirectly) incentivize customers to adapt their demand to supply conditions, which in turn will help to utilize intermittent energy resources such as from solar and wind, and to reduce peak-demand. Since wholesale electricity markets are not designed for individual participation, retail brokers could represent customer populations in the wholesale market, and make profit while contributing to the electricity grid’s stability and reducing customer costs. A retail broker will need to operate continually and make real-time decisions in a complex, dynamic environment. Therefore, it will benefit from employing an autonomous broker agent. With this motivation in mind, this dissertation makes five main contributions to the areas of artificial intelligence, smart grids, and electricity markets. First, this dissertation formalizes the problem of autonomous trading by a retail broker in modern electricity markets. Since the trading problem is intractable to solve exactly, this formalization provides a guideline for approximate solutions. Second, this dissertation introduces a general algorithm for autonomous trading in modern electricity markets, named LATTE (Lookahead-policy for Autonomous Time-constrained Trading of Electricity). LATTE is a general framework that can be instantiated in different ways that tailor it to specific setups. Third, this dissertation contributes fully implemented and operational autonomous broker agents, each using a different instantiation of LATTE. These agents were successful in international competitions and controlled experiments and can serve as benchmarks for future research in this domain. Detailed descriptions of the agents’ behaviors as well as their source code are included in this dissertation. Fourth, this dissertation contributes extensive empirical analysis which validates the effectiveness of LATTE in different competition levels under a variety of environmental conditions, shedding light on the main reasons for its success by examining the importance of its constituent components. Fifth, this dissertation examines the impact of Time-Of-Use (TOU) tariffs in competitive electricity markets through empirical analysis. Time-Of-Use tariffs are proposed for demand-side management both in the literature and in the real-world. The success of the different instantiations of LATTE demonstrates its generality in the context of electricity markets. Ultimately, this dissertation demonstrates that an autonomous broker can act effectively in modern electricity markets by executing an efficient lookahead policy that optimizes its predicted utility, and by doing so the broker can benefit itself, its customers, and the economy.Item Learning from human-generated reward(2012-12) Knox, William Bradley; Stone, Peter, 1971-; Ballard, Dana; Breazeal, Cynthia; Love, Bradley C; Mooney, Raymond JRobots and other computational agents are increasingly becoming part of our daily lives. They will need to be able to learn to perform new tasks, adapt to novel situations, and understand what is wanted by their human users, most of whom will not have programming skills. To achieve these ends, agents must learn from humans using methods of communication that are naturally accessible to everyone. This thesis presents and formalizes interactive shaping, one such teaching method, where agents learn from real-valued reward signals that are generated by a human trainer. In interactive shaping, a human trainer observes an agent behaving in a task environment and delivers feedback signals. These signals are mapped to numeric values, which are used by the agent to specify correct behavior. A solution to the problem of interactive shaping maps human reward to some objective such that maximizing that objective generally leads to the behavior that the trainer desires. Interactive shaping addresses the aforementioned needs of real-world agents. This teaching method allows human users to quickly teach agents the specific behaviors that they desire. Further, humans can shape agents without needing programming skills or even detailed knowledge of how to perform the task themselves. In contrast, algorithms that learn autonomously from only a pre-programmed evaluative signal often learn slowly, which is unacceptable for some real-world tasks with real-world costs. These autonomous algorithms additionally have an inflexibly defined set of optimal behaviors, changeable only through additional programming. Through interactive shaping, human users can (1) specify and teach desired behavior and (2) share task knowledge when correct behavior is already indirectly specified by an objective function. Additionally, computational agents that can be taught interactively by humans provide a unique opportunity to study how humans teach in a highly controlled setting, in which the computer agent’s behavior is parametrized. This thesis answers the following question. How and to what extent can agents harness the information contained in human-generated signals of reward to learn sequential decision-making tasks? The contributions of this thesis begin with an operational definition of the problem of interactive shaping. Next, I introduce the tamer framework, one solution to the problem of interactive shaping, and describe and analyze algorithmic implementations of the framework within multiple domains. This thesis also proposes and empirically examines algorithms for learning from both human reward and a pre-programmed reward function within an MDP, demonstrating two techniques that consistently outperform learning from either feedback signal alone. Subsequently, the thesis shifts its focus from the agent to the trainer, describing two psychological studies in which the trainer is manipulated by either changing their perceived role or by having the agent intentionally misbehave at specific times; we examine the effect of these manipulations on trainer behavior and the agent’s learned task performance. Lastly, I return to the problem of interactive shaping, for which we examine a space of mappings from human reward to objective functions, where mappings differ by how much the agent discounts reward it expects to receive in the future. Through this investigation, a deep relationship is identified between discounting, the level of positivity in human reward, and training success. Specific constraints of human reward are identified (i.e., the “positive circuits” problem), as are strategies for overcoming these constraints, pointing towards interactive shaping methods that are more effective than the already successful tamer framework.