Title: Optimization
Aspects of Temporal Abstraction in Reinforcement Learning
Abstract: Temporal
abstraction refers to the idea that complicated sequential decision making
problems can sometimes be simplified by considering the "big picture"
first. In this talk, I will give an overview of some of my work on learning
such temporal abstractions end-to-end within the "option-critic"
architecture (Bacon et al., 2017). I will then explain how other related
hierarchical RL frameworks, such as Feudal RL by Dayan and Hinton (1993), can
also be approached under the same option-critic architecture. However, we will
see that that this formulation leads to a so-called "bilevel"
optimization problem. While this is a more difficult problem, the good news is
that the literature on bilevel optimization is rich and many of its tools have
yet to be re-discovered by our community. I will finally show how
"iterative differentiation" techniques (Griewank
and Walther, 2008) can be applied to our problem while providing a new
interpretation to the "inverse RL" approach of Rust (1988).
Bio: Pierre-Luc Bacon
is a postdoc in Emma Brunskill's group. He
completed his PhD with Doina Precup
at McGill University in 2018. His research focuses on temporal abstraction and
representation learning in RL.