Title: Taming the Devil of Gradient-based Optimization Methods
with the Angel of Differential Equations
Abstract: In this talk, we use ordinary differential equations to
model, analyze, and interpret gradient-based optimization methods. In the first
part of the talk, we derive a second-order ODE that is the limit of Nesterov’s accelerated gradient method for non-strongly
objectives (NAG-C). The continuous-time ODE is shown to allow for a better
understanding of NAG-C and, as a byproduct, we obtain a family of accelerated
methods with similar convergence rates. In the second part, we begin by
recognizing that existing ODEs in the literature are inadequate to distinguish
between two fundamentally different methods, Nesterov’s
accelerated gradient method for strongly convex functions (NAG-SC) and Polyak’s heavy-ball method. In response, we derive
high-resolution ODEs as more accurate surrogates for the three aforementioned
methods. These novel ODEs can be integrated into a general framework that
allows for a fine-grained analysis of the discrete optimization algorithms
through translating properties of the amenable ODEs into those of their
discrete counterparts. As the first application of this framework, we identify
the effect of a term referred to as ‘gradient correction’ in NAG-SC but not in
the heavy-ball method, shedding insight into why the former achieves acceleration
while the latter does not. Moreover, in this high-resolution ODE framework,
NAG-C is shown to boost the squared gradient norm minimization at the inverse
cubic rate, which is the sharpest known rate concerning NAG-C itself. Finally,
by modifying the high-resolution ODE of NAG-C, we obtain a family of new
optimization methods that are shown to maintain the accelerated convergence
rates as NAG-C for smooth convex functions. This is based on joint work with
Stephen Boyd, Emmanuel Candes, Simon Du, Michael
Jordan, and Bin Shi.
Bio: Weijie Su is currently an Assistant
Professor of Statistics in the Department of Statistics at the Wharton School,
University of Pennsylvania. His research interests are in statistical machine
learning, high-dimensional statistical inference, large-scale multiple testing,
and privacy-preserving data analysis. Prior to joining Penn in Summer 2016, He
obtained his Ph.D. in Statistics from Stanford University in 2016, under the
supervision of Emmanuel Candès, and received his
bachelor's degree in Mathematics from Peking University in 2011.