Title: When do neural networks have bad local minima, and
when not?
Abstract:
To explain the recent success of neural networks, researchers have
conjectured that all local minima are global minima despite the non-convexity
of the problem. Is this really true? Is this just hand-wavy intuition that is
roughly true in special cases or can be a rigorous result in a broad setting?
In this talk, instead of explaining "why
neural-nets are good", we try to understand "when neural-nets are
good, and when not" --with a restricted definition of "good" by
"every local-min is global-min". We focus on the binary
classification problem and discuss how architecture and data affect the
landscape. On the positive side, we prove that no bad local minima exist under
reasonable assumptions on the neuron types, the neural-net structure, the loss
function, and the dataset. On the negative side, we provide dozens of
counterexamples that show the necessity of most assumptions.
Our approach can be viewed as a game of
"local-min attack" and "defense". An attacker tries to
construct examples that bad local minima exist, and the defender modifies the
setting to eliminate bad local minima. For instance, the attacker constructs
bad local minima for 1-hidden-layer ReLU network with
linearly separable data, then the defender proves that smooth versions of ReLU eliminate them. At last, we present a strong defense
consisting of a special neuron and a special regularizer
that can eliminate bad local minima for a deep neural-net in the realizable
case.
Joint work with Shiyu Liang, Yixuan Li, Jason Lee and R. Srikant.
Bio:
Ruoyu Sun is an assistant professor in the Department of Industrial and
Enterprise Systems Engineering Department (ISE) and Coordinate Science Lab
(CSL), University of Illinois at Urbana-Champaign. His recent research
interests lie in large-scale optimization and non-convex optimization in
machine learning. Before joining UIUC, he was a visiting scientist at Facebook
AI Research and was a postdoctoral researcher at Stanford University. He
obtained PhD in electrical engineering from University of Minnesota, and B.S.
in mathematics from Peking University. He has won the second place of INFORMS
George Nicholson student paper competition, and honorable mention of INFORMS
optimization society student paper competition.