Title: When do neural networks have bad local minima, and when not?
To explain the recent success of neural networks, researchers have conjectured that all local minima are global minima despite the non-convexity of the problem. Is this really true? Is this just hand-wavy intuition that is roughly true in special cases or can be a rigorous result in a broad setting?
In this talk, instead of explaining "why neural-nets are good", we try to understand "when neural-nets are good, and when not" --with a restricted definition of "good" by "every local-min is global-min". We focus on the binary classification problem and discuss how architecture and data affect the landscape. On the positive side, we prove that no bad local minima exist under reasonable assumptions on the neuron types, the neural-net structure, the loss function, and the dataset. On the negative side, we provide dozens of counterexamples that show the necessity of most assumptions.
Our approach can be viewed as a game of "local-min attack" and "defense". An attacker tries to construct examples that bad local minima exist, and the defender modifies the setting to eliminate bad local minima. For instance, the attacker constructs bad local minima for 1-hidden-layer ReLU network with linearly separable data, then the defender proves that smooth versions of ReLU eliminate them. At last, we present a strong defense consisting of a special neuron and a special regularizer that can eliminate bad local minima for a deep neural-net in the realizable case.
Joint work with Shiyu Liang, Yixuan Li, Jason Lee and R. Srikant.
Ruoyu Sun is an assistant professor in the Department of Industrial and Enterprise Systems Engineering Department (ISE) and Coordinate Science Lab (CSL), University of Illinois at Urbana-Champaign. His recent research interests lie in large-scale optimization and non-convex optimization in machine learning. Before joining UIUC, he was a visiting scientist at Facebook AI Research and was a postdoctoral researcher at Stanford University. He obtained PhD in electrical engineering from University of Minnesota, and B.S. in mathematics from Peking University. He has won the second place of INFORMS George Nicholson student paper competition, and honorable mention of INFORMS optimization society student paper competition.