Content area
Abstract
Generalization is a fundamental problem in machine learning. For overparameterized deep neural network models, there are many solutions that can fit the training data equally well. The key question is which solution has a better generalization performance measured by test loss (error). Here we report the discovery of exact duality relations between changes in activities and changes in weights in any fully connected layer in feed-forward neural networks. By using the activity–weight duality relation, we decompose the generalization loss into contributions from different directions in weight space. Our analysis reveals that two key factors, sharpness of the loss landscape and size of the solution, act together to determine generalization. In general, flatter and smaller solutions have better generalization. By using the generalization loss decomposition, we show how existing learning algorithms and regularization schemes affect generalization by controlling one or both factors. Furthermore, by applying our analysis framework to evaluate different algorithms for realistic large neural network models in the multi-learner setting, we find that the decentralized algorithms have better generalization performance as they introduce additional landscape-dependent noise that leads to flatter solutions without changing their sizes.
A challenging problem in deep learning consists in developing theoretical frameworks suitable to study generalization. Feng and colleagues uncover a duality relation between neuron activities and weights in deep learning neural networks, and use it to show that sharpness of the loss landscape and norm of the solution act together in determining its generalization performance.
Details
1 IBM T. J. Watson Research Center, Yorktown Heights, USA (GRID:grid.481554.9) (ISNI:0000 0001 2111 841X); Duke University, Department of Physics, Durham, USA (GRID:grid.26009.3d) (ISNI:0000 0004 1936 7961)
2 IBM T. J. Watson Research Center, Yorktown Heights, USA (GRID:grid.481554.9) (ISNI:0000 0001 2111 841X)




