Content area

Abstract

Generalization is a fundamental problem in machine learning. For overparameterized deep neural network models, there are many solutions that can fit the training data equally well. The key question is which solution has a better generalization performance measured by test loss (error). Here we report the discovery of exact duality relations between changes in activities and changes in weights in any fully connected layer in feed-forward neural networks. By using the activity–weight duality relation, we decompose the generalization loss into contributions from different directions in weight space. Our analysis reveals that two key factors, sharpness of the loss landscape and size of the solution, act together to determine generalization. In general, flatter and smaller solutions have better generalization. By using the generalization loss decomposition, we show how existing learning algorithms and regularization schemes affect generalization by controlling one or both factors. Furthermore, by applying our analysis framework to evaluate different algorithms for realistic large neural network models in the multi-learner setting, we find that the decentralized algorithms have better generalization performance as they introduce additional landscape-dependent noise that leads to flatter solutions without changing their sizes.

A challenging problem in deep learning consists in developing theoretical frameworks suitable to study generalization. Feng and colleagues uncover a duality relation between neuron activities and weights in deep learning neural networks, and use it to show that sharpness of the loss landscape and norm of the solution act together in determining its generalization performance.

Details

Title
Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalization
Author
Feng, Yu 1 ; Zhang, Wei 2 ; Tu, Yuhai 2   VIAFID ORCID Logo 

 IBM T. J. Watson Research Center, Yorktown Heights, USA (GRID:grid.481554.9) (ISNI:0000 0001 2111 841X); Duke University, Department of Physics, Durham, USA (GRID:grid.26009.3d) (ISNI:0000 0004 1936 7961) 
 IBM T. J. Watson Research Center, Yorktown Heights, USA (GRID:grid.481554.9) (ISNI:0000 0001 2111 841X) 
Pages
908-918
Publication year
2023
Publication date
Aug 2023
Publisher
Nature Publishing Group
e-ISSN
25225839
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2854124596
Copyright
© The Author(s), under exclusive licence to Springer Nature Limited 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.