Activity–weight duality in feed-forward neural

Try and log in through your library or institution to see if they have access.

Abstract

Generalization is a fundamental problem in machine learning. For overparameterized deep neural network models, there are many solutions that can fit the training data equally well. The key question is which solution has a better generalization performance measured by test loss (error). Here we report the discovery of exact duality relations between changes in activities and changes in weights in any fully connected layer in feed-forward neural networks. By using the activity–weight duality relation, we decompose the generalization loss into contributions from different directions in weight space. Our analysis reveals that two key factors, sharpness of the loss landscape and size of the solution, act together to determine generalization. In general, flatter and smaller solutions have better generalization. By using the generalization loss decomposition, we show how existing learning algorithms and regularization schemes affect generalization by controlling one or both factors. Furthermore, by applying our analysis framework to evaluate different algorithms for realistic large neural network models in the multi-learner setting, we find that the decentralized algorithms have better generalization performance as they introduce additional landscape-dependent noise that leads to flatter solutions without changing their sizes.

A challenging problem in deep learning consists in developing theoretical frameworks suitable to study generalization. Feng and colleagues uncover a duality relation between neuron activities and weights in deep learning neural networks, and use it to show that sharpness of the loss landscape and norm of the solution act together in determining its generalization performance.

Details

Title

Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalization

Author

Feng, Yu¹; Zhang, Wei²; Tu, Yuhai²

¹ IBM T. J. Watson Research Center, Yorktown Heights, USA (GRID:grid.481554.9) (ISNI:0000 0001 2111 841X); Duke University, Department of Physics, Durham, USA (GRID:grid.26009.3d) (ISNI:0000 0004 1936 7961)
² IBM T. J. Watson Research Center, Yorktown Heights, USA (GRID:grid.481554.9) (ISNI:0000 0001 2111 841X)

Pages

908-918

Publication year

2023

Publication date

Aug 2023

Publisher

Nature Publishing Group

e-ISSN

25225839

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/s42256-023-00700-x

ProQuest document ID

2854124596

© The Author(s), under exclusive licence to Springer Nature Limited 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalization

Content area

Abstract

Details

Suggested sources