It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Machine learning influences numerous aspects of modern society, empowers new technologies, from Alphago to ChatGPT, and increasingly materializes in consumer products such as smartphones and self-driving cars. Despite the vital role and broad applications of artificial neural networks, we lack systematic approaches, such as network science, to understand their underlying mechanism. The difficulty is rooted in many possible model configurations, each with different hyper-parameters and weighted architectures determined by noisy data. We bridge the gap by developing a mathematical framework that maps the neural network’s performance to the network characters of the line graph governed by the edge dynamics of stochastic gradient descent differential equations. This framework enables us to derive a neural capacitance metric to universally capture a model’s generalization capability on a downstream task and predict model performance using only early training results. The numerical results on 17 pre-trained ImageNet models across five benchmark datasets and one NAS benchmark indicate that our neural capacitance metric is a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.
Understanding of artificial neural networks function, and their ability to effectively solve specific tasks, still require more rigorous analytical background. Using network science and dynamical systems tools, the authors develop a framework for predicting the performance of artificial neural networks
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
; Pedapati, Tejaswini 2 ; Chen, Pin-Yu 2
; Sun, Yizhou 3
; Gao, Jianxi 1
1 Rensselaer Polytechnic Institute, Network Science and Technology Center, Troy, USA (GRID:grid.33647.35) (ISNI:0000 0001 2160 9198); Rensselaer Polytechnic Institute, Department of Computer Science, Troy, USA (GRID:grid.33647.35) (ISNI:0000 0001 2160 9198)
2 IBM Thomas J. Watson Research Center, Yorktown Heights, USA (GRID:grid.481554.9) (ISNI:0000 0001 2111 841X)
3 University of California, Department of Computer Science, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718)




