Full text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Analyzing deep neural networks (DNNs) via information plane (IP) theory has gained tremendous attention recently to gain insight into, among others, DNNs’ generalization ability. However, it is by no means obvious how to estimate the mutual information (MI) between each hidden layer and the input/desired output to construct the IP. For instance, hidden layers with many neurons require MI estimators with robustness toward the high dimensionality associated with such layers. MI estimators should also be able to handle convolutional layers while at the same time being computationally tractable to scale to large networks. Existing IP methods have not been able to study truly deep convolutional neural networks (CNNs). We propose an IP analysis using the new matrix-based Rényi’s entropy coupled with tensor kernels, leveraging the power of kernel methods to represent properties of the probability distribution independently of the dimensionality of the data. Our results shed new light on previous studies concerning small-scale DNNs using a completely new approach. We provide a comprehensive IP analysis of large-scale CNNs, investigating the different training phases and providing new insights into the training dynamics of large-scale neural networks.

Details

Title
Analysis of Deep Convolutional Neural Networks Using Tensor Kernels and Matrix-Based Entropy
Author
Wickstrøm, Kristoffer K 1   VIAFID ORCID Logo  ; Løkse, Sigurd 1   VIAFID ORCID Logo  ; Kampffmeyer, Michael C 2   VIAFID ORCID Logo  ; Yu, Shujian 3   VIAFID ORCID Logo  ; Príncipe, José C 4   VIAFID ORCID Logo  ; Jenssen, Robert 5   VIAFID ORCID Logo 

 Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway; [email protected] (S.L.); [email protected] (M.C.K.); [email protected] (S.Y.); [email protected] (R.J.) 
 Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway; [email protected] (S.L.); [email protected] (M.C.K.); [email protected] (S.Y.); [email protected] (R.J.); Norwegian Computing Center, Department of Statistical Analysis and Machine Learning, 114 Blindern, NO-0314 Oslo, Norway 
 Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway; [email protected] (S.L.); [email protected] (M.C.K.); [email protected] (S.Y.); [email protected] (R.J.); Computational NeuroEngineering Laboratory, Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA; [email protected]; Department of Computer Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands 
 Computational NeuroEngineering Laboratory, Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA; [email protected] 
 Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway; [email protected] (S.L.); [email protected] (M.C.K.); [email protected] (S.Y.); [email protected] (R.J.); Norwegian Computing Center, Department of Statistical Analysis and Machine Learning, 114 Blindern, NO-0314 Oslo, Norway; Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark 
First page
899
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
10994300
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2829794111
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.