Computational and Statistical Theories for

Abstract

Deep learning methods operate in regimes that defy the traditional computational and statistical mindsets. Despite the non-convexity of empirical risks and the huge complexity of neural network architectures, stochastic gradient algorithms can often find an approximate global minimizer of the training loss and achieve small generalization error on test data. In recent years, an important research direction is to theoretically explain these observed optimization efficiency and generalization efficacy of neural network systems. This thesis tries to tackle these challenges in the model of two-layers neural networks, by analyzing its computational and statistical properties in various scaling limits.

On the computational aspects, we introduce two competing theories for neural network dynamics: the mean field theory and the tangent kernel theory. These two theories characterize training dynamics of neural networks in different regimes that exhibit different behaviors. In the mean field framework, the training dynamics, in the large neuron limit, is captured by a particular non-linear partial differential equation. This characterization allows us to prove global convergence of the dynamics in certain scenarios. Comparatively, the tangent kernel theory characterizes the same dynamics in a different scaling limit and provides global convergence guarantees in more general scenarios.

On the statistical aspects, we study the generalization properties of neural networks trained in the two regimes as described above. We first show that, in the high dimensional limit, neural tangent kernels are no better than polynomial regression, while neural networks trained in the mean field regime can potentially perform better. Next, we study more carefully the random features model, which is equivalent to a two-layers neural network in the kernel regime. We compute the precise asymptotics of its test error in the high dimensional limit and confirm that it exhibits an interesting double-descent curve that was observed in experiments.

Details

Title

Computational and Statistical Theories for Large-Scale Neural Networks

Author

Mei, Song

Publication year

2020

Publisher

ProQuest Dissertations & Theses

ISBN

9798662510883

Source type

Dissertation or Thesis

Language of publication

English

ProQuest document ID

2431823586

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Computational and Statistical Theories for Large-Scale Neural Networks

Content area

Abstract

Details

Suggested sources