Content area

Abstract

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Details

1009240
Title
Adam: A Method for Stochastic Optimization
Publication title
arXiv.org; Ithaca
Publication year
2017
Publication date
Jan 30, 2017
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2017-01-31
Milestone dates
2014-12-22 (Submission v1); 2015-01-17 (Submission v2); 2015-02-27 (Submission v3); 2015-03-03 (Submission v4); 2015-04-23 (Submission v5); 2015-06-23 (Submission v6); 2015-07-20 (Submission v7); 2015-07-23 (Submission v8); 2017-01-30 (Submission v9)
Publication history
 
 
   First posting date
31 Jan 2017
ProQuest document ID
2075396516
Document URL
https://www.proquest.com/working-papers/adam-method-stochastic-optimization/docview/2075396516/se-2?accountid=40258
Full text outside of ProQuest
Copyright
© 2017. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2019-09-09
Database
Publicly Available Content Database