TY - JOUR
T1 - Hyperparameter-free optimizer of stochastic gradient descent that incorporates unit correction and moment estimation
AU - Yamada, Kazunori D.
N1 - Publisher Copyright:
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2018/6/18
Y1 - 2018/6/18
N2 - In the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.
AB - In the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.
UR - http://www.scopus.com/inward/record.url?scp=85095632755&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095632755&partnerID=8YFLogxK
U2 - 10.1101/348557
DO - 10.1101/348557
M3 - Article
AN - SCOPUS:85095632755
JO - [No source information available]
JF - [No source information available]
ER -