Differential dynamic programming with temporally decomposed dynamics

Akihiko Yamaguchi, Christopher G. Atkeson

研究成果: Conference contribution

17 被引用数 (Scopus)

抄録

We explore a temporal decomposition of dynamics in order to enhance policy learning with unknown dynamics. There are model-free methods and model-based methods for policy learning with unknown dynamics, but both approaches have problems: in general, model-free methods have less generalization ability, while model-based methods are often limited by the assumed model structure or need to gather many samples to make models. We consider a temporal decomposition of dynamics to make learning models easier. To obtain a policy, we apply differential dynamic programming (DDP). A feature of our method is that we consider decomposed dynamics even when there is no action to be taken, which allows us to decompose dynamics more flexibly. Consequently learned dynamics become more accurate. Our DDP is a first-order gradient descent algorithm with a stochastic evaluation function. In DDP with learned models, typically there are many local maxima. In order to avoid them, we consider multiple criteria evaluation functions. In addition to the stochastic evaluation function, we use a reference value function. This method was verified with pouring simulation experiments where we created complicated dynamics. The results show that we can optimize actions with DDP while learning dynamics models.

本文言語English
ホスト出版物のタイトルHumanoids 2015
ホスト出版物のサブタイトルHumanoids in the New Media Age - IEEE RAS International Conference on Humanoid Robots
出版社IEEE Computer Society
ページ696-703
ページ数8
ISBN(電子版)9781479968855
DOI
出版ステータスPublished - 2015 12 22
外部発表はい
イベント15th IEEE RAS International Conference on Humanoid Robots, Humanoids 2015 - Seoul, Korea, Republic of
継続期間: 2015 11 32015 11 5

出版物シリーズ

名前IEEE-RAS International Conference on Humanoid Robots
2015-December
ISSN(印刷版)2164-0572
ISSN(電子版)2164-0580

Other

Other15th IEEE RAS International Conference on Humanoid Robots, Humanoids 2015
国/地域Korea, Republic of
CitySeoul
Period15/11/315/11/5

ASJC Scopus subject areas

  • 人工知能
  • コンピュータ ビジョンおよびパターン認識
  • ハードウェアとアーキテクチャ
  • 人間とコンピュータの相互作用
  • 電子工学および電気工学

フィンガープリント

「Differential dynamic programming with temporally decomposed dynamics」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル