Phased learning with hierarchical reinforcement learning in nonholonomic motion control

Takaknuni Goto, Noriyasu Homma, Makoto Yoshizawa, Kenichi Abe

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In this paper, a hierarchical reinforcement learning algorithm for controlling nonholonomic systems is proposed. When applying reinforcement learning to the nonholonomic systems, acquiring adequate policies is difficult because of an increase of learning steps and a convergence of local optimal policies. The proposed algorithm is inspired by such human learning behavior. Human can learn to control such systems sufficiently even if they initially have little knowledge about the system's dynamics and the way to control. This human capability is suggested to be caused by their exploration strategies for acquiring the adequate policies. The key element of the proposed algorithm is a shaping function defined on a novel position-direction space. The shaping function is autonomously constructed once the goal is reached and constrains the exploration area to optimize the policy. The efficiency of the proposed shaping function was demonstrated by using a nonholonomic control problem of positioning the 2-link planer underactuated manipulator.

Original languageEnglish
Title of host publication2006 SICE-ICASE International Joint Conference
Number of pages6
Publication statusPublished - 2006
Event2006 SICE-ICASE International Joint Conference - Busan, Korea, Republic of
Duration: 2006 Oct 182006 Oct 21

Publication series

Name2006 SICE-ICASE International Joint Conference


Other2006 SICE-ICASE International Joint Conference
Country/TerritoryKorea, Republic of


  • Human learning behavior
  • Nonholonomic systems
  • Reinforcement learning
  • Shaping function

ASJC Scopus subject areas

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering


Dive into the research topics of 'Phased learning with hierarchical reinforcement learning in nonholonomic motion control'. Together they form a unique fingerprint.

Cite this