TY - GEN
T1 - Phased learning with hierarchical reinforcement learning in nonholonomic motion control
AU - Goto, Takaknuni
AU - Homma, Noriyasu
AU - Yoshizawa, Makoto
AU - Abe, Kenichi
PY - 2006
Y1 - 2006
N2 - In this paper, a hierarchical reinforcement learning algorithm for controlling nonholonomic systems is proposed. When applying reinforcement learning to the nonholonomic systems, acquiring adequate policies is difficult because of an increase of learning steps and a convergence of local optimal policies. The proposed algorithm is inspired by such human learning behavior. Human can learn to control such systems sufficiently even if they initially have little knowledge about the system's dynamics and the way to control. This human capability is suggested to be caused by their exploration strategies for acquiring the adequate policies. The key element of the proposed algorithm is a shaping function defined on a novel position-direction space. The shaping function is autonomously constructed once the goal is reached and constrains the exploration area to optimize the policy. The efficiency of the proposed shaping function was demonstrated by using a nonholonomic control problem of positioning the 2-link planer underactuated manipulator.
AB - In this paper, a hierarchical reinforcement learning algorithm for controlling nonholonomic systems is proposed. When applying reinforcement learning to the nonholonomic systems, acquiring adequate policies is difficult because of an increase of learning steps and a convergence of local optimal policies. The proposed algorithm is inspired by such human learning behavior. Human can learn to control such systems sufficiently even if they initially have little knowledge about the system's dynamics and the way to control. This human capability is suggested to be caused by their exploration strategies for acquiring the adequate policies. The key element of the proposed algorithm is a shaping function defined on a novel position-direction space. The shaping function is autonomously constructed once the goal is reached and constrains the exploration area to optimize the policy. The efficiency of the proposed shaping function was demonstrated by using a nonholonomic control problem of positioning the 2-link planer underactuated manipulator.
KW - Human learning behavior
KW - Nonholonomic systems
KW - Reinforcement learning
KW - Shaping function
UR - http://www.scopus.com/inward/record.url?scp=34250741388&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250741388&partnerID=8YFLogxK
U2 - 10.1109/SICE.2006.315088
DO - 10.1109/SICE.2006.315088
M3 - Conference contribution
AN - SCOPUS:34250741388
SN - 8995003855
SN - 9788995003855
T3 - 2006 SICE-ICASE International Joint Conference
SP - 4557
EP - 4562
BT - 2006 SICE-ICASE International Joint Conference
T2 - 2006 SICE-ICASE International Joint Conference
Y2 - 18 October 2006 through 21 October 2006
ER -