TY - GEN
T1 - Learning strategy fusion to acquire dynamic motion
AU - Yamaguchi, Akihiko
AU - Takamatsu, Jun
AU - Ogasawara, Tsukasa
PY - 2011
Y1 - 2011
N2 - This paper proposes a method to fuse learning strategies (LSs) in a reinforcement learning framework. In this method, some LSs are integrated for learning a single task of a single robot. The LSs consists of (1) LS-scratch: learning a policy from scratch, (2) LS-accelerating: learning a policy from a previously learned policy by accelerating motion-speed parameters, and (3) LS-freeing: learning a policy from a previously learned policy by increasing the DoF (degree of freedom). The proposed LS fusion method enables (A) in the early stage of learning, LS fusion can select a suitable DoF configuration from a predefined set of DoF configurations, and (B) after a behavior module that learns from scratch converges, the LSs are applied to improve the policy. As a result, a robot can learn a complex task by starting with a simplified configuration, and then transferring the learned behaviors while increasing the difficulty. We introduce WF-DCOB proposed by Yamaguchi et al. for the LSs. We verify the proposed LS fusion method with a crawling task of a humanoid robot. The simulation experiments demonstrate the advantage of the proposed method compared to learning with a single learning module.
AB - This paper proposes a method to fuse learning strategies (LSs) in a reinforcement learning framework. In this method, some LSs are integrated for learning a single task of a single robot. The LSs consists of (1) LS-scratch: learning a policy from scratch, (2) LS-accelerating: learning a policy from a previously learned policy by accelerating motion-speed parameters, and (3) LS-freeing: learning a policy from a previously learned policy by increasing the DoF (degree of freedom). The proposed LS fusion method enables (A) in the early stage of learning, LS fusion can select a suitable DoF configuration from a predefined set of DoF configurations, and (B) after a behavior module that learns from scratch converges, the LSs are applied to improve the policy. As a result, a robot can learn a complex task by starting with a simplified configuration, and then transferring the learned behaviors while increasing the difficulty. We introduce WF-DCOB proposed by Yamaguchi et al. for the LSs. We verify the proposed LS fusion method with a crawling task of a humanoid robot. The simulation experiments demonstrate the advantage of the proposed method compared to learning with a single learning module.
UR - http://www.scopus.com/inward/record.url?scp=84856331129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856331129&partnerID=8YFLogxK
U2 - 10.1109/Humanoids.2011.6100853
DO - 10.1109/Humanoids.2011.6100853
M3 - Conference contribution
AN - SCOPUS:84856331129
SN - 9781612848679
T3 - IEEE-RAS International Conference on Humanoid Robots
SP - 247
EP - 254
BT - 2011 11th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS 2011
T2 - 2011 11th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS 2011
Y2 - 26 October 2011 through 28 October 2011
ER -