TY - GEN
T1 - Deep Reinforcement Learning with Hidden Layers on Future States
AU - Kameko, Hirotaka
AU - Suzuki, Jun
AU - Mizukami, Naoki
AU - Tsuruoka, Yoshimasa
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Deep reinforcement learning algorithms such as Deep Q-Networks have successfully been used to construct a strong agent for Atari games by only performing direct evaluation of the current state and actions. This is in stark contrast to the algorithms for traditional board games such as Chess or Go, where a look-ahead search mechanism is indispensable to build a strong agent. In this paper, we present a novel deep reinforcement learning architecture that can both effectively and efficiently use information on future states in video games. First, we demonstrate that such information is indeed quite useful in deep reinforcement learning by using exact state transition information obtained from the emulator. We then propose a method that predicts future states using Long Short Term Memory (LSTM), such that the agent can look ahead without the emulator. In this work, we applied our method to the asynchronous advantage actor-critic (A3C) architecture. The experimental results show that our proposed method with predicted future states substantially outperforms the vanilla A3C in several Atari games.
AB - Deep reinforcement learning algorithms such as Deep Q-Networks have successfully been used to construct a strong agent for Atari games by only performing direct evaluation of the current state and actions. This is in stark contrast to the algorithms for traditional board games such as Chess or Go, where a look-ahead search mechanism is indispensable to build a strong agent. In this paper, we present a novel deep reinforcement learning architecture that can both effectively and efficiently use information on future states in video games. First, we demonstrate that such information is indeed quite useful in deep reinforcement learning by using exact state transition information obtained from the emulator. We then propose a method that predicts future states using Long Short Term Memory (LSTM), such that the agent can look ahead without the emulator. In this work, we applied our method to the asynchronous advantage actor-critic (A3C) architecture. The experimental results show that our proposed method with predicted future states substantially outperforms the vanilla A3C in several Atari games.
UR - http://www.scopus.com/inward/record.url?scp=85042547473&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85042547473&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-75931-9_4
DO - 10.1007/978-3-319-75931-9_4
M3 - Conference contribution
AN - SCOPUS:85042547473
SN - 9783319759302
T3 - Communications in Computer and Information Science
SP - 46
EP - 60
BT - Computer Games - 6th Workshop, CGW 2017, Held in Conjunction with the 26th International Conference on Artificial Intelligence, IJCAI 2017, Revised Selected Papers
A2 - M. Winands, Mark H.
A2 - Cazenave, Tristan
A2 - Saffidine, Abdallah
PB - Springer Verlag
T2 - 6th Workshop on Computer Games, CGW 2017 Held in Conjunction with the 26th International Conference on Artificial Intelligence, IJCAI 2017
Y2 - 20 August 2017 through 20 August 2017
ER -