Exploration bonuses based on upper confidence bounds for sparse reward games

Naoki Mizukami, Jun Suzuki, Hirotaka Kameko, Yoshimasa Tsuruoka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent deep reinforcement learning (RL) algorithms have achieved super-human-level performance in many Atari games. However, a closer look at their performance reveals that the algorithms fall short of humans in games where rewards are only obtained occasionally. One solution to this sparse reward problem is to incorporate an explicit and more sophisticated exploration strategy in the agent’s learning process. In this paper, we present an effective exploration strategy that explicitly considers the progress of training using exploration bonuses based on Upper Confidence Bounds (UCB). Our method also includes a mechanism to separate exploration bonuses from rewards, thereby avoiding the problem of interfering with the original learning objective. We evaluate our method on Atari 2600 games with sparse rewards, and achieve significant improvements over the vanilla asynchronous advantage actor-critic (A3C) algorithm.

Original languageEnglish
Title of host publicationAdvances in Computer Games - 15th International Conferences, ACG 2017, Revised Selected Papers
EditorsH. Jaap van den Herik, Mark H. Winands, Walter A. Kosters
PublisherSpringer-Verlag
Pages165-175
Number of pages11
ISBN (Print)9783319716480
DOIs
Publication statusPublished - 2017 Jan 1
Externally publishedYes
Event15th International Conference on Advances in Computer Games, ACG 2017 - Leiden, Netherlands
Duration: 2017 Jul 32017 Jul 5

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10664 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th International Conference on Advances in Computer Games, ACG 2017
Country/TerritoryNetherlands
CityLeiden
Period17/7/317/7/5

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Exploration bonuses based on upper confidence bounds for sparse reward games'. Together they form a unique fingerprint.

Cite this