Visualization of convolutional neural networks for monocular depth estimation

Junjie Hu, Yan Zhang, Takayuki Okatani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Recently, convolutional neural networks (CNNs) have shown great success on the task of monocular depth estimation. A fundamental yet unanswered question is: How CNNs can infer depth from a single image. Toward answering this question, we consider visualization of inference of a CNN by identifying relevant pixels of an input image to depth estimation. We formulate it as an optimization problem of identifying the smallest number of image pixels from which the CNN can estimate a depth map with the minimum difference from the estimate from the entire image. To cope with a difficulty with optimization through a deep CNN, we propose to use another network to predict those relevant image pixels in a forward computation. In our experiments, we first show the effectiveness of this approach, and then apply it to different depth estimation networks on indoor and outdoor scene datasets. The results provide several findings that help exploration of the above question.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision, ICCV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3868-3877
Number of pages10
ISBN (Electronic)9781728148038
DOIs
Publication statusPublished - 2019 Oct
Event17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of
Duration: 2019 Oct 272019 Nov 2

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2019-October
ISSN (Print)1550-5499

Conference

Conference17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
CountryKorea, Republic of
CitySeoul
Period19/10/2719/11/2

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'Visualization of convolutional neural networks for monocular depth estimation'. Together they form a unique fingerprint.

Cite this