Convolutional Neural Network-Based Visual Servoing for Eye-to-Hand Manipulator

Fuyuki Tokuda, Shogo Arai, Kazuhiro Kosuge

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


We propose a CNN based visual servoing scheme for precise positioning of an eye-to-hand manipulator in which the control input of a robot is calculated directly from images by a neural network. In this paper, we propose Difference of Encoded Features driven Interaction matrix Network (DEFINet), a new convolutional neural network (CNN), for eye-to-hand visual servoing. DEFINet estimates a relative pose between desired and current end-effector from desired and current images captured by an eye-to-hand camera. DEFINet includes two branches of the same CNN that share weights and encode target and current images, which is inspired by the architecture of Siamese network. Regression of the relative pose from the difference of the encoded target and current image features leads to a high positioning accuracy of visual servoing using DEFINet. The training dataset is generated from sample data collected by operating a manipulator randomly in task space. The performance of the proposed visual servoing is evaluated through numerical simulation and experiments using a six-DOF industrial manipulator in a real environment. Both simulation and experimental results show the effectiveness of the proposed method.

Original languageEnglish
Article number9464907
Pages (from-to)91820-91835
Number of pages16
JournalIEEE Access
Publication statusPublished - 2021


  • Visual servoing
  • manipulator
  • neural network

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)


Dive into the research topics of 'Convolutional Neural Network-Based Visual Servoing for Eye-to-Hand Manipulator'. Together they form a unique fingerprint.

Cite this