Fusion of Camera and Lidar Data for Large Scale Semantic Mapping

Thomas Westfechtel, Kazunori Ohno, Ranulfo Plutarco Bezerra Neto, Shotaro Kojima, Satoshi Tadokoro

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)


Current self-driving vehicles rely on detailed maps of the environment, that contains exhaustive semantic information. This work presents a strategy to utilize the recent advancements in semantic segmentation of images, fuse the information extracted from the camera stream with accurate depth measurements of a Lidar sensor in order to create large scale semantic labeled point clouds of the environment. We fuse the color and semantic data gathered from a round-view camera system with the depth data gathered from a Lidar sensor. In our framework, each Lidar scan point is projected onto the camera stream to extract the color and semantic information while at the same time a large scale 3D map of the environment is generated by a Lidar-based SLAM algorithm. While we employed a network that achieved state of the art semantic segmentation results on the Cityscape dataset [1] (IoU score of 82.1%), the sole use of the extracted semantic information only achieved an IoU score of 38.9% on 105 manually labeled 5x5m tiles from 5 different trial runs within the Sendai city in Japan (this decrease in accuracy will discussed in section III-B). To increase the performance, we reclassify the label of each point. For this two different approaches were investigated: a random forest and SparseConvNet [2] (a deep learning approach). We investigated for both methods how the inclusion of semantic labels from the camera stream affected the classification task of the 3D point cloud. To which end we show, that a significant performance increase can be achieved by doing so - 25.4 percent points for random forest (40.0% w/o labels to 65.4% with labels) and 16.6 in case of the SparseConvNet (33.4% w/o labels to 50.8% with labels). Finally, we present practical examples on how semantic enriched maps can be employed for further tasks. In particular, we show how different classes (i.e. cars and vegetation) can be removed from the point cloud in order to increase the visibility of other classes (i.e. road and buildings). And how the data could be used for extracting the trajectories of vehicles and pedestrians.

Original languageEnglish
Title of host publication2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages8
ISBN (Electronic)9781538670248
Publication statusPublished - 2019 Oct
Event2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019 - Auckland, New Zealand
Duration: 2019 Oct 272019 Oct 30

Publication series

Name2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019


Conference2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
Country/TerritoryNew Zealand

ASJC Scopus subject areas

  • Artificial Intelligence
  • Management Science and Operations Research
  • Instrumentation
  • Transportation


Dive into the research topics of 'Fusion of Camera and Lidar Data for Large Scale Semantic Mapping'. Together they form a unique fingerprint.

Cite this