Accurate detection and 3D localization of humans using a novel YOLO-based RGB-D fusion approach and synthetic training data

Timm Linder, Kilian Y. Pfeiffer, Narunas Vaskevicius, Robert Schirmer, and Kai O. Arras
Accurate detection and 3D localization of humans using a novel YOLO-based RGB-D fusion approach and synthetic training data
Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2020

Abstract

While 2D object detection has made significant progress, robustly localizing objects in 3D space under presence of occlusion is still an unresolved issue. Our focus in this work is on real-time detection of human 3D centroids in RGB-D data. We propose an image-based detection approach which extends the YOLOv3 architecture with a 3D centroid loss and mid-level feature fusion to exploit complementary information from both modalities. We employ a transfer learning scheme which can benefit from existing large-scale 2D object detection datasets, while at the same time learning end-to-end 3D localization from our highly randomized, diverse synthetic RGB-D dataset with precise 3D groundtruth. We further propose a geometrically more accurate depth-aware crop augmentation for training on RGB-D data, which helps to improve 3D localization accuracy. In experiments on our challenging intralogistics dataset, we achieve state-of-the-art performance even when learning 3D localization just from synthetic data.

@InProceedings{Linder2020,
 author={T. {Linder} and K. Y. {Pfeiffer} and N. {Vaskevicius} and R. {Schirmer} and K. O. {Arras}},
 booktitle={2020 IEEE International Conference on Robotics and Automation (ICRA)},
 title={Accurate detection and {3D} localization of humans using a novel {YOLO}-based {RGB-D} fusion approach and synthetic training data},
 year={2020},
}