Robust deep fusion models for self-driving cars




Kim, Taewan

Journal Title

Journal ISSN

Volume Title



Deep learning algorithms have been adopted to various applications like self-driving cars and healthcare for their superb performance. In such fields, trustworthy models are indispensable to practical systems because their decisions are directly connected to our lives. Utilizing multiple input sources is an effective and natural way of improving a deep model's ability and robustness, because both complementary and shared information can be extracted from different sensors. In this dissertation, we focus on developing deep fusion models for a self-driving car's perception system. First, a novel deep sensor-fusion convolutional neural network (CNN) architecture for detecting road users is introduced to make the system robust against natural perturbation. A laser based sensor LIDAR, which stands for Light Detection and Ranging, is selected as another input source to supplement the shortcomings of an RGB camera. Additional object proposals lead the detector to attain higher accuracies in finding and locating road users like cars, pedestrians, and cyclists. Our algorithm further benefits from LIDAR's advantage and shows improved robustness against different lighting conditions. Next, we develop a CNN-based pedestrian detection model which provides an additional functionality of depth prediction. The proposed algorithm learns a joint feature representation by extracting information from both RGB and LIDAR data to overcome inherent limitations of a single sensor framework, i.e. no depth information in an RGB image. Our simplified task and a direct fusion strategy make the model predict in real-time. We then introduce a newly collected pedestrian detection dataset with distinctive characteristics to test our architecture. Finally, we investigate learning fusion algorithms that are robust against noise added to a single source. We first demonstrate that robustness against corruption in a single source is not guaranteed in a linear fusion model. Motivated by this discovery, two possible approaches are proposed to increase robustness: a carefully designed loss with corresponding training algorithms for deep fusion models, and a simple convolutional fusion layer that has a structural advantage in dealing with noise. Experimental results show that both training algorithms and our fusion layer make a deep fusion-based 3D object detector robust against noise applied to a single source, while preserving the original performance on clean data.


LCSH Subject Headings