Bin Zhao

Semester Work
Supervisors: Dr. Stamatios Georgoulis, Anton Obukhov, and Prof. Dr. Luc van Gool

Leveraging Depth Cues for Semantic Segmentation

Nowadays, with the advent of deep learning, most of the computer vision problems including semantic segmentation are being tackled by using deep convolutional neural networks. To achieve high-performing semantic segmentation results, conditional random field is often applied as a post-processing step. However, scientists usually consider the relation among pixels in terms of color consistency which may be unreliable under certain conditions. In addition, recent advances in multitask learning indicate that several tasks can benefit from a joint training. Inspired by these observations, we generate the idea of using depth cues, which can be regarded as an extra supervision, to improve segmentation performance. The intuition is that certain objects like cars, people, buildings, etc. at certain distances have similar depth component, which might be helpful for semantic segmentation. We focus on integrating depth cues into CRF-based approaches. In particular, we propose a novel depth kernel as a new pairwise potential term which can be used together with appearance kernel. We evaluate our methods on public datasets, i.e. Cityscapes and Synscapes, and train two baseline models for comparison. We conduct various experiments of CRF post-processing or fine-tuning the model using CRF-inspired regularized loss. In order to compare the influence of depth constraint, we perform all experiments with three flavors, color constraint only, depth constraint only or both. The results show significant performance improvements after using depth cues compared to baseline models. Another crucial finding is that depth constraint is as effective as color constraint whether it is used in CRF post-processing or CRF regularized loss. The combination of these two might give us better results. Our implementation can be found at