Supervisors: Yuhua Chen
Fully Convolutional Network (FCN) models have been widely used in semantic segmentation, and substantially improved the state-of-the-art in recent years. However, the performance typically comes with complicated network architectures, and our understanding of how these models work remains limited. In this work, we aim to analyze what the network learns by visualizing per-pixel segmentation loss during the training phase of FCN. We found that the training is mainly influenced by a few pixels with large loss, namely hard pixels. Besides the training loss is imbalanced between categories. Based on these observations, we propose to address these issues by integrating online hard example mining (OHEM) with FCN. Our proposed network, FCN-OHEM, delivers a faster convergence and an improvement in mean IoU by 1.5% on PASCAL VOC 2011.