Supervisors: Dr. Wen Li, Yuhua Chen
In this work, we investigate the image colorization task with user interaction in the form of natural language. Given a gray-scale image and a language description, our aim is to learn a model that can automatically generate a realistic colorized version of the input image, and the colorization should align with the language description. We propose a neural network consisting of three major components, classification-based colorization model, language-guided visual attention module and semantic segmentation multitasking module. The classification-based colorization model is the backbone in our colorization network. On top of it, we build a language-guided visual attention module to deal with language interaction, and a semantic segmentation multitasking module to improve general colorization quality. The language-guided visual attention module produces both channel-wise and spatial attentions from the input language, and manipulates visual features from the colorization model. Meanwhile, the semantic segmentation module shares parameters with the colorization model, and incorporates high-level semantics into the colorization model via multitasking. Our network is trained and evaluated on the COCO data set. It generates language-guided colorization results with high perceptual quality, and outperforms the state-of-art method. Both the qualitative and quantitative results demonstrate the effectiveness of the proposed method in the language-guided image colorization task.