27.October Seoul, Korea

FIRE 2019

From Image Restoration to Enhancement and Beyond


in conjunction with ICCV 2019


Recent years shown a large interest and tremendous advances have been achieved in image and video restoration and enhancement. A large number of solutions were proposed ranging from handcrafted designs to fully learned and generative models. Gradually the focus in image restoration shifted from improved fidelity of the results to improved perceptual quality. At the same time the studied corruptions departed from the standard synthetic/artificial corruptions in controlled environments to fully realistic and in the wild settings -- a fertile soil for developing semi- and unsupervised solutions.

This tutorial is on the current state-of-the-art in the fields of image and video restoration and enhancement with applications to autonomous driving and smartphone cameras. Moreover, this tutorial will convey the importance of the restoration and enhancement for the subsequent higher level computer vision tasks.


Radu Timofte

ETH Zurich

Title:From Paired to Unpaired Visual Domain Translation and Beyond

Abstract: Image and video restoration and enhancement tasks can be seen as image-to-image and video-to-video translations, respectively. We will review the literature going through the fully supervised settings (when corresponding pairs of images/videos are available for training), semi-supervised and unsupervised (when only unpaired images/videos are available). We will review representative visual domain translators such as pix2pix, CycleGAN, ComboGAN, StarGAN, MUNIT, and SMIT, as well as very recent video translation methods including vid2vid, RecycleGAN and UVIT, and a selection of other recent developments.

Bio: Radu Timofte is lecturer and research group leader in the Computer Vision Laboratory, at ETH Zurich, Switzerland. He obtained a PhD degree in Electrical Engineering at the KU Leuven, Belgium in 2013, the MSc at the Univ. of Eastern Finland in 2007, and the Dipl. Eng. at the Technical Univ. of Iasi, Romania in 2006. He serves as a reviewer for top journals (such as TPAMI, TIP, IJCV, TNNLS, TCSVT, CVIU, PR) and conferences (ICCV, CVPR, ECCV, NeurIPS, ICLR), as an area editor for Elsevier's CVIU journal (from 2017), and as an associate editor for SIAM’s SIIMS (from 2020). He served as area chair for ACCV 2018, ICCV 2019 and as SPC for IJCAI 2019, 2020. He received a NIPS 2017 best reviewer award. His work received several awards, including a best scientific paper award at ICPR 2012, the honorable mention award at FG 2017, the best student paper award at BMVC 2019, and his team won a number of challenges including traffic sign detection (IJCNN 2013) and apparent age estimation (ICCV 2015). He is co-founder of Merantix and co-organizer of NTIRE, CLIC, PIRM and AIM events. His current research interests include deep learning, augmented perception, domain translation, image/video compression, manipulation, restoration and enhancement.

Shuhang Gu

ETH Zurich

Title:Towards Practical Image Restoration and Enhancement

Abstract: In the last decades, tremendous advances in image restoration and enhancement tasks such as denoising and super-resolution have been achieved using neural networks. Such approaches generally employ very deep architectures to capture the mapping function between degraded and latent target images. Despite their good restoration performance on conventional benchmarks, the heavy computational burden and limited generalization capacity hinder the deployment of these networks on some real application scenarios. Moreover, substantial research in image quality assessment have shown that conventional PSNR metric poorly aligns with the human perceptual quality. Recent improvements of PSNR indexes on standard benchmarks did not introduce equally significant visual quality improvements. In this tutorial, we will introduce some recent works which aim to study the image restoration and enhancement problem in practical scenarios. Concretely, we will introduce three branches of works: efficient restoration models, new restoration scenarios and from fidelity to visual quality enhancement.

Bio: Shuhang Gu received the B.E. degree from the School of Astronautics, Beijing University of Aeronautics and Astronautics, China, in 2010, the M.E. degree from the Institute of Pattern Recognition and Artificial Intelligence, Huazhong University of Science and Technology, China, in 2013, and Ph.D. degree from the Department of Computing, The Hong Kong Polytechnic University, in 2017. He currently holds a post-doctoral position at ETH Zurich, Switzerland. His research interests include image restoration, enhancement and compression.

Martin Danelljan

ETH Zurich

Title:Learning to enhance in the real world

Abstract: Many image enhancement and restoration tasks, such as image denoising and super resolution, suffer from the unavailability or scarcity of true ground-truth data. This severely complicates training and evaluation of methods in real settings. Instead of addressing these problems directly, the primary focus of research have been on artificially generated paired data. For example, in the case of image denoising, paired data is most commonly obtained by adding white Gaussian noise to clean images. Similarly, bicubic downsampling is applied in the context of super resolution, to obtain the corresponding low-resolution image. However, these image degradation techniques only serve as coarse approximations of their real counterparts. In reality, the degradation process is far more complex and often unknown. For example, bicubic downsampling significantly alters the image characteristics by reducing noise and other high-frequency content present in real images. Image enhancement and restoration methods trained in such artificial conditions therefore cannot be expected to generalize to the real setting. We will study the problems induced by the artificial setting, when models are applied to real data. We will review methods for semi- and unsupervised image enhancement and restoration, with a particular focus on the problem of real-world super-resolution, that aim at reducing or eliminating the need for artificially created paired data.

Bio: Martin Danelljan is a postdoctoral researcher at ETH Zurich, Switzerland. He received his Ph.D. degree from Linköping University, Sweden in 2018. His Ph.D. thesis was awarded the biannual Best Nordic Thesis Prize at SCIA 2019. His main research interests are online and meta-learning methods for visual tracking and video object segmentation, deep probabilistic models for image generation, and machine learning with no or limited supervision. His research in the field of visual tracking, in particular, has attracted much attention. In 2014, he won the Visual Object Tracking (VOT) Challenge and the OpenCV State-ofthe-Art Vision Challenge. Furthermore, he achieved top ranks in VOT2016 and VOT2017 challenges. He received the best paper award at ICPR 2016 and best student paper at BMVC 2019.

Dengxin Dai

ETH Zurich

Title:Semantic understanding under adverse conditions for autonomous driving

Abstract: Adverse weather or illumination conditions create visibility problems for both people and the sensors that power automated systems~\cite{vision:atmosphere}. While sensors and the down-streaming vision algorithms are constantly getting better, their performance are mainly benchmarked with respect to clear weather images. However, in many outdoor applications, including autonomous driving, the ability to robustly cope with ``bad'' weather conditions is absolutely essential. One typical example of adverse weather conditions is fog, which degrades the visibility of a scene significantly. The denser the fog is, the more severe this problem becomes. During the past years, the community has made a tremendous progress on image dehazing (defogging) and image enhancement to increase the visibility of foggy images and nighttime images. The last few years have also witnessed a leap in semantic object recognition. A great deal of effort is made specifically on semantic road scene understanding. However, the extension of these techniques to adverse weather/illumination conditions has not received due attention, despite its importance in outdoor applications. This tutorial will teach recent technologies developed on extending the state-of-the-art semantic understanding algorithms from clear weather conditions to adverse weather/illumination conditions, especially to foggy and nighttime driving scenarios.

Bio: Dengxin Dai is a Lecturer and Group Leader working with the Computer Vision Lab at ETH Zurich. In 2016, he obtained his PhD in Computer Vision at ETH Zurich. Since then he is the Team Leader of TRACE-Zurich, working on Autonomous Driving within the R&D project "TRACE: Toyota Research on Automated Cars in Europe". His research interests lie in autonomous driving, robust perception in adverse weather and illumination conditions, automotive sensors and computer vision under limited supervision. He has been an organizer of the CVPR'19 Workshop Vision for All Seasons: Bad Weather and Nighttime and the ICCV'19 workshop Autonomous Driving. He has been a program committee member of several major computer vision conferences and received multiple outstanding reviewer awards. He is a guest editor for the IJCV special issue Vision for All Seasons and is an area chair for WACV 2020.

Zhiwu Huang & Danda Paudel

ETH Zurich

Title:Image and video quality mapping

Abstract: Despite many mobile camera technological advances we have today, our captured images often still come with limited dynamic range, undesirable color rendition, and unsatisfactory texture sharpness. Images acquired under different conditions, or different parts of a single image, may require separate enhancement operations. In this context, customarily used context/content agnostic enhancement methods often lead to a poor performance in the overall visual assessment. Therefore, comprehensive methods that improve the perceptual quality of images are in high demand. In this talk, I will introduce the literature of deep image quality mapping methods including supervised methods like HDRNet, UPE, DPED and weakly-supervised ones like WESPE and DPE. Fully-supervised methods often lack flexibility towards new domain adaptation, thus requiring low-high paired acquisitions for every low-end camera. In contrast, the weakly-supervised methods merely require a set of target good quality images, without requiring image acquisitions for the same scenes. However, few of them can work for high-resolution image and video enhancement well. To fill this gap, I will talk about our recent work on a divide-and-conquer inspired adversarial learning (DACAL) approach for high-resolution image and video enhancement. The key idea is to decompose the photo enhancement process into hierarchically multiple sub-problems, which can be better conquered from bottom to up. While considering all hierarchies, we develop multiscale and recurrent training approaches to optimize the image and video enhancement process in a weakly-supervised manner. Both quantitative and qualitative results clearly demonstrate that our proposed DACAL achieves the state-of-the-art performance for high-resolution image and video enhancement.

Bio: Zhiwu Huang is currently a postdoctoral researcher in the Computer Vision Lab, ETH Zurich, Switzerland. He received the PhD degree from Institute of Computing Technology, Chinese Academy of Sciences in 2015. His main research interest is in human-focussed video analysis with Riemannian manifold networks and Wasserstein generative models.
Danda Pani Paudel is a postdoctoral researcher in the Computer Vision Lab at ETH Zurich. His research interests include low level vision, unsupervised learning, and dynamic scene reconstruction and understanding. Currently, he is working in the field of image and video enhancement using weak supervision methods. Danda Pani received PhD in 2015, and a Master’s degree in computer vision in 2012, from University of Bourgogne, France.

Robby T. Tan

Yale-NUS College & NUS

Title:Deraining and droplet removal

Abstract: Rain produces undesirable visual artefacts that can significantly impair visibility, causing many computer vision systems, such as self-driving cars, surveillance systems, autonomous drones, etc., to break down. Rain introduces artefacts in the forms of rain-streaks, rain accumulation/veiling effect (visually similar to mist or fog), and raindrops that are adhered to the camera lens or a car's windscreen. In this tutorial, we intend to discuss how we can restore the degraded background information due to these rain artefacts and their compound problems, and thus enhance the visibility of the scenes. To deal with rain streaks and rain accumulation, we will discuss briefly how the conventional non-deep-learning methods evolved, and more focus on how the more recent deep learning methods works. Most of the current deep learning based methods are trained in a supervised manner, provided with ground truth data. However, obtaining real ground truth data is extremely difficult. Therefore, existing methods rely on rendered synthetic data. The problem with this approach is that synthetic data is significantly different to real data in terms of degradation complexity, background variations, lighting variations, etc. Hence, to be able to resolve the problem of rain streaks and rain accumulation properly, we need to go beyond synthetic training. Aside from rain streak and rain accumulation, raindrops adhered to a glass window or camera lens can severely hamper the visibility of a background scene and degrade an image considerably. Some non-deep learning methods have been proposed to deal with adherent raindrops, but the results are inadequate. The problem is intractable, since first the regions occluded by raindrops are not given. Second, the information about the background scene of the occluded regions is completely lost for most part. To resolve the problem, a state of the art method applies an attentive generative network using adversarial training. The main idea is to inject visual attention into both the generative and discriminative networks. During the training, the visual attention learns about raindrop regions and their surroundings. Hence, by injecting this information, the generative network will pay more attention to the raindrop regions and the surrounding structures, and the discriminative network will be able to assess the local consistency of the restored regions.

Bio: Robby T. Tan is an Associate Professor at Yale-NUS College and also at Electrical and Computer Engineering Department, NUS. Before coming to Singapore, he was an Assistant Professor at Utrecht University in the Netherlands, a research associate at Imperial College London, and a research scientist at NICTA/Australian National University. He received his PhD degree in Computer Science from the University of Tokyo, Japan. He has organized the Emerging Topics on Image Restoration and Enhancement (IREw) workshop in conjunction with ACCV 2014, and a Workshop on Vision for All Seasons: Bad Weather and Nighttime in conjunction with CVPR 2019. He was area chairs in ACCV 2010 and ACCV 2018. He also served as publication chair in ECCV 2016 and regularly as program committee members for CVPR/ICCV/ECCV. His work on dehazing in CVPR 2008 is regarded as the pioneer work in single image dehazing literature. His research focus is in the areas of bad weather/nighttime and physics based vision.