Supervisors: Dr. Zhiwu Huang, Dr. Danda Pani Paudel, Prof. Luc Van Gool
With the advent of GAN, significant progress has been made in video synthesis task. However, synthesizing video with high resolution and good quality is still challenging because video appearance and motion can be highly complex. In this work, we explored whether we could apply use multi-condition in video synthesis to get video with higher resolution and better quality. Our method exploits the state-of-the-art video translation model, vid2vid, for other video synthesis task and uses multi environment cues to enhance it. We achieve this with two key contributions: (a) utilizing a vid2vid for future video prediction and (b) incorporating depth and ego-motion as multi-condition to enhance the generator. We experimentally show that the proposed framework can enhance the synthesized video quality.