Supervisors: Vaishakh Patil, Simon Hecker, Prof. Konrad Schindler, Prof. Luc Van Gool
Many moving objects are presented when driving in the city. The knowledge of their movement, e.g. moving vector in the 3D space, provides valuable information for making the next driving decision. In this thesis, we try to infer the object movement given a video sequence, captured by a moving camera. A pipeline based on view synthesis is proposed to solve this problem in a self-supervised manner. Two networks are employed to predict the pixel-wise depth and 3D scene flow map for the input image. The evaluation on KITTI dataset demonstrates the effectiveness of our approach to estimate the 3D object movement.