Video Registration to SfM Models
T. Kroeger, L. Van Gool, ECCV 2014.
[Pdf] [Supplementary Material]


Abstract: Registering image data to Structure from Motion (SfM) point clouds is widely used to find precise camera location and orientation with respect to a world model. In case of videos one constraint has previously been unexploited: temporal smoothness. Without temporal smoothness the magnitude of the pose error in each frame of a video will often dominate the magnitude of frame-to-frame pose change. This hinders application of methods requiring stable poses estimates (e.g. tracking, augmented reality). We incorporate temporal constraints into the image-based registration setting and solve the problem by pose regularization with model fitting and smoothing methods. This leads to accurate, gap-free and smooth poses for all frames. We evaluate different methods on challenging synthetic and real street-view SfM data for varying scenarios of motion speed, outlier contamination, pose estimation failures and 2D-3D correspondence noise. For all test cases a 2 to 60-fold reduction in root mean squared (RMS) positional error is observed, depending on pose estimation difficulty. For varying scenarios, different methods perform best. We give guidance which methods should be preferred depending on circumstances and requirements.

Antwerp Street-View Dataset

The dataset consists of 4 sets of videos. Each set consists of 12 videos with each 301 frames. The 12 videos were captured from cameras rigidly mounted onto a van. The rigid camera layout can be seen in the top right figure. Example images are shown in the bottom right figure.

Together with the video frames a SfM (Structure from Motion) model is provided for each video set. The SfM Model was created from a subset of the 12 cameras (blue cameras in figure). For registration we use the remaining cameras (red cameras in figure).

For all 12 cameras precise 6-DoF Ground-Truth is available. In total we use 12 videos (= 3612 frames) for registration evaluation.

We include MATLAB example code to display the point cloud, cameras and original images.

The dataset is free for academic use, provided that you do not redistribute the dataset, and that you render unidentifiable all personal information (faces, license plates, etc) in all publications. This is captured in the Confidentiality Agreement which you should read, understand and sign. Once a (hand-)signed and scanned/photographed copy has been sent to we will provide you with a download link.


We provide a full pipeline to match a query video (given as input .jpg frames) to a given SfM model. The SfM model can be provided as a .nvm file (output of VisualSfM + original images) or the Antwerp Street-View dataset.

The code requires VLFeat, the ASPnP toolbox and our Bundle Adjustment for Matlab library (optional).

The pipeline is implemented in Matlab / C++, and tested on Debian 7.0 Linux. Re-running and evaluating the video registration of our paper on the Antwerp Street-View Dataset may yield slightly different results, because of partial re-implementation due to licensing issues.

The code is provided "as is" with no guarantee for functionality or support. Use at your own risk.