On this webpage, we provide result videos and datasets used in the following journal paper:
B. Leibe, K. Schindler, N. Cornelis, and L. Van Gool,
"Coupled Object Detection and Tracking from Static Cameras and
Moving Vehicles",
IEEE Trans. Pattern Analysis
and Machine Intelligence, Vol. 30, No. 10, pp. 1683-1698, 2008.
Sequence #1 (pedestrian crossing) - tracking results
Sequence #2 (pedestrian crossing) - tracking results
Sequence #3 (pedestrian crossing) - tracking results
Sequence #4 (moving vehicle, recorded @ 25fps) - baseline detections,
tracking results.
Sequence #5 (moving vehicle, recorded @ 3fps) - baseline detections,
tracking results.
Please note that the last video was produced at a low confidence threshold of our system, so that many false positives will be visible. This was done in order to show where tracks can be generated from the recorded imagery. At the low recording framerate of only 3 fps, it is however very difficult to generate good tracks at all, especially for the fast-moving bicyclists.
In the following, we provide the three pedestrian crossing sequences and the first of the two moving-vehicle sequences shown in the paper. Each sequence comes with ground-truth bounding box annotations for the objects to be tracked, as well as a camera calibration. The annotation files for the pedestrian crossing sequences contain bounding box annotations for every fourth frame; the moving-vehicle sequence contains bounding box annotations for every frame of the left camera. If you have questions concerning the data, please contact me).
We deeply appreciate the help of Martin Vogt in annotating this large amount of data.
Sequence #1 (601 frames) - images (18.7 MB), annotations
Sequence #2 (301 frames) - images (10.3 MB), annotations
Sequence #3 (379 frames) - images (57.4 MB), annotations
Sequence #4 (1175 frame pairs) - left camera (213 MB), right camera (218 MB), annotations
Each image archive comes with a subdirectory "maps" containing either a single calibration file "camera.default" (in the case of a static camera), or a separate calibration file "camera.XXXXX" for every frame (in the case of a moving camera). In the latter case, those calibrations were automatically obtained using the Structure-from-Motion approach by Cornelis et al., CVPR'06.
Calibration files contain the calibration for one image at a time (K [3x3], rad [1x3], R [3x3], t [1x3], GP[1x4]), with K the internal calibration, rad the radial distortion coefficients, R/t external calibration, world -> camera (i.e. X_cam = R X_world + t), and GP the ground plane coordinates (in the form GP(1:3)x - GP(4)=0). For convenience, we provide the Matlab function read_camera.m, which demonstrates how to read in the camera parameters.
Please note that we rescaled all images to twice their original size for object detection. The calibration files still refer to this original size. Therefore, all image coordinates need to be divided by a factor of 2 prior to applying the calibration. For the static sequences 1-3, the world scale is already expressed in meters. In sequence 4, a world scale unit corresponds to 0.049603m.
The IDL files are used for storing the annotations of the
sequence. For each image, the file format lists a set of bounding
boxes, separated by commas. The boxes contain upper-left and
lower-right corner, but are not necessarily sorted according to
this. A semicolon ends the list of bounding boxes for a single
file, a period ends the file.
"filename": (x1, y1, x2, y2), (x1, y1, x2, y2), ...;