| Home | CV | Research | Publications | Software | Datasets |
In the following, I provide a collection of datasets used in my publications. All data is only for research purposes, unless stated differently. When using this data, please acknowledge the effort that went into data collection by referencing the corresponding paper. If you have questions concerning the data, please contact me).
|
|
|
|
![]() |
Here are the three pedestrian crossing sequences used in our ICCV'07 paper. Each sequence comes with ground-truth bounding box annotations for the objects to be tracked, as well as a camera calibration (1 scale unit=1m). The annotation files for the pedestrian crossing sequences contain bounding box annotations for every fourth frame. |
Sequence #1 (601 frames) - images (18.7 MB), annotations, tracking results
Sequence #2 (301 frames) - images (10.3 MB), annotations, tracking results
Sequence #3 (379 frames) - images (57.4 MB), annotations, tracking results
![]() |
Here we provide the test sequences we used for evaluation in our ICCV'07 and CVPR'08 papers on "Depth and Appearance for Mobile Scene Analysis" and "A Mobile Vision System for Robust Multi-Person Tracking". For each sequence, we recorded two synchronized video streams from a pair of cameras mounted onto a child stroller. The images were recorded at a resolution of 640*480 (Bayered) and a framerate of 15fps. For each dataset, we provide both video streams, our automatically estimated camera calibration from SfM (1 scale unit=0.001m), as well as ground truth bounding box annotations. |
The ICCV'07 data can be downloaded from here. The CVPR'08 data is available here.
![]() |
The sequence contains 1175 stereo camera pairs acquired with setup mounted on top of a moving vehicle. The stereo setup has a fixed baseline, and the cameras are calibrated internally and with respect to each other. We provide the original images, recorded at 25fps and a resolution of 360*288 pixels, together with ground-truth bounding box annotations and our automatically estimated camera calibrations from SfM (1 scale unit=0.049603m). |
The data can be downloaded from here.
![]() |
The motorbike test set contains 115 images collected from the World Wide Web. Each image contains one or more motorbikes at different scales and in front of difficult backgrounds. Some images depict larger scenes in which the motorbike must be localized; others add to the difficulty by containing occluding elements, such as humans sitting on or posing in front of the motorbike. We provide the original images, together with ground-truth bounding box annotations. This test set has also been included into the PASCAL Object Recognition Database Collection. |
The data can be downloaded from here
(21.7MB).
Bounding box annotations are available here.
![]() |
The cow test set consists of 14 different video sequences showing a total of 18 cows walking from right to left in front of different backgrounds and with varying lighting conditions. Some test sequences contain severe interlacing and MPEG-compression artifacts and significant noise. Altogether, the test suite consists of 2217 frames, in which 1682 instances of cows are visible by at least 50% (and some more are visible by a smaller fraction). The data has been acquired by Derek Magee, Univ. of Leeds, and credit should therefore go to him. We have used this data set in our SLCV'04, ICCV'05, and IJCV'08 papers. |
The data can be downloaded from here (280MB).
![]() |
This data set contains the images we used for training our multi-view car detectors in the 3DPVT'06 and CVPR'07 papers. The data is split up into 7 subdirectories for car viewpoints approx. 30 degrees apart (the remaining viewpoints can be obtained by mirroring the provided images). There are between 100 and 150 training images for each viewpoint, except for the side view subdirectory ("az000deg"), which contains about 400 images. All images were extracted from the LabelMe database, normalized to a uniform size (as indicated in the directory name), and the LabelMe annotation polygon was converted into a binary segmentation mask. |
The data can be downloaded from here (115MB).
![]() |
This data set contains the images we used for training our motorbike side-view detector in the ICCV'05, BMVC'06, and IJCV'08 papers. The motorbike set contains 153 images from the CalTech-6 database. They are a subset of the 400 images Fergus et al. used for training in their CVPR'03 paper. The criterion for selecting those 153 images was that they had a roughly uniform background, which made them easy to segment using just the Flood Fill function of standard graphics software. For each image, a figure-ground segmentation map is provided in the subdirectory "maps/" |
The data can be downloaded from here (7.4MB).
![]() |
This data set contains the images we used for training our cow side-view detector in the SLCV'04, ICCV'05, and IJCV'08 papers. The cow set contains 113 images of cows walking in front of one of three static backgrounds, facing left. All images are extracted from video sequences originally taken by Derek Magee, Univ. of Leeds. The images are scaled such that the cows have approximately the same size (also 200 pixels width). For each image, we again provide a hand-made figure-ground segmentation mask in the subdirectory "maps/". |
The data can be downloaded from here (14.3MB).
![]() |
This data set contains the images we used for training our car side-view detector in the SLCV'04, DAGM'04, and IJCV'08 papers. The car set contains 50 images, mirrored to represent both car directions. 9 of the images were taken from the Corel database; 3 from the Internet; the rest was taken with a digital camera in Zurich. All images are scaled, such that the cars have approximately the same size (a width of about 200 pixels). For each image, a hand-drawn figure-ground segmentation map is provided in the subdirectory "maps/". As there is a natural uncertainty how to handle transparencies and shadows, we set the following rules for the reference segmentation:
|
The data can be downloaded from here (9.6MB).
![]() |
This data set contains the images we used for training our car rear-view detector in the IJCV'08 paper. The cars-rear set contains the 126 images from the cars_markus set of the CalTech database. They have been manually segmented and resized to a width of 200 pixels. For each image, the figure-ground segmentation map is provided in the subdirectory "maps/" |
The data can be downloaded from here (23.2MB).
Each image archive comes with a subdirectory "maps" containing either a single calibration file "camera.default" (in the case of a static camera), or a separate calibration file "camera.XXXXX" for every frame (in the case of a moving camera). In the latter case, those calibrations were automatically obtained using the Structure-from-Motion approach by Cornelis et al., CVPR'06.
Calibration files contain the cal ibration for one image at a time (K [3x3], rad [1x3], R [3x3], t [1x3], GP[1x4]), with K the internal calibration, rad the radial distortion coefficients, R/t external calibration, world -> camera (i.e. X_cam = R X_world + t), and GP the ground plane coordinates (in the form GP(1:3)x - GP(4)=0). For convenience, we provide the Matlab function read_camera.m, which demonstrates how to read in the camera parameters.
Please note that for many datasets, we rescaled all images to twice their original size for object detection. In those cases, the calibration files still refer to this original size. Therefore, all image coordinates need to be divided by a factor of 2 prior to applying the calibration. For the static sequences, the world scale is already expressed in meters.
The IDL files are used for storing the annotations of the
sequence. For each image, the file format lists a set of bounding
boxes, separated by commas. The boxes contain upper-left and
lower-right corner, but are not necessarily sorted according to
this. A semicolon ends the list of bounding boxes for a single
file, a period ends the file.
"filename": (x1, y1, x2, y2), (x1, y1, x2, y2), ...;