Below you can browse through a selection of demos. Enjoy!
Authors: Michael Van den Bergh, Esther Koller-Meier, and L. Van Gool
Based on a 3D hull reconstruction, the current pose of the user is detected from a database of predefined poses. This is done in real-time using 3D Haarlets. The system works for any orientation of the user.
References:
M. Van den Bergh, E. Koller-Meier, and L. Van Gool
"Realtime
body pose recognition using 2d or 3d haarlets",
Interna-
tional Journal of Computer Vision, vol. 83, pp. 72-84,June
2009.
M. Van den Bergh, E. Koller-Meier, and L. Van Gool
"Realtime
3d body pose estimation",
Multi-Camera Networks:
Concepts and Applications, pp. 335-360, 2009.
MPEG-4 movie (13 MB)
Created: October 2010
Authors: Andreas Ess, Bastian Leibe, Konrad Schindler and L. Van Gool
We address the problem of vision-based multi-person tracking in busy pedestrian zones using a pair of forward-looking cameras mounted on a mobile platform. Specifically, we are interested in the application of such a system for supporting path planning algorithms in the avoidance of dynamic obstacles. The complexity of the problem calls for an integrated solution, which extracts as much visual information as possible and combines it through cognitive feedback. We propose such an approach, which jointly estimates camera position, stereo depth, object detections, and trajectories based on visual information only. We represent the interplay between these components using a graphical model. For each frame, we first estimate the ground surface together with a set of object detections. Conditioned on these results, we then address object interactions and estimate trajectories. Finally, we employ the tracking results to predict future motion for dynamic objects and fuse this information with a static occupancy map estimated from stereo.
References:
A. Ess, B. Leibe, K. Schindler, and L. Van Gool
"Moving Obstacle Detection in Highly Dynamic Scenes",
IEEE International Conference on Robotics and Automation (ICRA'09), 2009, best vision paper award.
A. Ess, B. Leibe, K. Schindler, and L. van Gool
"Robust Multi-Person Tracking from a Mobile Platform",
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, No. 10, pp. 1831-1846, 2009
MPEG-4 movie (25 MB)
Created: October 2010
Authors: Michael Van den Bergh, Frédéric Bosché, Esther Koller-Meier, and L. Van Gool
A hand gesture interaction system set up at the Value Lab. A camera mounted on top of the screen detects hand gestures. Using these gestures, a user can manipulate a 3D model.
References:
M. Van den Bergh, F. Bosche, E. Koller-Meier, and L. Van Gool
"Haarlet-based hand gesture recognition for 3d interaction",
IEEE Workshop on Motion and Video Computing, December 2009.
M. Van den Bergh, J. Halatsch, A. Kunze, F. Bosche, L. Van
Gool, and G. Schmitt
"Towards Collaborative Interaction with Large nD Models for Effective Project Management", 9th International Conference on Construction Applications of Virtual Reality (ConVR), November 2009.
MPEG-4 movie (3.9 MB)
Created: October 2010
MPEG-4 movie (5.2 MB)
Created: October 2010
MPEG-4 movie (4.0 MB)
Created: October 2010
Authors: Michael D. Breitenstein, Fabian Reichlin, Bastian Leibe, Esther Koller-Meier, and L. Van Gool
Completely automatic multi-person detection and tracking. No background modeling - robust to camera motion (up to some amount). Only based on 2D information from a single, uncalibrated camera. No scene-specific information (ground plane). Causal/Markovian (no "looking into the future'') - suitable for time-critical online applications.
Additional information and videos
References:
M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool
"Online Multi-Person Tracking-by-Detection from a Single, Uncalibrated Camera",
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010
M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool
"Robust Tracking-by-Detection using a Detector Confidence Particle Filter",
IEEE International Conference on Computer Vision, October 2009
M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Van Gool
"Markovian Tracking-by-Detection from a Single, Uncalibrated Camera",
IEEE CVPR Workshop on Performance Evaluation of Tracking and Surveillance (PETS'09), June 2009
MPEG-4 movie (3.0 MB)
Created: October 2010
MPEG-4 movie (3.3 MB)
Created: October 2010
MPEG-4 movie (23 MB)
Created: October 2010
Authors: Andreas Ess, Tomas Mueller, Helmut Grabner and L. Van Gool
In this work, we propose a method to recognize the traffic scene in front of a moving vehicle with respect to the road topology and the existence of objects. To this end, we use a two-stage system, where the first stage abstracts from the underlying image by means of a rough super-pixel segmentation of the scene. In a second stage, this meta representation is then used to construct a feature set for a classifier that is able to distinguish between different road types as well as detect the existence of commonly encountered objects, such as cars or pedestrian crossings. We show that by relying on an intermediate stage, we can effectively abstract from any peculiarities of the underlying image data due to, e.g., color aberrations. The method is tested on two long, challenging urban data sets, covering both daylight and dusk conditions. Compared to a state-of-the-art descriptor, we show improved classification performance, especially for object classes.
References:
A. Ess, T. Mueller, H. Grabner, L. Van Gool,
"Segmentation-Based Urban Traffic Scene Understanding",
British Machine Vision Conference (BMVC '09), 2009.
MPEG-4 movie (9.6 MB)
Created: October 2010
Authors: Gabriele Fanelli, Juergen Gall and L. Van Gool
We present a novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy. While facial feature points like mouth corners or lip contours are commonly used to estimate at least scale, position, and orientation of the mouth, we propose a Hough transform-based method. Instead of relying on a predefined sparse subset of mouth features, it casts probabilistic votes for the mouth center from several patches in the neighborhood and accumulates the votes in a Hough image. This makes the localization more robust as it does not rely on the detection of a single feature. In addition, we exploit the different shape properties of eyes and mouth in order to localize the mouth more efficiently. Using the rotation invariant representation of the iris, scale and orientation can be efficiently inferred from the localized eye positions. The accompanying video shows some example sequences of tracked faces, with the recognition of the uttered words, both using only audio cues (AO) and fusing audio and visual features (AV), with and without white noise added to the audio channel.
References:
G. Fanelli, J. Gall and L. Van Gool,
"Hough Transform-based Mouth Localization for Audio-Visual Speech Recognition",
British Machine Vision Conference (BMVC '09), 2009.
J. Gall and V. Lempitsky,
"Class-Specific Hough Forests for Object Detection",
IEEE Conference on Computer Vision and Pattern Recognition, 2009.
MPEG-4 movie (22 MB)
Created: October 2010
CGA shape, a novel shape grammar for the procedural modeling of CG architecture, produces building shells with high visual quality and geometric detail. It produces extensive architectural models for computer games and movies, at low cost. Context sensitive shape rules allow the user to specify interactions between the entities of the hierarchical shape descriptions. Selected examples demonstrate solutions to previously unsolved modeling problems, especially to consistent mass modeling with volumetric shapes of arbitrary orientation. CGA shape is shown to efficiently generate massive urban models with unprecedented level of detail, with the virtual rebuilding of the archaeological site of Pompeii as a case in point.
Created by Pascal Müller and Simon Haegler
More Info here
Authors: Nico Cornelis, Bastian Leibe, Kurt Cornelis, Luc Van Gool
CVPR'06 Video Proceedings Best Video Award
In this video [1] we show the combined results from two recent publications [2], [3]. In [2], we introduce a real-time 3D City Modeling algorithm which is able to build compact 3D representations of cities using the assumption that building facades and roads can be modeled by simple ruled surfaces. The main advantage of this algorithm is its exceptional speed. It can process the full Structure-from-Motion and dense reconstruction pipeline at 25-30fps -- thus, the reconstructed model can directly be created online, while the survey vehicle is driving through the streets. However, due to the simple geometry assumptions, this original algorithm is unable to model cars which are everpresent in cities and obviously visually degrade our resulting 3D city model.
In [3], we therefore propose to combine the 3D reconstruction with an object detection algorithm based on Implicit Shape Models. The two components are integrated in a cognitive feedback loop. The 3D reconstruction modules inform object detection about the scene geometry, which greatly helps to improve detection precision. Using the knowledge of camera parameters and scene geometry from [2], the 2D car detections are temporally integrated in a world coordinate frame, which allows to obtain precise 3D location and orientation estimates. Those can then be used to instantiate the virtual 3D car models which improve the visual realism of our final 3D city model.
Our final system is able to create an automatic 3D city model from the input video streams of a survey vehicle, identify the locations of cars in the recorded real-world scene, and replace them by virtual 3D models in the reconstruction. Besides improving the visual realism of the final 3D model, this has as the additional benefit that it also solves privacy issues by removing personalized information from the resulting final city model. Therefore, object recognition can aid 3D reconstruction in achieving more realistic results. On the other hand, the object recognition algorithm itself can benefit from the higher-level scene knowledge which is available through 3D reconstruction. It is exactly this bidirectional nature of interactions between both the reconstruction and recognition algorithm which earns it the name of cognitive loop.
References:
[1] N. Cornelis, B. Leibe, K. Cornelis, L. Van Gool,
"3D City Modeling Using Cognitive Loops",
3rd International Symposium on 3D Data Processing, Visualization, and
Transmission (3DPVT'06), Chapel Hill, USA, June 2006.
and
Video Proceedings for CVPR 2006 (VPCVPR'06), New York, June 2006.
[2] N. Cornelis, K. Cornelis, L. Van Gool,
"Fast Compact City Modeling for Navigation Pre-Visualization",
In IEEE International Conference on Computer Vision and Pattern
Recognition (CVPR'06), New York, 2006.
[3] B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool,
"Integrating Recognition and Reconstruction for Cognitive Traffic
Scene Analysis from a Moving Vehicle",
In DAGM Annual Pattern Recognition Symposium, Berlin, Germany,
LNCS Vol. 4174, pp. 192-201, Springer, September 2006.
The prototype has been created using several modules developed within a number of Co-Me projects. These modules provide simulation of soft tissue deformation, collision detection and response, cutting, as well as a hysteroscopy tool as input device to the simulator. In addition, a CFD module has been integrated for blood flow simulation. Moreover, we replicated an OR in our lab and provide standard hysteroscopic tools for interaction. In this setting, the training starts as soon as the trainee enters the OR, and it ends, when she leaves the room.
More info: http://www.hystsim.ethz.ch/
In our current research we examine the integration of haptic interfaces into augmented reality setups. The ultimate target of these endeavours is the application of the framework to training of manipulative skills in surgical environments. To this end, highly accurate calibration, system stability, and low latency are indispensable prerequisites. Therefore, we developed a new calibration method to exactly align the haptic and world coordinate systems. Moreover, a distributed framework was created, which ensures low latency and component synchronization. Finally, to demonstrate our results, we integrated all elements into an augmented reality haptics ping-pong game. (Video 1)
Publication: G. Bianchi, B. Knörlein, G. Székely and M. Harders, "High Precision Augmented Reality Haptics", Eurohaptics 2006, July 2006
The driving force of our research is the precise combination of real and - possibly indistinguishable - virtual interactive objects in an augmented reality environment. This requires an interactive, multimodal simulation, as well as stable and accurate overlay of the computer-generated objects. This paper describes several methods to improve accuracy and stability of our hybrid augmented reality system. In a comparison of two approaches to hybrid head pose refinement, we show the superior performance of Quasi-Newton optimization for image space error minimization. Moreover, a 3D landmark refinement step is proposed, which significantly improves robustness of the overlay process. The enhanced system is demonstrated in an interactive AR environment, which provides accurate haptic feedback from real and virtual deformable objects. Finally, the effect of landmark occlusion on tracking stability during user interaction is also analyzed.
Publication: G. Bianchi, C. Jung, B. Knörlein, M. Harders and G. Székely, "High-fidelity visuo-haptic interaction with virtual objects in multi-modal AR systems", ISMAR 2006, October 2006.
In contrast to CT, MRI provides excellent soft tissue contrast and volunteers and patients are not exposed to ionising radiation.
Sequences of 3D volumes (4D data sets) were reconstructed from dynamic sagittal 2D images acquired during free breathing. Other gating methods assume regular respiratory motion and reduce the respiratory organ deformation to one parameter such as amplitude or phase. This neglects all residual variability and is a too coarse approximation in some cases, leading to artefacts in the reconstructed images.
The proposed approach derives a multi-dimensional gating measure from dedicated so-called navigator frames in order to determine the state of the liver retrospectively and find corresponding 2D slices that can be combined to 3D volumes. The method does not assume a constant breathing depth or even strict periodicity and does not depend on an external gating signal. The technique is applicable to any organ that undergoes respiratory motion such as lung, liver, pancreas or kidneys and can be implemented on a standard MR scanner without additional equipment.
Created by: Martin von Siebenthal.
More info: http://www.vision.ee.ethz.ch/4dmri/.
Blue-C is a interdisciplinary research project of the ETH. It combines the qualities of total immersion experienced in CAVE-like environments with simultaneous, real-time 3D video acquisition and rendering from multiple cameras.
An accurate archaeological high-resolution reconstruction of an ancient Roman fountain.
Created by Pascal Müller
Realistic Face Animation for Speech created by Gregor Kalberer and Pascal Müller
3D-tracking of human hands created by Matthieu Bray and Pascal Müller
Here are some movies demonstrating the capabilities of the affine region tracker and the augmented reality system developed by Vittorio Ferrari at the Computer Vision Lab of ETH Zuerich. Some of these results are discussed in the following papers:
Vittorio Ferrari, Tinne Tuytelaars and Luc Van Gool
"Real-time Affine Region Tracking and Coplanar Grouping",
in Proc. of the IEEE Computer Vision and Pattern Recognition (CVPR), Kauai, Hawaii, December 2001.
Vittorio Ferrari, Tinne Tuytelaars and Luc Van Gool
"Markerless Augmented Reality with a Real-time Affine Region Tracker",
in Proc. of the IEEE and ACM International Symposium on Augmented Reality (ISAR), New York, New York, October
2001, pp. 87-96.
3D Augentation with a Buddha model. Created in cooperation with Lukas Hohl and Till Quack.
MPEG-1 movie (848kB)
Created: October 2001