Research in Image Communication and Understanding

Vision is the most important sense humans have. Computer vision tries to endow machines with similar capabilities to interpret visual input, and to act upon it. With the ICU team, we work on mainly three aspects: object recognition, 3D reconstruction and modeling, and tracking and gesture analysis.

To learn more about the individual topics and to show a list of related projects, please select a topic below:

In terms of recognition, we try to tackle several challenges: how to recognize objects from arbitrary viewpoints, under varying illumination, possibly while being occluded by other objects, or when placed against very different backgrounds. We study both how to recognize a specific person (my mother) or object (my brother's car), or an entire class of objects (like any person, any car, etc.). Current projects include:

Current projects in Object Recognition include:

Our 3D related research aims at extracting 3D shapes of objects and scenes with cheap hardware. We work on the 3D modeling of scenes from simple images or video sequences (possibly taken with hand-held cameras), but also on methods that employ special, projected light patterns. Some of the latter use visible light, but some use near-infrared in order not to be visible to the naked eye. We also build systems consisting of multiple cameras and projectors, in order to capture 3D scenes together with their variation over time (like detailed 3D motion capture for faces, hands, or full bodies). Current work is focusing also on real-time 3D extraction, with and without special illumination. Current projects include:

Current projects in 3D Modeling include:

In our tracking activities, we aim at several goals. On the one hand, we focus on the tracking of multiple people, in cluttered scenes, to determine their trajectories. But we also focus on detailed body pose analysis from monocular video streams. And we have worked on detailed 3D viseme tracking for faces and finger pose analysis for hands. In an ongoing project, we are working on the combination of visual hand tracking and the capture of the corresponding haptic data, while a person is manipulating an object. Current projects include:

Current projects in Tracking and Gesture Analysis include:

Current projects in Cultural Heritage include:

Current projects in Augmented Reality include:



REPLICATE aims at the creation of powerful tools to produce 3D models, made available to many users. 
The project's consortium consists of a team research institutions and SMEs. Smartphones are used to take photos, from which the 3D models are extracted. This 3D extraction combines the increasing computational power of phones with that of the cloud. REPLICATE builds an entire 3D creation suite, including a structure-from-motion 3D model generation part, model editing tools, and ways to combine models into exciting 3D worlds.

ETH is developing solutions for the acquisition of 3D shapes, turning smartphone photos into 3D models through a combination of local computation on the mobile devices and remote computation in a computer cloud. Successful modeling is supported by an interface that provides the user with a partial 3D reconstruction on-the-fly and advises the user where it might be the best to take further images to improve the model. Special focus is given to the actual distribution of the computation between the mobile and the cloud, which may depend on many factors including the quality of current Internet connection, the computational power of the mobile processor, and current battery level.

The joint mobile-cloud solution will facilitate an advanced level of object understanding as it is integrated into the 3D acquisition process. This will yield better 3D models with enhanced usability. ETH is also investigating the semantic analysis of models, e.g., segmentation of windows, doors, etc. in order to fill in unavoidable holes in the 3D point cloud models. To this end, methods extracting geometrical primitives like straight lines or planes from point clouds will be developed as well.

Download a PDF Project Brochure


Participants: Dr. Michal Havlena, Alex Locher, Prof. Luc Van Gool


Fondazione Bruno Kessler
Fraunhofer HHI
Gameware Europe
Animal Vegetable Mineral
t2i-transferimento tecnologico e innovazione



The next generation of semantic and dynamic city models.

Virtual city models are used in many game and movie designs, like the industry leading spin-off Procedural (now part of ESRI) for creating stunning 3D urban environments from 2D data.

Currently the production of real 3D city models comes at a high cost. Given that the modeling effort needs to be repeated regularly for updating, rendering city model production more efficient is an absolute necessity. Our work create inverse procedural models, which are built for existing cities. The modeling is done by analyzing image of real cities and constructing parametrized and semantic models, where we know the number of storeys, shadows cast by new buildings, the position of traffic signs, vegetation, etc.

Our research additionally creates dynamic living 3D city models, which allows for deeper immersion than in current city representations. We extract and special events and traffic flows to generate a city-scale motion and activity model. One can virtually visit Times Square and see what was on the electronic newsreel recently or check out traffic densities along a journey or the kids' way to school.


Participants: Massimo Mauro, Dr. Santiago Manen, Dr. Ralf Dragon, Till Kroeger, Dr. Andras Bodis-Szomoru, Dr. Dengxin Dai, Michael Gygli, Dr. Hayko Riemenschneider, Prof. Luc Van Gool, Dr. Julien Weissenberg


KU Leuven
ESRI Procedural 


RADHAR (Robotic ADaptation to Humans Adapting to Robots) will develop a driving assistance system involving environment perception, driver perception and modelling, and robot decision making. RADHAR proposes a framework to seamlessly fuse the inherently uncertain information from both environment perception and the driver's steering signals by estimating the trajectory the robot should execute, and to adopt this fused information for safe navigation with a level of autonomy adjusted to the user's capabilities and desires. This requires lifelong, unsupervised but safe learning by the robot. As a consequence, a continuous interaction between two learning systems (the robot and the user) will emerge, hence Robotic ADaptation to Humans Adapting to Robots (RADHAR). The framework will be demonstrated on a robotic wheelchair platform that navigates in an everyday environment with everyday objects. RADHAR targets as main scientific outcomes: online 3D perception combining laser scanners and vision with traversability analysis of the terrain; novel paradigm for fusing environment and user perception and for safe robot navigation.


Participants: Dr. Jürgen Gall, Dr. Michael Van den Bergh, Dr. Andrea Fossati, Prof. Luc Van Gool, Dr. Gabriele Fanelli


Katholieke Universiteit Leuven, Belgium Albert-Ludwigs-Universitaet Freiburg, Germany PROFACTOR GMBH, Austria HMC International, Belgium Permobil AB, Sweden Windekind VZW Centrum voor buitengewone zorg, Belgium Nationaal Multiple Sclerose Centrum, Belgium Integrated Microsystems Austria GmbH, Austria

Finished in: 2013


In general, interactive robots provide people with information they need or want to have, like mobile museum guides. People address the robot and know what to obtain from it.

For IURO we invert the perspective: The robot addresses arbitrarily passers-by in public (urban) areas in order to obtain 'vital' information from them: In which direction is square X? Where can I find shop Y?

These are everyday knowledge gaps experienced by human pedestrians. We assume that mobile service robots will experience the same gaps while navigating outdoor in public spaces.

As humans, robots will have to rely on proactive communication when available knowledge is incomplete. They will need to know how to address people, how to engage a conversation, how to establish a feeling of trust and comfort, how to ask the right questions, and how to interpret correctly the hints and cues obtained.


Participants: Dr. Andrea Fossati, Dr. Jürgen Gall, Dr. Michael Van den Bergh, Dr. Gabriele Fanelli, Prof. Luc Van Gool


Institute of Automatic Control Engineering is part of the Technische Universität München Human-Computer Interaction & Usability Unit of the ICT&S Center at Universität Salzburg Department of Speech, Music and Hearing of the Swedish Royal Institute of Technology ACCREA engineering


Many everyday actions take place in a social and affective context and presuppose that the agents share this context. But current motion synthesis techniques, e.g. in computer graphics, mainly focus on physical factors. The role of other factors, and specifically psychological variables, is not yet well understood.

The goal of the TANGO project is to take these familiar ideas about affective communication one radical step further by developing a framework to represent and model the essential interactive nature of social communication based on non- verbal communication with facial and bodily expression.

TANGO will investigate interactions in real life contexts showing agents in daily situations such as navigation and affective communication. A central goal of the project is the development of a mathematical theory of emotional communicative behaviour. Theoretical developments and investigations of the neurofunctional basis of affective interactions will be combined with advanced methods from computer vision and computer graphics. Emotional interactions can be studied quantitatively in detail and can be transferred in technical systems that simulate believable emotional interactive behaviour. Based on the obtained experimental results and mathematical analysis, a new generation of technical devices establishing emotional communication between humans and machines will be developed.

TANGO goes beyond the state of the art in theoretical scope, in methodological approaches and in innovative applications that are anticipated.


Participants: Prof. Luc Van Gool, Dr. Gabriele Fanelli, Dr. Stefano Pellegrini


Tilburg University, Netherlands Eberhard Karls Universität Tübingen,Germany Eidgenössische Technische Hochschule Zürich, Switzerland Weizman Institute of Science, Israel Institut national de Recherche en Informatique et en Automatique, INRIA, France Max Planck Institute for Biological Cybernetics, Germany Università degli studi di Roma La Sapienza, Italy

Aerial Crowd

Most Augmented Reality applications deal with very restricted and constrained environments. The goal of the AERIAL CROWDS project is to take Augmented Reality out of the laboratory and into a real urban environment, where it can be used to virtually add or remove buildings and crowds. Virtual crowds will be customized by selecting from a variety of realistic behaviors and appearances.

Participants: Dr. Frederic Bosche, Simon Haegler, Prof. Luc Van Gool


Ecole Polytechnique Féderale de Lausanne, Computer Vision Laboratory Ecole Polytechnique Féderale de Lausanne, Virtual Reality Laboratory ETHZ Computer Vision Laboratory MIRALab, University of Geneva


The 3D-COFORM Consortium has one over-riding aim: to establish 3D documentation as an affordable, practical and effective mechanism for long term documentation of tangible cultural heritage. In order to make this happen the consortium is highly conscious that both the state of the art in 3D digitisation and the practical aspects of deployment in the sector must be addressed. Hence 3D-COFORM proposes an ambitious program of technical research, coupled with practical exercises and research in the business of 3D to inform and accelerate the deployment of these technologies to good effect. The research leading to these results has received funding from the EuropeanActive Image Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 231809.


Participants: Prof. Luc Van Gool, Simon Haegler, Dr. Frederic Bosche, Henning Avenhaus, Dr. Jianke Zhu, Dr. Henning Hamer


Breukmann Centre for Documentation of Cultural and Natural Heritage Laboratoire du Centre de recherche et de restauration des musées de France CMC Associated Consiglio Nazionale delle Ricerche, Istituto di Scienza e Tecnologie dell'Informazione The Cyprus Institute Foundation for Research & Technology, Hellas Spheron PIN, Università di Firenze Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. Katholieke Universiteit Leuven MICC, Università di Firenze Technische Universität Graz University of Bonn University of Brighton University of East Anglia University of Glasgow Victoria and Albert Museum, Photographic Department


The Toyota project is a joint project with the KU Leuven, aimed at developing the computer vision software for a mobile robot, equipped with a stereo camera head. The division at ETH focuses on the 3D reconstruction of the environment of the robot. Special attention is given to the reconstruction of indoor environments. Untextured environments aggravate the reconstruction using traditional dense stereo approaches. To still obtain a reconstruction, additional information, such as coplanarity is introduced. Line features, abundant in architectural scenes, can be used as basis for this. This project investigates and refines existing approaches of the various subparts needed for the reconstruction. This covers tracking of lines over a video sequence, pose estimation algorithms for the relative localisation of the robot, and bundle adjustment for improving the reconstruction, among others. So far, an improved version of a feature detector - SURF - has been published, and further papers on 3D reconstruction and pose estimation are in the pipeline.

Download of SURF feature detector

Participants: Dr. Herbert Bay, Dr. Andreas Ess, Prof. Luc Van Gool


Katholieke Universiteit Leuven, Heverlee


Objective: Today's computers can do many amazing things but there are still many "trivial" but important tasks they cannot do well. In particular, current information extraction techniques perform well when event types are well represented in the training data but often fail when encountering information-rich unexpected rare events. DIRAC project addresses this crucial machine weakness and aims at designing and developing an environment-adaptive autonomous artificial cognitive system that will detect, identify and classify possibly threatening rare events from the information derived by multiple active information-seeking audio-visual sensors. Biological organisms rely for their survival on detecting and identifying new events. DIRAC therefore strives to combine its expertise in physiology of mammalian auditory and visual cortex and in audio/visual recognition engineering with the aim to move the art of audiovisual machine recognition from the classical signal processing/pattern classification paradigm to human-like information extraction. This means, among other things, to move from interpretation of all incoming data to reliable rejection of non-informative inputs, from passive acquisition of a single incoming stream to active search for the most relevant information in multiple streams, and from a system optimized for one static environment to autonomous adaptation to new changing environments, thus forming foundation for a new generation of efficient cognitive information processing technologies. DIRAC is an EU IP IST project of the 6th Framework Program. Its duration is 5 years, from January 2006 until December 2010.


Participants: Dr. Bastian Leibe, Dr. Tobias Jaeggli, Prof. Luc Van Gool, Dr. Andreas Ess, Dr. Fabian Nater, Dr. Konrad Schindler, Dr. Esther Koller-Meier


IDIAP Research Institute (CH) The Hebrew University of Jerusalem (IL) Czech Technical University (CS) Carl von Ossietzky Universitaet Oldenburg (DE) Leibniz Institute for Neurobiology (DE) Katholieke Universiteit Leuven, Laboratorium voor Neuro- en Psychofysiologie and ESAT/PSI VISICS (B) Oregon Health and Science University OGI School of Science and Engineering (USA).

Finished in: 2010

IM 2

The National Center of Competence in Research (NCCR) on Interactive Multimodal Information Management, in brief (IM)2, is aimed at the advancement of research, and the development of prototypes, in the field of man-machine interaction. The NCCR is particularly concerned with technologies coordinating natural input modes (such as speech, image, pen, touch, hand gestures, head and/or body movements, and even physiological sensors) with multimedia system outputs, such as speech, sounds, images, 3D graphics and animation.

In the first phase (2004-2006) the Computer Vision Laboratory at ETH Zurich was involved in the Workpackage on Scene Analysis of this project. We have been working on several issues:

In the current phase (starting 2006) IM2 addresses the following themes:


Participants: Dr. Beat Fasel, Dr. Philipp Zehnder, Dr. Esther Koller-Meier, Prof. Luc Van Gool, Dr. Tobias Jaeggli, Dr. Till Quack, Dr. Gabriele Fanelli

In-Hand 3D Scanning

Objective: Tracking of articulated objects is a task as attractive as it is challenging. Attractive because it can be applied to many applications such as motion capture, Human-Computer Interaction, animation, and medical diagnosis. The task is also challenging, especially when , the computation time has to kept low: the more degrees of freedom the object has, the more difficult this is. Furthermore, in a high dimensional space, many ambiguities may arise. In our research we focus on the hands, where we combine on-line 3D scans with monocular imagery (contour information). The background can be cluttered and the hands may hold an object. For every frame of a video, a detailed hand pose incl. the angles of all finger digits is extracted.

Participants: Dr. Matthieu Bray, Prof. Luc Van Gool, Dr. Henning Hamer


Objective: Cultural Heritage Informatics Research Oriented Network CHIRON is a Marie-Curie EU-funded project providing research training fellowships for graduates wishing to start a research career in the field of IT applications to the research, conservation, and presentation of tangible Cultural Heritage. The project will consist of a joint training program and individual research carried out by fellows within a co-ordinated framework at participating partner institution. CHIRON has a duration of four years with an overall budget of about 2 300 000 Euro.


Participants: Dr. Esther Koller-Meier, Dr. Konrad Schindler, Dr. Henning Hamer


PIN scrl Servizi didattici e scientifici per l'Università di Firenze (IT) The University of the Aegean (GR) The Ben-Gurion University of the Negev (IL) The University of Brighton (UK) The Ename Center For Public Archaeology And Heritage Presentation (BE) Eidgenossiche Technische Hochschule Zurich (CH) The University of York (UK)



In the frame of the project ReMeDi a robot system is designed that features medical tele-examination of patients. Successful medical treatment depends on a timely and correct diagnosis, but the availability of doctors of various specializations is limited, especially in provincial hospitals or after regular working hours. Medical services performed remotely are emerging, yet current solutions are limited to merely teleconferencing and are insufficient. Use case scenarios targeted in ReMeDi feature a robot capable of performing a physical examination, specifically of the two most widespread examination techniques i) palpation, i.e. pressing the patients stomach with the doctor’s hand and observing the stiffness of the internal organs and the patient’s feedback (discomfort, pain) as well as ii) ultrasonographic examination. Beside quality teleconferencing, ReMeDi features a mobile robot (placed in a hospital) equipped with a lightweight and inherently safe manipulator with an advanced sensorized head and/or ultrasonic probe; and the remote interface (placed at the doctor’s location) equipped with sophisticated force-feedback, active vision and locomotion capabilities. The system is incrementally built following a user-centered design approach, and its usability with respect to the patient and the examining doctor is extensively studied in real world scenarios of cardiac examination. ReMeDi will go beyond classical telepresence concepts: It will capture and process multi-sensory data (integrating visual, haptic, speech, patient’s emotions and physiological responses) into perception and reasoning capabilities making ReMeDi a diagnostic assistant offering context-dependent and proactive support for the doctor. Particular attention is devoted to safety aspects. The normative standards (both existing and in draft) and the results of ongoing research projects will be integrated in all the system development phases.

In actual aging societies, the demand for specialized medical care becomes higher. Nowadays in the majority of countries worldwide a lack of physicians is observed. Forecasts warn that this lack will grow worse in the near future. It is already visible with the limited number of specialists who are not always available to medical units due to geographical (e.g. provincial hospitals), time (after regular working hours) or other logistic constraints. This situation led to development of several types of medicine-related services performed remotely, ranging from Telenursing, Telepharmacy, Telerehabilitation, Telepsychiatry, Telepathology, Teledentistry, etc. to Telesurgery. All these medical tele-services are examples of the use of information and communication technologies (ICT) for health, called eHealth.

The ReMeDi project addresses telediagnostics in clinical environments. A successful medical treatment depends on a timely and correct diagnosis. In the ReMeDi project we develop a multifuntional robotic device, which will allow performing a real remote physical and ultrasonographic (USG) examination. Working as a multidisciplinary consortium (physicians, human-robot interaction researchers such as psychologists and social scientists, and engineers), we want to enable remote examinations that come as close as possible to direct examinations and thereby follow the most natural and common medical techniques. Our goal is to make the ReMeDi robot user-friendly for physicians and acceptable to patients by enhancing (tele-)presence with intelligent autonomous features.

The envisioned system consists of a mobile robot – ReMeDi - operating in a hospital, and a remote interface - DiagUI - placed at the doctor’s location. The role of the ReMeDi robot is twofold: firstly - it acts as a full embodiment of the doctor; secondly it is an intelligent robot system equipped with advanced perception, reasoning, and learning abilities. It is extremely important to underline that ReMeDi will go beyond traditional teleoperated diagnoses; it will be teleoperated only if needed to improve the quality of the diagnoses and guarantee the patient’s health and safety in critical situations. Therefore, ReMeDi can be considered more as a diagnostic associate and as a first step towards a future, fully autonomous diagnostician than as a sophisticated medical tool.


Participants: Dr. Andrea Fossati, Dr. Thomas Probst


University of the West of England, Bristol (UWE), Bristol Robotics Laboratory

ACCREA Engineering, Poland 

Medical University of Lublin, Poland

ICT&S Center Salzburg, Austria 

SSSA - Scuola Superiore Sant’Anna, Italy

Eidgenössische Technische Hochschule Zürich, Switzerland

Wroclaw University of Technology, Poland