Research in Image Communication and Understanding
Vision is the most important sense humans have. Computer vision tries to endow machines with similar capabilities to interpret visual input, and to act upon it. With the ICU team, we work on mainly three aspects: object recognition, 3D reconstruction and modeling, and tracking and gesture analysis.To learn more about the individual topics and to show a list of related projects, please select a topic below:
RADHAR

RADHAR (Robotic ADaptation to Humans Adapting to Robots) will develop a driving assistance system involving environment perception, driver perception and modelling, and robot decision making. RADHAR proposes a framework to seamlessly fuse the inherently uncertain information from both environment perception and the driver's steering signals by estimating the trajectory the robot should execute, and to adopt this fused information for safe navigation with a level of autonomy adjusted to the user's capabilities and desires. This requires lifelong, unsupervised but safe learning by the robot. As a consequence, a continuous interaction between two learning systems (the robot and the user) will emerge, hence Robotic ADaptation to Humans Adapting to Robots (RADHAR). The framework will be demonstrated on a robotic wheelchair platform that navigates in an everyday environment with everyday objects. RADHAR targets as main scientific outcomes: online 3D perception combining laser scanners and vision with traversability analysis of the terrain; novel paradigm for fusing environment and user perception and for safe robot navigation.
Project-Website: http://www.radhar.eu/
Participants: Gabriele Fanelli Andrea Fossati Jürgen Gall Michael Van den Bergh Luc Van Gool
Partners: Katholieke Universiteit Leuven, Belgium
Albert-Ludwigs-Universitaet Freiburg, Germany
PROFACTOR GMBH, Austria
HMC International, Belgium
Permobil AB, Sweden
Windekind VZW Centrum voor buitengewone zorg, Belgium
Nationaal Multiple Sclerose Centrum, Belgium
Integrated Microsystems Austria GmbH, Austria
IURO

In general, interactive robots provide people with information they need or want to have, like mobile museum guides. People address the robot and know what to obtain from it.
For IURO we invert the perspective: The robot addresses arbitrarily passers-by in public (urban) areas in order to obtain 'vital' information from them: In which direction is square X? Where can I find shop Y?
These are everyday knowledge gaps experienced by human pedestrians. We assume that mobile service robots will experience the same gaps while navigating outdoor in public spaces.
As humans, robots will have to rely on proactive communication when available knowledge is incomplete. They will need to know how to address people, how to engage a conversation, how to establish a feeling of trust and comfort, how to ask the right questions, and how to interpret correctly the hints and cues obtained.
Project-Website: http://www.iuro-project.eu/
Participants: Gabriele Fanelli Andrea Fossati Jürgen Gall Michael Van den Bergh Luc Van Gool
Partners: Institute of Automatic Control Engineering is part of the Technische Universität München
Human-Computer Interaction & Usability Unit of the ICT&S Center at Universität Salzburg
Department of Speech, Music and Hearing of the Swedish Royal Institute of Technology
ACCREA engineering
TANGO

Many everyday actions take place in a social and affective context and presuppose that the agents share this context. But current motion synthesis techniques, e.g. in computer graphics, mainly focus on physical factors. The role of other factors, and specifically psychological variables, is not yet well understood.
The goal of the TANGO project is to take these familiar ideas about affective communication one radical step further by developing a framework to represent and model the essential interactive nature of social communication based on non- verbal communication with facial and bodily expression.
TANGO will investigate interactions in real life contexts showing agents in daily situations such as navigation and affective communication. A central goal of the project is the development of a mathematical theory of emotional communicative behaviour. Theoretical developments and investigations of the neurofunctional basis of affective interactions will be combined with advanced methods from computer vision and computer graphics. Emotional interactions can be studied quantitatively in detail and can be transferred in technical systems that simulate believable emotional interactive behaviour. Based on the obtained experimental results and mathematical analysis, a new generation of technical devices establishing emotional communication between humans and machines will be developed.
TANGO goes beyond the state of the art in theoretical scope, in methodological approaches and in innovative applications that are anticipated.
Project-Website: http://www.tango-project.eu/
Participants: Gabriele Fanelli Stefano Pellegrini Luc Van Gool
Partners: Tilburg University, Netherlands
Eberhard Karls Universität Tübingen,Germany
Eidgenössische Technische Hochschule Zürich, Switzerland
Weizman Institute of Science, Israel
Institut national de Recherche en Informatique et en Automatique, INRIA, France
Max Planck Institute for Biological Cybernetics, Germany
Università degli studi di Roma La Sapienza, Italy
Patient-Specific Model Generation for Surgical Training Simulation
Objective:
The target of this project is to extend and modify previously developed generic methods to patient-specific scenarios. Moreover, the various modules will be combined into a user-friendly, complete training scene generation tool. In this context, aspects of optimal human-computer interaction, workflow, and usability will be addressed.
A key element of training with virtual reality surgical simulators is the definition of the simulated patients. This step typically includes the generation of geometric models of healthy and pathological anatomy, organ textures, vessel structures, and the determination of tissue deformation parameters.
Participants: Thomas Wolf Michael Emmersberger Matthias Harders
Partners:
VirtaMed AG, Switzerland
Eidgenössische Technische Hochschule Zürich, Switzerland
Aerial Crowd
Most Augmented Reality applications deal with very restricted and constrained environments. The goal of the AERIAL CROWDS project is to take Augmented Reality out of the laboratory and into a real urban environment, where it can be used to virtually add or remove buildings and crowds. Virtual crowds will be customized by selecting from a variety of realistic behaviors and appearances.
Participants: Simon Hägler Frédéric Bosché Luc Van Gool
Partners: Ecole Polytechnique Féderale de Lausanne, Computer Vision Laboratory
Ecole Polytechnique Féderale de Lausanne, Virtual Reality Laboratory
ETHZ Computer Vision Laboratory
MIRALab, University of Geneva
3D-COFORM
The 3D-COFORM Consortium has one over-riding aim: to establish 3D documentation as an affordable, practical and effective mechanism for long term documentation of tangible cultural heritage. In order to make this happen the consortium is highly conscious that both the state of the art in 3D digitisation and the practical aspects of deployment in the sector must be addressed. Hence 3D-COFORM proposes an ambitious program of technical research, coupled with practical exercises and research in the business of 3D to inform and accelerate the deployment of these technologies to good effect.
The research leading to these results has received funding from the EuropeanActive Image Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 231809.
Project-Website: http://www.3d-coform.eu
Participants: Simon Hägler Henning Avenhaus Henning Hamer Frédéric Bosché Jianke Zhu Luc Van Gool
Partners: Breukmann
Centre for Documentation of Cultural and Natural Heritage
Laboratoire du Centre de recherche et de restauration des musées de France
CMC Associated
Consiglio Nazionale delle Ricerche, Istituto di Scienza e Tecnologie dell'Informazione
The Cyprus Institute
Foundation for Research & Technology, Hellas
Spheron
PIN, Università di Firenze
Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V.
Katholieke Universiteit Leuven
MICC, Università di Firenze
Technische Universität Graz
University of Bonn
University of Brighton
University of East Anglia
University of Glasgow
Victoria and Albert Museum, Photographic Department
Toyota
The Toyota project is a joint project with the KU Leuven, aimed at developing the computer vision software for a mobile robot, equipped with a stereo camera head. The division at ETH focuses on the 3D reconstruction of the environment of the robot. Special attention is given to the reconstruction of indoor environments. Untextured environments aggravate the reconstruction using traditional dense stereo approaches. To still obtain a reconstruction, additional information, such as coplanarity is introduced. Line features, abundant in architectural scenes, can be used as basis for this. This project investigates and refines existing approaches of the various subparts needed for the reconstruction. This covers tracking of lines over a video sequence, pose estimation algorithms for the relative localisation of the robot, and bundle adjustment for improving the reconstruction, among others. So far, an improved version of a feature detector - SURF - has been published, and further papers on 3D reconstruction and pose estimation are in the pipeline.
Download of SURF feature detector
Participants: Herbert Bay Andreas Ess Luc Van Gool
Partners: Katholieke Universiteit Leuven, Heverlee
DIRAC
Objective: Today's computers can do many amazing things but there are still many
"trivial" but important tasks they cannot do well. In particular,
current information extraction techniques perform well when event
types are well represented in the training data but often fail when
encountering information-rich unexpected rare events. DIRAC project
addresses this crucial machine weakness and aims at designing and
developing an environment-adaptive autonomous artificial cognitive
system that will detect, identify and classify possibly threatening
rare events from the information derived by multiple active
information-seeking audio-visual sensors.
Biological organisms rely for their survival on detecting and identifying new events. DIRAC therefore strives to combine its expertise in physiology of mammalian auditory and visual cortex and in audio/visual recognition engineering with the aim to move the art of audiovisual machine recognition from the classical signal processing/pattern classification paradigm to human-like information extraction. This means, among other things, to move from interpretation of all incoming data to reliable rejection of non-informative inputs, from passive acquisition of a single incoming stream to active search for the most relevant information in multiple streams, and from a system optimized for one static environment to autonomous adaptation to new changing environments, thus forming foundation for a new generation of efficient cognitive information processing technologies. DIRAC is an EU IP IST project of the 6th Framework Program. Its duration is 5 years, from January 2006 until December 2010.
Project-Website: http://www.diracproject.org/
Participants: Fabian Nater Bastian Leibe Tobias Jaeggli Andreas Ess Konrad Schindler Esther Koller-Meier Luc Van Gool
Partners: IDIAP Research Institute (CH)
The Hebrew University of Jerusalem (IL)
Czech Technical University (CS)
Carl von Ossietzky Universitaet Oldenburg (DE)
Leibniz Institute for Neurobiology (DE)
Katholieke Universiteit Leuven, Laboratorium voor Neuro- en Psychofysiologie and ESAT/PSI VISICS (B)
Oregon Health and Science University OGI School of Science and Engineering (USA).
IM 2
The National Center of Competence in Research (NCCR) on Interactive
Multimodal Information Management, in brief (IM)2, is aimed at the
advancement of research, and the development of prototypes, in the field
of man-machine interaction. The NCCR is particularly concerned with
technologies coordinating natural input modes (such as speech, image,
pen, touch, hand gestures, head and/or body movements, and even
physiological sensors) with multimedia system outputs, such as speech,
sounds, images, 3D graphics and animation.
In the first phase (2004-2006) the Computer Vision Laboratory at ETH Zurich was involved in the
Workpackage on Scene Analysis of this
project. We have been working on several issues:
- grouping based on geometric regularities
- a multi-feature based tracker
- a hand tracker
- person detection and tracking
- gesture analysis
- Multimodal input interface: including speech signal processing (natural speech recognition, speaker tracking, segmentation, and recognition) and visual input (e.g., shape tracking, face and gesture recognition, printed document processing and handwriting recognition).
- Integration of modalities and coordination among modalities, including (asynchronous) multi-channel processing (e.g., audio-visual tracking), integration of knowledge sources (expert fusion), and multimodal language modeling.
- Meeting dynamics and human-human interaction modeling, including the definition of meeting scenarios, analysing human interaction and multimodal dialogue modeling.
- Content abstraction, including multimodal information indexing, summarizing, and retrieval.
- Technology transfer through exploration and evaluation of advanced end-user applications, evaluating the advantages and drawbacks of the above functionalities in different prototype systems.
Project-Website: http://www.im2.ch/
Participants: Philipp Zehnder Gabriele Fanelli Beat Fasel Esther Koller-Meier Tobias Jaeggli Till Quack Luc Van Gool
In-Hand 3D Scanning
Objective: Tracking of articulated objects is a task as attractive as it is challenging. Attractive because it can be applied to many applications such as motion capture, Human-Computer Interaction, animation, and medical diagnosis.
The task is also challenging, especially when , the computation time has to kept low: the more degrees of freedom the object has, the more difficult this is. Furthermore, in a high dimensional space, many ambiguities may arise. In our research we focus on the hands, where we combine on-line 3D scans with monocular imagery (contour information). The background can be cluttered and the hands may hold an object. For every frame of a video, a detailed hand pose incl. the angles of all finger digits is extracted.
Participants: Matthieu Bray Henning Hamer Luc Van Gool
CHIRON
Objective: Cultural Heritage Informatics Research Oriented Network
CHIRON is a Marie-Curie EU-funded project providing research training fellowships for graduates wishing to start a research career in the field of IT applications to the research, conservation, and presentation of tangible Cultural Heritage. The project will consist of a joint training program and individual research carried out by fellows within a co-ordinated framework at participating partner institution. CHIRON has a duration of four years with an overall budget of about 2 300 000 Euro.
Project-Website: http://www.chiron-training.org/index.html
Participants: Henning Hamer Esther Koller-Meier Konrad Schindler
Partners: PIN scrl Servizi didattici e scientifici per l'Università di Firenze (IT)
The University of the Aegean (GR)
The Ben-Gurion University of the Negev (IL)
The University of Brighton (UK)
The Ename Center For Public Archaeology And Heritage Presentation (BE)
Eidgenossiche Technische Hochschule Zurich (CH)
The University of York (UK)