ETH - Swiss Federal Institute of Technology
ITET - Department of Information Technology and Electrical Engineering
BIWI - Computer Vision Laboratory

Research, Prof. Luc Van Gool


Object Recognition

The following projects are part of the object recognition research area:
CIMWOS - Combined IMage and WOrd Spotting
Participants: Vittorio Ferrari, Luc Van Gool

Partner: ILSP - Institute for language and speech processing (Athens)
IDIAP, Belgium
Martigny, Belgium
CH, Belgium
Canal+, Belgium
KULeuven - Katholieke Universiteit Leuven, Belgium

Extended Information: Project Homepage

Objective: Develop a tool for helping users in the annotation of multimedia documents and their further content based retrieval.

The ETH part in CIMWOS is in the object recognition/localization field. We start from a shot-partitioned image sequence (movie) containing an object of interest. This is selected by the user from a key-frame. The object should then be automatically localized in every frame, by tracking it in the frames immediately following/preceding the key-frame, and re-localizing it in other shots. This base goal could be extended to deal with several objects and recognition of object classes. The underlying technology of the project is the matching of affinely invariant regions. Currently its use in object recognition is limited to counting the number of matching regions between the input image and several model objects images, and then selecting the object with the highest count. The main innovation of the project should consist in developing a model that can take into account the configurations of the regions on the object of interest. Learning their relative positions and motions automatically, the system should develop an internal structured representation of the object that will help dealing with the complex situations of a real movie (strong occlusions, sharp changes in camera position, complex motions).


Object/Scene Recognition for Wearable Computer
Participants: Hao Shao, Luc Van Gool

Partner: Electronics Laboratory ETH

Extended Information: Project Homepage

Objective: The vision part work of ETH poly project--wearable computer is to develop a system which will recognize an object or scene from a given image.

An image database in which all already known objects or scenes were included should be built at first. To build the database, invariant to affined transform regions should be extracted firstly from all the images in the database, then colour moment invariance of all the extracted regions are computed and stored in the database. After that, query image which includes the object or scene which are concerned by user has been processed with the same procedure. Then, a distance-based indexing techniques method, like the Vantage Point Tree, is adopted to index regions extracted from the binary tree structure region database. The best matched image in the database will be returned by the system and all the knowledge related to the image would provide to the user. So the user would know the object or the scene for the matched image contains same or similar object/scene.

Top



3D modeling, motion capture, animation

The following projects are part of the 3D modeling research area:
OSCAR - an Oppurtinistic SCAnneR
Participants: Andreas Griesser, Luc Van Gool

Extended Information: Project Homepage

Objective: The central theme of this project is the construction of an opportunistic 3D scanning system consisting of multiple cameras and stripe projectors.

Active lighting is a popular technique for the acquisition of 3D shapes. Typically one light projector and one or two cameras are combined into a single acquisition module. For OSCAR I will develop a setup consisting of several projection devices and cameras (i.e. multiple modules) that are configured around the scanned object to be modeled in 3D.

Typically, the light that is projected is fixed. Even in cases where a series of patterns are projected in succession, these patterns normally do not depend on the scene content. A notable exception is work at the University of Tel Aviv. In this work it is described how series of projected patterns can be optimised for noise levels and required accuracy. This has led to improvements over the popular Gray code technique. In an other work by that same group a series of colour patterns are optimised for the colour on the surface of an object, on a worst-case basis.

Nevertheless, some assumptions had to be made about the reflectance properties of the surface and the constancy of ambient lighting, and the number of projections has to be increased by two additional projections for normalisation.
In our planned work, one-shot ranging techniques are envisaged and the optimisation targets different object specific parameters.


In-Hand 3D Scanning
Participants: Matthieu Bray, Luc Van Gool

Objective: Tracking of articulated objects is a task as attractive as it is challenging. Attractive because it can be applied to many applications such as motion capture, Human-Computer Interaction, animation, and medical diagnosis.

The task is also challenging, especially when , the computation time has to kept low: the more degrees of freedom the object has, the more difficult this is. Furthermore, in a high dimensional space, many ambiguities may arise.

In our research we focus on the hands, where we combine on-line 3D scans with monocular imagery (contour information). The background can be cluttered and the hands may hold an object. For every frame of a video, a detailed hand pose incl. the angles of all finger digits is extracted.

Top



tracking and gesture analysis

The following projects are part of the tracking research area:
Blue-C
Participants: Esther Koller-Meier, Roland Kehl, Luc Van Gool

Partner: CAAD, ETH Zürich
Center of Product Development, ETH Zürich

Extended Information: Project Homepage

Objective: The blue-c project aims at fundamental research for, and development of a new generation of virtual design and modeling environments centering on the interaction between humans and models. By integrating three-dimensional human representations into immersive virtual environments, many of today's collaboration and interaction techniques can be improved and new ones will be invented.

Today's technology enables information exchange and simple communication. Our team will build a system that enables a number of participants to interact and collaborate in a virtual world at an unprecedented level of immersion. The blue-c will support: fully three-dimensionally rendered human inlays, supporting motion and speech in real time. Interaction metaphors between humans and simulated artifacts, are they functional, behavioral, or formal models or combinations of those.

The blue-c will leverage telepresence and virtual meetings to a new dimension of immersion. We will investigate the usability and performance of the prototype in selected applications including architecture, mechanical design, and medicine.

Our group is developing algorithms for multiple camera self-calibration, real-time segmentation, progressive silhouette extraction, and interpretation of natural human gestures.


CogViSys, a virtual commentator for video sequences
Participants: Philipp Zehnder, Luc Van Gool

Extended Information: Project Homepage

Objective: The goal of this project is to build a virtual commentator for video sequences. This means building a vision system that is able to translate visual information into a textual description, i.e. a system that can understand and tell what is happening in a specific video sequence. In particular we are working with content from situation comedies (sitcoms). This has the advantage of representing a quasi-closed world: Usually there is a rather small number of characters and only a few different sets, thus making the recognition task simpler. Nevertheless it is intended keep the overall framework general, so that it can easily be transferred to other tasks.

The project involves different levels of complexity in the field of computer vision and artificial intelligence. They may be roughly stated as follows:
  • State-of-the-art cue integration, so that the more cognitive processes can start from a firm basis.
  • Recognition and tracking of objects, motions and environments. Here the main focus lies on categorization rather than identification of specific instantiations.
  • Understanding and interpretation of the information coming from the lower levels. This is the semantic layer of the system and includes the investigation of techniques to express knowledge and reasoning.
There are two main applications for the virtual commentator. The first one is indexing of video content. The system should annotate film sequences in the manner of a visual database. Based on this it should be possible to issue visual search operations (vgrep) like "Find all scenes where character John appears". The other application is to provide visually impaired viewers with a description of the visual content in order to augment the sound track. An example would be: "Elaine and Kramer have walked out of Seinfeld's apartment and are talking in the corridor."


VITOS - Virtual Touchscreen within the Miniaturized Wearable Computing Project
Participants: Esther Koller-Meier, Luc Van Gool

Partner: Electronics Laboratory, ETH Zürich
Computer Engineering and Networks Laboratory, ETH Zürich
Perceptual Computing and Computer Vision, ETH Zürich
History of Technology, ETH Zürich

Extended Information: Project Homepage

Objective: Hand gestures receive increasing interest for the interaction between a user and a wearable system. The user should be able to command the system through simple, intuitive gestures. The recognition tool will pick up hand and finger motions seen by a camera. The hand movement will mainly be used to activate different functions while the finger motion is applied to drive the mouse visible on the display.

The proposed system has to find respectively track the finger and the hand in an image sequence. Furthermore, the hand movements have to be distinguished between a number of predefined gestures by classifying the tracked trajectories.


ViRoom
Participants:Petr Doubek, Luc Van Gool

Extended Information: Project Homepage

Objective: Track humans inside a room, recognize their actions, describe the actions, provide the best view.

ViRoom is a room with multiple cameras. Our goal is to create a system which detects and tracks humans in this room, recognizes and stores descriptions of their actions, and selects the best viewpoint for this actions and generates new view from a virtual camera if necessary. We do not want to restrict the system to one particular room with a specific arrangement. We would like to be able to turn any room into ViRoom just by setting up the cameras.

Some of the possible tasks for ViRoom are: making training videos, automated training, tele-teaching.

Top



texture analysis and synthesis

The following projects are part of the texture research area:
CogViSys, Cognitive Vision Systems
Participants: Alexey Zalesny, Luc Van Gool

Extended Information: Project Homepage

Objective: CogViSys aims at developing a virtual commentator, which is able to translate visual information into a textual description.

ETH aims at developing a texture understanding system, which will be able to recognize the materials given their images under different viewing and illumination directions.

The first step is the analysis. The sequence of images of the material under consideration together with the appropriate viewpoint and illumination information are the input of the analysis procedure. The result of it is a so-called multiview texture model, which contains structural and statistical information about interdependencies of pixels for the variety of material appearances. The analysis must be fulfilled for every type of the material that is of interest in the current application. Thus, the output of the analysis stage is the database of the multiview models of different materials.

The second step is the classification. The texture model database is one input of the classifier. The another input is the textured image or images of the same material to be classified and maybe the specific appearance information of those images. The goal of the classifier is to select the model from the database, which best explains the input images. Thus, the output is the name of the material or the rejection from its recognition. The criterion of the model expressibility could be its ability to synthesize the texture that is visually similar to the analyzed one. ETH investigated such synthesis models based on the statistical description of image including viewpoint dependency (see Figure with real and synthetic tangerines and banana covered with the tangerine skin). The algorithm of model creation must be adopted now for the classification purposes.

Top


© 1999-2012 by ETH Zurich | | July 11, 2008 | Valid XHTML 1.0