Research, Prof. Luc Van Gool
Object Recognition
The following projects are part of the
object recognition research area:
| CIMWOS - Combined IMage and WOrd Spotting |
|
Participants: Vittorio Ferrari, Luc Van Gool
| Partner: |
ILSP - Institute for language and speech processing (Athens) |
|
IDIAP, Belgium
Martigny, Belgium
CH, Belgium
Canal+, Belgium
KULeuven - Katholieke Universiteit Leuven, Belgium
|
Extended Information: Project Homepage
Objective: Develop a tool for helping users in the annotation of multimedia documents
and their further content based retrieval.
The ETH part in CIMWOS is in the object recognition/localization
field. We start from a shot-partitioned image sequence (movie) containing an object of
interest. This is selected by the user from a key-frame. The object
should then be automatically localized in every frame, by tracking it
in the frames immediately following/preceding the key-frame, and
re-localizing it in other shots. This base goal could be extended to
deal with several objects and recognition of object classes.
The underlying technology of the project is the matching of affinely
invariant regions. Currently its use in object recognition is limited
to counting the number of matching regions between the input image and
several model objects images, and then selecting the object with the
highest count. The main innovation of the project should consist in
developing a model that can take into account the configurations of
the regions on the object of interest. Learning their relative
positions and motions automatically, the system should develop an
internal structured representation of the object that will help
dealing with the complex situations of a real movie (strong
occlusions, sharp changes in camera position, complex motions).
|
|
| Object/Scene Recognition for Wearable Computer |
|
Participants: Hao Shao, Luc Van Gool
Partner: Electronics Laboratory ETH
Extended Information: Project Homepage
Objective: The vision part work of ETH poly project--wearable computer is to develop a system which will recognize an object or scene from a given image.
An image database in which all already known objects or scenes were included should be built at first. To build the database, invariant to affined transform regions should be extracted firstly from all the images in the database, then colour moment invariance of all the extracted regions are computed and stored in the database. After that, query image which includes the object or scene which are concerned by user has been processed with the same procedure. Then, a distance-based indexing techniques method, like the Vantage Point Tree, is adopted to index regions extracted from the binary tree structure region database. The best matched image in the database will be returned by the system and all the knowledge related to the image would provide to the user. So the user would know the object or the scene for the matched image contains same or similar object/scene.
|
|
Top
3D modeling, motion capture, animation
The following projects are part of the
3D modeling research area:
| OSCAR - an Oppurtinistic SCAnneR |
|
Participants: Andreas Griesser, Luc Van Gool
Extended Information: Project Homepage
Objective: The central theme of this project is the construction of an opportunistic 3D
scanning system consisting of multiple cameras and stripe projectors.
Active lighting is a popular technique for the acquisition of 3D
shapes. Typically one light projector and one or two cameras are combined into
a single acquisition module. For OSCAR I will develop a setup consisting of
several projection devices and cameras (i.e. multiple modules) that are
configured around the scanned object to be modeled in 3D.
Typically, the light that is projected is fixed. Even in
cases where a series of patterns are projected in succession, these patterns
normally do not depend on the scene content.
A notable exception is work at the University of Tel Aviv. In this work it
is described how series of projected patterns can be optimised for noise
levels and required accuracy. This has led to improvements over the popular
Gray code technique. In an other work by that same group a series of colour patterns are
optimised for the colour on the surface of an object, on a worst-case basis.
Nevertheless, some assumptions had to be made about the reflectance
properties of the surface and the constancy of ambient lighting, and the
number of projections has to be increased by two additional projections for
normalisation.
In our planned work, one-shot ranging techniques are envisaged and
the optimisation targets different object specific parameters.
|
|
|
|
Participants: Matthieu Bray, Luc Van Gool
Objective: Tracking of articulated objects is a task as attractive as it is
challenging. Attractive because it can be applied to many applications
such as
motion capture, Human-Computer Interaction, animation, and medical
diagnosis.
The task is also challenging, especially when , the
computation time
has to kept low: the more degrees of freedom the object has, the more
difficult this is.
Furthermore, in a high dimensional space, many ambiguities may
arise.
In our
research we focus on the hands, where we combine on-line 3D scans
with monocular imagery (contour information). The background can be
cluttered
and the hands may hold an object. For every frame of a video, a detailed
hand
pose incl. the angles of all finger digits is extracted.
|
|
Top
tracking and gesture analysis
The following projects are part of the
tracking research area:
|
|
Participants: Esther Koller-Meier, Roland Kehl, Luc Van Gool
| Partner: |
CAAD, ETH Zürich |
|
Center of Product Development, ETH Zürich
|
Extended Information: Project Homepage
Objective: The blue-c project aims at fundamental research for, and development
of a new generation of virtual design and modeling environments
centering on the interaction between humans and models. By
integrating three-dimensional human representations into immersive
virtual environments, many of today's collaboration and interaction
techniques can be improved and new ones will be invented.
Today's technology enables information exchange and simple
communication. Our team will build a system that enables a number of
participants to interact and collaborate in a virtual world at an
unprecedented level of immersion. The blue-c will support: fully
three-dimensionally rendered human inlays, supporting motion and
speech in real time. Interaction metaphors between humans and
simulated artifacts, are they functional, behavioral, or formal models
or combinations of those.
The blue-c will leverage telepresence and virtual meetings to a new
dimension of immersion. We will investigate the usability and
performance of the prototype in selected applications including
architecture, mechanical design, and medicine.
Our group is developing algorithms for multiple camera
self-calibration, real-time segmentation, progressive silhouette
extraction, and interpretation of natural human gestures.
|
|
| CogViSys, a virtual commentator for video sequences |
|
Participants: Philipp Zehnder, Luc Van Gool
Extended Information: Project Homepage
Objective: The goal of this project is to build a virtual commentator for video sequences. This means building a vision system that is able to translate visual information into a textual description, i.e. a system that can understand and tell what is happening in a specific video sequence. In particular we are working with content from situation comedies (sitcoms). This has the advantage of representing a quasi-closed world: Usually there is a rather small number of characters and only a few different sets, thus making the recognition task simpler. Nevertheless it is intended keep the overall framework general, so that it can easily be transferred to other tasks.
The project involves different levels of complexity in the field of computer vision and artificial intelligence. They may be roughly stated as follows:
- State-of-the-art cue integration, so that the more cognitive processes can start from a firm basis.
- Recognition and tracking of objects, motions and environments. Here the main focus lies on categorization rather than identification of specific instantiations.
- Understanding and interpretation of the information coming from the lower levels. This is the semantic layer of the system and includes the investigation of techniques to express knowledge and reasoning.
There are two main applications for the virtual commentator. The first one is indexing of video content. The system should annotate film sequences in the manner of a visual database. Based on this it should be possible to issue visual search operations (vgrep) like "Find all scenes where character John appears". The other application is to provide visually impaired viewers with a description of the visual content in order to augment the sound track. An example would be: "Elaine and Kramer have walked out of Seinfeld's apartment and are talking in the corridor."
|
|
| VITOS - Virtual Touchscreen within the Miniaturized Wearable Computing Project |
|
Participants: Esther Koller-Meier, Luc Van Gool
| Partner: |
Electronics Laboratory, ETH Zürich |
|
Computer Engineering and Networks Laboratory, ETH Zürich
Perceptual Computing and Computer Vision, ETH Zürich
History of Technology, ETH Zürich
|
Extended Information: Project Homepage
Objective: Hand gestures receive increasing interest for the interaction
between a user and a wearable system. The user should be able
to command the system through simple, intuitive gestures. The
recognition tool will pick up hand and finger motions seen by
a camera. The hand movement will mainly be used to activate
different functions while the finger motion is applied to drive
the mouse visible on the display.
The proposed system has to find respectively track the finger
and the hand in an image sequence. Furthermore, the hand movements
have to be distinguished between a number of predefined gestures
by classifying the tracked trajectories.
|
|
|
|
Participants:Petr Doubek, Luc Van Gool
Extended Information: Project Homepage
Objective: Track humans inside a room, recognize their actions,
describe the actions, provide the best view.
ViRoom is a room with multiple cameras. Our goal is to create
a system which detects and tracks humans in this room, recognizes
and stores descriptions of their actions, and selects the best
viewpoint for this actions and generates new view from a virtual
camera if necessary. We do not want to restrict the system to one
particular room with a specific arrangement. We would like to be
able to turn any room into ViRoom just by setting up the cameras.
Some of the possible tasks for ViRoom are: making training videos,
automated training, tele-teaching.
|
|
Top
texture analysis and synthesis
The following projects are part of the
texture research area:
| CogViSys, Cognitive Vision Systems |
|
Participants: Alexey Zalesny, Luc Van Gool
Extended Information: Project Homepage
Objective: CogViSys aims at developing a virtual commentator,
which is able to translate visual information into
a textual description.
ETH aims at developing a texture understanding system,
which will be able to recognize the materials given their
images under different viewing and illumination directions.
The first step is the analysis. The sequence of images
of the material under consideration together with the appropriate
viewpoint and illumination information are the input of the
analysis procedure. The result of it is a so-called multiview
texture model, which contains structural and statistical information
about interdependencies of pixels for the variety of material
appearances. The analysis must be fulfilled for every type
of the material that is of interest in the current application.
Thus, the output of the analysis stage is the database of the
multiview models of different materials.
The second step is the
classification. The texture model database is one input of the
classifier. The another input is the textured image or images of the
same material to be classified and maybe the specific appearance
information of those images. The goal of the classifier is to select
the model from the database, which best explains the input images.
Thus, the output is the name of the material or the rejection from
its recognition. The criterion of the model expressibility could
be its ability to synthesize the texture that is visually similar
to the analyzed one. ETH investigated such synthesis models based
on the statistical description of image including viewpoint
dependency (see Figure with real and synthetic tangerines and banana
covered with the tangerine skin). The algorithm of model creation
must be adopted now for the classification purposes.
|
|
Top
© 1999-2012
by
ETH Zurich |
|
July 11, 2008
|