The National Center of Competence in Research (NCCR) on Interactive
Multimodal Information Management, in brief (IM)2, is aimed at the
advancement of research, and the development of prototypes, in the field
of man-machine interaction. The NCCR is particularly concerned with
technologies coordinating natural input modes (such as speech, image,
pen, touch, hand gestures, head and/or body movements, and even
physiological sensors) with multimedia system outputs, such as speech,
sounds, images, 3D graphics and animation.
In the first phase (2004-2006) the Computer Vision Laboratory at ETH Zurich was involved in the
Workpackage on Scene Analysis of this
project. We have been working on several issues:
- grouping based on geometric regularities
- a multi-feature based tracker
- a hand tracker
- person detection and tracking
- gesture analysis
In the current phase (starting 2006) IM2 addresses the following themes:
- Multimodal input interface: including speech signal processing (natural speech recognition, speaker tracking, segmentation, and recognition) and visual input (e.g., shape tracking, face and gesture recognition, printed document processing and handwriting recognition).
- Integration of modalities and coordination among modalities, including (asynchronous) multi-channel processing (e.g., audio-visual tracking), integration of knowledge sources (expert fusion), and multimodal language modeling.
- Meeting dynamics and human-human interaction modeling, including the definition of meeting scenarios, analysing human interaction and multimodal dialogue modeling.
- Content abstraction, including multimodal information indexing, summarizing, and retrieval.
- Technology transfer through exploration and evaluation of advanced end-user applications, evaluating the advantages and drawbacks of the above functionalities in different prototype systems.
Dr. Beat Fasel,
Dr. Philipp Zehnder,
Dr. Esther Koller-Meier,
Prof. Luc Van Gool,
Dr. Tobias Jaeggli,
Dr. Till Quack,
Dr. Gabriele Fanelli