This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Search for Publication

Year(s) from:  to 
Keywords (separated by spaces):

3D Vision Technology for Capturing Multimodal Corpora: Chances and Challenges

G. Fanelli, J. Gall, H. Romsdorfer, T.Weise, L. Van Gool
LREC WS on Multimodal Corpora
Malta, May 2010


Data annotation is the most labor-intensive part for the acquisition of a multimodal corpus. 3D vision technology can ease the annotation process, especially when continuous surface deformations need to be extracted accurately and consistently over time. In this paper, we give an example use of such technology, namely the acquisition of an audio-visual corpus comprising detailed dynamic face geometry, transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. By means of the example, we will discuss the advantages and challenges of integrating non-invasive 3D vision capture techniques into a setup for recording multimodal data.

Download in pdf format
  author = {G. Fanelli and J. Gall and H. Romsdorfer and T.Weise and L. Van Gool},
  title = {3D Vision Technology for Capturing Multimodal Corpora: Chances and Challenges},
  booktitle = {LREC WS on Multimodal Corpora},
  year = {2010},
  month = {May},
  keywords = {}