This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Search for Publication

Year(s) from:  to 
Keywords (separated by spaces):

Hough Transform-based Mouth Localization for Audio-Visual Speech Recognition

G. Fanelli, J. Gall and L. Van Gool
British Machine Vision Conference
September 2009


We present a novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy. While facial feature points like mouth corners or lip contours are commonly used to estimate at least scale, position, and orientation of the mouth, we propose a Hough transform-based method. Instead of relying on a predefined sparse subset of mouth features, it casts probabilistic votes for the mouth center from several patches in the neighborhood and accumulates the votes in a Hough image. This makes the localization more robust as it does not rely on the detection of a single feature. In addition, we exploit the different shape properties of eyes and mouth in order to localize the mouth more efficiently. Using the rotation invariant representation of the iris, scale and orientation can be efficiently inferred from the localized eye positions. The superior accuracy of our method and quantitative improvements for audio-visual speech recognition over monomodal approaches are demonstrated on two datasets.

Download in pdf format
  author = {G. Fanelli and J. Gall and L. Van Gool},
  title = {Hough Transform-based Mouth Localization for Audio-Visual Speech Recognition},
  booktitle = {British Machine Vision Conference},
  year = {2009},
  month = {September},
  keywords = {Hough transform, speech recognition, face tracking, eye detection, mouth localization}