This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Search for Publication

Year(s) from:  to 
Keywords (separated by spaces):

Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

Naoya Takahashi, Tofigh Naghibi, Beat Pfister
San Fransisco, September 2016, in press


Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

Link to publisher's page
  author = {Naoya Takahashi and Tofigh Naghibi and Beat Pfister},
  title = {Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks},
  booktitle = {INTERSPEEH},
  year = {2016},
  month = {September},
  keywords = {},
  note = {in press}