Human-like Singing and Talking Machines
Listen now
Description
Human-like Singing and Talking Machines: Flexible Speech Synthesis in Karaoke, Anime, Smart Phones, Video Games, Digital Signage, TV and Radio Programs This talk will give an overview of statistical approach to flexible speech synthesis. For constructing human-like talking machines, speech synthesis systems are required to have an ability to generate speech with arbitrary speaker's voice, various speaking styles in different languages, varying emphasis and focus, and/or emotional expressions. The main advantage of the statistical approach is that such flexibility can easily be realized using mathematically well-defined algorithms. In this talk, the system architecture is outlined and then recent results and demos will be presented. Keiichi Tokuda is a Professor in the Department of Computer Science at Nagoya Institute of Technology and currently he is visiting Google on sabbatical. He is also an Honorary Professor at the University of Edinburgh. He was an Invited Researcher at the National Institute of Information and Communications Technology (NICT), formally known as the ATR Spoken Language Communication Research Laboratories, Kyoto, Japan from 2000 to 2013, and was a Visiting Researcher at Carnegie Mellon University from 2001 to 2002. He has been working on statistical parametric speech synthesis after he proposed an algorithm for speech parameter generation from HMM in 1995. He received six paper awards and two achievement awards. He is an IEEE Fellow and an ISCA Fellow.
More Episodes
The confluence of sensing, communication and computing technologies is allowing capture and access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. These include data that afford the analysis and interpretation of multimodal cues of verbal and...
Published 03/08/16
Clarification in Spoken Dialogue Systems such as in mobile applications often consists of simple requests to “Please repeat” or “Please rephrase” when the system fails to understand a word or phrase. However, human-human dialogues rarely include such questions. When humans ask for clarification...
Published 05/28/15