Project Reference: | ITP/052/19LP |
Project Title: | Scenario ConsciousText-to-Speech Synthesis |
Hosting Institution: | LSCM R&D Centre (LSCM) |
Abstract: | Text-to-speech (TTS) synthesis has a wide range of applications, such as using chatbot application to communicate with the elderly to provide care, helping visual impaired persons with different kinds of alerting and sensor information; making public announcement automatically, etc. Given the advancing TTS technology, it is common for organizations to generate speeches using software and reduce the need to hire voice talent. However, even with the advancement in TTS, the generated speech can be fluent but still toneless. Researches have shown that most of the human communication perceptions come from non-verbal expression including voice tone and intensity[1][2]. For instance, it was noted that what a robot says and what tone of voice it uses can impact whether the user feel the interaction with robot encouraging or boring [3]. In this project, we will focus on learning different speech’s characteristic according to different scenarios. We expect our research can generate speech with corresponding attributes such as locale, pitch, speed and style that can match the scenarios. Our research and development undertakings involve: (1) the construction of a machine learning model to learn about the different scenario embeddings in audios; (2) build a scenario conscious TTS engine which includes scenario embeddings. [1] Laplante, D., & Ambady, N. 2003. On How Things Are Said: Voice Tone, Voice Intensity, Verbal Content, and Perceptions of Politeness. Journal of Language and Social Psychology, 22(4), 434–441. [2] Philip Yaffe. 2011. The 7% rule: fact, fiction, or misunderstanding. Ubiquity 2011, October, Article 1 (October 2011) [3] Maja Matarić, How to Build Robots People Can Relate To, The Wall Street Journal |
Project Coordinator: | Dr Chung-Dak Shum |
Approved Funding Amount: | HK$2.79M |
Project Period: | 01 Feb 2020 - 31 Jan 2021 |