Neural Text-To-Speech Synthesis

Tan, Xu

  • 出版商: Springer
  • 出版日期: 2024-07-18
  • 售價: $5,950
  • 貴賓價: 9.5$5,653
  • 語言: 英文
  • 頁數: 201
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 9819908299
  • ISBN-13: 9789819908295
  • 海外代購書籍(需單獨結帳)

商品描述

Text-to-speech (TTS) aims to synthesize intelligible and natural speech based on the given text. It is a hot topic in language, speech, and machine learning research and has broad applications in industry. This book introduces neural network-based TTS in the era of deep learning, aiming to provide a good understanding of neural TTS, current research and applications, and the future research trend.

This book first introduces the history of TTS technologies and overviews neural TTS, and provides preliminary knowledge on language and speech processing, neural networks and deep learning, and deep generative models. It then introduces neural TTS from the perspective of key components (text analyses, acoustic models, vocoders, and end-to-end models) and advanced topics (expressive and controllable, robust, model-efficient, and data-efficient TTS). It also points some future research directions and collects some resources related to TTS.

This book is the first to introduceneural TTS in a comprehensive and easy-to-understand way and can serve both academic researchers and industry practitioners working on TTS.


商品描述(中文翻譯)

文本轉語音(TTS)旨在根據給定的文本合成可理解且自然的語音。這是一個在語言、語音和機器學習研究中熱門的主題,並在產業中有廣泛的應用。本書介紹了基於神經網絡的 TTS,特別是在深度學習時代,旨在提供對神經 TTS、當前研究與應用以及未來研究趨勢的良好理解。

本書首先介紹了 TTS 技術的歷史,概述了神經 TTS,並提供了語言和語音處理、神經網絡和深度學習以及深度生成模型的初步知識。接著,從關鍵組件(文本分析、聲學模型、編碼器和端到端模型)和進階主題(表現力和可控性、穩健性、模型效率和數據效率的 TTS)來介紹神經 TTS。它還指出了一些未來的研究方向,並收集了一些與 TTS 相關的資源。

本書是第一本以全面且易於理解的方式介紹神經 TTS 的書籍,適合從事 TTS 研究的學術研究者和業界實踐者。

作者簡介

Xu Tan is a Principal Researcher and Research Manager at Microsoft Research Asia. His research interests cover deep learning and its applications in language/speech/music processing and digital human creation. He has rich research experience in text-to-speech synthesis. He has developed high-quality TTS systems such as FastSpeech 1/2 (widely used in the TTS community), DelightfulTTS (winning the champion of the Blizzard TTS Challenge), and NaturalSpeech (achieving human-level quality on the TTS benchmark dataset), and transferred many research works to improve the experience of Microsoft Azure TTS services. He has given a series of tutorials on TTS at top conferences such as IJCAI, ICASSP, and INTERSPEECH, and written a comprehensive survey paper on TTS.

Besides speech synthesis, he has designed several popular language models (e.g., MASS) and AI music systems (e.g., Muzic), developed machine translation systems that achieved human parity in Chinese-English translation and won several champions in WMT machine translation competitions. He has published over 100 papers at prestigious conferences such as ICML, NeurIPS, ICLR, AAAI, IJCAI, ACL, EMNLP, NAACL, ICASSP, INTERSPEECH, KDD, and IEEE/ACM Transactions, and served as the area chair or action editor of some AI conferences and journals (e.g., NeurIPS, AAAI, ICASSP, TMLR).


作者簡介(中文翻譯)

徐坦(Xu Tan)是微軟亞洲研究院的首席研究員和研究經理。他的研究興趣涵蓋深度學習及其在語言/語音/音樂處理和數位人類創造中的應用。他在文本到語音合成(text-to-speech synthesis)方面擁有豐富的研究經驗。他開發了高品質的 TTS 系統,如 FastSpeech 1/2(在 TTS 社群中廣泛使用)、DelightfulTTS(贏得 Blizzard TTS Challenge 冠軍)和 NaturalSpeech(在 TTS 基準數據集上達到人類水平的品質),並轉移了許多研究成果以改善微軟 Azure TTS 服務的體驗。他在 IJCAI、ICASSP 和 INTERSPEECH 等頂尖會議上提供了一系列 TTS 教學,並撰寫了一篇全面的 TTS 調查論文。

除了語音合成,他還設計了幾個受歡迎的語言模型(例如 MASS)和 AI 音樂系統(例如 Muzic),開發了在中英翻譯中達到人類水平的機器翻譯系統,並在 WMT 機器翻譯競賽中贏得了幾個冠軍。他在 ICML、NeurIPS、ICLR、AAAI、IJCAI、ACL、EMNLP、NAACL、ICASSP、INTERSPEECH、KDD 和 IEEE/ACM Transactions 等知名會議上發表了超過 100 篇論文,並擔任一些 AI 會議和期刊的領域主席或行動編輯(例如 NeurIPS、AAAI、ICASSP、TMLR)。