Speech Processing for IP Networks: Media Resource Control Protocol (Hardcover)
暫譯: IP 網路語音處理:媒體資源控制協議 (精裝版)

David Burke

  • 出版商: Wiley
  • 出版日期: 2007-04-01
  • 定價: $3,980
  • 售價: 9.5$3,781
  • 語言: 英文
  • 頁數: 368
  • 裝訂: Hardcover
  • ISBN: 0470028343
  • ISBN-13: 9780470028346
  • 相關分類: HTTPIPV6XML
  • 立即出貨

買這商品的人也買了...

相關主題

商品描述

 

Description

Media Resource Control Protocol (MRCP) is a new IETF protocol, providing a key enabling technology that eases the integration of speech technologies into network equipment and accelerates their adoption resulting in exciting and compelling interactive services to be delivered over the telephone.  MRCP leverages IP telephony and Web technologies such as SIP, HTTP, and XML (Extensible Markup Language) to deliver an open standard, vendor-independent, and versatile interface to speech engines. 

Speech Processing for IP Networks brings these technologies together into a single volume, giving the reader a solid technical understanding of the principles of MRCP, how it leverages other protocols and specifications for its operation, and how it is applied in modern IP-based telecommunication networks.  Focusing on the MRCPv2 standard developed by the IETF SpeechSC Working Group, this book will also provide an overview of its precursor, MRCPv1.

Speech Processing for IP Networks:

  • Gives a complete background on the technologies required by MRCP to function, including SIP (Session Initiation Protocol), RTP (Real-time Transport Protocol), and HTTP (Hypertext Transfer Protocol).
  • Covers relevant W3C data representation formats including Speech Synthesis Markup Language (SSML), Speech Recognition Grammar Specification (SRGS), Semantic Interpretation for Speech Recognition (SISR), and Pronunciation Lexicon Specification (PLS).
  • Describes VoiceXML - the leading approach for programming cutting-edge speech applications and a key driver to the development of many of MRCP’s features.
  • Explains advanced topics such as VoiceXML and MRCP interworking.

This text will be an invaluable resource for technical managers, product managers, software developers, and technical marketing professionals working for network equipment manufacturers, speech engine vendors, and network operators. Advanced students on computer science and engineering courses will also find this to be a useful guide.

 

Table of Contents

 
 

PART I. BACKGROUND.

1. Introduction.

1.1 Introduction to Speech Applications.

1.2 The MRCP Value Proposition.

1.3 History of MRCP Standardisation.

1.3.1 Internet Engineering Task Force.

1.3.2 World Wide Web Consortium.

1.3.3 MRCP: From Humble Beginnings Toward IETF Standard.

1.4 Summary.

2. Basic Principles of Speech Processing.

2.1 Human Speech Production.

2.1.1 Speech Sounds: Phonemics and Phonetics.

2.2 Speech Recognition.

2.2.1 Endpoint Detection.

2.2.2 Mel-Cepstrum.

2.2.3 Hidden Markov Models.

2.2.4 Language Modelling.

2.3 Speaker Verification and Identification.

   2.3.1 Feature Extraction.

   2.3.2 Statistical Modelling.

2.4 Speech Synthesis.

2.4.1 Front-end Processing.

2.4.2 Back-end Synthesis.

2.5 Summary.

3. Overview of MRCP.

3.1 Architecture.

3.2 Media Resource Types.

3.3 Network Scenarios.

3.3.1 VoiceXML IVR Service Node.

3.3.2 IP PBX with Voicemail.

3.3.3 Advanced Media Gateway.

3.4 Protocol Operation.

3.4.1 Establishing Communication Channels.

3.4.2 Controlling a Media Resource.

3.4.3 Walkthrough Examples.

3.5 Security.

3.6 Summary.

PART II. MEDIA AND CONTROL SESSIONS.

4. Session Initiation Protocol.

4.1 Introduction.

4.2 Walkthrough Example.

4.3 SIP URIs.

4.4 Transport.

4.5 Media Negotiation.

4.5.1 Session Description Protocol.

4.5.2 Offer/Answer Model.

4.6 SIP Servers.

4.6.1 Registrars.

4.6.2 Proxy Servers.

4.6.3 Redirect Servers.

4.7 SIP Extensions.

4.7.1 Capability Discovery.

4.8 Security.

4.8.1 Transport and Network Layer Security.

4.8.2 Authentication.

4.8.3 S/MIME.

4.9 Summary.

5. Session Initiation in MRCP.

5.1 Introduction.

5.2 Initiating the Media Session.

5.3 Initiating the Control Session.

5.4 Session Initiation Examples.

5.4.1 Single Media Resource.

5.4.2 Adding and Removing Media Resources.

5.4.3 Distributed Media Source/Sink.

5.5 Locating Media Resource Servers.

5.5.1 Requesting Server Capabilities.

5.5.2 Media Resource Brokers.

5.6 Security.

5.7 Summary.

6. The Media Session.

6.1 Media Encoding.

6.1.1 Pulse Code Modulation (PCM).

6.1.2 Linear Predictive Coding (LPC).

6.2 Media Transport.

6.2.1 Real-Time Protocol (RTP).

6.2.2 DTMF.

6.3 Security.

6.4 Summary.

7. The Control Session.

7.1 Message Structure.

7.1.1 Request Message.

7.1.2 Response Message.

7.1.3 Event Message.

7.1.4 Message Bodies.

7.2 Generic Methods.

7.3 Generic Headers.

7.4 Security.

7.5 Summary.

PART III. DATA REPRESENTATION FORMATS.

8. Speech Synthesis Markup Language (SSML).

8.1 Introduction.

8.2 Document Structure.

8.3 Recorded Audio.

8.4 Pronunciation.

8.4.1 Phonemic/Phonetic Content.

8.4.2 Substitution.

8.4.3 Interpreting Text .

8.5 Prosody.

8.5.1 Prosodic Boundaries.

8.5.2 Emphasis.

8.5.3 Speaking Voice.

8.5.4 Prosodic Control.

8.6 Markers .

8.7 Metadata.

8.8 Summary.

9. Speech Recognition Grammar Specification (SRGS).

9.1 Introduction.

9.2 Document Structure.

9.3 Rules, Tokens, and Sequences.

9.4 Alternatives.

9.5 Rule References.

9.5.1 Special Rules.

9.6 Repeats.

9.7 DTMF Grammars.

9.8 Semantic Interpretation.

9.8.1 Semantic Literals.

9.8.2 Semantic Scripts.

9.9 Summary.

10. Natural Language Semantics Markup Language (NLSML).

10.1 Introduction.

10.2 Document Structure.

10.3 Speech Recognition Results.

10.3.1 Serialising Semantic Interpretation Results.

10.4 Voice Enrollment Results.

10.5 Speaker Verification Results.

10.6 Summary.

11. Pronunciation Lexicon Specification (PLS).

11.1 Introduction.

11.2 Document Structure.

11.3 Lexical Entries.

11.4 Abbreviations and Acronyms.

11.5 Multiple Orthographies.

11.6 Multiple Pronunciations.

11.7 Summary.

PART IV. MEDIA RESOURCES.

12. Speech Synthesiser Resource.

12.1 Overview.

12.2 Methods.

12.2.1 SPEAK.

12.2.2 PAUSE.

12.2.3 RESUME.

12.2.4 STOP.

12.2.5 BARGE-IN-OCCURRED.

12.2.6 CONTROL.

12.2.7 DEFINE-LEXICON.

12.3 Events.

12.3.1 SPEECH-MARKER.

12.3.2 SPEAK-COMPLETE.

12.4 Headers.

12.5 Summary.

13. Speech Recogniser Resource.

13.1 Overview.

13.2 Recognition Methods.

13.2.1 RECOGNIZE.

13.2.2 DEFINE-GRAMMAR.

13.2.3 START-INPUT-TIMERS.

13.2.4 GET-RESULT.

13.2.5 STOP.

13.2.6 INTERPRET.

13.3 Enrollment Methods.

13.3.1 START-PHRASE-ENROLLMENT.

13.3.2 ENROLLMENT-ROLLBACK.

13.3.3 END-PHRASE-ENROLLMENT.

13.3.4 MODIFY-PHRASE.

13.3.5 DELETE-PHRASE.

13.4 Events.

13.4.1 START-OF-INPUT.

13.4.2 RECOGNITION-COMPLETE.

13.4.3 INTERPRETATION-COMPLETE.

13.5 Recognition Headers.

13.6 Enrollment Headers.

13.7 Summary.

14. Recorder Resource.

14.1 Overview.

14.2 Methods.

14.2.1 RECORD.

14.2.2 START-INPUT-TIMERS.

14.2.3 STOP.

14.3 Events.

14.3.1 START-OF-INPUT.

14.3.2 RECORD-COMPLETE.

14.4 Headers.

14.5 Summary.

15. Speaker Verification Resource.

15.1 Overview.

15.2 Methods.

15.2.1 START-SESSION.

15.2.2 END-SESSION.

15.2.3 VERIFY.

15.2.4 VERIFY-FROM-BUFFER.

15.2.5 VERIFY-ROLLBACK.

15.2.6 START-INPUT-TIMERS.

15.2.7 GET-INTERMEDIATE-RESULT.

15.2.8 STOP.

15.2.9 CLEAR-BUFFER.

15.2.10 QUERY-VOICEPRINT.

15.2.11 DELETE-VOICEPRINT.

15.3 Events.

15.3.1 START-OF-INPUT.

15.3.2 VERIFICATION-COMPLETE.

15.4 Headers.

15.5 Summary.

PART V. PROGRAMMING SPEECH APPLICATIONS.

16. Voice eXtensible Markup Language (VoiceXML).

16.1 Introduction.

16.2 Document Structure.

16.2.1 Applications and Dialogs.

16.3 Dialogs.

16.3.1 Forms.

16.3.2 Menus.

16.3.3 Mixed Initiative Dialogs.

16.4 Media Playback.

16.5 Media Recording.

16.6 Speech and DTMF Recognition.

16.6.1 Specifying Grammars.

16.6.2 Grammar Scope and Activation.

16.6.3 Configuring Recognition Settings.

16.6.4 Processing Recognition Results.

16.7 Flow Control.

16.7.1 Executable Content.

16.7.2 Variables, Scopes, and Expressions.

16.7.3 Document and Dialog Transitions .

16.7.4 Event Handling.

16.8 Resource Fetching.

16.9 Call Transfer.

16.10 Summary.

17. VoiceXML and MRCP Interworking.

17.1 Introduction.

17.2 Interworking Fundamentals.

17.2.1 Play Prompts.

17.2.2 Play and Recognise.

17.2.3 Record.

17.3 Application Example.

17.3.1 VoiceXML Scripts.

17.3.2 MRCP Flows.

17.4 Summary.

Appendix A. MRCP Version 1.

A.1 Overview.

A.2 Session Management and Message Transport.

A.3 General Protocol Details.

A.4 Speech Synthesiser Resource.

A.5 Speech Recogniser Resource.

Appendix B. XML Primer.

B.1 Background.

B.2 Basic Concepts.

B.3 Namespaces.

B.4 Document Schemas.

Appendix C. HTTP Primer.

C.1 Background.

C.2 Basic Concepts.

C.2.1 GET Method.

C.2.2 POST Method.

C.3 Caching.

C.4 Cookies.

C.5 Security.

References.

Index.

Acronyms.

商品描述(中文翻譯)

描述

媒體資源控制協議 (Media Resource Control Protocol, MRCP) 是一種新的 IETF 協議,提供了一項關鍵的啟用技術,簡化了語音技術與網路設備的整合,並加速其採用,從而提供令人興奮且引人注目的互動服務,這些服務可以透過電話傳遞。MRCP 利用 IP 語音和網頁技術,如 SIP、HTTP 和 XML(可擴展標記語言),提供一個開放標準、供應商獨立且多功能的語音引擎介面。

《IP 網路的語音處理》將這些技術整合成一本書,讓讀者對 MRCP 的原則有堅實的技術理解,了解它如何利用其他協議和規範進行操作,以及它在現代基於 IP 的電信網路中的應用。該書專注於 IETF SpeechSC 工作組開發的 MRCPv2 標準,並將提供其前身 MRCPv1 的概述。

《IP 網路的語音處理》:

- 提供 MRCP 運作所需技術的完整背景,包括 SIP(會話啟動協議)、RTP(實時傳輸協議)和 HTTP(超文本傳輸協議)。
- 涵蓋相關的 W3C 數據表示格式,包括語音合成標記語言(SSML)、語音識別語法規範(SRGS)、語音識別的語義解釋(SISR)和發音詞典規範(PLS)。
- 描述 VoiceXML - 用於編程尖端語音應用的主要方法,並且是許多 MRCP 功能發展的關鍵驅動力。
- 解釋進階主題,如 VoiceXML 和 MRCP 的互操作性。

本書將成為網路設備製造商、語音引擎供應商和網路運營商的技術經理、產品經理、軟體開發人員和技術行銷專業人士的重要資源。計算機科學和工程課程的高年級學生也會發現這是一本有用的指南。

目錄

第一部分 背景

1. 介紹
1.1 語音應用介紹
1.2 MRCP 的價值主張
1.3 MRCP 標準化歷史
1.3.1 網際網路工程任務組
1.3.2 全球資訊網聯盟
1.3.3 MRCP:從謙卑的開始到 IETF 標準
1.4 總結

2. 語音處理的基本原則
2.1 人類語音產生
2.1.1 語音聲音:音位學和語音學
2.2 語音識別
2.2.1 端點檢測
2.2.2 Mel-倒譜
2.2.3 隱馬可夫模型
2.2.4 語言建模
2.3 語者驗證和識別
2.3.1 特徵提取
2.3.2 統計建模
2.4 語音合成
2.4.1 前端處理
2.4.2 後端合成
2.5 總結

3. MRCP 概述
3.1 架構
3.2 媒體資源類型
3.3 網路場景
3.3.1 VoiceXML IVR 服務節點
3.3.2 帶語音信箱的 IP PBX
3.3.3 先進的媒體閘道
3.4 協議操作
3.4.1 建立通信通道
3.4.2 控制媒體資源
3.4.3 實作範例
3.5 安全性
3.6 總結

第二部分 媒體和控制會話

4. 會話啟動協議
4.1 介紹
4.2 實作範例
4.3 SIP URI
4.4 傳輸
4.5 媒體協商
4.5.1 會話描述協議
4.5.2 提供/回答模型
4.6 SIP 伺服器
4.6.1 註冊伺服器
4.6.2 代理伺服器
4.6.3 重新導向伺服器
4.7 SIP 擴展
4.7.1 能力發現
4.8 安全性
4.8.1 傳輸和網路層安全
4.8.2 認證
4.8.3 S/MIME
4.9 總結

5. MRCP 中的會話啟動
5.1 介紹
5.2 啟動媒體會話
5.3 啟動控制會話
5.4 會話啟動範例
5.4.1 單一媒體資源
5.4.2 添加和移除媒體資源
5.4.3 分散式媒體來源/接收器
5.5 定位媒體資源伺服器
5.5.1 請求伺服器能力
5.5.2 媒體資源經紀人
5.6 安全性
5.7 總結

6. 媒體會話
6.1 媒體編碼
6.1.1 脈衝編碼調變 (PCM)
6.1.2 線性預測編碼 (LPC)
6.2 媒體傳輸
6.2.1 實時協議 (RTP)
6.2.2 DTMF
6.3 安全性
6.4 總結

7. 控制會話
7.1 訊息結構
7.1.1 請求訊息
7.1.2 回應訊息
7.1.3 事件訊息
7.1.4 訊息主體
7.2 通用方法
7.3 通用標頭
7.4 安全性
7.5 總結

第三部分 數據表示格式

8. 語音合成標記語言 (SSML)
8.1 介紹
8.2 文件結構
8.3 錄製音頻
8.4 發音
8.4.1 音位/語音內容
8.4.2 替代
8.4.3 解釋文本
8.5 音韻
8.5.1 音韻邊界
8.5.2 強調
8.5.3 說話聲音
8.5.4 音韻控制
8.6 標記
8.7 元數據
8.8 總結

9. 語音識別語法規範 (SRGS)
9.1 介紹
9.2 文件結構
9.3 規則、標記