Discrete-Time Speech Signal Processing: Principles and Practice (IE-Hardcover)
暫譯: 離散時間語音信號處理:原則與實踐 (IE-精裝本)

Thomas F. Quatieri

買這商品的人也買了...

相關主題

商品描述

Essential principles, practical examples, current applications, and leading-edge research.

In this book, Thomas F. Quatieri presents the field's most intensive, up-to-date tutorial and reference on discrete-time speech signal processing. Building on his MIT graduate course, he introduces key principles, essential applications, and state-of-the-art research, and he identifies limitations that point the way to new research opportunities.

Quatieri provides an excellent balance of theory and application, beginning with a complete framework for understanding discrete-time speech signal processing. Along the way, he presents important advances never before covered in a speech signal processing text book, including sinusoidal speech processing, advanced time-frequency analysis, and nonlinear aeroacoustic speech production modeling. Coverage includes:

  • Speech production and speech perception: a dual view
  • Crucial distinctions between stochastic and deterministic problems
  • Pole-zero speech models
  • Homomorphic signal processing
  • Short-time Fourier transform analysis/synthesis
  • Filter-bank and wavelet analysis/synthesis
  • Nonlinear measurement and modeling techniques

The book's in-depth applications coverage includes speech coding, enhancement, and modification; speaker recognition; noise reduction; signal restoration; dynamic range compression, and more. Principles of Discrete-Time Speech Processing also contains an exceptionally complete series of examples and Matlab exercises, all carefully integrated into the book's coverage of theory and applications.

Table of Contents

1. Introduction.

Discrete-Time Speech Signal Processing. The Speech Communication Pathway. Analysis/Synthesis Based on Speech Production and Perception. Applications. Outline of Book.


2. A Discrete-Time Signal Processing Framework.

Discrete-Time Signals. Discrete-Time Systems. Discrete-Time Fourier Transform. Uncertainty Principle. z-Transform. LTI Systems in the Frequency Domain. Properties of LTI Systems. Time-Varying Systems. Discrete-Fourier Transform. Conversion of Continuous Signals and Systems to Discrete Time.


3. Production and Classification of Speech Sounds.

Anatomy and Physiology of Speech Production. Spectrographic Analysis of Speech. Categorization of Speech Sounds. Prosody: The Melody of Speech. Speech Perception.


4. Acoustics of Speech Production.

Physics of Sound. Uniform Tube Model. A Discrete-Time Model Based on Tube Concatenation. Vocal Fold/Vocal Tract Interaction.


5. Analysis and Synthesis of Pole-Zero Speech Models.

Time-Dependent Processing. All-Pole Modeling of Deterministic Signals. Linear Prediction Analysis of Stochastic Speech Sounds. Criterion of “Goodness” . Synthesis Based on All-Pole Modeling. Pole-Zero Estimation. Decomposition of the Glottal Flow Derivative.


 

Appendix 5.A: Properties of Stochastic Processes.

Random Processes. Ensemble Averages. Stationary Random Process. Time Averages. Power Density Spectrum.


 

Appendix 5.B: Derivation of the Lattice Filter in Linear Prediction Analysis.

6. Homomorphic Signal Processing.

Concept. Homomorphic Systems for Convolution. Complex Cepstrum of Speech-Like Sequences. Spectral Root Homomorphic Filtering. Short-Time Homomorphic Analysis of Periodic Sequences. Short-Time Speech Analysis. Analysis/Synthesis Structures. Contrasting Linear Prediction and Homomorphic Filtering.


7. Short-Time Fourier Transform Analysis and Synthesis.

Short-Time Analysis. Short-Time Synthesis. Short-Time Fourier Transform Magnitude. Signal Estimation from the Modified STFT or STFTM. Time-Scale Modification and Enhancement of Speech.


 

Appendix 7.A: FBS Method with Multiplicative Modification.

8. Filter-Bank Analysis/Synthesis.

Revisiting the FBS Method. Phase Vocoder. Phase Coherence in the Phase Vocoder. Constant-Q Analysis/Synthesis. Auditory Modeling.


9. Sinusoidal Analysis/Synthesis.

Sinusoidal Speech Model. Estimation of Sinewave Parameters. Synthesis. Source/Filter Phase Model. Additive Deterministic-Stochastic Model.


 

Appendix 9.A: Derivation of the Sinewave Model.

Appendix 9.B: Derivation of Optimal Cubic Phase Parameters.

10. Frequency-Domain Pitch Estimation.

A Correlation-Based Pitch Estimator. Pitch Estimation Based on a “Comb Filter<170. Pitch Estimation Based on a Harmonic Sinewave Model. Glottal Pulse Onset Estimation. Multi-Band Pitch and Voicing Estimation.


11. Nonlinear Measurement and Modeling Techniques.

The STFT and Wavelet Transform Revisited. Bilinear Time-Frequency Distributions. Aeroacoustic Flow in the Vocal Tract. Instantaneous Teager Energy Operator.


12. Speech Coding.

Statistical Models of Speech. Scaler Quantization. Vector Quantization (VQ). Frequency-Domain Coding. Model-Based Coding. LPC Residual Coding.


13. Speech Enhancement.

Introduction. Preliminaries. Wiener Filtering. Model-Based Processing. Enhancement Based on Auditory Masking.


 

Appendix 13.A: Stochastic-Theoretic parameter Estimation.

14. Speaker Recognition.

Introduction. Spectral Features for Speaker Recognition. Speaker Recognition Algorithms. Non-Spectral Features in Speaker Recognition. Signal Enhancement for the Mismatched Condition. Speaker Recognition from Coded Speech.


 

Appendix 14.A: Expectation-Maximization (EM) Estimation.

Glossary.
Speech Signal Processing.
Units.
Databases.
Index.
About the Author.

商品描述(中文翻譯)

基本原則、實用範例、當前應用及前沿研究。

在這本書中,Thomas F. Quatieri 提供了該領域最深入、最新的離散時間語音信號處理教程和參考資料。基於他的麻省理工學院研究生課程,他介紹了關鍵原則、基本應用和最先進的研究,並指出了限制,指引新的研究機會。

Quatieri 在理論和應用之間提供了極好的平衡,首先建立了理解離散時間語音信號處理的完整框架。在此過程中,他介紹了語音信號處理教科書中從未涵蓋的重要進展,包括正弦波語音處理、高級時間-頻率分析和非線性氣動聲學語音生成建模。內容涵蓋:

- 語音生成與語音感知:雙重視角
- 隨機問題與確定性問題之間的關鍵區別
- 極點-零點語音模型
- 同態信號處理
- 短時傅立葉變換分析/合成
- 濾波器組和小波分析/合成
- 非線性測量和建模技術

本書深入的應用涵蓋了語音編碼、增強和修改;說話者識別;噪音減少;信號恢復;動態範圍壓縮等。《離散時間語音處理原則》還包含了一系列極為完整的範例和 Matlab 練習,所有內容都與書中的理論和應用緊密結合。

目錄

1. 介紹。
離散時間語音信號處理。語音通信途徑。基於語音生成和感知的分析/合成。應用。書籍大綱。

2. 離散時間信號處理框架。
離散時間信號。離散時間系統。離散時間傅立葉變換。不確定性原則。z-變換。頻域中的 LTI 系統。LTI 系統的特性。時間變化系統。離散傅立葉變換。連續信號和系統轉換為離散時間。

3. 語音聲音的生成與分類。
語音生成的解剖學和生理學。語音的光譜分析。語音聲音的分類。韻律:語音的旋律。語音感知。

4. 語音生成的聲學。
聲音的物理學。均勻管模型。基於管道串聯的離散時間模型。聲帶/聲道互動。

5. 極點-零點語音模型的分析與合成。
時間依賴處理。確定性信號的全極建模。隨機語音聲音的線性預測分析。“良好性”標準。基於全極建模的合成。極點-零點估計。聲門流導數的分解。

附錄 5.A:隨機過程的特性。
隨機過程。集合平均。平穩隨機過程。時間平均。功率密度譜。

附錄 5.B:線性預測分析中格子濾波器的推導。

6. 同態信號處理。
概念。用於卷積的同態系統。類語音序列的複數倒頻譜。頻譜根同態濾波。周期序列的短時同態分析。短時語音分析。分析/合成結構。對比線性預測和同態濾波。

7. 短時傅立葉變換分析與合成。
短時分析。短時合成。短時傅立葉變換幅度。從修改的 STFT 或 STFTM 中的信號估計。時間尺度修改和語音增強。

附錄 7.A:帶乘法修改的 FBS 方法。

8. 濾波器組分析/合成。
重新檢視 FBS 方法。相位變換器。相位變換器中的相位一致性。恆定 Q 分析/合成。聽覺建模。

9. 正弦波分析/合成。
正弦波語音模型。正弦波參數的估計。合成。源/濾波器相位模型。加法確定性-隨機模型。

附錄 9.A:正弦波模型的推導。
附錄 9.B:最佳立方相位參數的推導。

10. 頻域音高估計。
基於相關的音高估計器。基於“梳狀濾波器”的音高估計。基於諧波正弦波模型的音高估計。聲門脈衝起始估計。多頻帶音高和聲音估計。

11. 非線性測量和建模技術。
重新檢視 STFT 和小波變換。雙線性時間-頻率分佈。聲道中的氣動聲學流。瞬時 Teager 能量運算子。

12. 語音編碼。
語音的統計模型。標量量化。向量量化 (VQ)。頻域編碼。基於模型的編碼。LPC 殘差編碼。

13. 語音增強。
介紹。初步知識。維納濾波。基於模型的處理。基於聽覺掩蔽的增強。

附錄 13.A:隨機理論參數估計。

14. 說話者識別。
介紹。說話者識別的光譜特徵。說話者識別算法。說話者識別中的非光譜特徵。針對不匹配條件的信號增強。從編碼語音中進行說話者識別。

附錄 14.A:期望最大化 (EM) 估計。

術語表。
語音信號處理。
單位。
數據庫。
索引。
關於作者。