商品描述
From the early pulse code modulation-based coders to some of the recent multi-rate wideband speech coding standards, the area of speech coding made several significant strides with an objective to attain high quality of speech at the lowest possible bit rate. This book presents some of the recent advances in linear prediction (LP)-based speech analysis that employ perceptual models for narrow- and wide-band speech coding. The LP analysis-synthesis framework has been successful for speech coding because it fits well the source-system paradigm for speech synthesis. Limitations associated with the conventional LP have been studied extensively, and several extensions to LP-based analysis-synthesis have been proposed, e.g., the discrete all-pole modeling, the perceptual LP, the warped LP, the LP with modified filter structures, the IIR-based pure LP, all-pole modeling using the weighted-sum of LSP polynomials, the LP for low frequency emphasis, and the cascade-form LP. These extensions can be classified as algorithms that either attempt to improve the LP spectral envelope fitting performance or embed perceptual models in the LP. The first half of the book reviews some of the recent developments in predictive modeling of speech with the help of Matlab™ Simulation examples. Advantages of integrating perceptual models in low bit rate speech coding depend on the accuracy of these models to mimic the human performance and, more importantly, on the achievable "coding gains" and "computational overhead" associated with these physiological models. Methods that exploit the masking properties of the human ear in speech coding standards, even today, are largely based on concepts introduced by Schroeder and Atal in 1979. For example, a simple approach employed in speech coding standards is to use a perceptual weighting filter to shape the quantization noise according to the masking properties of the human ear. The second half of the book reviews some of the recent developments in perceptual modeling of speech (e.g., masking threshold, psychoacoustic models, auditory excitation pattern, and loudness) with the help of Matlab™ simulations. Supplementary material including Matlab™ programs and simulation examples presented in this book can also be accessed at http://www.morganclaypool.com/page/atti. Table of Contents: Introduction / Predictive Modeling of Speech / Perceptual Modeling of Speech
商品描述(中文翻譯)
從早期基於脈衝編碼調變(pulse code modulation)的編碼器到一些最近的多速率寬頻語音編碼標準,語音編碼領域在追求以最低的比特率達到高品質語音方面取得了幾項重大進展。本書介紹了一些基於線性預測(linear prediction, LP)的語音分析的最新進展,這些分析使用感知模型進行窄頻和寬頻語音編碼。LP分析-合成框架在語音編碼中取得成功,因為它很好地符合語音合成的源-系統範式。與傳統LP相關的限制已被廣泛研究,並提出了幾種對LP基礎的分析-合成的擴展,例如離散全極建模(discrete all-pole modeling)、感知LP(perceptual LP)、扭曲LP(warped LP)、具有修改過濾器結構的LP、基於IIR的純LP、使用LSP多項式加權和的全極建模、低頻強調的LP以及串聯形式的LP。這些擴展可以被分類為試圖改善LP頻譜包絡擬合性能或在LP中嵌入感知模型的算法。本書的前半部分回顧了一些在語音預測建模方面的最新發展,並輔以Matlab™模擬範例。
將感知模型整合到低比特率語音編碼中的優勢取決於這些模型模擬人類表現的準確性,更重要的是,取決於與這些生理模型相關的可實現的「編碼增益」和「計算開銷」。即使在今天,利用人耳掩蔽特性進行語音編碼標準的方法在很大程度上仍然基於Schroeder和Atal在1979年提出的概念。例如,在語音編碼標準中採用的一種簡單方法是使用感知加權濾波器根據人耳的掩蔽特性來塑造量化噪聲。本書的後半部分回顧了一些在語音感知建模方面的最新發展(例如,掩蔽閾值、心理聲學模型、聽覺激發模式和響度),並輔以Matlab™模擬。包括本書中呈現的Matlab™程式和模擬範例的補充材料也可以在http://www.morganclaypool.com/page/atti訪問。
目錄:介紹 / 語音的預測建模 / 語音的感知建模