Syntax-Based Statistical Machine Translation
暫譯: 基於語法的統計機器翻譯
Philip Williams, Rico Sennrich, Matt Post, Philipp Koehn
- 出版商: Morgan & Claypool
- 出版日期: 2016-08-01
- 售價: $2,570
- 貴賓價: 9.5 折 $2,442
- 語言: 英文
- 頁數: 210
- 裝訂: Paperback
- ISBN: 1627059008
- ISBN-13: 9781627059008
海外代購書籍(需單獨結帳)
商品描述
This unique book provides a comprehensive introduction to the most popular syntax-based statistical machine translation models, filling a gap in the current literature for researchers and developers in human language technologies. While phrase-based models have previously dominated the field, syntax-based approaches have proved a popular alternative, as they elegantly solve many of the shortcomings of phrase-based models. The heart of this book is a detailed introduction to decoding for syntax-based models.
The book begins with an overview of synchronous-context free grammar (SCFG) and synchronous tree-substitution grammar (STSG) along with their associated statistical models. It also describes how three popular instantiations (Hiero, SAMT, and GHKM) are learned from parallel corpora. It introduces and details hypergraphs and associated general algorithms, as well as algorithms for decoding with both tree and string input. Special attention is given to efficiency, including search approximations such as beam search and cube pruning, data structures, and parsing algorithms. The book consistently highlights the strengths (and limitations) of syntax-based approaches, including their ability to generalize phrase-based translation units, their modeling of specific linguistic phenomena, and their function of structuring the search space.
商品描述(中文翻譯)
這本獨特的書籍提供了對最受歡迎的基於語法的統計機器翻譯模型的全面介紹,填補了當前文獻中人類語言技術研究者和開發者的空白。雖然基於短語的模型之前主導了這個領域,但基於語法的方法已被證明是一種受歡迎的替代方案,因為它們優雅地解決了基於短語模型的許多缺點。本書的核心是對基於語法模型的解碼進行詳細介紹。
本書首先概述了同步上下文無關文法(synchronous-context free grammar, SCFG)和同步樹替換文法(synchronous tree-substitution grammar, STSG)及其相關的統計模型。它還描述了如何從平行語料庫中學習三種流行的實例(Hiero、SAMT 和 GHKM)。本書介紹並詳細說明了超圖(hypergraphs)及其相關的通用算法,以及用樹和字串輸入進行解碼的算法。特別關注效率,包括搜索近似方法,如束搜索(beam search)和立方修剪(cube pruning)、數據結構和解析算法。本書始終強調基於語法的方法的優勢(和局限性),包括它們對基於短語的翻譯單元的概括能力、對特定語言現象的建模能力,以及它們在結構化搜索空間中的功能。