Bitext Alignment (Synthesis Lectures on Human Language Technologies)
暫譯: 雙語文本對齊(人類語言技術合成講座)

Jörg Tiedemann

  • 出版商: Morgan & Claypool
  • 出版日期: 2011-06-03
  • 售價: $1,780
  • 貴賓價: 9.5$1,691
  • 語言: 英文
  • 頁數: 166
  • 裝訂: Paperback
  • ISBN: 1608455106
  • ISBN-13: 9781608455102
  • 海外代購書籍(需單獨結帳)

商品描述

This book provides an overview of various techniques for the alignment of bitexts. It describes general concepts and strategies that can be applied to map corresponding parts in parallel documents on various levels of granularity. Bitexts are valuable linguistic resources for many different research fields and practical applications. The most predominant application is machine translation, in particular, statistical machine translation. However, there are various other threads that can be followed which may be supported by the rich linguistic knowledge implicitly stored in parallel resources. Bitexts have been explored in lexicography, word sense disambiguation, terminology extraction, computer-aided language learning and translation studies to name just a few. The book covers the essential tasks that have to be carried out when building parallel corpora starting from the collection of translated documents up to sub-sentential alignments. In particular, it describes various approaches to document alignment, sentence alignment, word alignment and tree structure alignment. It also includes a list of resources and a comprehensive review of the literature on alignment techniques.

Table of Contents: Introduction / Basic Concepts and Terminology / Building Parallel Corpora / Sentence Alignment / Word Alignment / Phrase and Tree Alignment / Concluding Remarks

商品描述(中文翻譯)

本書提供了各種雙語文本對齊技術的概述。它描述了可以應用於映射平行文件中對應部分的通用概念和策略,涵蓋不同的粒度層級。雙語文本是許多不同研究領域和實際應用中寶貴的語言資源。最主要的應用是機器翻譯,特別是統計機器翻譯。然而,還有許多其他的研究方向可以探索,這些方向可能受到平行資源中隱含的豐富語言知識的支持。雙語文本已在詞典學、詞義消歧、術語提取、計算機輔助語言學習和翻譯研究等領域進行探討,僅舉幾例。本書涵蓋了在建立平行語料庫時必須執行的基本任務,從收集翻譯文件到子句級別的對齊。特別是,它描述了文檔對齊、句子對齊、詞對齊和樹結構對齊的各種方法。它還包括資源列表和對對齊技術文獻的綜合回顧。

目錄:引言 / 基本概念與術語 / 建立平行語料庫 / 句子對齊 / 詞對齊 / 短語與樹對齊 / 總結性評論