Building and Using Comparable Corpora for Multilingual Natural Language Processing
暫譯: 建立與使用可比語料庫以進行多語言自然語言處理

Name: Building and Using Comparable Corpora for Multilingual Natural Language Processing
Price: 1786 TWD
Availability: OnlineOnly
Author: Sharoff, Serge, Rapp, Reinhard, Zweigenbaum, Pierre
ISBN: 3031313860

Sharoff, Serge, Rapp, Reinhard, Zweigenbaum, Pierre

出版商: Springer
出版日期: 2024-08-24
售價: $1,880
貴賓價: 9.5 折 $1,786
語言: 英文
頁數: 133
裝訂: Quality Paper - also called trade paper
ISBN: 3031313860
ISBN-13: 9783031313868

海外代購書籍(需單獨結帳)

商品描述

This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.

商品描述(中文翻譯)

本書提供了建立可比較語料庫的方法及其應用的全面概述，包括機器翻譯、跨語言轉移以及各種多語言自然語言處理。作者首先簡要介紹了該主題的歷史，接著與平行資源進行比較，並解釋了為何可比較語料庫變得越來越廣泛使用。特別是，他們提供了預訓練模型（如 BERT 或 GPT）的多語言能力的基礎。接下來，本書專注於建立可比較語料庫，對其句子進行對齊，以創建合適翻譯的資料庫，並利用這些句子翻譯來生成字典和術語庫。然後，解釋了如何利用可比較語料庫來構建機器翻譯引擎以及開發各種多語言應用。

作者簡介

Serge Sharoff, Ph.D., is Professor of Language Technology and Digital Humanities at the Centre for Translation Studies, University of Leeds. His research focuses on Natural Language Processing, including automated methods for collecting very large corpora from the Web, their analysis in terms of domains, genres or text quality, as well as extraction of lexicons and terminology from corpora. The application domains for this kind of research in the Digital Humanities include text annotation, information retrieval, machine translation and computer-assisted language learning. His research stresses the inherent multilingualism of NLP, which implies that tools and resources can be ported across languages by paying attention to the respective linguistic properties.

Pierre Zweigenbaum, Ph.D., FACMI, FIAHSI, is a Senior Researcher at the Interdisciplinary Laboratory for Digital Sciences (LISN, Orsay, France), a laboratory of the French National Center forScientific Research (CNRS) and Université Paris-Saclay, where he has led the ILES Natural Language Processing group. Before CNRS he was a researcher at Paris Public Hospitals in an Inserm team. He also was a part-time professor at the National Institute for Oriental Languages and Civilizations. His research focus is Natural Language Processing, with medicine as a main application domain. He has also designed methods to acquire linguistic knowledge automatically from corpora and thesauri, to help extend monolingual and bilingual lexicons and terminologies, using parallel and comparable corpora.

Reinhard Rapp, Ph.D., is Professor of Applied Translation Studies at Magdeburg-Stendal University of Applied Sciences and is also affiliated with the University of Mainz. He has conducted EU-funded research projects at the University of Geneva, the University of Tarragona, the University of Leeds, at Aix-Marseille University, at the University of Mainz and at the Athena Research Center in Athens. His main research interests are in computational linguistics, translation studies and cognitive science. His publications have dealt with unsupervised language learning from text corpora, word sense disambiguation, text mining, thesaurus construction, bilingual dictionary induction from parallel and comparable corpora, and with statistical and neural machine translation.

作者簡介(中文翻譯)

謝爾吉·沙羅夫（Serge Sharoff），博士，是利茲大學翻譯研究中心的語言技術與數位人文學教授。他的研究專注於自然語言處理（Natural Language Processing），包括從網路收集非常大型語料庫的自動化方法，根據領域、類型或文本質量進行分析，以及從語料庫中提取詞彙和術語。這類研究在數位人文學中的應用領域包括文本註釋、資訊檢索、機器翻譯和計算機輔助語言學習。他的研究強調自然語言處理的內在多語言性，這意味著工具和資源可以跨語言移植，前提是注意各自的語言特性。

皮埃爾·茲維根鮑姆（Pierre Zweigenbaum），博士，FACMI，FIAHSI，是法國國家科學研究中心（CNRS）和巴黎薩克雷大學（Université Paris-Saclay）下的數位科學跨學科實驗室（LISN，法國奧爾塞）的高級研究員，他曾領導ILES自然語言處理小組。在加入CNRS之前，他是巴黎公立醫院的一名研究員，隸屬於一個Inserm團隊。他還曾在國立東方語言與文明學院擔任兼職教授。他的研究重點是自然語言處理，醫學是其主要應用領域。他還設計了自動從語料庫和詞庫中獲取語言知識的方法，以幫助擴展單語和雙語詞彙及術語，使用平行和可比語料庫。

萊因哈德·拉普（Reinhard Rapp），博士，是馬格德堡-斯坦達爾應用科學大學的應用翻譯研究教授，並與美因茨大學有關聯。他曾在日內瓦大學、塔拉戈納大學、利茲大學、艾克斯-馬賽大學、美因茨大學以及雅典的雅典研究中心進行歐盟資助的研究項目。他的主要研究興趣包括計算語言學、翻譯研究和認知科學。他的出版物涉及從文本語料庫進行無監督語言學習、詞義消歧、文本挖掘、詞庫構建、從平行和可比語料庫中引導雙語字典，以及統計和神經機器翻譯。