Pretrained Transformers for Text Ranking: Bert and Beyond
暫譯: 預訓練變壓器於文本排名:Bert 與其他技術

Jimmy Lin , Rodrigo Nogueira , Andrew Yates

  • 出版商: Morgan & Claypool
  • 出版日期: 2021-10-29
  • 售價: $3,850
  • 貴賓價: 9.5$3,658
  • 語言: 英文
  • 頁數: 325
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 163639230X
  • ISBN-13: 9781636392301
  • 海外代購書籍(需單獨結帳)

商品描述

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing (NLP) applications. This book provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in NLP, information retrieval (IR), and beyond.

This book provides a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. It covers a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. Two themes pervade the book: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this book also attempts to prognosticate where the field is heading.

商品描述(中文翻譯)

文本排名的目標是生成一個有序的文本列表,這些文本是從語料庫中根據查詢檢索而來。雖然文本排名最常見的表述是搜索,但在許多自然語言處理(NLP)應用中也可以找到這項任務的實例。本書提供了有關文本排名的概述,重點介紹了被稱為變壓器(transformers)的神經網絡架構,其中 BERT(Bidirectional Encoder Representations from Transformers)是最著名的例子。變壓器與自我監督預訓練的結合,促成了 NLP、信息檢索(IR)及其他領域的範式轉變。

本書綜合了現有的研究成果,為希望更好地理解如何將變壓器應用於文本排名問題的從業者以及希望在此領域進行研究的學者提供了一個單一的切入點。它涵蓋了廣泛的現代技術,分為兩個高層次的類別:在多階段架構中執行重新排名的變壓器模型,以及直接執行排名的密集檢索技術。本書貫穿兩個主題:處理長文檔的技術,超越 NLP 中典型的逐句處理,以及解決效果(即結果質量)與效率(例如查詢延遲、模型和索引大小)之間的權衡的技術。儘管變壓器架構和預訓練技術是近期的創新,但它們在文本排名中的應用方式的許多方面相對較為成熟,並代表了成熟的技術。然而,仍然存在許多未解的研究問題,因此除了奠定預訓練變壓器在文本排名中的基礎外,本書還試圖預測該領域的未來發展方向。

作者簡介

Jimmy Lin holds the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo. Prior to 2015, he was a faculty at the University of Maryland, College Park. Lin received his Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology in 2009
 

Rodrigo Nogueira is a post-doctoral researcher at the University of Waterloo, an adjunct professor at the University of Campinas (UNICAMP), and a senior research scientist at NeuralMind, a startup focused on applying deep learning to document and image analysis. Nogueira received his Ph.D. in Computer Science from the New York University in 2019.
 

Andrew Yates is an assistant professor in the Informatics Institute at the University of Amsterdam. Prior to 2021, he was a post-doctoral researcher and then senior researcher at the Max Planck Institute for Informatics. Yates received his Ph.D. in Computer Science from Georgetown University in 2016.

作者簡介(中文翻譯)

林俊傑擔任滑鐵盧大學戴維·R·切里頓計算機科學學院的戴維·R·切里頓講座教授。在2015年之前,他是馬里蘭大學(University of Maryland, College Park)的教職員。林俊傑於2009年在麻省理工學院(Massachusetts Institute of Technology)獲得電機工程與計算機科學博士學位。

 

羅德里戈·諾蓋拉 是滑鐵盧大學的博士後研究員,坎皮納斯大學(University of Campinas, UNICAMP)的兼任教授,以及專注於將深度學習應用於文檔和圖像分析的初創公司NeuralMind的高級研究科學家。諾蓋拉於2019年在紐約大學(New York University)獲得計算機科學博士學位。

 

安德魯·耶茨 是阿姆斯特丹大學(University of Amsterdam)資訊學研究所的助理教授。在2021年之前,他是馬克斯·普朗克資訊學研究所的博士後研究員,隨後成為高級研究員。耶茨於2016年在喬治城大學(Georgetown University)獲得計算機科學博士學位。