Information Retrieval Models: Foundations and Relationships (Paperback)

Thomas Roelleke

相關主題

商品描述

Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the vector-space model (VSM), the probabilistic relevance framework (PRF), the binary independence retrieval (BIR) model, BM25 (Best-Match Version 25, the main instantiation of the PRF/BIR), and language modelling (LM). Also, the early 2000s saw the arrival of divergence from randomness (DFR).

Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: "It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works."

This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-based models. The aim is to create a consolidated and balanced view on the main models.

A particular focus of this book is on the "relationships between models." This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters.

Table of Contents: List of Figures / Preface / Acknowledgments / Introduction / Foundations of IR Models / Relationships Between IR Models / Summary & Research Outlook / Bibliography / Author's Biography / Index

商品描述(中文翻譯)

資訊檢索(IR)模型是IR研究和IR系統的核心組件。過去十年,IR模型家族得到了整合,到2000年為止,它由相對孤立的TF-IDF(詞頻乘以逆文檔頻率)作為向量空間模型(VSM)中的加權方案、概率相關框架(PRF)、二元獨立檢索(BIR)模型、BM25(PRF/BIR的主要實例)和語言建模(LM)等觀點組成。此外,2000年代初出現了與隨機性的差異(DFR)。

關於直覺和簡單性,儘管從概率的角度來看,LM是清晰的,但有些人表示:“理解TF-IDF和BM25很容易。然而,對於LM,我們理解數學,但我們不完全理解它為什麼有效。”

本書採取了一種橫向方法,匯集了TF-IDF、PRF、BIR、Poisson、BM25、LM、概率推理網絡(PIN)和基於差異的模型的基礎。目標是創建一個統一且平衡的主要模型觀點。

本書特別關注“模型之間的關係”。這包括對主要框架(PRF、邏輯IR、VSM、廣義VSM)的概述,以及將TF-IDF與其他模型配對。可以明顯看出,TF-IDF和LM測量的是相同的,即文檔和查詢之間的相依性(重疊)。泊松概率有助於為TF-IDF建立概率的、非啟發式的根基,而泊松參數、平均詞頻則是幾個檢索模型和模型參數之間的聯繫。

目錄:圖表清單/前言/致謝/引言/IR模型的基礎/IR模型之間的關係/總結與研究展望/參考文獻/作者簡介/索引