This open access book covers all facets of entity-oriented search--where "search" can be interpreted in the broadest sense of information access--from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book.
The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)--a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research.
Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms.
這本開放存取的書籍涵蓋了實體導向搜尋的各個面向——在這裡,「搜尋」可以被解釋為資訊存取的最廣泛意義——從統一的角度提供了一個連貫且全面的最新技術概述。它代表了這個廣泛且快速發展領域研究的首次綜合。選定的主題進行了深入討論,目標是建立基本技術和方法,作為未來研究和開發的基礎。其他主題僅在調查層面上處理,並包含大量指向相關文獻的參考。基於沿途識別的開放問題和挑戰,書中還提供了未來研究的路線圖。
本書分為三個主要部分,夾在引言和結論章節之間。前兩章介紹了基本概念,提供了實體導向搜尋任務的概述,並呈現了整本書將使用的各種數據類型和來源。第一部分處理實體排名的核心任務:給定一個文本查詢,可能附加有額外元素或結構提示,返回一個排名的實體列表。這一核心任務在多種不同變體中進行了檢視,使用結構化和非結構化數據集合,以及多種查詢表達方式。接著,第二部分專注於實體在橋接非結構化和結構化數據中的角色。第三部分探討了實體如何使搜尋引擎理解用戶在搜尋框中輸入的查詢背後的概念、意義和意圖,以及它們如何提供豐富且集中的回應(而不僅僅是一個文件列表)——這一過程被稱為語義搜尋。最後一章通過討論當前方法的局限性,並建議未來研究的方向來結束本書。
本書的主要目標讀者是研究人員和研究生。具備資訊檢索的一般背景即可跟隨本書的內容,包括對基本概率和統計概念的理解,以及對機器學習概念和監督學習算法的基本知識。