Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection (Data-Centric Systems and Applications)
暫譯: 資料匹配:記錄連結、實體解析與重複檢測的概念與技術(資料導向系統與應用)

Peter Christen

  • 出版商: Springer
  • 出版日期: 2012-07-05
  • 售價: $6,250
  • 貴賓價: 9.5$5,938
  • 語言: 英文
  • 頁數: 272
  • 裝訂: Hardcover
  • ISBN: 3642311636
  • ISBN-13: 9783642311635
  • 海外代購書籍(需單獨結帳)

商品描述

Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains including applied statistics, health informatics, data mining, machine learning, artificial intelligence, database management, and digital libraries, significant advances have been achieved over the last decade in all aspects of the data matching process, especially on how to improve the accuracy of data matching, and its scalability to large databases.

Peter Christen’s book is divided into three parts: Part I, “Overview”, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, “Steps of the Data Matching Process”, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, “Further Topics”, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today.

By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.

商品描述(中文翻譯)

資料匹配(也稱為記錄或數據連結、實體解析、物件識別或欄位匹配)是識別、匹配和合併來自多個數據庫或甚至同一數據庫中對應於相同實體的記錄的任務。基於在應用統計、健康資訊學、數據挖掘、機器學習、人工智慧、數據庫管理和數位圖書館等各個領域的研究,在過去十年中,資料匹配過程的各個方面都取得了顯著的進展,特別是在如何提高資料匹配的準確性及其對大型數據庫的可擴展性方面。

彼得·克里斯滕(Peter Christen)的書分為三個部分:第一部分「概述」介紹了該主題,通過展示幾個示例應用及其特定挑戰,以及通用資料匹配過程的一般概述。第二部分「資料匹配過程的步驟」詳細說明了其主要步驟,如預處理、索引、欄位和記錄比較、分類和質量評估。最後,第三部分「進一步主題」處理隱私、實時匹配或匹配非結構化數據等特定方面。最後,它簡要描述了當今許多研究和開源系統的主要特徵。

通過為讀者提供廣泛的資料匹配概念和技術,並觸及資料匹配過程的各個方面,本書幫助研究人員以及專注於數據質量或資料匹配方面的學生熟悉最近的研究進展,並識別資料匹配領域中的開放研究挑戰。為此,本書的每一章都包括一個最終部分,提供進一步背景和研究材料的指引。實務工作者將更好地理解資料匹配的當前技術狀態以及當前系統的內部運作和限制。特別是,他們將了解到,僅僅實施現有的現成資料匹配系統而不進行 substantial 的調整和定制,往往是不可行的。這些實際考量在資料匹配過程的每個主要步驟中都有討論。