Building the Unstructured Data Warehouse (Paperback)
暫譯: 建立非結構化數據倉庫 (平裝本)
W.H. Inmon, Krish Krishnan
- 出版商: Technics Publication
- 出版日期: 2011-01-15
- 售價: $1,485
- 貴賓價: 9.5 折 $1,411
- 語言: 英文
- 頁數: 216
- 裝訂: Paperback
- ISBN: 1935504045
- ISBN-13: 9781935504047
-
相關分類:
大數據 Big-data、資料庫、Data Science
立即出貨 (庫存=1)
買這商品的人也買了...
-
$1,140Effective Java, 2/e (Paperback)
-
$420$357 -
$1,290Game Engine Architecture (Hardcover)
-
$820$697 -
$850$672 -
$480$384 -
$950$751 -
$680$578 -
$490$382 -
$390$304 -
$580$493 -
$880$695 -
$780$663 -
$580$458 -
$590$460 -
$580$452 -
$1,130$961 -
$400$380 -
$2,210$2,100 -
$550$429 -
$480$408 -
$320$250 -
$249$212 -
$690$538 -
$380$323
相關主題
商品描述
Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text.
Master these ten objectives:
- Build an unstructured data warehouse using the 11-step approach
- Integrate text and describe it in terms of homogeneity, relevance, medium, volume, and structure
- Overcome challenges including blather, the Tower of Babel, and lack of natural relationships
- Avoid the Data Junkyard and combat the Spider's Web
- Reuse techniques perfected in the traditional data warehouse and Data Warehouse 2.0,including iterative development
- Apply essential techniques for textual Extract, Transform, and Load (ETL) such as phrase recognition, stop word filtering, and synonym replacement
- Design the Document Inventory system and link unstructured text to structured data
- Leverage indexes for efficient text analysis and taxonomies for useful external categorization
- Manage large volumes of data using advanced techniques such as backward pointers
- Evaluate technology choices suitable for unstructured data processing, such as data warehouse appliances
- Chapter 1 defines unstructured data and explains why text is the main focus of this book.
- Chapter 2 addresses the challenges one faces when managing unstructured data.
- Chapter 3 discusses the DW 2.0 architecture, which leads into the role of the unstructured data warehouse. The unstructured data warehouse is defined and benefits are given. There are several features of the conventional data warehouse that can be leveraged for the unstructured data warehouse, including ETL processing, textual integration, and iterative development.
- Chapter 4 focuses on the heart of the unstructured data warehouse: Textual Extract, Transform, and Load (ETL).
- Chapter 5 describes the 11 steps required to develop the unstructured data warehouse.
- Chapter 6 describes how to inventory documents for maximum analysis value, as well as link the unstructured text to structured data for even greater value.
- Chapter 7 goes through each of the different types of indexes necessary to make text analysis efficient. Indexes range from simple indexes, which are fast to create and are good if the analyst really knows what needs to be analyzed before the indexing process begins, to complex combined indexes, which can be made up of any and all of the other kinds of indexes.
- Chapter 8 explains taxonomies and how they can be used within the unstructured data warehouse.
- Chapter 9 explains ways of coping with large amounts of unstructured data. Techniques such as keeping the unstructured data at its source and using backward pointers are discussed. The chapter explains why iterative development is so important.
- Chapter 10 focuses on challenges and some technology choices that are suitable for unstructured data processing. In addition, the data warehouse appliance is discussed.
- Chapters 11, 12, and 13 put all of the previously discussed techniques and approaches in context through three case studies.
商品描述(中文翻譯)
從數據倉儲傳奇人物 Bill Inmon 學習如何建立您當前業務所需的報告環境的基本技術!
許多有價值的商業問題的答案隱藏在文本中。您現有的報告環境能多好地從電子郵件、電子表格和文件中提取必要的文本,並將其轉換為有用的分析和報告格式?將傳統數據倉儲轉變為高效的非結構化數據倉儲需要分析師、架構師、設計師和開發人員額外的技能。本書將幫助您成功實施非結構化數據倉儲,通過清晰的解釋、範例和案例研究,您將學習到成功獲取和分析文本的新技術和技巧。
掌握這十個目標:
- 使用 11 步驟方法建立非結構化數據倉儲
- 整合文本並從同質性、相關性、媒介、數量和結構等方面進行描述
- 克服挑戰,包括冗長、巴別塔和缺乏自然關係
- 避免數據垃圾場並對抗蜘蛛網
- 重用在傳統數據倉儲和數據倉儲 2.0 中完善的技術,包括迭代開發
- 應用文本提取、轉換和加載 (ETL) 的基本技術,如短語識別、停用詞過濾和同義詞替換
- 設計文檔庫系統,並將非結構化文本與結構化數據連結
- 利用索引進行高效的文本分析,並使用分類法進行有用的外部分類
- 使用先進技術(如反向指針)管理大量數據
- 評估適合非結構化數據處理的技術選擇,如數據倉儲設備
- 第一章定義非結構化數據並解釋為什麼文本是本書的主要焦點。
- 第二章討論管理非結構化數據時面臨的挑戰。
- 第三章討論 DW 2.0 架構,並引入非結構化數據倉儲的角色。定義非結構化數據倉儲並給出其好處。傳統數據倉儲的幾個特徵可以用於非結構化數據倉儲,包括 ETL 處理、文本整合和迭代開發。
- 第四章專注於非結構化數據倉儲的核心:文本提取、轉換和加載 (ETL)。
- 第五章描述開發非結構化數據倉儲所需的 11 個步驟。
- 第六章描述如何對文檔進行清點以獲得最大的分析價值,以及如何將非結構化文本與結構化數據連結以獲得更大的價值。
- 第七章逐一介紹進行文本分析所需的不同類型的索引。索引從簡單索引開始,這些索引創建速度快,適合分析師在索引過程開始之前確切知道需要分析的內容,直到複雜的組合索引,這些索引可以由其他所有類型的索引組成。
- 第八章解釋分類法及其在非結構化數據倉儲中的應用。
- 第九章解釋應對大量非結構化數據的方法。討論了將非結構化數據保留在其來源並使用反向指針等技術。本章解釋了為什麼迭代開發如此重要。
- 第十章專注於挑戰和一些適合非結構化數據處理的技術選擇。此外,還討論了數據倉儲設備。
- 第十一、十二和十三章通過三個案例研究將之前討論的所有技術和方法放在上下文中。