Data Mining with Microsoft SQL Server 2000 Technical Reference
暫譯: 使用 Microsoft SQL Server 2000 的資料探勘技術參考
Claude Seidman
- 出版商: MicroSoft
- 出版日期: 2001-06-09
- 售價: $2,050
- 貴賓價: 9.5 折 $1,948
- 語言: 英文
- 頁數: 400
- 裝訂: Hardcover
- ISBN: 0735612714
- ISBN-13: 9780735612716
-
相關分類:
MSSQL、SQL、Data-mining
已過版
買這商品的人也買了...
-
$2,500$2,375 -
$1,029Neural Networks: A Comprehensive Foundation, 2/e (精裝)
-
$680$537 -
$2,660$2,527 -
$970Introduction to Algorithms, 2/e
-
$1,150$1,127 -
$880$695 -
$1,274Computer Architecture: A Quantitative Approach, 3/e(精裝本)
-
$490$387 -
$1,900$1,805 -
$1,650$1,568 -
$1,930$1,834 -
$690$587 -
$650$514 -
$590$466 -
$380$342 -
$720$569 -
$1,300$1,235 -
$690$587 -
$280$266 -
$580$458 -
$750$638 -
$560$476 -
$880$695 -
$750$593
商品描述
Description:
Learn how to turn mountains of raw data into useful information with this guide.
The amount of information stored in corporate databases is exploding exponentially. Data mining—finding meaningful patterns in all that data—can give any organization a competitive advantage. This book is the in-depth reference from Microsoft® for anyone who wants to take full advantage of the powerful data-mining features in SQL Server™ 2000. It examines the SQL Server 2000 Analysis Services architecture and shows how data mining fits into its complete suite of information-extraction technologies. Then it demonstrates how to structure and mine large databases with the algorithms included with SQL Server 2000 to find nuggets of useful information. It even shows how to create a practice data-mining model using data downloaded from a database. Coverage includes:
• INTRODUCTION TO DATA MINING: What data mining is and isn’t, plus important principles and definitions behind data-mining methodologies, including the role of data-mining models, statistics, and algorithms
• SQL SERVER 2000 ARCHITECTURE: How data mining fits into the SQL Server 2000 Analysis Services architecture and how it builds on the SQL Server 2000 relational database and its embedded online analytical processing (OLAP) engine
• DATA-MINING METHODS: How to choose the best data-mining method for the job—decision trees or clustering
• EASE OF USE FEATURES: How to use the Mining Model Wizard and the OLAP Mining Model Editor to simplify creating, training, and processing a model
• PROGRAMMING THE DATA-MINING SERVICES: How to use data-mining models and Data Transformation Services, PivotTable® Services, decision-support objects (DSO), PERL, Visual Basic®, Scripting Edition, XML, and other tools and languages to work with the data-mining engine
Table of Contents:
Acknowledgments xi | xi |
Introduction xiii | xiii |
PART I INTRODUCING DATA MINING | |
1 Understanding Data Mining | 3 |
What Is Data Mining? | 3 |
Why Use Data Mining? | 4 |
How Data Mining Is Currently Used | 6 |
Defining the Terms | 7 |
Data Mining Methodology | 9 |
Analyzing the Problem | 10 |
Extracting and Cleansing the Data | 10 |
Validating the Data | 10 |
Creating and Training the Model | 10 |
Querying the Data Mining Model Data | 10 |
Maintaining the Validity of the Data-Mining Model | 10 |
Overview of Microsoft Data Mining | 11 |
Data Mining vs. OLAP | 11 |
Data-Mining Models | 11 |
Data-Mining Algorithms | 12 |
Using SQL Server Syntax to Data Mine | 14 |
Summary | 14 |
2 Microsoft SQL Server Analysis Services Architecture | 15 |
Introduction to OLAP | 16 |
MOLAP | 18 |
ROLAP | 18 |
HOLAP | 19 |
Server Architecture | 20 |
Data Mining Services Within Analysis Services | 20 |
Client Architecture | 21 |
PivotTable Service | 22 |
OLE DB | 23 |
Decision Support Objects (DSO) | 24 |
Multidimensional Expressions (MDX) | 25 |
Prediction Joins | 25 |
Summary | 26 |
3 Data Storage Models | 27 |
Why Data Mining Needs a Data Warehouse | 27 |
Maintaining Data Integrity | 28 |
Reporting Against OLTP Data Can Be Hazardous to Your Performance | 31 |
Data Warehousing Architecture for Data Mining | 33 |
Creating the Warehouse from OLTP Data | 33 |
Optimizing Data for Mining | 36 |
Physical Data Mining Structure | 42 |
Three-Tier Architecture | 43 |
Relational Data Warehouse | 43 |
Advantages of Relational Data Storage | 44 |
Building Supporting Tables for Data Mining | 45 |
OLAP cubes | 46 |
How Data Mining Uses OLAP Structures | 46 |
Advantages of OLAP Storage | 47 |
When OLAP Is Not Appropriate for Data Mining | 49 |
Summary | 49 |
4 Approaches to Data Mining | 51 |
Directed Data Mining | 51 |
Undirected Data Mining | 52 |
Data Mining vs. Statistics | 52 |
Learning from Historical Data | 57 |
Predicting the Future | 59 |
Training Data-Mining Models | 61 |
Evaluating the Models and Avoiding Errors | 62 |
Summary | 65 |
PART II DATA-MINING METHODS | |
5 Microsoft Decision Trees | 69 |
Creating the Model | 69 |
Analysis Manager | 70 |
Visualizing the Model | 87 |
Dependency Network Browser | 94 |
Inside the Decision Tree Algorithm | 97 |
How Predictions Are Derived | 109 |
Navigating the Tree | 109 |
Navigation vs. Rules | 112 |
When to Use Decision Trees | 113 |
Summary | 114 |
6 Creating Decision Trees with OLAP | 115 |
Creating the Model | 115 |
Select Source Type | 116 |
Select Source Cube and Data-Mining Technique | 116 |
Select Case | 118 |
Select Predicted Entity | 119 |
Select Training Data | 121 |
Select Dimension and Virtual Cube | 121 |
Completing the Data-Mining Model | 123 |
OLAP Mining Model Editor | 125 |
Content Detail Pane | 126 |
Structure Panel | 126 |
Prediction Tree List | 126 |
Analyzing Data with the OLAP Data-Mining Model | 126 |
Using the Generated Virtual Cube | 128 |
Using the Generated Dimension | 129 |
Summary | 133 |
7 Microsoft Clustering | 135 |
The Search for Order | 136 |
Looking for Ways to Understand Data | 136 |
Clustering as an Undirected Data-Mining Technique | 137 |
How Clustering Works | 138 |
Overview of the Algorithm | 138 |
The K-Means Method Clustering Algorithm | 138 |
What Is Being Measured Exactly? | 142 |
Clustering Factors | 142 |
Measuring "Closeness" | 143 |
When to Use Clustering | 146 |
Visualize Relationships | 146 |
Highlight Anomalies | 146 |
Create Samples for Other Data-Mining Efforts | 148 |
Weaknesses of Clustering | 148 |
Creating a Data-Mining Model Using Clustering | 149 |
Select Source Type | 150 |
Select the Table or Tables for Your Mining Model | 150 |
Select the Data-Mining Technique | 151 |
Edit Joins | 152 |
Select the Case Key Column for Your Mining Model | 152 |
Select the Input and Predictable Columns | 152 |
Viewing the Model | 154 |
Organization of the Cluster Nodes | 154 |
Order of the Cluster Nodes | 156 |
Analyzing the Data | 156 |
Summary | 158 |
PART III CREATING DATA–MINING APPLICATIONS WITH CODE | |
8 Using Microsoft Data Transformation Services (DTS) | 161 |
What Is DTS? | 162 |
DTS Tasks | 162 |
Transform | 162 |
Bulk Insert | 163 |
Data Driven Query | 163 |
Execute Package | 164 |
Connections | 167 |
Sources | 167 |
Configuring a Connection | 168 |
DTS Package Workflow | 169 |
DTS Package Steps | 169 |
Precedence Constraints | 170 |
DTS Designer | 171 |
Opening the DTS Designer | 171 |
Saving a DTS Package | 172 |
dtsrun Utility | 174 |
Using DTS to Create a Data-Mining Model | 177 |
Preparing the SQL Server Environment | 178 |
Creating the Package | 182 |
Summary | 208 |
9 Using Decision Support Objects (DSO) | 209 |
Scripting vs. Visual Basic | 210 |
The Server Object | 211 |
The Database Object | 219 |
Creating the Relational Data-Mining Model Using DSO | 221 |
Creating the OLAP Data-Mining Model Using DSO | 230 |
The DataSource Object | 232 |
Data-Mining Model (Decision Support Objects) | 233 |
Adding a New Data Source | 233 |
Analysis Server Roles | 234 |
Data-Mining Model Roles | 235 |
Summary | 236 |
10 Understanding Data-Mining Structures | 237 |
The Structure of the Data-Mining Model Case | 237 |
Data-Mining Models Look Like Tables | 237 |
Using Code to Browse Data-Mining Models | 238 |
Using the Schema Rowsets | 243 |
MINING_MODELS Schema Rowset | 243 |
MINING_COLUMNS Schema Rowset | 249 |
MINING_MODEL_CONTENT Schema Rowset | 259 |
MINING_SERVICES Schema Rowset | 262 |
SERVICE_PARAMETERS Schema Rowset | 266 |
MODEL_CONTENT_PMML Schema Rowset | 268 |
Summary | 269 |
11 Data Mining Using PivotTable Service | 271 |
Redistributing Components | 272 |
Installing and Registering Components | 273 |
File Locations | 274 |
Installation Registry Settings | 275 |
Redistribution Setup Programs | 275 |
Connecting to the PivotTable Service | 276 |
Connect to Analysis Services Using PivotTable Service | 276 |
Connect to Analysis Services Using HTTP | 280 |
Building a Local Data-Mining Model | 280 |
Storage of Local Mining Models | 284 |
SELECT INTO Statement | 286 |
INSERT INTO Statement | 286 |
OPENROWSET Syntax | 287 |
Nested Tables and the SHAPE Statement | 289 |
Using XML in Data Mining | 290 |
The PMML Standard | 290 |
Summary | 296 |
12 Data-Mining Queries | 297 |
Components of a Prediction Query | 297 |
The Basic Prediction Query | 298 |
Specifying the Test Case Source | 298 |
Specifying Columns | 300 |
The PREDICTION JOIN Clause | 300 |
Using Functions as Columns | 304 |
Using Tabular Values as Columns | 304 |
The WHERE Clause | 306 |
Prediction Functions | 307 |
Predict | 307 |
PredictProbability | 308 |
PredictSupport | 308 |
PredictVariance | 309 |
PredictStdev | 310 |
PredictProbabilityVariance | 310 |
PredictProbabilityStdev | 310 |
PredictHistogram | 310 |
TopCount | 313 |
TopSum | 313 |
TopPercent | 314 |
RangeMin | 314 |
RangeMid | 314 |
RangeMax | 314 |
PredictScore | 314 |
PredictNodeId | 315 |
Prediction Queries with Clustering Models | 315 |
Cluster | 315 |
ClusterProbability | 316 |
ClusterDistance | 316 |
Using DTS to Run Prediction Queries | 317 |
Summary | 322 |
APPENDIX | 325 |
GLOSSARY | 349 |
INDEX | 359 |
商品描述(中文翻譯)
描述:
學習如何將大量的原始數據轉化為有用的信息,請參考本指南。
企業數據庫中儲存的信息量正在以指數級增長。數據挖掘——在這些數據中尋找有意義的模式——可以為任何組織提供競爭優勢。本書是微軟®針對希望充分利用 SQL Server™ 2000 中強大數據挖掘功能的任何人的深入參考資料。它探討了 SQL Server 2000 分析服務的架構,並展示了數據挖掘如何融入其完整的信息提取技術套件中。然後,它演示了如何使用 SQL Server 2000 附帶的算法來結構化和挖掘大型數據庫,以尋找有用的信息。它甚至展示了如何使用從數據庫下載的數據創建一個實踐數據挖掘模型。內容包括:
• 數據挖掘介紹:什麼是數據挖掘,什麼不是,以及數據挖掘方法論背後的重要原則和定義,包括數據挖掘模型、統計和算法的角色
• SQL Server 2000 架構:數據挖掘如何融入 SQL Server 2000 分析服務架構,以及它如何建立在 SQL Server 2000 關聯數據庫及其嵌入的在線分析處理 (OLAP) 引擎之上
• 數據挖掘方法:如何選擇最適合工作的數據挖掘方法——決策樹或聚類
• 易用性功能:如何使用挖掘模型向導和 OLAP 挖掘模型編輯器來簡化模型的創建、訓練和處理
• 編程數據挖掘服務:如何使用數據挖掘模型和數據轉換服務、PivotTable® 服務、決策支持對象 (DSO)、PERL、Visual Basic®、腳本版、XML 及其他工具和語言來處理數據挖掘引擎
目錄:
致謝 xi
引言 xiii
第一部分 數據挖掘介紹
1 理解數據挖掘 3
數據挖掘是什麼? 3
為什麼使用數據挖掘? 4
當前數據挖掘的使用情況 6
定義術語 7
數據挖掘方法論 9
分析問題 10
提取和清理數據 10
驗證數據 10
創建和訓練模型 10
查詢數據挖掘模型數據 10
維護數據挖掘模型的有效性 10
微軟數據挖掘概述 11
數據挖掘與 OLAP 11
數據挖掘模型 11
數據挖掘算法 12
使用 SQL Server 語法進行數據挖掘 14
總結 14
2 微軟 SQL Server 分析服務架構 15
OLAP 介紹 16
MOLAP 18
ROLAP 18
HOLAP 19
伺服器架構 20
分析服務中的數據挖掘服務 20
客戶端架構 21
PivotTable 服務 22
OLE DB 23
決策支持對象 (DSO) 24
多維表達式 (MDX) 25
預測聯接 25
總結 26
3 數據存儲模型 27
為什麼數據挖掘需要數據倉庫 27
維護數據完整性 28
針對 OLTP 數據的報告可能會影響性能 31
數據挖掘的數據倉庫架構 33
從 OLTP 數據創建倉庫 33
優化數據以進行挖掘 36
實體數據挖掘結構 42
三層架構 43
關聯數據倉庫 43
關聯數據存儲的優勢 44