Data Preparation for Data Mining Using SAS (Paperback)
暫譯: 使用 SAS 進行資料探勘的資料準備 (平裝本)
Mamdouh Refaat
- 出版商: Morgan Kaufmann
- 出版日期: 2006-10-13
- 售價: $1,800
- 貴賓價: 9.5 折 $1,710
- 語言: 英文
- 頁數: 424
- 裝訂: Paperback
- ISBN: 0123735777
- ISBN-13: 9780123735775
-
相關分類:
Data-mining
立即出貨 (庫存 < 3)
買這商品的人也買了...
-
$1,007C How to Program, 4/e
-
$880$695 -
$1,650$1,568 -
$580$493 -
$650$507 -
$550$435 -
$270$213 -
$420$399 -
$580$458 -
$750$593 -
$450$356 -
$490$387 -
$680$537 -
$780$616 -
$720$569 -
$1,200$948 -
$490$382 -
$580$493 -
$580$493 -
$880$695 -
$600$474 -
$1,560$1,326 -
$360$342 -
$620$490 -
$750$593
商品描述
Description
Are you a data mining analyst, who spends up to 80% of your time assuring data quality, then preparing that data for developing and deploying predictive models? And do you find lots of literature on data mining theory and concepts, but when it comes to practical advice on developing good mining views find little ?how to? information? And are you, like most analysts, preparing the data in SAS? This book is intended to fill this gap as your source of practical recipes. It introduces a framework for the process of data preparation for data mining, and presents the detailed implementation of each step in SAS. In addition, business applications of data mining modeling require you to deal with a large number of variables, typically hundreds if not thousands. Therefore, the book devotes several chapters to the methods of data transformation and variable selection.
Table of Contents
Contents 1 Introduction 1.1 The Data Mining Process 1.2 Methodologies of Data Mining 1.3 The Mining View 1.4 Scoring View 1.5 Notes on Data Mining Software 2 Tasks and Data Flow 2.1 Data Mining Tasks 2.2 Data Mining Competencies 2.3 The Data Flow 2.4 Types of Variables 2.5 The Mining View and the Scoring View 2.6 Steps of Data Preparation 3 Review of Data Mining Modeling Techniques 3.1 Introduction 3.2 Regression Models 3.3 Decision trees 3.4 Neural Networks 3.5 Cluster Analysis 3.6 Association Rules 3.7 Time Series Analysis 3.8 Support Vector Machines 4 SAS Macros: A Quick Start 4.1 Introduction: Why Macros 4.2 The Basics - The Macro and Its Variables 4.3 Doing Calculations 4.4 Programming Logic 4.5 Working with Strings 4.6 Macros that Call Other Macros 4.7 Common Macro Patterns and Caveats 4.8 Where to Go From Here 5 Data Acquisition and Integration 5.1 Introduction 5.2 Sources of Data 5.3 Variable Types 5.4 Data Roll Up 5.5 Roll Up With Sums, Averages and Counts 5.6 Calculation of the Mode 5.7 Data Integration 6 Integrity Checks 6.1 Introduction 6.2 Comparing Datasets 6.3 Dataset Schema Checks 6.3.2 Variable Types 6.4 Nominal Variables 6.5 Continuous Variables 7 Exploratory Data Analysis 7.1 Introduction 7.2 Common EDA Procedures 7.3 Univariate Statistics 7.4 Variable Distribution 7.5 Detection of Outliers 7.5.4 Notes on Outliers 7.6 Testing Normality 7.7 Cross-tabulation 7.8 Investigating Data Structures 8 Sampling and Partitioning 8.1 Introduction 8.2 Contents of Samples 8.3 Random Sampling 8.4 Balanced Sampling 8.5 Minimum Sample Size 9 Data Transformations 9.1 Raw and Analytical Variables 9.2 Scope of Data Transformations 9.3 Creation of New Variables 9.4 Mapping of Nominal Variables 9.5 Normalization of Continuous Variables 9.6 Changing the Variable Distribution 10 Binning and Reduction of Cardinality 10.1 Introduction 10.2 Cardinality Reduction 10.2.1 The Main Questions 10.2.2 Structured Grouping Methods 10.2.3 Splitting a Dataset 10.2.4 The Main Algorithm 10.2.5 Reduction of Cardinality Using Gini Measure 10.2.6 Limitations and Modifications 10.3 Binning of Continuous Variables 11 Treatment of Missing Values 11.1 Introduction 11.2 Simple Replacement 11.3 Imputing Missing Values 11.3.1 Basic Issues in Multiple Imputation 11.3.2 Patterns of Missingness 11.4 Imputation Methods and Strategy 11.5 SAS Macros for Multiple Imputation Nominal Variables 11.6 Predicting Missing Values 12 Predictive Power and Variable Reduction I 12.1 Introduction 12.2 Metrics of Predictive Power . 12.3 Methods of Variable Reduction 12.4 Variable Reduction : before or during modeling 13 Analysis of Nominal and Ordinal Variables 13.1 Introduction 13.2 Contingency Tables 13.3 Notation and Definitions 13.4 Contingency Tables for Binary Variables 13.5 Contingency Tables for Multi - Category Variables 13.6 Analysis of Ordinal Variables 13.7 Implementation Scenarios 14 Analysis of Continuous Variables 14.1 Introduction 14.2 When is Binning Necessary? 14.3 Measures of Association 14.4 Correlation Coefficients 15 Principal Component Analysis (PCA) 2 15.1 Introduction 15.2 Mathematical Formulations 15.3 Implementing and Using PCA . 15.4 Comments on Using PCA 15.4.1 Number of Principal Components 15.4.2 Success of PCA 15.4.3 Nominal Variables 15.4.4 Dataset Size and Performance 16 Factor Analysis 16.1 Introduction to Factor Analysis 16.2 Relationship between PCA and FA 16.3 Implementation of Factor Analysis 17 Predictive Power and Variable Reduction II 17.1 Introduction 17.2 Data with Binary Dependent Variables 17.3 Nominal IV?s 17.3.2 Ordinal IV?s 17.4 Variable Reduction Strategies 18 Putting it All Together 18.1 Introduction 18.2 The Process of Data Preparation 18.3 Case Study: The Bookstore A Listing of SAS Macros A.1 Copyright and Software License A.2 Dependencies between Macros A.3 Data Acquisition and Integration A.4 Integrity Checks A.5 Exploratory Data Analysis A.6 Sampling and Partitioning A.7 Data Transformations A.8 Binning and Reduction of Cardinality A.9 Treatment of Missing Values A.10 Analysis of Nominal and Ordinal Variables A.11 Analysis of Continuous Variables A.12 Principal Component Analysis
商品描述(中文翻譯)
**描述**
您是否是一位數據挖掘分析師,花費多達80%的時間來確保數據質量,然後準備這些數據以開發和部署預測模型?您是否發現有很多關於數據挖掘理論和概念的文獻,但在開發良好的挖掘視圖方面卻找不到多少“如何做”的資訊?您是否像大多數分析師一樣,在使用SAS準備數據?本書旨在填補這一空白,成為您實用食譜的來源。它介紹了一個數據準備過程的框架,並詳細說明了每個步驟在SAS中的實現。此外,數據挖掘建模的商業應用要求您處理大量變數,通常是數百甚至數千個。因此,本書專門用幾個章節來探討數據轉換和變數選擇的方法。
**目錄**
內容
1. 介紹
1.1 數據挖掘過程
1.2 數據挖掘方法論
1.3 挖掘視圖
1.4 評分視圖
1.5 數據挖掘軟體的注意事項
2. 任務與數據流
2.1 數據挖掘任務
2.2 數據挖掘能力
2.3 數據流
2.4 變數類型
2.5 挖掘視圖與評分視圖
2.6 數據準備步驟
3. 數據挖掘建模技術回顧
3.1 介紹
3.2 迴歸模型
3.3 決策樹
3.4 神經網絡
3.5 聚類分析
3.6 關聯規則
3.7 時間序列分析
3.8 支持向量機
4. SAS宏:快速入門
4.1 介紹:為什麼使用宏
4.2 基礎 - 宏及其變數
4.3 進行計算
4.4 程式邏輯
4.5 字串處理
4.6 調用其他宏的宏
4.7 常見宏模式及注意事項
4.8 接下來的步驟
5. 數據獲取與整合
5.1 介紹
5.2 數據來源
5.3 變數類型
5.4 數據彙總
5.5 使用總和、平均數和計數的彙總
5.6 模式計算
5.7 數據整合
6. 完整性檢查
6.1 介紹
6.2 比較數據集
6.3 數據集架構檢查
6.3.2 變數類型
6.4 名義變數
6.5 連續變數
7. 探索性數據分析
7.1 介紹
7.2 常見的EDA程序
7.3 單變數統計
7.4 變數分佈
7.5 異常值檢測
7.5.4 異常值的注意事項
7.6 正態性檢測
7.7 交叉表
7.8 調查數據結構
8. 抽樣與分區
8.1 介紹
8.2 樣本內容
8.3 隨機抽樣
8.4 平衡抽樣
8.5 最小樣本大小
9. 數據轉換
9.1 原始與分析變數
9.2 數據轉換的範圍
9.3 新變數的創建
9.4 名義變數的映射
9.5 連續變數的正規化
9.6 變數分佈的變更
10. 分箱與基數減少
10.1 介紹
10.2 基數減少
10.2.1 主要問題
10.2.2 結構化分組方法
10.2.3 拆分數據集
10.2.4 主要算法
10.2.5 使用基尼指數減少基數
10.2.6 限制與修改
10.3 連續變數的分箱
11. 缺失值處理
11.1 介紹
11.2 簡單替代
11.3 補全缺失值
11.3.1 多重插補的基本問題
11.3.2 缺失模式
11.4 補全方法與策略
11.5 用於多重插補名義變數的SAS宏
11.6 預測缺失值
12. 預測能力與變數減少 I
12.1 介紹
12.2 預測能力的指標
12.3 變數減少的方法
12.4 變數減少:建模前或建模期間
13. 名義與序數變數分析
13.1 介紹
13.2 交叉表
13.3 符號與定義
13.4 二元變數的交叉表
13.5 多類別變數的交叉表
13.6 序數變數的分析
13.7 實施場景
14. 連續變數分析
14.1 介紹
14.2 何時需要分箱?
14.3 關聯度量
14.4 相關係數
15. 主成分分析 (PCA)
15.1 介紹
15.2 數學公式
15.3 實施與使用PCA
15.4 使用PCA的評論
15.4.1 主成分的數量
15.4.2 PCA的成功
15.4.3 名義變數
15.4.4 數據集大小與性能
16. 因子分析
16.1 因子分析介紹
16.2 PCA與FA之間的關係
16.3 因子分析的實施
17. 預測能力與變數減少 II
17.1 介紹
17.2 具有二元因變數的數據
17.3 名義自變數
17.3.2 序數自變數
17.4 變數減少策略
18. 整合所有內容
18.1 介紹
18.2 數據準備過程
18.3 案例研究:書店
附錄 A. SAS宏列表
A.1 版權與軟體許可
A.2 宏之間的依賴關係
A.3 數據獲取與整合
A.4 完整性檢查
A.5 探索性數據分析
A.6 抽樣與分區
A.7 數據轉換
A.8 分箱與基數減少
A.9 缺失值處理
A.10 名義與序數變數分析
A.11 連續變數分析
A.12 主成分分析