商品描述
SAS Enterprise Miner streamlines the data mining process to create highly accurate predictive and descriptive models based on analysis of vast amounts of data from across an enterprise. Data mining is applicable in a variety of industries and provides methodologies for such diverse business problems as fraud detection, householding, customer retention and attrition, database marketing, market segmentation, risk analysis, affinity analysis, customer satisfaction, bankruptcy prediction, and portfolio analysis. In SAS Enterprise Miner, the data mining process has the following (SEMMA) steps: • Sample the data by creating one or more data sets. The sample should be large enough to contain significant information, yet small enough to process. This step includes the use of data preparation tools for data import, merge, append, and filter, as well as statistical sampling techniques. • Explore the data by searching for relationships, trends, and anomalies in order to gain understanding and ideas. This step includes the use of tools for statistical reporting and graphical exploration, variable selection methods, and variable clustering. • Modify the data by creating, selecting, and transforming the variables to focus the model selection process. This step includes the use of tools for defining transformations, missing value handling, value recoding, and interactive binning. • Model the data by using the analytical tools to train a statistical or machine learning model to reliably predict a desired outcome. This step includes the use of techniques such as linear and logistic regression, decision trees, neural networks, partial least squares, LARS and LASSO, nearest neighbor, and importing models defined by other users or even outside SAS Enterprise Miner. • Assess the data by evaluating the usefulness and reliability of the findings from the data mining process. This step includes the use of tools for comparing models and computing new fit statistics, cutoff analysis, decision support, report generation, and score code management. You might or might not include all of the SEMMA steps in an analysis, and it might be necessary to repeat one or more of the steps several times before you are satisfied with the results. After you have completed the SEMMA steps, you can apply a scoring formula from one or more champion models to new data that might or might not contain the target variable. Scoring new data that is not available at the time of model training is the goal of most data mining problems. Furthermore, advanced visualization tools enable you to quickly and easily examine large amounts of data in multidimensional histograms and to graphically compare modeling results.
商品描述(中文翻譯)
SAS Enterprise Miner 簡化了資料挖掘過程,以根據對企業內大量資料的分析來創建高度準確的預測和描述模型。資料挖掘適用於各種行業,並提供針對多樣化商業問題的方法論,例如詐騙檢測、家庭整合、客戶保留與流失、資料庫行銷、市場細分、風險分析、親和分析、客戶滿意度、破產預測和投資組合分析。在 SAS Enterprise Miner 中,資料挖掘過程包含以下 (SEMMA) 步驟:
• 取樣資料,創建一個或多個資料集。樣本應足夠大以包含重要資訊,但又要小到可以處理。這一步驟包括使用資料準備工具進行資料匯入、合併、附加和過濾,以及統計取樣技術。
• 探索資料,尋找關係、趨勢和異常,以獲得理解和想法。這一步驟包括使用統計報告和圖形探索工具、變數選擇方法和變數聚類。
• 修改資料,創建、選擇和轉換變數,以聚焦模型選擇過程。這一步驟包括使用定義轉換、缺失值處理、值重編碼和互動分箱的工具。
• 建模資料,使用分析工具訓練統計或機器學習模型,以可靠地預測所需的結果。這一步驟包括使用線性和邏輯回歸、決策樹、神經網絡、偏最小二乘法 (Partial Least Squares)、LARS 和 LASSO、最近鄰居等技術,並導入其他用戶或甚至外部 SAS Enterprise Miner 定義的模型。
• 評估資料,評估資料挖掘過程中發現的有用性和可靠性。這一步驟包括使用比較模型和計算新擬合統計、截止分析、決策支持、報告生成和分數代碼管理的工具。
在分析中,您可能會包含或不包含所有 SEMMA 步驟,並且可能需要重複一個或多個步驟幾次,才能對結果感到滿意。在完成 SEMMA 步驟後,您可以將一個或多個冠軍模型的計分公式應用於可能包含或不包含目標變數的新資料。對於在模型訓練時不可用的新資料進行計分是大多數資料挖掘問題的目標。此外,先進的可視化工具使您能夠快速輕鬆地在多維直方圖中檢查大量資料,並以圖形方式比較建模結果。