Data Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications
暫譯: C++中的資料探勘演算法:現代應用的資料模式與演算法

Timothy Masters

相關主題

商品描述

Discover hidden relationships among the variables in your data, and learn how to exploit these relationships.  This book presents a collection of data-mining algorithms that are effective in a wide variety of prediction and classification applications.  All algorithms include an intuitive explanation of operation, essential equations, references to more rigorous theory, and commented C++ source code.
 
Many of these techniques are recent developments, still not in widespread use.  Others are standard algorithms given a fresh look.  In every case, the focus is on practical applicability, with all code written in such a way that it can easily be included into any program.  The Windows-based DATAMINE program lets you experiment with the techniques before incorporating them into your own work.
 
What you'll learn
  • Monte-Carlo permutation tests provide statistically sound assessment of relationships present in your data.
  • Combinatorially symmetric cross validation reveals whether your model has true power or has just learned noise by overfitting the data.
  • Feature weighting as regularized energy-based learning ranks variables according to their predictive power when there is too little data for traditional methods.
  • The eigenstructure of a dataset enables clustering of variables into groups that exist only within meaningful subspaces of the data.
  • Plotting regions of the variable space where there is disagreement between marginal and actual densities, or where contribution to mutual information is high, provides visual insight into anomalous relationships.
 
Who this book is for
 
The techniques presented in this book and in the DATAMINE program will be useful to anyone interested in discovering and exploiting relationships among variables.  Although all code examples are written in C++, the algorithms are described in sufficient detail that they can easily be programmed in any language.

商品描述(中文翻譯)

發現您數據中變數之間的隱藏關係,並學習如何利用這些關係。本書介紹了一系列在各種預測和分類應用中有效的數據挖掘算法。所有算法都包括直觀的操作解釋、基本方程式、對更嚴謹理論的參考,以及註解的 C++ 原始碼。

這些技術中的許多是最近的發展,仍未廣泛使用。其他則是標準算法,給予了新的視角。在每一種情況下,重點都在於實際應用性,所有代碼都以易於納入任何程序的方式編寫。基於 Windows 的 DATAMINE 程序讓您在將這些技術納入自己的工作之前進行實驗。

您將學到的內容:

- 蒙地卡羅置換檢驗 提供對您數據中存在的關係的統計上可靠的評估。
- 組合對稱交叉驗證 揭示您的模型是否具有真正的能力,或僅僅是通過過擬合數據學習了噪音。
- 特徵加權作為正則化的基於能量的學習 在數據不足以使用傳統方法時,根據預測能力對變數進行排名。
- 數據集的特徵結構使變數能夠聚類成僅存在於數據的有意義子空間中的組。
- 繪製邊際密度與實際密度之間存在不一致的變數空間區域,或對互信息的貢獻較高的區域,提供對異常關係的視覺洞察。

本書適合誰:

本書中介紹的技術以及 DATAMINE 程序將對任何有興趣發現和利用變數之間關係的人有用。雖然所有代碼示例均以 C++ 編寫,但算法的描述足夠詳細,可以輕鬆地用任何語言編程。