Individual and Collective Graph Mining: Principles, Algorithms, and Applications (Synthesis Lectures on Data Mining and Knowledge Discovery)
Danai Koutra, Christos Faloutsos
- 出版商: Morgan & Claypool
- 出版日期: 2017-10-26
- 售價: $2,830
- 貴賓價: 9.5 折 $2,689
- 語言: 英文
- 頁數: 208
- 裝訂: Paperback
- ISBN: 1681730391
- ISBN-13: 9781681730394
-
相關分類:
Algorithms-data-structures、Data-mining
海外代購書籍(需單獨結帳)
相關主題
商品描述
Graphs naturally represent information ranging from links between web pages, to communication in email networks, to connections between neurons in our brains. These graphs often span billions of nodes and interactions between them. Within this deluge of interconnected data, how can we find the most important structures and summarize them? How can we efficiently visualize them? How can we detect anomalies that indicate critical events, such as an attack on a computer system, disease formation in the human brain, or the fall of a company?
This book presents scalable, principled discovery algorithms that combine globality with locality to make sense of one or more graphs. In addition to fast algorithmic methodologies, we also contribute graph-theoretical ideas and models, and real-world applications in two main areas:
•Individual Graph Mining: We show how to interpretably summarize a single graph by identifying its important graph structures. We complement summarization with inference, which leverages information about few entities (obtained via summarization or other methods) and the network structure to efficiently and effectively learn information about the unknown entities.
•Collective Graph Mining: We extend the idea of individual-graph summarization to time-evolving graphs, and show how to scalably discover temporal patterns. Apart from summarization, we claim that graph similarity is often the underlying problem in a host of applications where multiple graphs occur (e.g., temporal anomaly detection, discovery of behavioral patterns), and we present principled, scalable algorithms for aligning networks and measuring their similarity.
The methods that we present in this book leverage techniques from diverse areas, such as matrix algebra, graph theory, optimization, information theory, machine learning, finance, and social science, to solve real-world problems. We present applications of our exploration algorithms to massive datasets, including a Web graph of 6.6 billion edges, a Twitter graph of 1.8 billion edges, brain graphs with up to 90 million edges, collaboration, peer-to-peer networks, browser logs, all spanning millions of users and interactions.
商品描述(中文翻譯)
圖表自然地代表了從網頁之間的連結,到電子郵件網絡中的通訊,再到我們大腦中神經元之間的連接等各種資訊。這些圖表通常涵蓋數十億個節點和它們之間的互動。在這個互相連接的數據洪流中,我們如何找到最重要的結構並對其進行總結?我們如何高效地視覺化它們?我們如何檢測表明關鍵事件的異常,例如對計算機系統的攻擊、人腦中的疾病形成或公司的倒閉?
本書介紹了可擴展的、有原則的發現算法,將全局性與局部性結合起來理解一個或多個圖表。除了快速的算法方法論外,我們還提出了圖論思想和模型,以及兩個主要領域的實際應用:
• 個別圖表挖掘:我們展示了如何通過識別重要的圖結構來對單個圖表進行可解釋的總結。我們通過推理來補充總結,該推理利用有關少數實體的信息(通過總結或其他方法獲得)和網絡結構,以高效且有效地學習有關未知實體的信息。
• 集體圖表挖掘:我們將個別圖表總結的思想擴展到時間演化圖表,並展示如何可擴展地發現時間模式。除了總結之外,我們認為圖表相似性通常是多個圖表出現的基本問題(例如,時間異常檢測、行為模式發現),我們提出了有原則的、可擴展的網絡對齊和相似性測量算法。
本書中介紹的方法利用了來自不同領域的技術,例如矩陣代數、圖論、優化、信息論、機器學習、金融和社會科學,來解決現實世界的問題。我們展示了將我們的探索算法應用於大型數據集,包括66億邊的網絡圖、18億邊的Twitter圖、多達9000萬邊的腦圖,以及涵蓋數百萬用戶和互動的協作、點對點網絡、瀏覽器日誌等。