Modern Algorithms of Cluster Analysis (Studies in Big Data)
暫譯: 現代聚類分析演算法（大數據研究）

Name: Modern Algorithms of Cluster Analysis (Studies in Big Data)
Price: 7429 TWD
Availability: OnlineOnly
Author: Slawomir Wierzchoń, Mieczyslaw Klopotek
ISBN: 3319693077

Slawomir Wierzchoń, Mieczyslaw Klopotek

出版商: Springer
出版日期: 2018-01-29
售價: $7,820
貴賓價: 9.5 折 $7,429
語言: 英文
頁數: 421
裝訂: Hardcover
ISBN: 3319693077
ISBN-13: 9783319693071
相關分類: 大數據 Big-data、Algorithms-data-structures

海外代購書籍(需單獨結帳)

商品描述

This book provides the reader with a basic understanding of the formal concepts of the cluster, clustering, partition, cluster analysis etc.

The book explains feature-based, graph-based and spectral clustering methods and discusses their formal similarities and differences. Understanding the related formal concepts is particularly vital in the epoch of Big Data; due to the volume and characteristics of the data, it is no longer feasible to predominantly rely on merely viewing the data when facing a clustering problem.

Usually clustering involves choosing similar objects and grouping them together. To facilitate the choice of similarity measures for complex and big data, various measures of object similarity, based on quantitative (like numerical measurement results) and qualitative features (like text), as well as combinations of the two, are described, as well as graph-based similarity measures for (hyper) linked objects and measures for multilayered graphs. Numerous variants demonstrating how such similarity measures can be exploited when defining clustering cost functions are also presented.

In addition, the book provides an overview of approaches to handling large collections of objects in a reasonable time. In particular, it addresses grid-based methods, sampling methods, parallelization via Map-Reduce, usage of tree-structures, random projections and various heuristic approaches, especially those used for community detection.

商品描述(中文翻譯)

這本書為讀者提供了對於叢集、聚類、分區、叢集分析等正式概念的基本理解。

本書解釋了基於特徵、基於圖形和光譜聚類方法，並討論它們的正式相似性和差異。在大數據時代，理解相關的正式概念尤其重要；由於數據的體量和特性，面對聚類問題時，僅僅依賴於查看數據已不再可行。

通常，聚類涉及選擇相似的對象並將它們分組。為了促進對於複雜和大數據的相似性度量的選擇，書中描述了各種基於定量（如數值測量結果）和定性特徵（如文本）的對象相似性度量，以及兩者的組合，還有針對（超）鏈接對象的基於圖形的相似性度量和多層圖的度量。書中還展示了多種變體，說明如何在定義聚類成本函數時利用這些相似性度量。

此外，本書提供了在合理時間內處理大量對象集合的方法概述。特別是，它探討了基於網格的方法、抽樣方法、通過 Map-Reduce 的並行化、樹狀結構的使用、隨機投影以及各種啟發式方法，特別是那些用於社群檢測的方法。