Apache Spark Graph Processing
暫譯: Apache Spark 圖形處理

Rindra Ramamonjison

  • 出版商: Packt Publishing
  • 出版日期: 2015-09-10
  • 售價: $1,670
  • 貴賓價: 9.5$1,587
  • 語言: 英文
  • 頁數: 148
  • 裝訂: Paperback
  • ISBN: 1784391808
  • ISBN-13: 9781784391805
  • 相關分類: Spark
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Build, process and analyze large-scale graph data effectively with Spark

About This Book

  • Find solutions for every stage of data processing from loading and transforming graph data to
  • Improve the scalability of your graphs with a variety of real-world applications with complete Scala code.
  • A concise guide to processing large-scale networks with Apache Spark.

Who This Book Is For

This book is for data scientists and big data developers who want to learn the processing and analyzing graph datasets at scale. Basic programming experience with Scala is assumed. Basic knowledge of Spark is assumed.

What You Will Learn

  • Write, build and deploy Spark applications with the Scala Build Tool.
  • Build and analyze large-scale network datasets
  • Analyze and transform graphs using RDD and graph-specific operations
  • Implement new custom graph operations tailored to specific needs.
  • Develop iterative and efficient graph algorithms using message aggregation and Pregel abstraction
  • Extract subgraphs and use it to discover common clusters
  • Analyze graph data and solve various data science problems using real-world datasets.

In Detail

Apache Spark is the next standard of open-source cluster-computing engine for processing big data. Many practical computing problems concern large graphs, like the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. Apache Spark GraphX API combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework.

This book will teach the user to do graphical programming in Apache Spark, apart from an explanation of the entire process of graphical data analysis. You will journey through the creation of graphs, its uses, its exploration and analysis and finally will also cover the conversion of graph elements into graph structures.

This book begins with an introduction of the Spark system, its libraries and the Scala Build Tool. Using a hands-on approach, this book will quickly teach you how to install and leverage Spark interactively on the command line and in a standalone Scala program. Then, it presents all the methods for building Spark graphs using illustrative network datasets. Next, it will walk you through the process of exploring, visualizing and analyzing different network characteristics. This book will also teach you how to transform raw datasets into a usable form. In addition, you will learn powerful operations that can be used to transform graph elements and graph structures. Furthermore, this book also teaches how to create custom graph operations that are tailored for specific needs with efficiency in mind. The later chapters of this book cover more advanced topics such as clustering graphs, implementing graph-parallel iterative algorithms and learning methods from graph data.

Style and approach

A step-by-step guide that will walk you through the key ideas and techniques for processing big graph data at scale, with practical examples that will ensure an overall understanding of the concepts of Spark.

商品描述(中文翻譯)

**有效地使用 Spark 建立、處理和分析大規模圖形數據**

## 本書介紹

- 尋找數據處理每個階段的解決方案,從加載和轉換圖形數據開始
- 使用完整的 Scala 代碼改善圖形的可擴展性,並應用於各種實際案例
- 一本簡明的指南,介紹如何使用 Apache Spark 處理大規模網絡

## 本書適合誰

本書適合希望學習大規模圖形數據集處理和分析的數據科學家和大數據開發人員。假設讀者具備基本的 Scala 編程經驗,並對 Spark 有基本的了解。

## 您將學到什麼

- 使用 Scala Build Tool 編寫、構建和部署 Spark 應用程序
- 建立和分析大規模網絡數據集
- 使用 RDD 和圖形特定操作分析和轉換圖形
- 實現針對特定需求的自定義圖形操作
- 使用消息聚合和 Pregel 抽象開發迭代和高效的圖形算法
- 提取子圖並用於發現共同的聚類
- 使用實際數據集分析圖形數據並解決各種數據科學問題

## 詳細內容

Apache Spark 是處理大數據的下一個開源集群計算引擎標準。許多實際計算問題涉及大型圖形,例如網頁圖和各種社交網絡。這些圖形的規模——在某些情況下達到數十億的頂點和數萬億的邊——對其高效處理提出了挑戰。Apache Spark GraphX API 通過在 Spark 數據並行框架內有效地表達圖形計算,結合了數據並行和圖形並行系統的優勢。

本書將教導用戶在 Apache Spark 中進行圖形編程,並解釋整個圖形數據分析的過程。您將體驗圖形的創建、用途、探索和分析,最後還將涵蓋將圖形元素轉換為圖形結構的過程。

本書從介紹 Spark 系統、其庫和 Scala Build Tool 開始。通過實踐的方法,本書將迅速教您如何在命令行和獨立的 Scala 程序中互動式安裝和利用 Spark。接著,將介紹使用示例網絡數據集構建 Spark 圖形的所有方法。然後,將引導您探索、可視化和分析不同的網絡特徵。本書還將教您如何將原始數據集轉換為可用的形式。此外,您將學習可以用來轉換圖形元素和圖形結構的強大操作。此外,本書還教您如何創建針對特定需求的高效自定義圖形操作。本書的後面幾章涵蓋更高級的主題,例如圖形聚類、實現圖形並行迭代算法和從圖形數據中學習方法。

## 風格與方法

一本逐步指南,將引導您了解處理大規模圖形數據的關鍵思想和技術,並提供實際示例以確保對 Spark 概念的全面理解。

最後瀏覽商品 (20)