Large-Scale Graph Processing Using Apache Giraph
暫譯: 使用 Apache Giraph 進行大規模圖形處理
Sakr, Sherif, Orakzai, Faisal Moeen, Abdelaziz, Ibrahim
- 出版商: Springer
- 出版日期: 2018-07-07
- 售價: $2,660
- 貴賓價: 9.5 折 $2,527
- 語言: 英文
- 頁數: 197
- 裝訂: Quality Paper - also called trade paper
- ISBN: 3319837354
- ISBN-13: 9783319837352
海外代購書籍(需單獨結帳)
相關主題
商品描述
This book takes its reader on a journey through Apache Giraph, a popular distributed graph processing platform designed to bring the power of big data processing to graph data. Designed as a step-by-step self-study guide for everyone interested in large-scale graph processing, it describes the fundamental abstractions of the system, its programming models and various techniques for using the system to process graph data at scale, including the implementation of several popular and advanced graph analytics algorithms.
The book is organized as follows: Chapter 1 starts by providing a general background of the big data phenomenon and a general introduction to the Apache Giraph system, its abstraction, programming model and design architecture. Next, chapter 2 focuses on Giraph as a platform and how to use it. Based on a sample job, even more advanced topics like monitoring the Giraph application lifecycle and different methods for monitoring Giraph jobs are explained. Chapter 3 then provides an introduction to Giraph programming, introduces the basic Giraph graph model and explains how to write Giraph programs. In turn, Chapter 4 discusses in detail the implementation of some popular graph algorithms including PageRank, connected components, shortest paths and triangle closing. Chapter 5 focuses on advanced Giraph programming, discussing common Giraph algorithmic optimizations, tunable Giraph configurations that determine the system's utilization of the underlying resources, and how to write a custom graph input and output format. Lastly, chapter 6 highlights two systems that have been introduced to tackle the challenge of large scale graph processing, GraphX and GraphLab, and explains the main commonalities and differences between these systems and Apache Giraph.
This book serves as an essential reference guide for students, researchers and practitioners in the domain of large scale graph processing. It offers step-by-step guidance, with several code examples and the complete source code available in the related github repository. Students will find a comprehensive introduction to and hands-on practice with tackling large scale graph processing problems using the Apache Giraph system, while researchers will discover thorough coverage of the emerging and ongoing advancements in big graph processing systems.
商品描述(中文翻譯)
這本書帶領讀者探索 Apache Giraph,這是一個流行的分散式圖形處理平台,旨在將大數據處理的力量應用於圖形數據。這本書被設計為一個逐步自學的指南,適合所有對大規模圖形處理感興趣的人,描述了系統的基本抽象、其編程模型以及使用該系統處理圖形數據的各種技術,包括幾個流行和先進的圖形分析算法的實現。
本書的組織結構如下:第一章首先提供大數據現象的一般背景以及對 Apache Giraph 系統的簡介,包括其抽象、編程模型和設計架構。接下來,第二章專注於 Giraph 作為一個平台及其使用方法。基於一個示例作業,還解釋了更高級的主題,如監控 Giraph 應用程序的生命週期和不同的監控 Giraph 作業的方法。第三章則介紹了 Giraph 編程,介紹了基本的 Giraph 圖形模型並解釋如何編寫 Giraph 程序。接著,第四章詳細討論了一些流行圖形算法的實現,包括 PageRank、連通組件、最短路徑和三角形閉合。第五章專注於高級 Giraph 編程,討論常見的 Giraph 算法優化、可調整的 Giraph 配置(這些配置決定了系統對底層資源的利用)以及如何編寫自定義的圖形輸入和輸出格式。最後,第六章強調了兩個為應對大規模圖形處理挑戰而引入的系統,GraphX 和 GraphLab,並解釋了這些系統與 Apache Giraph 之間的主要共通點和差異。
這本書是大規模圖形處理領域學生、研究人員和從業者的重要參考指南。它提供逐步的指導,包含多個代碼示例,完整的源代碼可在相關的 GitHub 倉庫中獲得。學生將會發現對使用 Apache Giraph 系統解決大規模圖形處理問題的全面介紹和實踐,而研究人員則會發現對新興和持續進展的大圖形處理系統的深入覆蓋。
作者簡介
Sherif Sakr is currently a professor of computer and information science in the Health Informatics department at King Saud bin Abdulaziz University for Health Sciences. He is also affiliated with the University of New South Wales and DATA61/CSIRO (formerly NICTA). He had held visiting appointments in several academic and research institutes including Microsoft Research (2011), Alcatel-Lucent Bell Labs (2012), Humboldt University of Berlin (2015), University of Zurich (2016) and TU Dresden (2016). In 2013, Sherif has been awarded the Stanford Innovation and Entrepreneurship Certificate.
Faisal Moeen Orakzai is a joint PhD candidate at Université Libre de Bruxelles (ULB) Belgium and Aalborg University (AAU) Denmark. In addition to doing research, he works as a consultant and helps companies setting up their distributed data processing architectures and pipelines. He is a Big Data management and analytics enthusiast and currently working on a Giraph based framework for spatio-temporal pattern mining.
Ibrahim Abdelaziz is a Computer Science PhD candidate at King Abdullah University of Science and Technology (KAUST). Prior to joining KAUST, he used to work on pattern recognition and information retrieval in several research organizations in Egypt. His current research interests are Data Mining over large scale graphs, Distributed Systems and Machine Learning.
Zuhair Khayyat is a PhD candidate in the InfoCloud group at King Abdullah University of Science and Technology (KAUST) focusing on Big Data, Analytics and Graphs.
作者簡介(中文翻譯)
Sherif Sakr 目前是沙烏地阿拉伯國王沙烏德本阿卜杜拉齊茲健康科學大學健康資訊學系的計算機與資訊科學教授。他同時也與新南威爾士大學及 DATA61/CSIRO(前身為 NICTA)有關聯。他曾在多個學術和研究機構擔任訪問職位,包括微軟研究院(2011年)、阿爾卡特-朗訊貝爾實驗室(2012年)、德國洪堡大學(2015年)、蘇黎世大學(2016年)和德累斯頓工業大學(2016年)。在2013年,Sherif 獲得了史丹佛創新與創業證書。
Faisal Moeen Orakzai 是比利時布魯塞爾自由大學(ULB)和丹麥奧爾堡大學(AAU)的聯合博士候選人。除了進行研究外,他還擔任顧問,幫助公司建立其分散式數據處理架構和管道。他是一位大數據管理和分析的熱衷者,目前正在研究基於 Giraph 的時空模式挖掘框架。
Ibrahim Abdelaziz 是沙烏地阿拉伯國王阿卜杜拉科技大學(KAUST)的計算機科學博士候選人。在加入 KAUST 之前,他曾在埃及的多個研究機構從事模式識別和信息檢索的工作。他目前的研究興趣包括大規模圖形上的數據挖掘、分散式系統和機器學習。
Zuhair Khayyat 是沙烏地阿拉伯國王阿卜杜拉科技大學(KAUST)InfoCloud 團隊的博士候選人,專注於大數據、分析和圖形。