Guide to High Performance Distributed Computing: Case Studies with Hadoop, Scalding and Spark (Computer Communications and Networks)
暫譯: 高效能分散式計算指南:Hadoop、Scalding 與 Spark 的案例研究 (計算機通信與網絡)

K.G. Srinivasa, Anil Kumar Muppalla

相關主題

商品描述

This timely text/reference describes the development and implementation of large-scale distributed processing systems using open source tools and technologies. Comprehensive in scope, the book presents state-of-the-art material on building high performance distributed computing systems, providing practical guidance and best practices as well as describing theoretical software frameworks. Features: describes the fundamentals of building scalable software systems for large-scale data processing in the new paradigm of high performance distributed computing; presents an overview of the Hadoop ecosystem, followed by step-by-step instruction on its installation, programming and execution; Reviews the basics of Spark, including resilient distributed datasets, and examines Hadoop streaming and working with Scalding; Provides detailed case studies on approaches to clustering, data classification and regression analysis; Explains the process of creating a working recommender system using Scalding and Spark.

商品描述(中文翻譯)

這本及時的文本/參考書描述了使用開源工具和技術開發及實施大規模分散式處理系統。該書範圍廣泛,提供了關於構建高性能分散式計算系統的最新材料,並提供實用指導和最佳實踐,同時描述理論軟體框架。特點包括:描述在高性能分散式計算新範式下,構建可擴展軟體系統以進行大規模數據處理的基本原則;概述Hadoop生態系統,並提供逐步的安裝、編程和執行指導;回顧Spark的基本概念,包括彈性分散式數據集,並檢視Hadoop流處理及使用Scalding的工作;提供有關聚類、數據分類和回歸分析方法的詳細案例研究;解釋使用Scalding和Spark創建可運作的推薦系統的過程。