Hadoop Application Architectures (Paperback)
暫譯: Hadoop 應用架構 (平裝本)

Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira

  • 出版商: O'Reilly
  • 出版日期: 2015-08-18
  • 售價: $2,100
  • 貴賓價: 9.5$1,995
  • 語言: 英文
  • 頁數: 400
  • 裝訂: Paperback
  • ISBN: 1491900083
  • ISBN-13: 9781491900086
  • 相關分類: Hadoop
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

相關主題

商品描述

Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case.

To reinforce those lessons, the book’s second section provides detailed examples of architectures used in some of the most commonly found Hadoop applications. Whether you’re designing a new Hadoop application, or planning to integrate Hadoop into your existing data infrastructure, Hadoop Application Architectures will skillfully guide you through the process.

This book covers:

  • Factors to consider when using Hadoop to store and model data
  • Best practices for moving data in and out of the system
  • Data processing frameworks, including MapReduce, Spark, and Hive
  • Common Hadoop processing patterns, such as removing duplicate records and using windowing analytics
  • Giraph, GraphX, and other tools for large graph processing on Hadoop
  • Using workflow orchestration and scheduling tools such as Apache Oozie
  • Near-real-time stream processing with Apache Storm, Apache Spark Streaming, and Apache Flume
  • Architecture examples for clickstream analysis, fraud detection, and data warehousing

商品描述(中文翻譯)

獲得專家指導,架構端到端的數據管理解決方案,使用 Apache Hadoop。雖然許多來源解釋了如何使用 Hadoop 生態系統中的各種組件,但這本實用的書籍將帶您了解將這些組件整合成完整的定制應用程序所需的架構考量,根據您的特定使用案例。

為了加強這些課程,書籍的第二部分提供了在一些最常見的 Hadoop 應用程序中使用的架構的詳細示例。無論您是設計新的 Hadoop 應用程序,還是計劃將 Hadoop 整合到現有的數據基礎設施中,《Hadoop 應用架構》都將熟練地指導您完成這個過程。

本書涵蓋的內容包括:
- 使用 Hadoop 存儲和建模數據時需要考慮的因素
- 在系統中進出數據的最佳實踐
- 數據處理框架,包括 MapReduce、Spark 和 Hive
- 常見的 Hadoop 處理模式,例如刪除重複記錄和使用窗口分析
- 用於在 Hadoop 上進行大型圖形處理的 Giraph、GraphX 和其他工具
- 使用工作流編排和調度工具,如 Apache Oozie
- 使用 Apache Storm、Apache Spark Streaming 和 Apache Flume 進行近實時流處理
- 點擊流分析、欺詐檢測和數據倉儲的架構示例