Big Data for Chimps: A Guide to Massive-Scale Data Processing in Practice (Paperback) (大數據與猩猩:實踐中的大規模數據處理指南)

Philip Kromer, Russell Jurney

  • 出版商: O'Reilly
  • 出版日期: 2015-11-17
  • 定價: $1,320
  • 售價: 8.0$1,056
  • 語言: 英文
  • 頁數: 220
  • 裝訂: Paperback
  • ISBN: 1491923946
  • ISBN-13: 9781491923948
  • 相關分類: 大數據 Big-data
  • 立即出貨 (庫存 < 4)

相關主題

商品描述

Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a practical, actionable view of big data by working with real data and real problems.

Perfect for beginners, this book’s approach will also appeal to experienced practitioners who want to brush up on their skills. Part I explains how Hadoop and MapReduce work, while Part II covers many analytic patterns you can use to process any data. As you work through several exercises, you’ll also learn how to use Apache Pig to process data.

  • Learn the necessary mechanics of working with Hadoop, including how data and computation move around the cluster
  • Dive into map/reduce mechanics and build your first map/reduce job in Python
  • Understand how to run chains of map/reduce jobs in the form of Pig scripts
  • Use a real-world dataset—baseball performance statistics—throughout the book
  • Work with examples of several analytic patterns, and learn when and where you might use them

商品描述(中文翻譯)

在大量事件流中尋找模式可能很困難,但學習如何找到它們並不一定困難。這本獨特的實踐指南將向您展示如何使用簡單、有趣且優雅的工具來解決大規模數據處理中的這個問題以及其他許多問題,這些工具利用了Apache Hadoop。通過使用真實數據和真實問題,您將獲得對大數據的實用、可操作的視角。

這本書的方法適合初學者,同時也適合希望提升技能的經驗豐富的從業人員。第一部分解釋了Hadoop和MapReduce的工作原理,而第二部分則涵蓋了許多分析模式,您可以使用這些模式來處理任何數據。通過完成幾個練習,您還將學習如何使用Apache Pig來處理數據。

本書包括以下內容:
- 學習使用Hadoop的必要機制,包括數據和計算在集群中的移動方式
- 深入了解map/reduce的機制,並使用Python構建第一個map/reduce作業
- 理解如何運行以Pig腳本形式的map/reduce作業鏈
- 在整本書中使用真實世界的數據集——棒球表現統計數據
- 使用多個分析模式的示例,並學習何時何地使用它們