Learning Hadoop 2
暫譯: 學習 Hadoop 2
Garry Turkington, Gabriele Modena
- 出版商: Packt Publishing
- 出版日期: 2015-01-31
- 售價: $2,200
- 貴賓價: 9.5 折 $2,090
- 語言: 英文
- 頁數: 316
- 裝訂: Paperback
- ISBN: 1783285516
- ISBN-13: 9781783285518
-
相關分類:
Hadoop
海外代購書籍(需單獨結帳)
相關主題
商品描述
Design and implement data processing, lifecycle management, and analytic workflows with the cutting-edge toolbox of Hadoop 2
About This Book
- Construct state-of-the-art applications using higher-level interfaces and tools beyond the traditional MapReduce approach
- Use the unique features of Hadoop 2 to model and analyze Twitter's global stream of user generated data
- Develop a prototype on a local cluster and deploy to the cloud (Amazon Web Services)
Who This Book Is For
If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.
What You Will Learn
- Write distributed applications using the MapReduce framework
- Go beyond MapReduce and process data in real time with Samza and iteratively with Spark
- Familiarize yourself with data mining approaches that work with very large datasets
- Prototype applications on a VM and deploy them to a local cluster or to a cloud infrastructure (Amazon Web Services)
- Conduct batch and real time data analysis using SQL-like tools
- Build data processing flows using Apache Pig and see how it enables the easy incorporation of custom functionality
- Define and orchestrate complex workflows and pipelines with Apache Oozie
- Manage your data lifecycle and changes over time
In Detail
This book introduces you to the world of building data-processing applications with the wide variety of tools supported by Hadoop 2. Starting with the core components of the framework―HDFS and YARN―this book will guide you through how to build applications using a variety of approaches.
You will learn how YARN completely changes the relationship between MapReduce and Hadoop and allows the latter to support more varied processing approaches and a broader array of applications. These include real-time processing with Apache Samza and iterative computation with Apache Spark. Next up, we discuss Apache Pig and the dataflow data model it provides. You will discover how to use Pig to analyze a Twitter dataset.
With this book, you will be able to make your life easier by using tools such as Apache Hive, Apache Oozie, Hadoop Streaming, Apache Crunch, and Kite SDK. The last part of this book discusses the likely future direction of major Hadoop components and how to get involved with the Hadoop community.
商品描述(中文翻譯)
**設計與實作數據處理、生命週期管理及分析工作流程,使用最先進的 Hadoop 2 工具箱**
## 本書簡介
- 使用高階介面和工具建構最先進的應用程式,超越傳統的 MapReduce 方法
- 利用 Hadoop 2 的獨特功能來建模和分析 Twitter 的全球用戶生成數據流
- 在本地叢集上開發原型並部署到雲端(Amazon Web Services)
## 本書適合誰
如果您是系統或應用程式開發人員,對於如何使用 Hadoop 框架解決實際問題感興趣,那麼這本書非常適合您。您應該熟悉 Unix/Linux 命令行介面,並具備一定的 Java 程式語言經驗。熟悉 Hadoop 將是加分項。
## 您將學到什麼
- 使用 MapReduce 框架編寫分散式應用程式
- 超越 MapReduce,使用 Samza 進行即時數據處理,並使用 Spark 進行迭代處理
- 熟悉適用於非常大數據集的數據挖掘方法
- 在虛擬機上原型應用程式,並將其部署到本地叢集或雲端基礎設施(Amazon Web Services)
- 使用類 SQL 工具進行批次和即時數據分析
- 使用 Apache Pig 建立數據處理流程,並了解如何輕鬆整合自定義功能
- 使用 Apache Oozie 定義和協調複雜的工作流程和管道
- 管理您的數據生命週期及隨時間變化的數據
## 詳細內容
本書將帶您進入使用 Hadoop 2 支援的各種工具來構建數據處理應用程式的世界。從框架的核心組件 HDFS 和 YARN 開始,本書將指導您如何使用多種方法構建應用程式。
您將學習 YARN 如何徹底改變 MapReduce 與 Hadoop 之間的關係,並使後者能夠支援更多樣化的處理方法和更廣泛的應用程式。這些包括使用 Apache Samza 進行即時處理和使用 Apache Spark 進行迭代計算。接下來,我們將討論 Apache Pig 及其提供的數據流數據模型。您將發現如何使用 Pig 來分析 Twitter 數據集。
透過本書,您將能夠使用 Apache Hive、Apache Oozie、Hadoop Streaming、Apache Crunch 和 Kite SDK 等工具來簡化您的工作。書的最後部分將討論主要 Hadoop 組件的未來發展方向以及如何參與 Hadoop 社群。