Learning Big Data with Amazon Elastic MapReduce
暫譯: 使用 Amazon Elastic MapReduce 學習大數據
Amarkant Singh, Vijay Rayapati
- 出版商: Packt Publishing
- 出版日期: 2014-10-10
- 售價: $1,880
- 貴賓價: 9.5 折 $1,786
- 語言: 英文
- 頁數: 242
- 裝訂: Paperback
- ISBN: 1782173439
- ISBN-13: 9781782173434
-
相關分類:
分散式架構、大數據 Big-data
海外代購書籍(需單獨結帳)
商品描述
About This Book
- Learn how to solve big data problems using Apache Hadoop
- Use Amazon Elastic MapReduce to create and maintain cluster infrastructure for big data analytics
- A step-by-step guide exploring the vast set of services provided by Amazon on the cloud
Who This Book Is For
This book is aimed at developers and system administrators who want to learn about Big Data analysis using Amazon Elastic MapReduce. Basic Java programming knowledge is required. You should be comfortable with using command-line tools. Prior knowledge of AWS, API, and CLI tools is not assumed. Also, no exposure to Hadoop and MapReduce is expected.
What You Will Learn
- Create and access your account on AWS and learn about its various services
- Launch a machine on the cloud infrastructure of AWS, get login credentials, and communicate with that machine
- Learn about the logical dataflow of MapReduce and how it uses distributed computing effectively
- Understand the benefits of EMR over a local Hadoop cluster
- Discover the best practices that should be kept in mind while planning and executing a cluster/job on EMR
- Launch a cluster on Amazon EMR, submit the Hello World wordcount job for processing, and download and view the results
- Execute jobs on EMR using the two primary methods provided by EMR
In Detail
Amazon Elastic MapReduce is a web service used to process and store vast amount of data, and it is one of the largest Hadoop operators in the world. With the increase in the amount of data generated and collected by many businesses and the arrival of cost-effective cloud-based solutions for distributed computing, the feasibility to crunch large amounts of data to get deep insights within a short span of time has increased greatly.
This book will get you started with AWS so that you can quickly create your own account and explore the services provided, many of which you might be delighted to use. This book covers the architectural details of the MapReduce framework, Apache Hadoop, various job models on EMR, how to manage clusters on EMR, and the command-line tools available with EMR. Each chapter builds on the knowledge of the previous one, leading to the final chapter where you will learn about solving a real-world use case using Apache Hadoop and EMR. This book will, therefore, get you up and running with major Big Data technologies quickly and efficiently.
商品描述(中文翻譯)
輕鬆學習、建構並執行真實世界的大數據解決方案,使用 Hadoop 和 AWS EMR
本書介紹
- 學習如何使用 Apache Hadoop 解決大數據問題
- 使用 Amazon Elastic MapReduce 創建和維護大數據分析的叢集基礎設施
- 逐步指南,探索 Amazon 在雲端提供的廣泛服務
本書適合誰閱讀
本書針對希望學習使用 Amazon Elastic MapReduce 進行大數據分析的開發人員和系統管理員。需要具備基本的 Java 程式設計知識。您應該能夠熟練使用命令行工具。不假設您對 AWS、API 和 CLI 工具有先前的了解。此外,對 Hadoop 和 MapReduce 也不需要有任何接觸。
您將學到什麼
- 創建並訪問您的 AWS 帳戶,了解其各種服務
- 在 AWS 的雲端基礎設施上啟動一台機器,獲取登錄憑證並與該機器進行通信
- 了解 MapReduce 的邏輯數據流及其如何有效利用分散式計算
- 理解 EMR 相對於本地 Hadoop 叢集的優勢
- 發現規劃和執行 EMR 上的叢集/作業時應考慮的最佳實踐
- 在 Amazon EMR 上啟動叢集,提交 Hello World 字數計算作業進行處理,並下載和查看結果
- 使用 EMR 提供的兩種主要方法在 EMR 上執行作業
詳細內容
Amazon Elastic MapReduce 是一種用於處理和存儲大量數據的網路服務,它是全球最大的 Hadoop 操作商之一。隨著許多企業生成和收集的數據量增加,以及成本效益高的雲端分散式計算解決方案的出現,快速處理大量數據以獲得深入見解的可行性大大提高。
本書將幫助您快速入門 AWS,以便您能夠迅速創建自己的帳戶並探索提供的服務,其中許多服務您可能會感到高興地使用。本書涵蓋了 MapReduce 框架的架構細節、Apache Hadoop、EMR 上的各種作業模型、如何管理 EMR 上的叢集,以及 EMR 提供的命令行工具。每一章都基於前一章的知識,最終章節將教您如何使用 Apache Hadoop 和 EMR 解決真實世界的使用案例。因此,本書將幫助您快速有效地掌握主要的大數據技術。