Optimizing Hadoop for MapReduce
暫譯: 優化 Hadoop 以提升 MapReduce 效能

Khaled Tannir

  • 出版商: Packt Publishing
  • 出版日期: 2014-02-21
  • 售價: $1,660
  • 貴賓價: 9.5$1,577
  • 語言: 英文
  • 頁數: 120
  • 裝訂: Paperback
  • ISBN: 1783285656
  • ISBN-13: 9781783285655
  • 相關分類: Hadoop分散式架構
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

This book is the perfect introduction to sophisticated concepts in MapReduce and will ensure you have the knowledge to optimize job performance. This is not an academic treatise; it's an example-driven tutorial for the real world.

Overview

  • Optimize your MapReduce job performance
  • Identify your Hadoop cluster's weaknesses
  • Tune your MapReduce configuration

In Detail

MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation.

This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster's node resources to run MapReduce jobs optimally.

This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your cluster's node resources.

Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression.

The book ends with best practices and recommendations on how to use your Hadoop cluster optimally.

What you will learn from this book

  • Learn about the factors that affect MapReduce performance
  • Utilize the Hadoop MapReduce performance counters to identify resource bottlenecks
  • Size your Hadoop cluster's nodes
  • Set the number of mappers and reducers correctly
  • Optimize mapper and reducer task throughput and code size using compression and Combiners
  • Understand the various tuning properties and best practices to optimize clusters

Approach

This book is an example-based tutorial that deals with optimizing MapReduce job performance.

Who this book is written for

If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.

商品描述(中文翻譯)

這本書是進入 MapReduce 複雜概念的完美入門,將確保您擁有優化工作性能所需的知識。這不是一篇學術論文;而是一個以範例為驅動的實用教程。

概述
- 優化您的 MapReduce 工作性能
- 確認您的 Hadoop 集群的弱點
- 調整您的 MapReduce 配置

詳細內容
MapReduce 是 Hadoop MapReduce 引擎用來在集群中分配工作的分佈系統,通過在較小的數據集上並行工作來實現。它在許多應用中都非常有用,包括分佈式基於模式的搜索、分佈式排序、網頁鏈接圖反轉、每個主機的術語向量、網頁訪問日誌統計、倒排索引構建、文檔聚類、機器學習和統計機器翻譯。

這本書將帶您了解進階的 MapReduce 概念,並教您從識別影響 MapReduce 工作性能的因素到調整 MapReduce 配置的所有知識。基於實際經驗,這本書將幫助您充分利用集群的節點資源,以最佳方式運行 MapReduce 工作。

本書詳細介紹了 Hadoop MapReduce 工作性能優化的過程。通過一系列清晰且實用的步驟,它將幫助您充分利用集群的節點資源。

從 MapReduce 的工作原理和影響 MapReduce 性能的因素開始,您將獲得 Hadoop 指標和幾個性能監控工具的概述。接下來,您將探索幫助您識別資源瓶頸、檢查集群健康狀況和確定 Hadoop 集群大小的性能計數器。您還將學習如何通過使用 Combiner 和壓縮來優化 map 和 reduce 任務。

本書以最佳實踐和建議結束,告訴您如何最佳地使用您的 Hadoop 集群。

您將從這本書中學到的內容
- 了解影響 MapReduce 性能的因素
- 利用 Hadoop MapReduce 性能計數器識別資源瓶頸
- 確定您的 Hadoop 集群的節點大小
- 正確設置 mapper 和 reducer 的數量
- 使用壓縮和 Combiner 優化 mapper 和 reducer 任務的吞吐量和代碼大小
- 理解各種調整屬性和最佳實踐以優化集群

方法
這本書是一個基於範例的教程,專注於優化 MapReduce 工作性能。

本書的讀者對象
如果您是 Hadoop 管理員、開發人員、MapReduce 使用者或初學者,這本書是您希望優化集群和應用程序的最佳選擇。擁有創建 MapReduce 應用程序的先前知識並不是必需的,但將有助於您更好地理解 MapReduce 類模板代碼的概念和片段。