Apache Flume: Distributed Log Collection for Hadoop, 2/e(Paperback)
暫譯: Apache Flume:Hadoop 的分散式日誌收集,第二版(平裝本)
Steve Hoffman
- 出版商: Packt Publishing
- 出版日期: 2015-02-28
- 售價: $1,740
- 貴賓價: 9.5 折 $1,653
- 語言: 英文
- 頁數: 175
- 裝訂: Paperback
- ISBN: 1784392170
- ISBN-13: 9781784392178
-
相關分類:
Hadoop
海外代購書籍(需單獨結帳)
商品描述
Design and implement a series of Flume agents to send streamed data into Hadoop
About This Book
- Construct a series of Flume agents using the Apache Flume service to efficiently collect, aggregate, and move large amounts of event data
- Configure failover paths and load balancing to remove single points of failure
- Use this step-by-step guide to stream logs from application servers to Hadoop's HDFS
Who This Book Is For
If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.
What You Will Learn
- Understand the Flume architecture, and also how to download and install open source Flume from Apache
- Follow along a detailed example of transporting weblogs in Near Real Time (NRT) to Kibana/Elasticsearch and archival in HDFS
- Learn tips and tricks for transporting logs and data in your production environment
- Understand and configure the Hadoop File System (HDFS) Sink
- Use a morphline-backed Sink to feed data into Solr
- Create redundant data flows using sink groups
- Configure and use various sources to ingest data
- Inspect data records and move them between multiple destinations based on payload content
- Transform data en-route to Hadoop and monitor your data flows
In Detail
Apache Flume is a distributed, reliable, and available service used to efficiently collect, aggregate, and move large amounts of log data. It is used to stream logs from application servers to HDFS for ad hoc analysis.
This book starts with an architectural overview of Flume and its logical components. It explores channels, sinks, and sink processors, followed by sources and channels. By the end of this book, you will be fully equipped to construct a series of Flume agents to dynamically transport your stream data and logs from your systems into Hadoop.
A step-by-step book that guides you through the architecture and components of Flume covering different approaches, which are then pulled together as a real-world, end-to-end use case, gradually going from the simplest to the most advanced features.
商品描述(中文翻譯)
**設計與實作一系列 Flume 代理以將串流數據發送至 Hadoop**
## 本書簡介
- 使用 Apache Flume 服務構建一系列 Flume 代理,以有效地收集、聚合和移動大量事件數據
- 配置故障轉移路徑和負載平衡,以消除單點故障
- 使用這本逐步指南將應用伺服器的日誌串流到 Hadoop 的 HDFS
## 本書適合誰
如果您是一位希望學習 Flume 的 Hadoop 程式設計師,以便能夠及時且可重複地將數據集移入 Hadoop,那麼這本書非常適合您。無需具備 Apache Flume 的先前知識,但假設您對 Hadoop 和 Hadoop 檔案系統 (HDFS) 有基本了解。
## 您將學到什麼
- 了解 Flume 架構,以及如何從 Apache 下載和安裝開源 Flume
- 跟隨詳細範例,將網頁日誌以近即時 (NRT) 的方式傳輸到 Kibana/Elasticsearch 並存檔至 HDFS
- 學習在生產環境中傳輸日誌和數據的技巧與竅門
- 了解並配置 Hadoop 檔案系統 (HDFS) Sink
- 使用 morphline 支持的 Sink 將數據輸入 Solr
- 使用 Sink 群組創建冗餘數據流
- 配置並使用各種來源以攝取數據
- 檢查數據記錄並根據有效載荷內容在多個目的地之間移動它們
- 在數據傳輸至 Hadoop 的過程中轉換數據並監控您的數據流
## 詳細內容
Apache Flume 是一個分散式、可靠且可用的服務,用於有效地收集、聚合和移動大量日誌數據。它用於將日誌從應用伺服器串流到 HDFS 以進行即席分析。
本書從 Flume 的架構概述及其邏輯組件開始。它探討了通道、Sink 和 Sink 處理器,接著是來源和通道。在本書結束時,您將完全具備構建一系列 Flume 代理的能力,能夠動態地將您的串流數據和日誌從系統傳輸到 Hadoop。
這是一本逐步的書籍,指導您了解 Flume 的架構和組件,涵蓋不同的方法,然後將這些方法整合為一個真實世界的端到端用例,逐漸從最簡單的功能過渡到最先進的功能。