Apache Flume: Distributed Log Collection for Hadoop (What You Need to Know)
暫譯: Apache Flume:Hadoop 的分散式日誌收集(您需要知道的事項)
Steve Hoffman
- 出版商: Packt Publishing
- 出版日期: 2013-07-04
- 售價: $1,740
- 貴賓價: 9.5 折 $1,653
- 語言: 英文
- 頁數: 108
- 裝訂: Paperback
- ISBN: 1782167919
- ISBN-13: 9781782167914
-
相關分類:
Hadoop
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
$720$562 -
$320$272 -
$750$638 -
$1,980$1,881 -
$680$537 -
$950$808 -
$520$411 -
$580$458 -
$780$764 -
$680$537 -
$650$514 -
$1,130$961 -
$480$408 -
$290$226 -
$880$695 -
$860$731 -
$940$700 -
$2,340$1,825 -
$650$585 -
$299$209 -
$480$379 -
$480$408 -
$2,320$2,204 -
$520$406 -
$750$593
相關主題
商品描述
If your role includes moving datasets into Hadoop, this book will help you do it more efficiently using Apache Flume. From installation to customization, it's a complete step-by-step guide on making the service work for you.
Overview
- Integrate Flume with your data sources
- Transcode your data en-route in Flume
- Route and separate your data using regular expression matching
- Configure failover paths and load-balancing to remove single points of failure
- Utilize Gzip Compression for files written to HDFS
In Detail
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with many failover and recovery mechanisms.
Apache Flume: Distributed Log Collection for Hadoop covers problems with HDFS and streaming data/logs, and how Flume can resolve these problems. This book explains the generalized architecture of Flume, which includes moving data to/from databases, NO-SQL-ish data stores, as well as optimizing performance. This book includes real-world scenarios on Flume implementation.
Apache Flume: Distributed Log Collection for Hadoop starts with an architectural overview of Flume and then discusses each component in detail. It guides you through the complete installation process and compilation of Flume.
It will give you a heads-up on how to use channels and channel selectors. For each architectural component (Sources, Channels, Sinks, Channel Processors, Sink Groups, and so on) the various implementations will be covered in detail along with configuration options. You can use it to customize Flume to your specific needs. There are pointers given on writing custom implementations as well that would help you learn and implement them.
What you will learn from this book
- Understand the Flume architecture
- Download and install open source Flume from Apache
- Discover when to use a memory or file-backed channel
- Understand and configure the Hadoop File System (HDFS) sink
- Learn how to use sink groups to create redundant data flows
- Configure and use various sources for ingesting data
- Inspect data records and route to different or multiple destinations based on payload content
- Transform data en-route to Hadoop
- Monitor your data flows
Approach
A starter guide that covers Apache Flume in detail.
Who this book is written for
Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators.
商品描述(中文翻譯)
如果您的角色包括將數據集移動到 Hadoop,本書將幫助您更有效地使用 Apache Flume。從安裝到自定義,這是一本完整的逐步指南,幫助您使該服務為您工作。
概述
- 將 Flume 與您的數據源整合
- 在 Flume 中轉碼您的數據
- 使用正則表達式匹配來路由和分隔您的數據
- 配置故障轉移路徑和負載平衡,以消除單點故障
- 利用 Gzip 壓縮寫入 HDFS 的文件
詳細內容
Apache Flume 是一個分散式、可靠且可用的服務,用於有效地收集、聚合和移動大量日誌數據。其主要目標是將數據從應用程序傳送到 Apache Hadoop 的 HDFS。它擁有基於流數據流的簡單且靈活的架構,並且具有強大的容錯能力,擁有多種故障轉移和恢復機制。
《Apache Flume: Distributed Log Collection for Hadoop》涵蓋了 HDFS 和流數據/日誌的問題,以及 Flume 如何解決這些問題。本書解釋了 Flume 的通用架構,包括將數據移動到/從數據庫、類 NO-SQL 的數據存儲,以及優化性能。本書還包括 Flume 實施的實際場景。
《Apache Flume: Distributed Log Collection for Hadoop》首先介紹 Flume 的架構概述,然後詳細討論每個組件。它將指導您完成 Flume 的完整安裝過程和編譯。
本書將讓您了解如何使用通道和通道選擇器。對於每個架構組件(來源、通道、匯、通道處理器、匯組等),將詳細介紹各種實現及其配置選項。您可以使用它來根據您的特定需求自定義 Flume。本書還提供了編寫自定義實現的指導,幫助您學習和實施。
到最後,您應該能夠構建一系列 Flume 代理,將您的流數據和日誌從系統實時傳輸到 Hadoop。
您將從本書中學到的內容
- 了解 Flume 架構
- 從 Apache 下載並安裝開源 Flume
- 知道何時使用內存或文件支持的通道
- 了解並配置 Hadoop 文件系統(HDFS)匯
- 學習如何使用匯組創建冗餘數據流
- 配置和使用各種來源以攝取數據
- 檢查數據記錄並根據有效負載內容路由到不同或多個目的地
- 在傳輸過程中轉換數據到 Hadoop
- 監控您的數據流
方法
一本詳細介紹 Apache Flume 的入門指南。
本書的讀者對象
《Apache Flume: Distributed Log Collection for Hadoop》適合那些負責及時可靠地將數據集移動到 Hadoop 的人,如軟體工程師、數據庫管理員和數據倉庫管理員。