Instant Apache Hive Essentials How-to

Darren Lee

  • 出版商: Packt Publishing
  • 出版日期: 2013-05-27
  • 售價: $990
  • 貴賓價: 9.5$941
  • 語言: 英文
  • 頁數: 76
  • 裝訂: Paperback
  • ISBN: 1782169474
  • ISBN-13: 9781782169475
  • 相關分類: Hadoop
  • 下單後立即進貨 (約3~4週)

相關主題

商品描述

Leverage your knowledge of SQL to easily write distributed data processing applications on Hadoop using Apache Hive

Overview

  • Learn something new in an Instant! A short, fast, focused guide delivering immediate results
  • Learn to use SQL to write Hadoop jobs
  • Add support for data to Hive in your own file formats
  • Understand how the Hive query processor works to optimize common queries

In Detail

Hadoop provides a robust framework for building distributed applications, but working directly with Hadoop requires writing a lot of code. Adding structure to data and using a higher-level language such as SQL makes working with Hadoop both easier and faster.

"Instant Apache Hive Essentials How-to" contains a series of practical recipes that introduce the power and flexibility of Hive. Starting with your first query, this book will provide step-by-step instructions and behind-the-scenes explanations for how to effectively write MapReduce jobs with SQL.

This book looks at how Hive transforms SQL statements into MapReduce jobs and demonstrates how you can extend Hive to support your own use cases. Its recipes will teach you how to leverage the scale of Hadoop while retaining the benefits of using a structured query language.You will learn how Hive translates a query into MapReduce jobs and explore how to structure your queries for better performance. You will extend Hive to understand your own file formats, simplifying the loading of data into the warehouse. You will finally add your own custom functions to Hive to support whatever use cases you may have.

"Instant Apache Hive Essentials How-to" is a quick introduction for adding Hive to your data toolkit. It is packed with high-level instructions for making Hive work as well as drawing connections to the underlying Hadoop framework to explain how things happen.

What you will learn from this book

  • Start with the basics of loading data and writing your first query
  • Use de-normalized data efficiently by manipulating complex data types
  • Structure your data and queries to take advantage of Hive’s optimizations
  • Bring your own data files to Hive and teach Hive how to understand them
  • Access the specialized functions built-in to Hive to manipulate your data
  • Use Hive streaming to integrate code written in any language into your queries
  • Extend Hive with user-defined functions

Approach

Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks.This book provides quick recipes for using Hive to read data in various formats, efficiently querying this data, and extending Hive with any custom functions you may need to insert your own logic into the data pipeline.

Who this book is written for

This book is written for data analysts and developers who want to use their current knowledge of SQL to be more productive with Hadoop. It assumes that readers are comfortable writing SQL queries and are familiar with Hadoop at the level of the classic WordCount example.

商品描述(中文翻譯)

利用您對 SQL 的了解,輕鬆在 Hadoop 上使用 Apache Hive 編寫分散式數據處理應用程式

概述
- 立即學習新知!一本短小、快速、專注的指南,提供即時結果
- 學習使用 SQL 編寫 Hadoop 工作
- 為 Hive 添加對您自己文件格式的數據支持
- 了解 Hive 查詢處理器如何運作以優化常見查詢

詳細內容
Hadoop 提供了一個強大的框架來構建分散式應用程式,但直接使用 Hadoop 需要編寫大量代碼。為數據添加結構並使用更高級的語言如 SQL,使得使用 Hadoop 變得更簡單且更快速。

《Instant Apache Hive Essentials How-to》包含一系列實用的食譜,介紹 Hive 的強大和靈活性。從您的第一個查詢開始,本書將提供逐步指導和幕後解釋,教您如何有效地使用 SQL 編寫 MapReduce 工作。

本書探討 Hive 如何將 SQL 語句轉換為 MapReduce 工作,並演示如何擴展 Hive 以支持您自己的使用案例。其食譜將教您如何利用 Hadoop 的規模,同時保留使用結構化查詢語言的好處。您將學習 Hive 如何將查詢轉換為 MapReduce 工作,並探索如何結構化查詢以獲得更好的性能。您將擴展 Hive 以理解您自己的文件格式,簡化數據加載到數據倉庫的過程。最後,您將為 Hive 添加自定義函數,以支持您可能有的任何使用案例。

《Instant Apache Hive Essentials How-to》是將 Hive 添加到您的數據工具包的快速入門。它充滿了高級指導,幫助您使 Hive 運作,並與底層的 Hadoop 框架建立聯繫,以解釋事情的運作方式。

您將從本書中學到的內容
- 從加載數據和編寫第一個查詢的基本知識開始
- 通過操作複雜數據類型有效使用去規範化數據
- 結構化您的數據和查詢,以利用 Hive 的優化
- 將您自己的數據文件帶入 Hive,並教 Hive 如何理解它們
- 訪問內建於 Hive 的專用函數以操作您的數據
- 使用 Hive 流式處理將任何語言編寫的代碼整合到您的查詢中
- 使用用戶定義函數擴展 Hive

方法
本書充滿了實用的逐步指導和清晰的解釋,針對最重要和有用的任務。提供快速食譜,使用 Hive 以各種格式讀取數據,有效查詢這些數據,並擴展 Hive 以滿足您可能需要插入自定義邏輯的數據管道。

本書的讀者對象
本書是為希望利用現有 SQL 知識提高在 Hadoop 上生產力的數據分析師和開發人員而寫的。假設讀者對編寫 SQL 查詢感到舒適,並且熟悉 Hadoop,至少達到經典的 WordCount 範例水平。