Learning Cloudera Impala
暫譯: 學習 Cloudera Impala

Avkash Chauhan

  • 出版商: Packt Publishing
  • 出版日期: 2013-12-27
  • 售價: $1,540
  • 貴賓價: 9.5$1,463
  • 語言: 英文
  • 頁數: 150
  • 裝訂: Paperback
  • ISBN: 1783281278
  • ISBN-13: 9781783281275
  • 海外代購書籍(需單獨結帳)

商品描述

Everything you need to know about Cloudera Impala is here – from installation onwards. Your raw data processing in Hadoop takes on new dimensions of speed and volume with this hands-on tutorial.

Overview

  • Step-by-step guidance to get you started with Impala on your Hadoop cluster
  • Manipulate your data rapidly by writing proper SQL statements
  • Explore the concepts of Impala security, administration, and troubleshooting in detail to maintain your Impala cluster

In Detail

If you have always wanted to crunch billions of rows of raw data on Hadoop in a couple of seconds, then Cloudera Impala is the number one choice for you. Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries.

In this practical, example-oriented book, you will learn everything you need to know about Cloudera Impala so that you can get started on your very own project. The book covers everything about Cloudera Impala from installation, administration, and query processing, all the way to connectivity with other third party applications. With this book in your hand, you will find yourself empowered to play with your data in Hadoop.

As a reader of this book, you will learn about the origin of Impala and the technology behind it that allows it to run on thousands of machines. You will learn how to install, run, manage, and troubleshoot Impala in your own Hadoop cluster using the step-by-step guidance provided in the book. The book covers tenets of data processing such as loading data stored in Hadoop into Impala tables and querying data using Impala SQL statements, all with various code illustrations and a real-world example.

The book is written to get you started with Impala by providing rich information so you can understand what Impala is, what it can do for you, and finally how you can use it to achieve your objective.

What you will learn from this book

  • Understand the various ways of installing Impala in your Hadoop cluster
  • Use the Impala shell API to interact with Impala components
  • Utilize Impala Query Language and built-in functions to play with data
  • Administrate and fine-tune Impala for high availability
  • Identify and troubleshoot problems in a variety of ways
  • Get acquainted with various input data formats in Hadoop and how to use them with Impala
  • Comprehend how third party applications can connect with Impala to provide data visualization and various other enhancements

Approach

This book is an easy-to-follow, step-by-step tutorial where each chapter takes your knowledge to the next level. The book covers practical knowledge with tips to implement this knowledge in real-world scenarios. A chapter with a real-life example is included to help you understand the concepts in full.

Who this book is written for

Using Cloudera Impala is for those who really want to take advantage of their Hadoop cluster by processing extremely large amounts of raw data in Hadoop at real-time speed. Prior knowledge of Hadoop and some exposure to HIVE and MapReduce is expected.

商品描述(中文翻譯)

有關 Cloudera Impala 的所有資訊都在這裡——從安裝開始。這本實作教程將使您在 Hadoop 中的原始數據處理速度和容量達到新的維度。

概述

  • 逐步指導您在 Hadoop 集群上開始使用 Impala
  • 通過撰寫適當的 SQL 語句快速操作數據
  • 詳細探索 Impala 的安全性、管理和故障排除概念,以維護您的 Impala 集群

詳細內容

如果您一直想在幾秒鐘內處理數十億行的原始數據,那麼 Cloudera Impala 是您的首選。Cloudera Impala 直接在存儲於 HDFS 或 HBase 的 Apache Hadoop 數據上提供快速、互動式的 SQL 查詢。除了使用相同的統一存儲平台外,Impala 還使用與 Apache Hive 相同的元數據、SQL 語法(Hive SQL)、ODBC 驅動程式和用戶界面(Hue Beeswax)。這為批量導向或實時查詢提供了一個熟悉且統一的平台。

在這本以實例為導向的實用書中,您將學到有關 Cloudera Impala 的所有知識,以便您可以開始自己的專案。這本書涵蓋了 Cloudera Impala 的所有內容,從安裝、管理和查詢處理,到與其他第三方應用程式的連接。有了這本書,您將能夠在 Hadoop 中隨意操作您的數據。

作為這本書的讀者,您將了解 Impala 的起源及其背後的技術,使其能夠在數千台機器上運行。您將學會如何安裝、運行、管理和故障排除 Impala,並使用書中提供的逐步指導在自己的 Hadoop 集群中進行操作。這本書涵蓋了數據處理的基本原則,例如將存儲在 Hadoop 中的數據加載到 Impala 表中,以及使用 Impala SQL 語句查詢數據,並提供各種代碼示例和實際案例。

這本書旨在幫助您開始使用 Impala,提供豐富的信息,以便您了解 Impala 是什麼、它能為您做什麼,以及最終如何使用它來實現您的目標。

您將從這本書中學到什麼

  • 了解在您的 Hadoop 集群中安裝 Impala 的各種方法
  • 使用 Impala shell API 與 Impala 組件互動
  • 利用 Impala 查詢語言和內建函數來操作數據
  • 管理和微調 Impala 以實現高可用性
  • 以多種方式識別和排除問題
  • 熟悉 Hadoop 中的各種輸入數據格式及其在 Impala 中的使用方法
  • 理解第三方應用程式如何與 Impala 連接以提供數據可視化和其他各種增強功能

方法

這本書是一個易於遵循的逐步教程,每一章都將您的知識提升到一個新的水平。這本書涵蓋了實用知識,並提供在現實場景中實施這些知識的提示。書中包含一章實際案例,以幫助您全面理解這些概念。

本書的讀者對象

使用 Cloudera Impala 的人是那些真正希望利用其 Hadoop 集群,以實時速度處理極大量的原始數據的人。預期讀者應具備 Hadoop 的基本知識,並對 HIVE 和 MapReduce 有一定的接觸。