Instant Apache Sqoop
暫譯: 即時 Apache Sqoop

Ankit Jain

  • 出版商: Packt Publishing
  • 出版日期: 2013-09-08
  • 售價: $1,190
  • 貴賓價: 9.5$1,131
  • 語言: 英文
  • 頁數: 58
  • 裝訂: Paperback
  • ISBN: 1782165762
  • ISBN-13: 9781782165767
  • 海外代購書籍(需單獨結帳)

商品描述

Transfer data efficiently between RDBMS and the Hadoop ecosystem using the robust Apache Sqoop

Overview

  • Learn something new in an Instant! A short, fast, focused guide delivering immediate results
  • Learn how to transfer data between RDBMS and Hadoop using Sqoop
  • Add a third-party connector into Sqoop
  • Export data from Hadoop and Hive to RDBMS
  • Describe third-party Sqoop connectors

In Detail

In today’s world, data size is growing at a very fast rate, and people want to perform analytics by combining different sources of data (RDBMS, Text, and so on). Using Hadoop for analytics requires you to load data from RDBMS to Hadoop and perform analytics on that data, before then loading that process data back to RDBMS to generate business reports.

Instant Apache Sqoop is a practical, hands-on guide that provides you with a number of clear, step-by-step exercises that will help you to take advantage of the real power of Apache Sqoop and give you a good grounding in the knowledge required to transfer data between RDBMS and the Hadoop ecosystem.

Instant Apache Sqoop looks at the import/export process required in data transfer and discusses examples of each process. It will also give you an overview of HBase and Hive table structures and how you can populate HBase and Hive tables. The book will finish by taking you through a number of third-party Sqoop connectors.

You will also learn about various import and export arguments and how you can use these arguments to move data between RDBMS and the Hadoop ecosystem. This book also explains the architecture of import and export processes. The book will also take a look at some Sqoop connectors and will discuss examples of each connector. If you want to move data between RDBMS and the Hadoop ecosystem, then this is the book for you.

You will learn everything that you need to know to transfer data between RDBMS and the Hadoop ecosystem as well as how you can add new connectors into Sqoop.

What you will learn from this book

  • Understand the Sqoop import arguments and the provided examples to master moving data from RDBMS to Hadoop
  • Get to know the Sqoop incremental import feature
  • Understand the HBase table structure, HBase basic commands, and learn how to move data from RDBMS to HBase
  • Learn about the Hive table structure, Hive basic commands, and understand the provided examples to discover how to move data from RDBMS to Hive
  • Explore the Sqoop export arguments and learn how to move process data from Hadoop to RDBMS
  • Learn how to move data from Hive to RDBMS
  • Discover Sqoop third-party connectors

Approach

Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. Instant Apache Sqoop is full of step-by-step instructions and practical examples along with challenges to test and improve your knowledge.

Who this book is written for

This book is great for developers who are looking to get a good grounding in how to effectively and efficiently move data between RDBMS and the Hadoop ecosystem. It’s assumed that you will have some experience in Hadoop already as well as some familiarity with HBase and Hive.

商品描述(中文翻譯)

轉移資料在 RDBMS 和 Hadoop 生態系統之間,使用強大的 Apache Sqoop 進行高效的資料傳輸

概述
- 立即學習新知!一本短小、快速、專注的指南,提供即時結果
- 學習如何使用 Sqoop 在 RDBMS 和 Hadoop 之間轉移資料
- 在 Sqoop 中添加第三方連接器
- 將資料從 Hadoop 和 Hive 匯出到 RDBMS
- 描述第三方 Sqoop 連接器

詳細內容
在當今世界,資料的大小以非常快的速度增長,人們希望通過結合不同來源的資料(RDBMS、文本等)來進行分析。使用 Hadoop 進行分析需要將資料從 RDBMS 載入到 Hadoop,並對該資料進行分析,然後再將處理後的資料載回 RDBMS 以生成業務報告。

《Instant Apache Sqoop》是一本實用的、動手操作的指南,提供了一系列清晰的逐步練習,幫助您充分利用 Apache Sqoop 的真正力量,並為您提供在 RDBMS 和 Hadoop 生態系統之間轉移資料所需的知識基礎。

《Instant Apache Sqoop》探討了資料傳輸中所需的匯入/匯出過程,並討論了每個過程的範例。它還將概述 HBase 和 Hive 的表結構,以及如何填充 HBase 和 Hive 表。這本書將以介紹多個第三方 Sqoop 連接器作結。

您還將學習各種匯入和匯出參數,以及如何使用這些參數在 RDBMS 和 Hadoop 生態系統之間移動資料。本書還解釋了匯入和匯出過程的架構。書中還將介紹一些 Sqoop 連接器,並討論每個連接器的範例。如果您想在 RDBMS 和 Hadoop 生態系統之間移動資料,那麼這本書就是為您而寫的。

您將學習到轉移資料在 RDBMS 和 Hadoop 生態系統之間所需的所有知識,以及如何在 Sqoop 中添加新的連接器。

您將從本書中學到的內容
- 理解 Sqoop 匯入參數及提供的範例,以掌握從 RDBMS 移動資料到 Hadoop
- 了解 Sqoop 增量匯入功能
- 理解 HBase 表結構、HBase 基本命令,並學習如何將資料從 RDBMS 移動到 HBase
- 了解 Hive 表結構、Hive 基本命令,並理解提供的範例以發現如何將資料從 RDBMS 移動到 Hive
- 探索 Sqoop 匯出參數,學習如何將處理後的資料從 Hadoop 移動到 RDBMS
- 學習如何將資料從 Hive 移動到 RDBMS
- 發現 Sqoop 第三方連接器

方法
本書充滿了實用的逐步指導和對最重要和有用任務的清晰解釋。《Instant Apache Sqoop》包含逐步指導和實用範例,並提供挑戰以測試和提升您的知識。

本書的讀者對象
本書非常適合希望有效且高效地在 RDBMS 和 Hadoop 生態系統之間移動資料的開發人員。假設您已經對 Hadoop 有一定的經驗,並對 HBase 和 Hive 有一些熟悉。