HDInsight Essentials
暫譯: HDInsight 基礎知識
Rajesh Nadipalli
- 出版商: Packt Publishing
- 出版日期: 2013-09-10
- 售價: $1,520
- 貴賓價: 9.5 折 $1,444
- 語言: 英文
- 頁數: 122
- 裝訂: Paperback
- ISBN: 1849695369
- ISBN-13: 9781849695367
海外代購書籍(需單獨結帳)
相關主題
商品描述
Tap your unstructured Big Data and empower your business using the Hadoop distribution from Windows
Overview
- Architect a Hadoop solution with a modular design for data collection, distributed processing, analysis, and reporting
- Build a multi-node Hadoop cluster on Windows servers
- Establish a Big Data solution using HDInsight with open source software, and provide useful Excel reports
- Run Pig scripts and build simple charts using Interactive JS (Azure)
In Detail
We live in an era in which data is generated with every action and a lot of these are unstructured; from Twitter feeds, Facebook updates, photos and digital sensor inputs. Current relational databases cannot handle the volume, velocity and variations of data. HDInsight gives you the ability to gain the full value of Big Data with a modern, cloud-based data platform that manages data of any size and type, whether structured or unstructured.
A hands-on guide that shows you how to seamlessly store and process Big Data of all types through Microsoft’s modern data platform; which provides simplicity, ease of management, and an open enterprise-ready Hadoop service all running in the Cloud. You will then learn how to analyze your Hadoop data with PowerPivot, Power View, Excel, and other Microsoft BI tools; thanks to integration with the Microsoft data platform, this will give you a solid foundation to build your own HDInsight solution, both on premise and on Cloud.
Firstly, we will provide an overview of Hadoop and Microsoft Big Data strategy, where HDinsight plays a key role. We will then show you how to set up your HDInsight cluster and take you through the 4 stages of collecting, processing, analysing and reporting. For each of these stages, you will see a practical example with working code.
You will then learn core Hadoop concepts like HDFS and MapReduce. You will also get a closer look at how Microsoft’s HDInsight leverages Hortonworks Data Platform that uses Apache Hadoop. You will then be guided through Hadoop commands and programming using open source software, such as Hive and Pig with HDInsight. Finally, you will learn to analyze and report using PowerPivot, Power View, Excel, and other Microsoft BI tools.
This guide provides step-by-step instructions on how to build a Big Data solution using HDInsight with open source software, provide useful Excel reports, and open up the full value of HDInsight.
What you will learn from this book
- Explore the characteristics of a Big Data problem
- Analyse and report your data using PowerPivot, Power View, Excel, and other Microsoft BI tools
- Explore the architectural considerations for scalability, maintainability, and security
- Understand the concept of Data Ingestion to your HDInsight cluster including community tools and scripts
- Administer and monitor your HDInsight cluster including capacity and process management
- Get to know the Hadoop ecosystem with various tools and software based on their roles
- Get to know the HDInsight differentiator and how it is built on top of Apache Hadoop
- Transform your data using open source software such as MapReduce, Hive, Pig and JavaScript
Approach
This book is a fast-paced guide full of step-by-step instructions on how to build a multi-node Hadoop cluster on Windows servers.
商品描述(中文翻譯)
利用 Windows 上的 Hadoop 發行版,挖掘您的非結構化大數據,並為您的業務賦能
概述
- 設計一個模組化的 Hadoop 解決方案,用於數據收集、分散式處理、分析和報告
- 在 Windows 伺服器上建立多節點 Hadoop 叢集
- 使用開源軟體建立基於 HDInsight 的大數據解決方案,並提供有用的 Excel 報告
- 運行 Pig 腳本並使用互動式 JS (Azure) 建立簡單的圖表
詳細內容
我們生活在一個每個行動都會產生數據的時代,其中許多數據是非結構化的;來自 Twitter 動態、Facebook 更新、照片和數位感測器輸入。當前的關聯式資料庫無法處理數據的體量、速度和變化。HDInsight 使您能夠充分利用大數據,提供一個現代化的雲端數據平台,能夠管理任何大小和類型的數據,無論是結構化還是非結構化。
這本實用指南展示了如何通過微軟的現代數據平台無縫地存儲和處理各類型的大數據;該平台提供簡單性、易於管理性,以及一個在雲端運行的開放企業級 Hadoop 服務。接著,您將學習如何使用 PowerPivot、Power View、Excel 和其他微軟 BI 工具分析您的 Hadoop 數據;得益於與微軟數據平台的整合,這將為您建立自己的 HDInsight 解決方案提供堅實的基礎,無論是在本地還是雲端。
首先,我們將提供 Hadoop 和微軟大數據策略的概述,其中 HDInsight 扮演著關鍵角色。然後,我們將展示如何設置您的 HDInsight 叢集,並帶您經歷收集、處理、分析和報告的四個階段。在每個階段中,您將看到一個實際的範例和可運行的代碼。
接下來,您將學習核心的 Hadoop 概念,如 HDFS 和 MapReduce。您還將更深入了解微軟的 HDInsight 如何利用使用 Apache Hadoop 的 Hortonworks 數據平台。然後,您將通過使用開源軟體(如 Hive 和 Pig)與 HDInsight 進行 Hadoop 命令和編程的指導。最後,您將學習如何使用 PowerPivot、Power View、Excel 和其他微軟 BI 工具進行分析和報告。
本指南提供逐步指導,教您如何使用開源軟體建立基於 HDInsight 的大數據解決方案,提供有用的 Excel 報告,並充分發揮 HDInsight 的價值。
您將從本書中學到什麼
- 探索大數據問題的特徵
- 使用 PowerPivot、Power View、Excel 和其他微軟 BI 工具分析和報告您的數據
- 探索可擴展性、可維護性和安全性的架構考量
- 了解數據導入到您的 HDInsight 叢集的概念,包括社群工具和腳本
- 管理和監控您的 HDInsight 叢集,包括容量和流程管理
- 了解 Hadoop 生態系統中各種工具和軟體的角色
- 了解 HDInsight 的差異化特點及其如何建立在 Apache Hadoop 之上
- 使用開源軟體如 MapReduce、Hive、Pig 和 JavaScript 轉換您的數據
方法
這本書是一本快速的指南,充滿了逐步指導,教您如何在 Windows 伺服器上建立多節點 Hadoop 叢集。