Pentaho Data Integration Beginner's Guide, 2/e(Paperback)
暫譯: Pentaho 數據整合初學者指南,第二版(平裝本)

María Carina Roldán

  • 出版商: Packt Publishing
  • 出版日期: 2013-08-31
  • 售價: $2,220
  • 貴賓價: 9.5$2,109
  • 語言: 英文
  • 頁數: 502
  • 裝訂: Paperback
  • ISBN: 1782165045
  • ISBN-13: 9781782165040
  • 海外代購書籍(需單獨結帳)

商品描述

Extract, Transform, and Load (ETL) is the essence of data integration and this book shows you how to achieve it quickly and efficiently using Pentaho Data. A hands-on guide that you’ll find an indispensable time-saver.

Overview

  • Manipulate your data by exploring, transforming, validating, and integrating it
  • Learn to migrate data between applications
  • Explore several features of Pentaho Data Integration 5.0
  • Connect to any database engine, explore the databases, and perform all kind of operations on databases

In Detail

Capturing, manipulating, cleansing, transferring, and loading data effectively are the prime requirements in every IT organization. Achieving these tasks require people devoted to developing extensive software programs, or investing in ETL or data integration tools that can simplify this work.

Pentaho Data Integration is a full-featured open source ETL solution that allows you to meet these requirements. Pentaho Data Integration has an intuitive, graphical, drag-and-drop design environment and its ETL capabilities are powerful. However, getting started with Pentaho Data Integration can be difficult or confusing.

"Pentaho Data Integration Beginner's Guide, Second Edition" provides the guidance needed to overcome that difficulty, covering all the possible key features of Pentaho Data Integration.

"Pentaho Data Integration Beginner's Guide, Second Edition" starts with the installation of Pentaho Data Integration software and then moves on to cover all the key Pentaho Data Integration concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to do all kinds of data manipulation and work with plain files. Then, the book gives you a primer on databases and teaches you how to work with databases inside Pentaho Data Integration. Moreover, you will be introduced to data warehouse concepts and you will learn how to load data in a data warehouse. After that, you will learn to implement simple and complex processes. Finally, you will have the opportunity of applying and reinforcing all the learned concepts through the implementation of a simple datamart.

With "Pentaho Data Integration Beginner's Guide, Second Edition", you will learn everything you need to know in order to meet your data manipulation requirements.

What you will learn from this book

  • Install and get started with Pentaho Data Integration
  • Get started with MySQL
  • Learn the ins and outs of Spoon, the graphical designer tool
  • Transform data in several ways such as performing simple and complex calculations, cleaning, counting, de-duplicating, filtering, and ordering
  • Learn to get data from all kind of data sources as plain files, Excel spreadsheets, databases, XML files and more, then preview it, and send it back to the same or different destinations
  • Discover how to read and parse unstructured files
  • Embed Java and JavaScript code in your Pentaho Data Integration transformations to enrich the treatment of data
  • Use Pentaho Data Integration to perform CRUD (create, read, update, and delete) operations on databases
  • Learn the basic concepts of data warehousing
  • Populate a data warehouse with Pentaho Data Integration including loading slowly changing dimensions, junk dimensions, time dimensions and more
  • Implement business processes by scheduling tasks, checking conditions, organizing files and folders, running daily processes, treating errors, and so on in a way that meets your requirements

商品描述(中文翻譯)

提取、轉換和加載(ETL)是數據整合的精髓,本書將向您展示如何快速且高效地使用 Pentaho Data 實現這一目標。這是一本實用指南,您會發現它是不可或缺的省時工具。

概述
- 通過探索、轉換、驗證和整合數據來操作您的數據
- 學習在應用程序之間遷移數據
- 探索 Pentaho Data Integration 5.0 的多個功能
- 連接到任何數據庫引擎,探索數據庫,並對數據庫執行各種操作

詳細內容
有效地捕獲、操作、清理、轉移和加載數據是每個 IT 組織的主要需求。實現這些任務需要專注於開發廣泛軟件程序的人,或投資於可以簡化這項工作的 ETL 或數據整合工具。

Pentaho Data Integration 是一個功能齊全的開源 ETL 解決方案,能夠滿足這些需求。Pentaho Data Integration 擁有直觀的圖形化拖放設計環境,其 ETL 功能強大。然而,開始使用 Pentaho Data Integration 可能會困難或令人困惑。

《Pentaho Data Integration 初學者指南(第二版)》提供了克服這一困難所需的指導,涵蓋了 Pentaho Data Integration 的所有可能關鍵功能。

《Pentaho Data Integration 初學者指南(第二版)》從安裝 Pentaho Data Integration 軟件開始,然後進一步介紹所有關鍵的 Pentaho Data Integration 概念。每一章都介紹新功能,讓您逐步參與這個工具。首先,您將學習各種數據操作並處理純文本文件。然後,本書將為您提供數據庫的入門知識,並教您如何在 Pentaho Data Integration 中使用數據庫。此外,您將接觸到數據倉儲的概念,並學習如何在數據倉儲中加載數據。之後,您將學習實施簡單和複雜的流程。最後,您將有機會通過實施一個簡單的數據集市來應用和鞏固所有學到的概念。

通過《Pentaho Data Integration 初學者指南(第二版)》,您將學習到滿足數據操作需求所需的所有知識。

您將從本書中學到的內容
- 安裝並開始使用 Pentaho Data Integration
- 開始使用 MySQL
- 學習 Spoon(圖形設計工具)的各種功能
- 以多種方式轉換數據,例如執行簡單和複雜的計算、清理、計數、去重、過濾和排序
- 學習從各種數據源獲取數據,如純文本文件、Excel 試算表、數據庫、XML 文件等,然後預覽並將其發送回相同或不同的目的地
- 探索如何讀取和解析非結構化文件
- 在您的 Pentaho Data Integration 轉換中嵌入 Java 和 JavaScript 代碼,以豐富數據處理
- 使用 Pentaho Data Integration 在數據庫上執行 CRUD(創建、讀取、更新和刪除)操作
- 學習數據倉儲的基本概念
- 使用 Pentaho Data Integration 填充數據倉儲,包括加載緩慢變化的維度、垃圾維度、時間維度等
- 通過安排任務、檢查條件、組織文件和文件夾、運行日常流程、處理錯誤等方式實施業務流程,以滿足您的需求