Data Engineering with Python (Paperback)
暫譯: 使用 Python 進行資料工程 (平裝本)
Crickard, Paul
- 出版商: Packt Publishing
- 出版日期: 2020-10-23
- 售價: $1,700
- 貴賓價: 9.5 折 $1,615
- 語言: 英文
- 頁數: 356
- 裝訂: Quality Paper - also called trade paper
- ISBN: 183921418X
- ISBN-13: 9781839214189
-
相關分類:
Python、程式語言
立即出貨 (庫存=1)
買這商品的人也買了...
-
$880$695 -
$1,320$1,294 -
$1,870$1,777 -
$500$390 -
$1,660$1,577 -
$1,570$1,492 -
$2,660Fundamentals of Data Engineering: Plan and Build Robust Data Systems (Paperback)
相關主題
商品描述
Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projects
Key features:
- Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples
- Design data models and learn how to extract, transform, and load (ETL) data using Python
- Schedule, automate, and monitor complex data pipelines in production
Book Description
Data engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python.
The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You'll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You'll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you'll build architectures on which you'll learn how to deploy data pipelines.
By the end of this Python book, you'll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.
What you will learn
- Understand how data engineering supports data science workflows
- Discover how to extract data from files and databases and then clean, transform, and enrich it
- Configure processors for handling different file formats as well as both relational and NoSQL databases
- Find out how to implement a data pipeline and dashboard to visualize results
- Use staging and validation to check data before landing in the warehouse
- Build real-time pipelines with staging areas that perform validation and handle failures
- Get to grips with deploying pipelines in the production environment
Who this book is for
This book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.
商品描述(中文翻譯)
建立、監控和管理實時數據管道,以高效地使用開源 Apache 項目創建數據工程基礎設施
主要特點:
- 透過實際範例,熟悉數據架構、數據準備和數據優化技能
- 設計數據模型,並學習如何使用 Python 提取、轉換和加載 (ETL) 數據
- 在生產環境中排程、自動化和監控複雜的數據管道
書籍描述
數據工程為數據科學和分析提供基礎,並且是所有企業的重要組成部分。本書將幫助您探索使用 Python 理解數據工程過程所需的各種工具和方法。
本書將向您展示如何應對數據工程不同方面中常見的挑戰。您將從數據工程的基本概念開始,了解構建數據管道所需的技術和框架,以處理大型數據集。您將學習如何轉換和清理數據,並進行分析,以充分利用您的數據。隨著學習的深入,您將發現如何處理不同複雜度的大數據和生產數據庫,並構建數據管道。通過真實世界的範例,您將構建架構,並學習如何部署數據管道。
在本 Python 書籍結束時,您將清楚了解數據建模技術,並能夠自信地構建數據工程管道,以跟踪數據、執行質量檢查並在生產中進行必要的更改。
您將學到什麼
- 了解數據工程如何支持數據科學工作流程
- 發現如何從文件和數據庫中提取數據,然後清理、轉換和豐富數據
- 配置處理器以處理不同的文件格式以及關聯型和 NoSQL 數據庫
- 了解如何實現數據管道和儀表板以可視化結果
- 使用暫存和驗證在數據進入數據倉庫之前檢查數據
- 構建具有驗證和故障處理的實時管道的暫存區
- 掌握在生產環境中部署管道的技巧
本書適合誰
本書適合數據分析師、ETL 開發人員,以及任何希望入門或轉型至數據工程領域,或希望使用 Python 刷新數據工程知識的人士。本書對於計劃在數據工程領域建立職業生涯的學生或準備轉型的 IT 專業人士也將非常有用。無需具備數據工程的先前知識。