Unlocking Dbt: Design and Deploy Transformations in Your Cloud Data Warehouse (Paperback)
暫譯: 解鎖 Dbt:在雲端資料倉儲中設計與部署轉換
Cyr, Cameron, Dorsey, Dustin
買這商品的人也買了...
商品描述
This book shows how dbt is used to build data transformation pipelines that enable dependency management and allow for version control and automated testing. It explains how dbt is revolutionizing data transformation and the advantages that a command-line tool like dbt provides over and above the use of database stored procedures and other ETL and ELT tools that handle data transformations. You'll see how to create custom-written transformations through simple SQL SELECT statements, eliminating the need for boilerplate code and making it easy to incorporate dbt as the transformation layer in your data warehouse pipelines. Additionally, you will learn how dbt enables data teams to incorporate software engineering best practices such as code reusability, version control, and automated testing into the data transformation process.
Unlocking dbt walks you through using dbt to establish a project, build and modularize SQL models, and execute jobs in a way that is easy to maintain and scale as your data ecosystem matures. You'll begin by establishing and configuring a project, a process covered using both dbt Cloud and dbt Core, so that you can confidently stand up a project using either platform. From there, you'll move into building transformations with peace of mind that your project will scale appropriately as you continue to develop it.
After learning the basics needed to get started, you'll continue to build on that foundation by looking at the unique ways in which dbt combines SQL with Jinja to take your code beyond what is capable in normal SQL. You will learn about advanced materializations, building lineage in your data flows, the unlimited potential of macros, and so much more. This book also explores supported file types and the building of Python models. Rounding things out, you will learn features of dbt that will assist you in making your transformation layer production ready. These include how to implement automated testing, using dbt to generate documentation, and running CI/CD pipelines.
What You Will Learn
- Understand what dbt is and how it is used in the modern data stack
- Set up a project using both dbt Cloud and dbt Core
- Connect a dbt project to a cloud data warehouse
- Build SQL and Python models that are scalable and maintainable
- Configure development, testing, and production environments
- Capture reusable logic in the form of Jinja macros
- Incorporate version control with your data transformation code
Who This Book Is For
Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline's transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.
商品描述(中文翻譯)
這本書展示了如何使用 dbt 建立數據轉換管道,這些管道能夠進行依賴管理,並允許版本控制和自動化測試。它解釋了 dbt 如何徹底改變數據轉換的方式,以及像 dbt 這樣的命令行工具相較於使用資料庫儲存過程和其他處理數據轉換的 ETL 和 ELT 工具所提供的優勢。您將看到如何通過簡單的 SQL SELECT 語句創建自定義轉換,消除樣板代碼的需求,並輕鬆將 dbt 作為數據倉庫管道中的轉換層。此外,您將學習 dbt 如何使數據團隊將軟體工程最佳實踐(如代碼重用、版本控制和自動化測試)納入數據轉換過程中。
《Unlocking dbt》將引導您使用 dbt 建立一個專案,構建和模組化 SQL 模型,並以易於維護和擴展的方式執行作業,隨著您的數據生態系統的成熟而變得更加簡單。您將首先建立和配置一個專案,這一過程將使用 dbt Cloud 和 dbt Core 進行說明,讓您能夠自信地使用任一平台建立專案。接下來,您將開始構建轉換,並放心地知道隨著您持續開發,您的專案將適當擴展。
在學習了開始所需的基本知識後,您將繼續在此基礎上深入了解 dbt 如何獨特地將 SQL 與 Jinja 結合,讓您的代碼超越普通 SQL 的能力。您將學習有關高級物化、在數據流中建立血緣、宏的無限潛力等內容。本書還探討了支持的文件類型和 Python 模型的構建。最後,您將學習 dbt 的功能,這些功能將幫助您使轉換層準備好進入生產環境,包括如何實施自動化測試、使用 dbt 生成文檔以及運行 CI/CD 管道。
您將學到的內容:
- 了解 dbt 是什麼以及它在現代數據堆疊中的使用方式
- 使用 dbt Cloud 和 dbt Core 設置專案
- 將 dbt 專案連接到雲數據倉庫
- 構建可擴展和可維護的 SQL 和 Python 模型
- 配置開發、測試和生產環境
- 以 Jinja 宏的形式捕捉可重用邏輯
- 將版本控制納入您的數據轉換代碼
本書適合對象:
目前和有志於成為數據專業人士的人士,包括架構師、開發人員、分析師、工程師、數據科學家和顧問,他們正在開始使用 dbt 作為其數據管道轉換層的一部分。讀者應具備撰寫基本 SQL 語句、開發最佳實踐以及在數據倉庫等分析環境中處理數據的基礎知識。
作者簡介
Cameron Cyr is a data fanatic who has spent his career developing data systems enabling valuable use cases such as analytics and machine learning. During this time, he has placed a focus on building reliable and scalable data systems with an emphasis on data quality. He is active in the data community and is one of the co-organizers and founders of Nashville's Data Engineering Group. Cameron currently serves as a data engineer for a healthcare tech startup.
Dustin Dorsey is a data leader and architect who has been building and managing data solutions for nearly 15 years. He is currently leading the build out of data infrastructure and analytics environments for a fast-growing healthcare tech startup. Dustin is a well-respected leader in the data community as an international speaker and mentor. He has previously organized several data community events and user groups and currently is one of the founders and organizers of the Nashville Data Engineering group. Dustin is one of the authors of the popular Apress book, Pro Database Migration to Azure.
作者簡介(中文翻譯)
Cameron Cyr 是一位數據狂熱者,他的職業生涯專注於開發數據系統,以實現有價值的應用案例,如分析和機器學習。在此期間,他專注於構建可靠且可擴展的數據系統,並強調數據質量。他活躍於數據社群,是納什維爾數據工程小組的共同組織者和創始人之一。Cameron 目前擔任一家醫療科技初創公司的數據工程師。
Dustin Dorsey 是一位數據領導者和架構師,近 15 年來一直在構建和管理數據解決方案。他目前正在為一家快速成長的醫療科技初創公司領導數據基礎設施和分析環境的建設。Dustin 在數據社群中是一位備受尊敬的領導者,擔任國際演講者和導師。他曾組織過幾個數據社群活動和用戶小組,並且目前是納什維爾數據工程小組的創始人和組織者之一。Dustin 是 Apress 受歡迎的書籍 Pro Database Migration to Azure 的作者之一。