The Data Wrangling Workshop, Second Edition: Create your own actionable insights using data from multiple raw sources
暫譯: 數據處理工作坊(第二版):利用多個原始數據來源創建可行的洞察力

Lipp, Brian, Roychowdhury, Shubhadeep, Sarkar, Tirthajyoti

  • 出版商: Packt Publishing
  • 出版日期: 2020-07-28
  • 售價: $1,670
  • 貴賓價: 9.5$1,587
  • 語言: 英文
  • 頁數: 576
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1839215003
  • ISBN-13: 9781839215001
  • 海外代購書籍(需單獨結帳)

商品描述

A beginner's guide to simplifying Extract, Transform, Load (ETL) processes with the help of hands-on tips, tricks, and best practices, in a fun and interactive way

Key Features

  • Explore data wrangling with the help of real-world examples and business use cases
  • Study various ways to extract the most value from your data in minimal time
  • Boost your knowledge with bonus topics, such as random data generation and data integrity checks

Book Description

While a huge amount of data is readily available to us, it is not useful in its raw form. For data to be meaningful, it must be curated and refined.

If you're a beginner, then The Data Wrangling Workshop will help to break down the process for you. You'll start with the basics and build your knowledge, progressing from the core aspects behind data wrangling, to using the most popular tools and techniques.

This book starts by showing you how to work with data structures using Python. Through examples and activities, you'll understand why you should stay away from traditional methods of data cleaning used in other languages and take advantage of the specialized pre-built routines in Python. Later, you'll learn how to use the same Python backend to extract and transform data from an array of sources, including the internet, large database vaults, and Excel financial tables. To help you prepare for more challenging scenarios, the book teaches you how to handle missing or incorrect data, and reformat it based on the requirements from your downstream analytics tool.

By the end of this book, you will have developed a solid understanding of how to perform data wrangling with Python, and learned several techniques and best practices to extract, clean, transform, and format your data efficiently, from a diverse array of sources.

What you will learn

  • Get to grips with the fundamentals of data wrangling
  • Understand how to model data with random data generation and data integrity checks
  • Discover how to examine data with descriptive statistics and plotting techniques
  • Explore how to search and retrieve information with regular expressions
  • Delve into commonly-used Python data science libraries
  • Become well-versed with how to handle and compensate for missing data

Who this book is for

The Data Wrangling Workshop is designed for developers, data analysts, and business analysts who are looking to pursue a career as a full-fledged data scientist or analytics expert. Although this book is for beginners who want to start data wrangling, prior working knowledge of the Python programming language is necessary to easily grasp the concepts covered here. It will also help to have a rudimentary knowledge of relational databases and SQL.

商品描述(中文翻譯)

初學者指南:透過實用的技巧、竅門和最佳實踐,以有趣和互動的方式簡化提取、轉換、加載(ETL)過程

主要特色


  • 透過真實世界的範例和商業案例探索數據處理

  • 學習各種方法,以最少的時間從數據中提取最大價值

  • 透過隨機數據生成和數據完整性檢查等額外主題提升您的知識

書籍描述

雖然大量數據隨時可用,但以原始形式並不有用。數據要有意義,必須經過策劃和精煉。

如果您是初學者,《數據處理工作坊》將幫助您分解這個過程。您將從基礎開始,逐步建立知識,從數據處理的核心方面進展到使用最受歡迎的工具和技術。

本書首先展示如何使用 Python 處理數據結構。通過範例和活動,您將了解為什麼應該避免使用其他語言中的傳統數據清理方法,並利用 Python 中的專用預建例程。稍後,您將學習如何使用相同的 Python 後端從各種來源提取和轉換數據,包括互聯網、大型數據庫和 Excel 財務表格。為了幫助您準備更具挑戰性的情境,本書教您如何處理缺失或不正確的數據,並根據下游分析工具的要求重新格式化數據。

到本書結束時,您將對如何使用 Python 進行數據處理有扎實的理解,並學會幾種技術和最佳實踐,以高效地從多樣的來源提取、清理、轉換和格式化數據。

您將學到什麼


  • 掌握數據處理的基本原則

  • 了解如何使用隨機數據生成和數據完整性檢查來建模數據

  • 發現如何使用描述性統計和繪圖技術檢查數據

  • 探索如何使用正則表達式搜索和檢索信息

  • 深入了解常用的 Python 數據科學庫

  • 熟悉如何處理和補償缺失數據

本書適合誰

《數據處理工作坊》是為希望成為全職數據科學家或分析專家的開發人員、數據分析師和商業分析師設計的。雖然本書適合希望開始數據處理的初學者,但需要具備 Python 程式語言的基本工作知識,以便輕鬆掌握這裡涵蓋的概念。此外,對關聯數據庫和 SQL 的基本知識也會有所幫助。

作者簡介

Brian Lipp is a technology polygot who is always in search of interesting and innovative technology. His current languages of choice are Python, Go, and Scala.

Shubhadeep Roychowdhury holds a master's degree in computer science from West Bengal University of Technology and certifications in machine learning from Stanford. He works as a senior software engineer at a Paris-based cybersecurity startup, where he is applying state-of-the-art computer vision and data engineering algorithms and tools to develop cutting-edge products. He often writes about algorithm implementation in Python and similar topics.

Dr. Tirthajyoti Sarkar works as a senior principal engineer in the semiconductor technology domain, where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. He writes regularly about Python programming and data science topics. He holds a Ph.D. from the University of Illinois and certifications in artificial intelligence and machine learning from Stanford and MIT.

作者簡介(中文翻譯)

Brian Lipp 是一位技術多面手,總是在尋找有趣和創新的技術。他目前選擇的程式語言是 Python、Go 和 Scala。

Shubhadeep Roychowdhury 擁有西孟加拉科技大學的計算機科學碩士學位,以及斯坦福大學的機器學習認證。他在一家位於巴黎的網路安全初創公司擔任高級軟體工程師,應用最先進的計算機視覺和數據工程算法及工具來開發尖端產品。他經常撰寫有關 Python 中算法實現及相關主題的文章。

Dr. Tirthajyoti Sarkar 在半導體技術領域擔任高級首席工程師,應用最先進的數據科學/機器學習技術進行設計自動化和預測分析。他定期撰寫有關 Python 程式設計和數據科學主題的文章。他擁有伊利諾伊大學的博士學位,以及斯坦福大學和麻省理工學院的人工智慧和機器學習認證。

目錄大綱

  1. Introduction to Data Wrangling with Python
  2. Advanced Operations on Built-In Data Structures
  3. Introduction to Numpy, Pandas, and Matplotlib
  4. A Deep Dive into Data Wrangling with Python
  5. Get Comfortable with Different Kinds of Data Sources
  6. Learning with Hidden Secrets of Data Wrangling
  7. Advanced Web Scrapping and Data Gathering
  8. RDBMS and SQL
  9. Applications in Business Use Cases and Conclusion of the Course

目錄大綱(中文翻譯)


  1. Introduction to Data Wrangling with Python

  2. Advanced Operations on Built-In Data Structures

  3. Introduction to Numpy, Pandas, and Matplotlib

  4. A Deep Dive into Data Wrangling with Python

  5. Get Comfortable with Different Kinds of Data Sources

  6. Learning with Hidden Secrets of Data Wrangling

  7. Advanced Web Scrapping and Data Gathering

  8. RDBMS and SQL

  9. Applications in Business Use Cases and Conclusion of the Course

最後瀏覽商品 (20)