Python Data Cleaning Cookbook: Modern techniques and Python tools to detect and remove dirty data and extract key insights
暫譯: Python 數據清理食譜:現代技術與 Python 工具檢測和移除髒數據並提取關鍵洞察
Walker, Michael
- 出版商: Packt Publishing
- 出版日期: 2020-12-11
- 定價: $1,650
- 售價: 6.0 折 $990
- 語言: 英文
- 頁數: 436
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1800565666
- ISBN-13: 9781800565661
-
相關分類:
Python、程式語言
-
相關翻譯:
Python 數據清洗 (簡中版)
-
其他版本:
Python Data Cleaning Cookbook - Second Edition: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
買這商品的人也買了...
-
$2,800$2,660 -
$301數據科學導論:Python語言實現(原書第2版)
-
$352Python 數據科學入門 (Python for Data Science For Dummies)
-
$2,720$2,584 -
$834$792 -
$3,070$2,917 -
$356數據科學實戰入門 使用Python和R
-
$1,421Fundamentals of Machine Learning for Predictive Data Analytics : Algorithms, Worked Examples, and Case Studies, 2/e (Hardcover)
-
$1,980$1,881
商品描述
Discover how to describe your data in detail, identify data issues, and find out how to solve them using commonly used techniques and tips and tricks
Key features
- Get well-versed with various data cleaning techniques to reveal key insights
- Manipulate data of different complexities to shape them into the right form as per your business needs
- Clean, monitor, and validate large data volumes to diagnose problems before moving on to data analysis
Book Description
Getting clean data to reveal insights is essential, as directly jumping into data analysis without proper data cleaning may lead to incorrect results. This book shows you tools and techniques that you can apply to clean and handle data with Python. You'll begin by getting familiar with the shape of data by using practices that can be deployed routinely with most data sources. Then, the book teaches you how to manipulate data to get it into a useful form. You'll also learn how to filter and summarize data to gain insights and better understand what makes sense and what does not, along with discovering how to operate on data to address the issues you've identified. Moving on, you'll perform key tasks, such as handling missing values, validating errors, removing duplicate data, monitoring high volumes of data, and handling outliers and invalid dates. Next, you'll cover recipes on using supervised learning and Naive Bayes analysis to identify unexpected values and classification errors, and generate visualizations for exploratory data analysis (EDA) to visualize unexpected values. Finally, you'll build functions and classes that you can reuse without modification when you have new data.
By the end of this Python book, you'll be equipped with all the key skills that you need to clean data and diagnose problems within it.
What you will learn
- Find out how to read and analyze data from a variety of sources
- Produce summaries of the attributes of data frames, columns, and rows
- Filter data and select columns of interest that satisfy given criteria
- Address messy data issues, including working with dates and missing values
- Improve your productivity in Python pandas by using method chaining
- Use visualizations to gain additional insights and identify potential data issues
- Enhance your ability to learn what is going on in your data
- Build user-defined functions and classes to automate data cleaning
Who this book is for
This book is for anyone looking for ways to handle messy, duplicate, and poor data using different Python tools and techniques. The book takes a recipe-based approach to help you to learn how to clean and manage data. Working knowledge of Python programming is all you need to get the most out of the book.
商品描述(中文翻譯)
探索如何詳細描述您的數據、識別數據問題,並了解如何使用常用技術和技巧來解決這些問題
主要特點
- 熟悉各種數據清理技術,以揭示關鍵見解
- 操作不同複雜度的數據,根據您的業務需求將其轉換為正確的形式
- 清理、監控和驗證大量數據,以在進行數據分析之前診斷問題
書籍描述
獲得乾淨的數據以揭示見解是至關重要的,因為在沒有適當數據清理的情況下直接進行數據分析可能會導致不正確的結果。本書展示了您可以應用的工具和技術,以使用 Python 清理和處理數據。您將首先通過使用可以在大多數數據源中常規部署的實踐來熟悉數據的形狀。然後,本書教您如何操作數據以使其轉換為有用的形式。您還將學習如何過濾和總結數據,以獲得見解並更好地理解哪些是合理的,哪些不是,同時發現如何操作數據以解決您已識別的問題。接下來,您將執行關鍵任務,例如處理缺失值、驗證錯誤、刪除重複數據、監控大量數據,以及處理異常值和無效日期。然後,您將學習使用監督學習和朴素貝葉斯分析來識別意外值和分類錯誤的配方,並生成用於探索性數據分析 (EDA) 的可視化,以可視化意外值。最後,您將構建可以在獲得新數據時無需修改即可重用的函數和類。
到本書結束時,您將具備清理數據和診斷數據問題所需的所有關鍵技能。
您將學到的內容
- 了解如何從各種來源讀取和分析數據
- 生成數據框、列和行屬性的摘要
- 過濾數據並選擇滿足給定標準的感興趣列
- 解決雜亂數據問題,包括處理日期和缺失值
- 通過使用方法鏈改善您在 Python pandas 中的生產力
- 使用可視化獲得額外見解並識別潛在數據問題
- 增強您了解數據中發生的事情的能力
- 構建用戶定義的函數和類以自動化數據清理
本書適合誰
本書適合任何尋找使用不同 Python 工具和技術處理雜亂、重複和劣質數據方法的人。本書採用基於配方的方法,幫助您學習如何清理和管理數據。您只需具備 Python 編程的工作知識,即可充分利用本書。