Python Data Cleaning Cookbook - Second Edition: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
暫譯: Python 數據清理食譜 - 第二版:使用 pandas、NumPy、Matplotlib、scikit-learn 和 OpenAI 準備您的數據以進行分析
Walker, Michael
- 出版商: Packt Publishing
- 出版日期: 2024-05-31
- 售價: $2,050
- 貴賓價: 9.5 折 $1,948
- 語言: 英文
- 頁數: 486
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1803239875
- ISBN-13: 9781803239873
-
相關分類:
Python、程式語言
海外代購書籍(需單獨結帳)
商品描述
Learn the intricacies of data description, issue identification, and practical problem-solving, armed with essential techniques and expert tips.
Key Features:- Get to grips with new techniques for data preprocessing and cleaning for machine learning and NLP models
- Use new and updated AI tools and techniques for data cleaning tasks
- Clean, monitor, and validate large data volumes to diagnose problems using cutting-edge methodologies including Machine learning and AI
Book Description:Jumping into data analysis without proper data cleaning will certainly lead to incorrect results. The Python Data Cleaning Cookbook will show you tools and techniques for cleaning and handling data with Python for better outcomes.
Fully updated to the latest version of Python and all relevant tools, this book will teach you how to manipulate and clean data to get it into a useful form. The current edition emphasizes advanced techniques like machine learning and AI-specific approaches and tools to data cleaning along with the conventional ones. The book also delves into tips and techniques to process and clean data for ML, AI and NLP models You will learn how to filter and summarize data to gain insights and better understand what makes sense and what does not, along with discovering how to operate on data to address the issues you've identified. Next, you'll cover recipes for using supervised learning and Naive Bayes analysis to identify unexpected values and classification errors and generate visualizations for exploratory data analysis (EDA) to identify unexpected values. Finally, you'll build functions and classes that you can reuse without modification when you have new data.
By the end of this Data Cleaning book, you'll know how to clean data and diagnose problems within it.
What You Will Learn:- Using OpenAI tools for various data cleaning tasks
- Produce summaries of the attributes of datasets, columns, and rows
- Anticipating Data Cleaning Issues when Importing Tabular Data into Pandas
- Apply validation techniques for imported tabular data
- Improve your productivity in Python pandas by using method chaining
- Recognize and resolve common issues like dates and IDs
- Set up indexes to streamline data issue identification
- Use data cleaning to prepare your data for ML and AI models
Who this book is for:This book is for anyone looking for ways to handle messy, duplicate, and poor data using different Python tools and techniques. The book takes a recipe-based approach to help you to learn how to clean and manage data with practical examples.
Working knowledge of Python programming is all you need to get the most out of the book.
商品描述(中文翻譯)
學習數據描述、問題識別和實用問題解決的複雜性,掌握必要的技術和專家提示。主要特點:
- 掌握機器學習和自然語言處理模型的新數據預處理和清理技術
- 使用新的和更新的人工智慧工具和技術進行數據清理任務
- 清理、監控和驗證大量數據,以使用包括機器學習和人工智慧在內的尖端方法診斷問題
書籍描述:在沒有適當數據清理的情況下進行數據分析,肯定會導致不正確的結果。《Python數據清理食譜》將向您展示使用Python清理和處理數據的工具和技術,以獲得更好的結果。
本書已完全更新至最新版本的Python及所有相關工具,將教您如何操作和清理數據,使其變得有用。本版強調先進技術,如機器學習和專門針對數據清理的人工智慧方法和工具,以及傳統方法。本書還深入探討了處理和清理數據以適應機器學習、人工智慧和自然語言處理模型的提示和技術。您將學習如何過濾和總結數據,以獲得見解,更好地理解哪些是合理的,哪些不是,並發現如何操作數據以解決您所識別的問題。接下來,您將學習使用監督學習和朴素貝葉斯分析的食譜,以識別意外值和分類錯誤,並生成探索性數據分析(EDA)的可視化,以識別意外值。最後,您將構建可以在獲得新數據時無需修改即可重用的函數和類。
在本書結束時,您將知道如何清理數據並診斷其中的問題。
您將學到的內容:- 使用OpenAI工具進行各種數據清理任務
- 生成數據集、列和行屬性的摘要
- 在將表格數據導入Pandas時預測數據清理問題
- 對導入的表格數據應用驗證技術
- 通過使用方法鏈改善您在Python pandas中的生產力
- 識別和解決常見問題,如日期和ID
- 設置索引以簡化數據問題識別
- 使用數據清理為您的機器學習和人工智慧模型準備數據
本書適合誰:本書適合任何希望使用不同Python工具和技術處理混亂、重複和劣質數據的人。本書採用基於食譜的方法,幫助您通過實際示例學習如何清理和管理數據。
您只需具備Python編程的工作知識,即可充分利用本書。