Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools
暫譯: 有效數據科學的數據清理:使用 Python、R 和命令行工具完成其他 80% 的工作
Mertz, David
- 出版商: Packt Publishing
- 出版日期: 2021-03-31
- 售價: $1,840
- 貴賓價: 9.5 折 $1,748
- 語言: 英文
- 頁數: 498
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1801071292
- ISBN-13: 9781801071291
-
相關分類:
Python、程式語言、Data Science
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
$474$450 -
$500$425
商品描述
A comprehensive guide for data scientists to master effective data cleaning tools and techniques
Key Features:
- Master data cleaning techniques in a language-agnostic manner
- Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing
- Work with detailed, commented, well-tested code samples in Python and R
Book Description:
It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David's signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results.
The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired.
You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration.
Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals.
By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks.
What You Will Learn:
- Identify problem data pertaining to individual data points
- Detect problem data in the systematic "shape" of the data
- Remediate data integrity and hygiene problems
- Prepare data for analytic and machine learning tasks
- Impute values into missing or unreliable data
- Generate synthetic features that are more amenable to data science, data analysis, or visualization goals.
Who this book is for:
This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing.
Basic familiarity with statistics, general concepts in machine learning, knowledge of a programming language (Python or R), and some exposure to data science are helpful. A glossary, references, and friendly asides should help bring all readers up to speed.
The text will also be helpful to intermediate and advanced data scientists who want to improve their rigor in data hygiene and wish for a refresher on data preparation issues.
商品描述(中文翻譯)
數據科學家掌握有效數據清理工具和技術的全面指南
主要特點:
- 以語言無關的方式掌握數據清理技術
- 從生物學、氣象數據、人口統計學、物理學、時間序列和圖像處理等多個領域的有趣實作範例中學習
- 使用詳細註解、經過良好測試的 Python 和 R 代碼範例進行實作
書籍描述:
在數據科學、數據分析或機器學習中,有一種公理是,實現實際目的所需的大部分努力都在於清理數據。本書以 David 標誌性的友好和幽默風格詳細討論了每個生產數據科學或數據分析管道中執行的基本步驟,並為數據可視化和建模結果做好準備。
本書深入探討了數據攝取、異常檢測、值插補和特徵工程所需的工具和技術的實際應用。每章結尾還提供長篇練習,以練習所獲得的技能。
您將首先查看數據格式的數據攝取,例如 JSON、CSV、SQL RDBMS、HDF5、NoSQL 數據庫、圖像格式的文件和二進制序列化數據結構。此外,本書提供了許多示例數據集和數據文件,這些文件可供下載和獨立探索。
在格式之後,您將插補缺失值,檢測不可靠數據和統計異常,並生成對成功的數據分析和可視化目標必要的合成特徵。
到本書結束時,您將對執行現實世界數據科學和機器學習任務所需的數據清理過程有堅實的理解。
您將學到的內容:
- 識別與單個數據點相關的問題數據
- 檢測數據的系統性“形狀”中的問題數據
- 修復數據完整性和衛生問題
- 為分析和機器學習任務準備數據
- 對缺失或不可靠的數據進行值插補
- 生成更適合數據科學、數據分析或可視化目標的合成特徵。
本書適合誰:
本書旨在幫助軟體開發人員、數據科學家、渴望成為數據科學家的學生,以及對數據分析或科學計算感興趣的學生。
對統計學的基本熟悉、機器學習的一般概念、編程語言(Python 或 R)的知識,以及對數據科學的某些接觸將是有幫助的。詞彙表、參考資料和友好的附註應該能幫助所有讀者跟上進度。
本書的內容對於希望提高數據衛生嚴謹性並希望對數據準備問題進行複習的中級和高級數據科學家也將是有幫助的。