Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools
暫譯: 有效數據科學的數據清理：使用 Python、R 和命令行工具完成其他 80% 的工作

Name: Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools
Price: 1748 TWD
Availability: OnlineOnly
Author: Mertz, David
ISBN: 1801071292

Mertz, David

出版商: Packt Publishing
出版日期: 2021-03-31
售價: $1,840
貴賓價: 9.5 折 $1,748
語言: 英文
頁數: 498
裝訂: Quality Paper - also called trade paper
ISBN: 1801071292
ISBN-13: 9781801071291
相關分類: Python、程式語言、Data Science

海外代購書籍(需單獨結帳)

買這商品的人也買了...

~~$474~~ $450

程序員的數學3 : 線性代數
~~$500~~ $425

超圖解資料科學 Data Science：數據處理入門中的入門，強化處理力&判讀力×資料倫理

商品描述

A comprehensive guide for data scientists to master effective data cleaning tools and techniques

Key Features:

Master data cleaning techniques in a language-agnostic manner
Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing
Work with detailed, commented, well-tested code samples in Python and R

Book Description:

It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David's signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results.

The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired.

You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration.

Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals.

By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks.

What You Will Learn:

Identify problem data pertaining to individual data points
Detect problem data in the systematic "shape" of the data
Remediate data integrity and hygiene problems
Prepare data for analytic and machine learning tasks
Impute values into missing or unreliable data
Generate synthetic features that are more amenable to data science, data analysis, or visualization goals.

Who this book is for:

This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing.

Basic familiarity with statistics, general concepts in machine learning, knowledge of a programming language (Python or R), and some exposure to data science are helpful. A glossary, references, and friendly asides should help bring all readers up to speed.

The text will also be helpful to intermediate and advanced data scientists who want to improve their rigor in data hygiene and wish for a refresher on data preparation issues.