Statistical Data Cleaning with Applications in R
暫譯: R語言的統計數據清理及其應用

Mark van der Loo, Edwin de Jonge

  • 出版商: Wiley
  • 出版日期: 2018-04-16
  • 定價: $2,600
  • 售價: 8.0$2,080
  • 語言: 英文
  • 頁數: 320
  • 裝訂: Hardcover
  • ISBN: 1118897153
  • ISBN-13: 9781118897157
  • 相關分類: R 語言Data Science
  • 相關翻譯: R統計數據清洗及應用 (簡中版)
  • 立即出貨 (庫存 < 4)

商品描述

A comprehensive guide to automated statistical data cleaning 

The production of clean data is a complex and time-consuming process that requires both technical know-how and statistical expertise. Statistical Data Cleaning brings together a wide range of techniques for cleaning textual, numeric or categorical data. This book examines technical data cleaning methods relating to data representation and data structure. A prominent role is given to statistical data validation, data cleaning based on predefined restrictions, and data cleaning strategy.

Key features:

  • Focuses on the automation of data cleaning methods, including both theory and applications written in R.
    • Enables the reader to design data cleaning processes for either one-off analytical purposes or for setting up production systems that clean data on a regular basis.
    • Explores statistical techniques for solving issues such as incompleteness, contradictions and outliers, integration of data cleaning components and quality monitoring.
    • Supported by an accompanying website featuring data and R code.

This book enables data scientists and statistical analysts working with data to deepen their understanding of data cleaning as well as to upgrade their practical data cleaning skills. It can also be used as material for a course in data cleaning and analyses. 

商品描述(中文翻譯)

自動化統計數據清理的綜合指南

清理數據的過程是一個複雜且耗時的過程,這需要技術專業知識和統計專業知識。《Statistical Data Cleaning》匯集了多種清理文本、數字或類別數據的技術。本書探討了與數據表示和數據結構相關的技術數據清理方法。統計數據驗證、基於預定限制的數據清理以及數據清理策略在本書中佔有重要地位。

主要特點:


  • 專注於數據清理方法的自動化,包括理論和用R語言編寫的應用。

    • 使讀者能夠設計數據清理過程,無論是用於一次性分析目的,還是用於設置定期清理數據的生產系統。

    • 探討解決不完整性、矛盾和異常值等問題的統計技術,數據清理組件的整合以及質量監控。

    • 配有一個附屬網站,提供數據和R代碼。



本書使數據科學家和統計分析師能夠加深對數據清理的理解,並提升他們的實際數據清理技能。它也可以作為數據清理和分析課程的教材。