Practical Data Wrangling: Expert techniques for transforming your raw data into a valuable source for analytics
暫譯: 實用數據處理:專家技術將原始數據轉換為分析的寶貴來源
Allan Visochek
- 出版商: Packt Publishing
- 出版日期: 2017-11-21
- 售價: $1,470
- 貴賓價: 9.5 折 $1,397
- 語言: 英文
- 頁數: 288
- 裝訂: Paperback
- ISBN: 1787286134
- ISBN-13: 9781787286139
海外代購書籍(需單獨結帳)
商品描述
Key Features
- This easy-to-follow guide takes you through every step of the data wrangling process in the best possible way
- Work with different types of datasets, and reshape the layout of your data to make it easier for analysis
- Get simple examples and real-life data wrangling solutions for data pre-processing
Book Description
Around 80% of time in data analysis is spent on cleaning and preparing data for analysis. This is, however, an important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis and reporting. Python and R are considered a popular choice of tool for data analysis, and have packages that can be best used to manipulate different kinds of data, as per your requirements. This book will show you the different data wrangling techniques, and how you can leverage the power of Python and R packages to implement them.
You’ll start by understanding the data wrangling process and get a solid foundation to work with different types of data. You’ll work with different data structures and acquire and parse data from various locations. You’ll also see how to reshape the layout of data and manipulate, summarize, and join data sets. Finally, we conclude with a quick primer on accessing and processing data from databases, conducting data exploration, and storing and retrieving data quickly using databases.
The book includes practical examples on each of these points using simple and real-world data sets to give you an easier understanding. By the end of the book, you’ll have a thorough understanding of all the data wrangling concepts and how to implement them in the best possible way.
What you will learn
- Read a csv file into python and R, and print out some statistics on the data
- Gain knowledge of the data formats and programming structures involved in retrieving API data
- Make effective use of regular expressions in the data wrangling process
- Explore the tools and packages available to prepare numerical data for analysis
- Find out how to have better control over manipulating the structure of the data
- Create a dexterity to programmatically read, audit, correct, and shape data
- Write and complete programs to take in, format, and output data sets
About the Author
Allan Visochek is a freelance web developer and data analyst in New Haven, Connecticut. Outside of work, Allan has a deep interest in machine learning and artificial intelligence.
Allan thoroughly enjoys teaching and sharing knowledge. After graduating from the Udacity Data Analyst Nanodegree program, he was contracted to Udacity for several months as a forum mentor and project reviewer, offering guidance to students working on data analysis projects. He has also written technical content for LearnToProgram.
Table of Contents
- Programming with Data
- An Introduction to Programming in Python
- Reading, Writing and Modifying Data in Python I
- Reading, Writing and Modifying Data in Python II
- Text Data and Regular expressions
- Cleaning Numerical Data: An Introduction To R and Rstudio
- Data Munging in R using Dplyr
- Getting data from the web
- Working with really large datasets
商品描述(中文翻譯)
**主要特點**
- 本指南易於遵循,將引導您以最佳方式完成數據整理過程的每一步
- 處理不同類型的數據集,並重塑數據的佈局,以便於分析
- 獲取簡單的示例和現實生活中的數據整理解決方案,用於數據預處理
**書籍描述**
在數據分析中,大約80%的時間花在清理和準備數據上。這是一項重要的任務,也是數據分析工作流程中其他步驟的前提,包括可視化、分析和報告。Python和R被認為是數據分析的熱門工具,並擁有可以根據您的需求操作不同類型數據的最佳套件。本書將向您展示不同的數據整理技術,以及如何利用Python和R套件的力量來實現這些技術。
您將首先了解數據整理過程,並為處理不同類型的數據打下堅實的基礎。您將處理不同的數據結構,並從各種位置獲取和解析數據。您還將看到如何重塑數據的佈局,並操作、總結和聯接數據集。最後,我們將以快速入門結束,介紹如何訪問和處理數據庫中的數據,進行數據探索,以及如何使用數據庫快速存儲和檢索數據。
本書包含針對每個要點的實用示例,使用簡單且真實的數據集,以便您更容易理解。到書籍結束時,您將對所有數據整理概念有透徹的理解,並知道如何以最佳方式實施它們。
**您將學到的內容**
- 將csv文件讀入Python和R,並打印出數據的一些統計信息
- 獲取有關檢索API數據所涉及的數據格式和編程結構的知識
- 在數據整理過程中有效使用正則表達式
- 探索可用於準備數值數據以進行分析的工具和套件
- 瞭解如何更好地控制數據結構的操作
- 培養以編程方式讀取、審核、修正和塑造數據的能力
- 編寫和完成程序以接收、格式化和輸出數據集
**關於作者**
**Allan Visochek** 是康涅狄格州新哈芬的自由網頁開發者和數據分析師。在工作之外,Allan對機器學習和人工智慧有著深厚的興趣。
Allan非常喜歡教學和分享知識。在畢業於Udacity數據分析師納米學位課程後,他被Udacity聘用作為論壇導師和項目審核員,為從事數據分析項目的學生提供指導。他還為LearnToProgram撰寫了技術內容。
**目錄**
1. 與數據編程
2. Python編程入門
3. 在Python中讀取、寫入和修改數據 I
4. 在Python中讀取、寫入和修改數據 II
5. 文本數據和正則表達式
6. 數值數據清理:R和Rstudio入門
7. 使用Dplyr在R中進行數據整理
8. 從網絡獲取數據
9. 處理非常大的數據集