Data Cleaning and Exploration with Machine Learning: Get to grips with machine learning techniques to achieve sparkling-clean data quickly
暫譯: 使用機器學習進行數據清理與探索:掌握機器學習技術以快速獲得潔淨數據
Walker, Michael
- 出版商: Packt Publishing
- 出版日期: 2022-08-26
- 售價: $1,770
- 貴賓價: 9.5 折 $1,682
- 語言: 英文
- 頁數: 542
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1803241675
- ISBN-13: 9781803241678
-
相關分類:
Spark、Machine Learning
海外代購書籍(需單獨結帳)
商品描述
Explore supercharged machine learning techniques to take care of your data laundry loads
Key Features
- Learn how to prepare data for machine learning processes
- Understand which algorithms are based on prediction objectives and the properties of the data
- Explore how to interpret and evaluate the results from machine learning
Book Description
Many individuals who know how to run machine learning algorithms do not have a good sense of the statistical assumptions they make and how to match the properties of the data to the algorithm for the best results.
As you start with this book, models are carefully chosen to help you grasp the underlying data, including in-feature importance and correlation, and the distribution of features and targets. The first two parts of the book introduce you to techniques for preparing data for ML algorithms, without being bashful about using some ML techniques for data cleaning, including anomaly detection and feature selection. The book then helps you apply that knowledge to a wide variety of ML tasks. You'll gain an understanding of popular supervised and unsupervised algorithms, how to prepare data for them, and how to evaluate them. Next, you'll build models and understand the relationships in your data, as well as perform cleaning and exploration tasks with that data. You'll make quick progress in studying the distribution of variables, identifying anomalies, and examining bivariate relationships, as you focus more on the accuracy of predictions in this book.
By the end of this book, you'll be able to deal with complex data problems using unsupervised ML algorithms like principal component analysis and k-means clustering.
What you will learn
- Explore essential data cleaning and exploration techniques to be used before running the most popular machine learning algorithms
- Understand how to perform preprocessing and feature selection, and how to set up the data for testing and validation
- Model continuous targets with supervised learning algorithms
- Model binary and multiclass targets with supervised learning algorithms
- Execute clustering and dimension reduction with unsupervised learning algorithms
- Understand how to use regression trees to model a continuous target
Who this book is for
This book is for professional data scientists, particularly those in the first few years of their career, or more experienced analysts who are relatively new to machine learning. Readers should have prior knowledge of concepts in statistics typically taught in an undergraduate introductory course as well as beginner-level experience in manipulating data programmatically.
商品描述(中文翻譯)
探索強化的機器學習技術,以處理您的數據清理工作
主要特點
- 學習如何為機器學習過程準備數據
- 了解哪些算法基於預測目標及數據的特性
- 探索如何解釋和評估機器學習的結果
書籍描述
許多知道如何運行機器學習算法的人,對他們所做的統計假設及如何將數據的特性與算法匹配以獲得最佳結果並沒有良好的理解。
當您開始閱讀本書時,模型經過精心挑選,以幫助您掌握基礎數據,包括特徵重要性和相關性,以及特徵和目標的分佈。本書的前兩部分介紹了為機器學習算法準備數據的技術,並不避諱使用一些機器學習技術進行數據清理,包括異常檢測和特徵選擇。然後,本書幫助您將這些知識應用於各種機器學習任務。您將了解流行的監督式和非監督式算法,如何為它們準備數據,以及如何評估它們。接下來,您將建立模型並理解數據中的關係,並執行數據的清理和探索任務。在本書中,您將快速進步,研究變數的分佈,識別異常,並檢查雙變量關係,因為您將更專注於預測的準確性。
在本書結束時,您將能夠使用非監督式機器學習算法,如主成分分析和k均值聚類,處理複雜的數據問題。
您將學到的內容
- 探索在運行最流行的機器學習算法之前使用的基本數據清理和探索技術
- 了解如何執行預處理和特徵選擇,以及如何設置數據以進行測試和驗證
- 使用監督式學習算法建模連續目標
- 使用監督式學習算法建模二元和多類目標
- 使用非監督式學習算法執行聚類和降維
- 了解如何使用回歸樹建模連續目標
本書適合誰
本書適合專業數據科學家,特別是那些職業生涯的前幾年,或對機器學習相對較新的更有經驗的分析師。讀者應具備本科入門課程中通常教授的統計概念的先前知識,以及在程序上操作數據的初級經驗。
作者簡介
Michael Walker has worked as a data analyst for over 30 years at a variety of educational institutions. He has also taught data science, research methods, statistics, and computer programming to undergraduates since 2006. He is currently the Chief Information Officer at College Unbound in Providence, Rhode Island.
作者簡介(中文翻譯)
邁克爾·沃克(Michael Walker)在多所教育機構擔任數據分析師已有超過30年的經驗。自2006年以來,他還教授數據科學、研究方法、統計學和計算機程式設計給本科生。目前,他是位於羅德島普羅維登斯的College Unbound的首席資訊官。
目錄大綱
1. Examining the Distribution of Features and Targets
2. Examining Bivariate and Multivariate Relationships between Features and Targets
3. Identifying and Fixing Missing Values
4. Encoding, Transforming, and Scaling Features
5. Feature Selection
6. Preparing for Model Evaluation
7. Linear Regression Models
8. Support Vector Regression
9. K-Nearest Neighbor, Decision Tree, Random Forest and Gradient Boosted Regression
10. Logistic Regression
11. Decision Trees and Random Forest Classification
12. K-Nearest Neighbors for Classification
13. Support Vector Machine Classification
14. Naive Bayes Classification
15. Principal Component Analysis
16. K-Means and DBSCAN Clustering
目錄大綱(中文翻譯)
1. Examining the Distribution of Features and Targets
2. Examining Bivariate and Multivariate Relationships between Features and Targets
3. Identifying and Fixing Missing Values
4. Encoding, Transforming, and Scaling Features
5. Feature Selection
6. Preparing for Model Evaluation
7. Linear Regression Models
8. Support Vector Regression
9. K-Nearest Neighbor, Decision Tree, Random Forest and Gradient Boosted Regression
10. Logistic Regression
11. Decision Trees and Random Forest Classification
12. K-Nearest Neighbors for Classification
13. Support Vector Machine Classification
14. Naive Bayes Classification
15. Principal Component Analysis
16. K-Means and DBSCAN Clustering