Machine Learning with PySpark: With Natural Language Processing and Recommender Systems
暫譯: 使用 PySpark 的機器學習:自然語言處理與推薦系統

Pramod Singh

買這商品的人也買了...

相關主題

商品描述

Build machine learning models, natural language processing applications, and recommender systems with PySpark to solve various business challenges. This book starts with the fundamentals of Spark and its evolution and then covers the entire spectrum of traditional machine learning algorithms along with natural language processing and recommender systems using PySpark. 
 
Machine Learning with PySpark shows you how to build supervised machine learning models such as linear regression, logistic regression, decision trees, and random forest. You’ll also see unsupervised machine learning models such as K-means and hierarchical clustering. A major portion of the book focuses on feature engineering to create useful features with PySpark to train the machine learning models. The natural language processing section covers text processing, text mining, and embedding for classification. 
 
After reading this book, you will understand how to use PySpark’s machine learning library to build and train various machine learning models. Additionally you’ll become comfortable with related PySpark components, such as data ingestion, data processing, and data analysis, that you can use to develop data-driven intelligent applications.
What You Will Learn
  • Build a spectrum of supervised and unsupervised machine learning algorithms
  • Implement machine learning algorithms with Spark MLlib libraries
  • Develop a recommender system with Spark MLlib libraries
  • Handle issues related to feature engineering, class balance, bias and variance, and cross validation for building an optimal fit model
 
Who This Book Is For 
 
Data science and machine learning professionals. 
 
 

商品描述(中文翻譯)

建立機器學習模型、自然語言處理應用程式和推薦系統,使用 PySpark 解決各種商業挑戰。本書從 Spark 的基本概念及其演變開始,然後涵蓋傳統機器學習演算法的整個範疇,以及使用 PySpark 的自然語言處理和推薦系統。

《Machine Learning with PySpark》將教你如何建立監督式機器學習模型,例如線性回歸、邏輯回歸、決策樹和隨機森林。你還將看到非監督式機器學習模型,例如 K-means 和層次聚類。本書的主要部分專注於特徵工程,使用 PySpark 創建有用的特徵來訓練機器學習模型。自然語言處理部分涵蓋文本處理、文本挖掘和分類的嵌入技術。

閱讀完本書後,你將了解如何使用 PySpark 的機器學習庫來建立和訓練各種機器學習模型。此外,你將熟悉相關的 PySpark 組件,例如數據攝取、數據處理和數據分析,這些都可以用來開發數據驅動的智能應用程式。

**你將學到什麼**

- 建立一系列監督式和非監督式機器學習演算法
- 使用 Spark MLlib 庫實現機器學習演算法
- 使用 Spark MLlib 庫開發推薦系統
- 處理與特徵工程、類別平衡、偏差與方差以及交叉驗證相關的問題,以建立最佳擬合模型

**本書適合誰**

數據科學和機器學習專業人士。