Python Feature Engineering Cookbook
暫譯: Python 特徵工程食譜

Soledad Galli

  • 出版商: Packt Publishing
  • 出版日期: 2020-01-22
  • 售價: $1,380
  • 貴賓價: 9.5$1,311
  • 語言: 英文
  • 頁數: 372
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1789806313
  • ISBN-13: 9781789806311
  • 相關分類: Python程式語言
  • 立即出貨 (庫存=1)

買這商品的人也買了...

相關主題

商品描述

Key Features

  • Discover solutions for feature generation, feature extraction, and feature selection
  • Uncover the end-to-end feature engineering process across continuous, discrete, and unstructured datasets
  • Implement modern feature extraction techniques using Python's pandas, scikit-learn, SciPy and NumPy libraries

Book Description

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code.

Using Python libraries such as pandas, scikit-learn, Featuretools, and Feature-engine, you'll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This book will cover Python recipes that will help you automate feature engineering to simplify complex processes. You'll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains.

By the end of this book, you'll have discovered tips and practical solutions to all of your feature engineering problems.

What you will learn

  • Simplify your feature engineering pipelines with powerful Python packages
  • Get to grips with imputing missing values
  • Encode categorical variables with a wide set of techniques
  • Extract insights from text quickly and effortlessly
  • Develop features from transactional data and time series data
  • Derive new features by combining existing variables
  • Understand how to transform, discretize, and scale your variables
  • Create informative variables from date and time

Who this book is for

This book is for machine learning professionals, AI engineers, data scientists, and NLP and reinforcement learning engineers who want to optimize and enrich their machine learning models with the best features. Knowledge of machine learning and Python coding will assist you with understanding the concepts covered in this book.

商品描述(中文翻譯)

#### 主要特點

- 探索特徵生成、特徵提取和特徵選擇的解決方案
- 揭示在連續、離散和非結構化數據集中的端到端特徵工程過程
- 使用 Python 的 pandas、scikit-learn、SciPy 和 NumPy 庫實現現代特徵提取技術

#### 書籍描述

特徵工程對於開發和豐富您的機器學習模型是無價的。在這本食譜書中,您將使用最佳工具來簡化您的特徵工程管道和技術,並簡化和改善您的代碼質量。

使用 Python 庫,如 pandas、scikit-learn、Featuretools 和 Feature-engine,您將學會如何處理連續和離散數據集,並能夠從非結構化數據集中轉換特徵。您將發展出選擇最佳特徵以及最合適的提取技術所需的技能。本書將涵蓋 Python 食譜,幫助您自動化特徵工程,以簡化複雜的過程。您還將掌握不同的特徵工程策略,例如 box-cox 變換、冪變換和對數變換,這些策略適用於機器學習、強化學習和自然語言處理 (NLP) 領域。

在本書結束時,您將發現解決所有特徵工程問題的提示和實用解決方案。

#### 您將學到什麼

- 使用強大的 Python 套件簡化您的特徵工程管道
- 熟悉缺失值的填補
- 使用多種技術編碼類別變數
- 快速輕鬆地從文本中提取見解
- 從交易數據和時間序列數據中開發特徵
- 通過組合現有變數推導新特徵
- 理解如何轉換、離散化和縮放您的變數
- 從日期和時間創建信息豐富的變數

#### 本書適合誰

本書適合希望使用最佳特徵來優化和豐富其機器學習模型的機器學習專業人士、AI 工程師、數據科學家以及 NLP 和強化學習工程師。對機器學習和 Python 編碼的了解將幫助您理解本書所涵蓋的概念。

作者簡介

Soledad Galli is a lead data scientist with more than 10 years of experience in world-class academic institutions and renowned businesses. She has researched, developed, and put into production machine learning models for insurance claims, credit risk assessment, and fraud prevention. Soledad received a Data Science Leaders' award in 2018 and was named one of LinkedIn's voices in data science and analytics in 2019. She is passionate about enabling people to step into and excel in data science, which is why she mentors data scientists and speaks at data science meetings regularly. She also teaches online courses on machine learning in a prestigious Massive Open Online Course platform, which have reached more than 10,000 students worldwide.

作者簡介(中文翻譯)

**Soledad Galli** 是一位首席數據科學家,擁有超過 10 年在世界級學術機構和知名企業的經驗。她研究、開發並投入生產的機器學習模型應用於保險索賠、信用風險評估和詐騙預防。Soledad 在 2018 年獲得數據科學領導者獎,並於 2019 年被評選為 LinkedIn 數據科學與分析的聲音之一。她熱衷於幫助人們進入並在數據科學領域中出色表現,因此她定期指導數據科學家並在數據科學會議上發表演講。她還在一個知名的大型開放在線課程平台上教授機器學習的在線課程,這些課程已經吸引了全球超過 10,000 名學生。

目錄大綱

  1. Foreseeing Variable Problems When Building ML Models
  2. Imputing Missing Data
  3. Encoding Categorical Variables
  4. Transforming Numerical Variables
  5. Performing Variable Discretisation
  6. Working with Outliers
  7. Deriving Features from Dates and Time Variables
  8. Performing Feature Scaling
  9. Applying Mathematical Computations to Features
  10. Creating Features with Transactional and Time Series Data
  11. Extracting Features from Text Variables

目錄大綱(中文翻譯)


  1. Foreseeing Variable Problems When Building ML Models

  2. Imputing Missing Data

  3. Encoding Categorical Variables

  4. Transforming Numerical Variables

  5. Performing Variable Discretisation

  6. Working with Outliers

  7. Deriving Features from Dates and Time Variables

  8. Performing Feature Scaling

  9. Applying Mathematical Computations to Features

  10. Creating Features with Transactional and Time Series Data

  11. Extracting Features from Text Variables