Python Feature Engineering Cookbook - Third Edition: A complete guide to crafting powerful features for your machine learning models

Galli, Soledad, Molnar, Christoph

  • 出版商: Packt Publishing
  • 出版日期: 2024-08-30
  • 售價: $1,820
  • 貴賓價: 9.5$1,729
  • 語言: 英文
  • 頁數: 396
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1835883583
  • ISBN-13: 9781835883587
  • 相關分類: Python程式語言Machine Learning
  • 海外代購書籍(需單獨結帳)

商品描述

Leverage the power of Python to build real-world feature engineering and machine learning pipelines ready to be deployed to production

Key Features:

- Craft powerful features from tabular, transactional, and time-series data

- Develop efficient and reproducible real-world feature engineering pipelines

- Optimize data transformation and save valuable time

- Purchase of the print or Kindle book includes a free PDF eBook

Book Description:

Streamline data preprocessing and feature engineering in your machine learning project with this third edition of the Python Feature Engineering Cookbook to make your data preparation more efficient.

This guide addresses common challenges, such as imputing missing values and encoding categorical variables using practical solutions and open source Python libraries.

You'll learn advanced techniques for transforming numerical variables, discretizing variables, and dealing with outliers. Each chapter offers step-by-step instructions and real-world examples, helping you understand when and how to apply various transformations for well-prepared data.

The book explores feature extraction from complex data types such as dates, times, and text. You'll see how to create new features through mathematical operations and decision trees and use advanced tools like Featuretools and tsfresh to extract features from relational data and time series.

By the end, you'll be ready to build reproducible feature engineering pipelines that can be easily deployed into production, optimizing data preprocessing workflows and enhancing machine learning model performance.

What You Will Learn:

- Discover multiple methods to impute missing data effectively

- Encode categorical variables while tackling high cardinality

- Find out how to properly transform, discretize, and scale your variables

- Automate feature extraction from date and time data

- Combine variables strategically to create new and powerful features

- Extract features from transactional data and time series

- Learn methods to extract meaningful features from text data

Who this book is for:

If you're a machine learning or data science enthusiast who wants to learn more about feature engineering, data preprocessing, and how to optimize these tasks, this book is for you. If you already know the basics of feature engineering and are looking to learn more advanced methods to craft powerful features, this book will help you. You should have basic knowledge of Python programming and machine learning to get started.

Table of Contents

- Imputing Missing Data

- Encoding Categorical Variables

- Transforming Numerical Variables

- Performing Variable Discretization

- Working with Outliers

- Extracting Features from Date and Time Variables

- Performing Feature Scaling

- Creating New Features

- Extracting Features from Relational Data with Featuretools

- Creating Features from a Time Series with tsfresh

- Extracting Features from Text Variables

商品描述(中文翻譯)

利用 Python 的強大功能來構建可部署到生產環境的實際特徵工程和機器學習管道

主要特點:
- 從表格、交易和時間序列數據中創建強大的特徵
- 開發高效且可重現的實際特徵工程管道
- 優化數據轉換,節省寶貴時間
- 購買印刷版或 Kindle 書籍可獲得免費 PDF 電子書

書籍描述:
透過這本第三版的 Python 特徵工程食譜,簡化您機器學習專案中的數據預處理和特徵工程,使您的數據準備更加高效。

本指南針對常見挑戰提供實用解決方案,例如填補缺失值和編碼類別變數,並使用開源 Python 庫。

您將學習轉換數值變數、離散化變數和處理異常值的高級技術。每一章都提供逐步指導和實際範例,幫助您了解何時以及如何應用各種轉換以獲得良好準備的數據。

本書探討從複雜數據類型(如日期、時間和文本)中提取特徵。您將看到如何通過數學運算和決策樹創建新特徵,並使用像 Featuretools 和 tsfresh 這樣的高級工具從關聯數據和時間序列中提取特徵。

到最後,您將準備好構建可重現的特徵工程管道,這些管道可以輕鬆部署到生產環境中,優化數據預處理工作流程並提升機器學習模型的性能。

您將學到的內容:
- 發現多種有效填補缺失數據的方法
- 在處理高基數時編碼類別變數
- 瞭解如何正確轉換、離散化和縮放變數
- 自動化從日期和時間數據中提取特徵
- 策略性地組合變數以創建新且強大的特徵
- 從交易數據和時間序列中提取特徵
- 學習從文本數據中提取有意義特徵的方法

本書適合對象:
如果您是機器學習或數據科學愛好者,想要深入了解特徵工程、數據預處理及如何優化這些任務,本書適合您。如果您已經了解特徵工程的基本知識,並希望學習更高級的方法來創建強大的特徵,本書將幫助您。您應具備基本的 Python 編程和機器學習知識以開始學習。

目錄:
- 填補缺失數據
- 編碼類別變數
- 轉換數值變數
- 進行變數離散化
- 處理異常值
- 從日期和時間變數中提取特徵
- 進行特徵縮放
- 創建新特徵
- 使用 Featuretools 從關聯數據中提取特徵
- 使用 tsfresh 從時間序列中創建特徵
- 從文本變數中提取特徵