Finding Ghosts in Your Data: Anomaly Detection Techniques with Examples in Python (Paperback)
暫譯: 在資料中尋找異常:使用 Python 的異常檢測技術與範例 (平裝本)

Feasel, Kevin

  • 出版商: Apress
  • 出版日期: 2022-11-10
  • 售價: $1,750
  • 貴賓價: 9.5$1,663
  • 語言: 英文
  • 頁數: 363
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1484288696
  • ISBN-13: 9781484288696
  • 相關分類: Python程式語言
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Discover key information buried in the noise of data by learning a variety of anomaly detection techniques and using the Python programming language to build a robust service for anomaly detection against a variety of data types. The book starts with an overview of what anomalies and outliers are and uses the Gestalt school of psychology to explain just why it is that humans are naturally great at detecting anomalies. From there, you will move into technical definitions of anomalies, moving beyond I know it when I see it to defining things in a way that computers can understand.
The core of the book involves building a robust, deployable anomaly detection service in Python. You will start with a simple anomaly detection service, which will expand over the course of the book to include a variety of valuable anomaly detection techniques, covering descriptive statistics, clustering, and time series scenarios. Finally, you will compare your anomaly detection service head-to-head with a publicly available cloud offering and see how they perform.
The anomaly detection techniques and examples in this book combine psychology, statistics, mathematics, and Python programming in a way that is easily accessible to software developers. They give you an understanding of what anomalies are and why you are naturally a gifted anomaly detector. Then, they help you to translate your human techniques into algorithms that can be used to program computers to automate the process. You'll develop your own anomaly detection service, extend it using a variety of techniques such as including clustering techniques for multivariate analysis and time series techniques for observing data over time, and compare your service head-on against a commercial service.

What You Will Learn

  • Understand the intuition behind anomalies
  • Convert your intuition into technical descriptions of anomalous data
  • Detect anomalies using statistical tools, such as distributions, variance and standard deviation, robust statistics, and interquartile range
  • Apply state-of-the-art anomaly detection techniques in the realms of clustering and time series analysis
  • Work with common Python packages for outlier detection and time series analysis, such as scikit-learn, PyOD, and tslearn
  • Develop a project from the ground up which finds anomalies in data, starting with simple arrays of numeric data and expanding to include multivariate inputs and even time series data


Who This Book Is For

For software developers with at least some familiarity with the Python programming language, and who would like to understand the science and some of the statistics behind anomaly detection techniques. Readers are not required to have any formal knowledge of statistics as the book introduces relevant concepts along the way.

商品描述(中文翻譯)

透過學習各種異常檢測技術並使用 Python 程式語言來建立一個穩健的異常檢測服務,以應對各種數據類型,發現埋藏在數據噪音中的關鍵信息。本書首先概述了什麼是異常和離群值,並利用格式塔心理學學派來解釋為什麼人類天生擅長檢測異常。接著,您將進入異常的技術定義,從「我看到時就知道」的直觀認知,轉向以計算機能理解的方式來定義異常。

本書的核心在於建立一個穩健且可部署的 Python 異常檢測服務。您將從一個簡單的異常檢測服務開始,隨著書籍的進展,將擴展到包括各種有價值的異常檢測技術,涵蓋描述性統計、聚類和時間序列場景。最後,您將與一個公開可用的雲端服務進行正面比較,看看它們的表現如何。

本書中的異常檢測技術和範例結合了心理學、統計學、數學和 Python 程式設計,以一種軟體開發人員易於理解的方式呈現。它們幫助您理解什麼是異常,以及為什麼您天生就是一位優秀的異常檢測者。然後,它們幫助您將人類的技術轉化為可以用來編程計算機以自動化該過程的算法。您將開發自己的異常檢測服務,使用各種技術擴展它,例如包括多變量分析的聚類技術和觀察數據隨時間變化的時間序列技術,並與商業服務進行正面比較。

您將學到什麼


  • 理解異常背後的直覺

  • 將您的直覺轉化為異常數據的技術描述

  • 使用統計工具檢測異常,例如分佈、變異數和標準差、穩健統計和四分位距

  • 在聚類和時間序列分析領域應用最先進的異常檢測技術

  • 使用常見的 Python 套件進行離群值檢測和時間序列分析,例如 scikit-learn、PyOD 和 tslearn

  • 從零開始開發一個能夠在數據中找到異常的專案,從簡單的數字數組開始,擴展到包括多變量輸入甚至時間序列數據



本書適合誰閱讀



適合對 Python 程式語言有一定熟悉度的軟體開發人員,並希望了解異常檢測技術背後的科學和一些統計知識。讀者不需要具備任何正式的統計知識,因為本書會在過程中介紹相關概念。

作者簡介

​Kevin Feasel is a Microsoft Data Platform MVP and CTO at Faregame Inc, where he specializes in data analytics with T-SQL and R, forcing Spark clusters to do his bidding, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL, president of the Triangle Area SQL Server Users Group, and author of PolyBase Revealed. A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather is nice enough.

作者簡介(中文翻譯)

凱文·費瑟爾 (Kevin Feasel) 是微軟數據平台 MVP 及 Faregame Inc 的首席技術官 (CTO),專注於使用 T-SQL 和 R 進行數據分析,指揮 Spark 集群執行任務,與 Kafka 進行鬥爭,並隨時變出驚喜。他是 Curated SQL 的主要貢獻者,三角區 SQL Server 使用者群的會長,以及《PolyBase Revealed》的作者。居住於北卡羅來納州的達勒姆 (Durham),他在天氣良好時常常騎自行車沿著三角區的步道騎行。