Learning from Imbalanced Data Sets
暫譯: 從不平衡資料集學習

Alberto Fernández, Salvador García, Mikel Galar, Ronaldo C. Prati, Bartosz Krawczyk, Francisco Herrera

  • 出版商: Springer
  • 出版日期: 2018-11-01
  • 售價: $6,780
  • 貴賓價: 9.5$6,441
  • 語言: 英文
  • 頁數: 377
  • 裝訂: Hardcover
  • ISBN: 3319980734
  • ISBN-13: 9783319980737
  • 海外代購書籍(需單獨結帳)

商品描述

This  book provides a general and comprehensible overview of   imbalanced learning.  It contains a formal description of a problem, and focuses on its main features, and the most relevant proposed solutions. Additionally, it considers the different scenarios in Data Science for which the imbalanced classification can create a real challenge. 

This book stresses the gap with standard classification tasks by reviewing the case studies and ad-hoc performance metrics that are applied in this area. It also covers the different approaches that have been traditionally applied to address the binary skewed class distribution. Specifically, it reviews cost-sensitive learning, data-level preprocessing methods and algorithm-level solutions, taking also into account those ensemble-learning solutions that embed any of the former alternatives. Furthermore, it focuses on the extension of the problem for multi-class problems, where the former classical methods are no longer to be applied in a straightforward way.

This book also focuses on the data intrinsic characteristics that are the main causes which, added to the uneven class distribution, truly hinders the performance of classification algorithms in this scenario. Then, some notes on data reduction are provided in order to understand the advantages related to the use of this type of approaches.

Finally this book introduces some novel areas of study that are gathering a deeper attention on the imbalanced data issue. Specifically, it considers the classification of data streams, non-classical classification problems, and the scalability related to Big Data. Examples of software libraries and modules to address imbalanced classification are provided.

This book is highly suitable for technical professionals, senior undergraduate and graduate students in the areas of data science, computer science and engineering.  It will also be useful for scientists and researchers to gain insight on the current developments in this area of study, as well as future research directions. 

商品描述(中文翻譯)

這本書提供了不平衡學習的一般性和易懂的概述。它包含了問題的正式描述,並專注於其主要特徵以及最相關的提議解決方案。此外,它考慮了數據科學中不平衡分類可能帶來的真正挑戰的不同場景。

本書強調了與標準分類任務之間的差距,通過回顧在這個領域中應用的案例研究和特定的性能指標。它還涵蓋了傳統上用於解決二元偏斜類別分佈的不同方法。具體而言,它回顧了成本敏感學習、數據層級的預處理方法和算法層級的解決方案,同時考慮到那些嵌入前述替代方案的集成學習解決方案。此外,它專注於多類別問題的擴展,這些問題使得前述的經典方法不再能夠直接應用。

本書還專注於數據的內在特徵,這些特徵是造成不均勻類別分佈的主要原因,真正阻礙了分類算法在這種情境下的性能。然後,提供了一些有關數據減少的說明,以便理解使用這類方法的優勢。

最後,本書介紹了一些新興的研究領域,這些領域對不平衡數據問題正獲得更深入的關注。具體而言,它考慮了數據流的分類、非經典分類問題以及與大數據相關的可擴展性。提供了針對不平衡分類的軟體庫和模組的範例。

這本書非常適合數據科學、計算機科學和工程領域的技術專業人士、高年級本科生和研究生。它對於科學家和研究人員也將有助於深入了解該研究領域的當前發展以及未來的研究方向。