Nonlinear Dimensionality Reduction Techniques: A Data Structure Preservation Approach
暫譯: 非線性降維技術:數據結構保留方法

Lespinats, Sylvain, Colange, Benoit, Dutykh, Denys

  • 出版商: Springer
  • 出版日期: 2021-12-03
  • 售價: $5,590
  • 貴賓價: 9.5$5,311
  • 語言: 英文
  • 頁數: 260
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 3030810259
  • ISBN-13: 9783030810252
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

1 Data science context

1.1 Data in a metric space

1.1.1 Measuring dissimilarities and similarities

1.1.2 Neighbourhood ranks

1.1.3 Embedding space notations

1.1.4 Multidimensional data

1.1.5 Sequence data

1.1.6 Network data

1.1.7 A few multidimensional datasets

1.2 Automated tasks

1.2.1 Underlying distribution

1.2.2 Category identification

1.2.3 Data manifold analysis

1.2.4 Model learning

1.2.5 Regression

1.3 Visual exploration

1.3.1 Human in the loop using graphic variables

1.3.2 Spatialization and Gestalt principles

1.3.3 Scatter plots

1.3.4 Parallel coordinates

1.3.5 Colour coding

1.3.6 Multiple coordinated views and visual interaction

1.3.7 Graph drawing

2 Intrinsic dimensionality

2.1 Curse of dimensionality

2.1.1 Data sparsity

2.1.2 Norm concentration

2.2 ID estimation

2.2.1 Covariance-based approaches

2.2.2 Fractal approaches

2.2.3 Towards local estimation

2.3 TIDLE

2.3.1 Gaussian mixture modelling

2.3.2 Test of TIDLE on a two clusters case

3 Map evaluation

3.1 Objective and practical indicators

3.1.1 Subjectivity of indicators .

3.1.2 User studies on specific tasks

3.2 Unsupervised global evaluation

3.2.1 Types of distortions

3.2.2 Link between distortions and mapping continuity

3.2.3 Reasons of distortions ubiquity

3.2.4 Scalar indicators

3.2.5 Aggregation

3.2.6 Diagrams

3.3 Class-aware indicators

3.3.1 Class separation and aggregation

3.3.2 Comparing scores between the two spaces

3.3.3 Class cohesion and distinction

3.3.4 The case of one cluster per class

4 Map interpretation

4.1 Axes recovery

4.1.1 Linear case: biplots

4.1.2 Non-linear case

4.2 Local evaluation

4.2.1 Point-wise aggregation

4.2.2 One to many relations with focus point .

4.2.3 Many to many relations

4.3 MING

4.3.1 Uniform formulation of rank-based indicator

4.3.2 MING graphs

4.3.3 MING analysis for a toy dataset

4.3.4 Impact of MING parameters

4.3.5 Visual clutter

4.3.6 Oil flow

4.3.7 COIL-20 dataset

4.3.8 MING perspectives

5 Unsupervised DR

5.1 Spectral projections

5.1.1 Principal Component Analysis

5.1.2 Classical MultiDimensional Scaling

5.1.3 Kernel methods: Isompap, KPCA, LE

5.2 Non-linear MDS

5.2.1 Metric MultiDimensional Scaling

5.2.2 Non-metric MultiDimensional Scaling

5.3 Neighbourhood Embedding

5.3.1 General principle: SNE

5.3.2 Scale setting

5.3.3 Divergence choice: NeRV and JSE

5.3.4 Symmetrization

5.3.5 Solving the crowding problem: tSNE

5.3.6 Kernel choice

5.3.7 Adaptive Student Kernel Imbedding

5.4 Graph layout

5.4.1 Force directed graph layout: Elastic Embedding

5.4.2 Probabilistic graph layout: LargeVis

5.4.3 Topological method UMAP

5.5 Artificial neural networks

5.5.1 Auto-encoders

5.5.2 IVIS

6 Supervised DR

6.1 Types of supervision

6.1.1 Full supervision

6.1.2 Weak supervision

6.1.3 Semi-supervision

6.2 Parametric with class pur

商品描述(中文翻譯)

1 數據科學背景

1.1 度量空間中的數據

1.1.1 測量不相似性和相似性

1.1.2 鄰域排名

1.1.3 嵌入空間符號

1.1.4 多維數據

1.1.5 序列數據

1.1.6 網絡數據

1.1.7 幾個多維數據集

1.2 自動化任務

1.2.1 潛在分佈

1.2.2 類別識別

1.2.3 數據流形分析

1.2.4 模型學習

1.2.5 回歸

1.3 視覺探索

1.3.1 使用圖形變數的人類參與

1.3.2 空間化和格式塔原則

1.3.3 散佈圖

1.3.4 平行座標

1.3.5 顏色編碼

1.3.6 多重協調視圖和視覺互動

1.3.7 圖形繪製

2 內在維度

2.1 維度詛咒

2.1.1 數據稀疏性

2.1.2 範數集中

2.2 ID 估計

2.2.1 基於協方差的方法

2.2.2 分形方法

2.2.3 朝向局部估計

2.3 TIDLE

2.3.1 高斯混合建模

2.3.2 在兩個聚類案例上測試 TIDLE

3 地圖評估

3.1 目標和實用指標

3.1.1 指標的主觀性

3.1.2 針對特定任務的用戶研究

3.2 無監督的全局評估

3.2.1 扭曲的類型

3.2.2 扭曲與映射連續性之間的聯繫

3.2.3 扭曲普遍性的原因

3.2.4 標量指標

3.2.5 聚合

3.2.6 圖表

3.3 類別感知指標

3.3.1 類別分離和聚合

3.3.2 比較兩個空間之間的分數

3.3.3 類別凝聚力和區別

3.3.4 每個類別一個聚類的情況

4 地圖解釋

4.1 軸恢復

4.1.1 線性情況:雙變量圖

4.1.2 非線性情況

4.2 局部評估

4.2.1 點對點聚合

4.2.2 以焦點為中心的一對多關係

4.2.3 多對多關係

4.3 MING

4.3.1 基於排名指標的統一公式

4.3.2 MING 圖

4.3.3 對玩具數據集的 MING 分析

4.3.4 MING 參數的影響

4.3.5 視覺雜亂

4.3.6 油流

4.3.7 COIL-20 數據集

4.3.8 MING 觀點

5 無監督降維

5.1 頻譜投影

5.1.1 主成分分析

5.1.2 傳統多維縮放

5.1.3 核方法:Isomap, KPCA, LE

5.2 非線性 MDS

5.2.1 度量多維縮放

5.2.2 非度量多維縮放

5.3 鄰域嵌入

5.3.1 一般原則:SNE

5.3.2 標度設定

5.3.3 發散選擇:NeRV 和 JSE

5.3.4 對稱化

5.3.5 解決擁擠問題:t-SNE

5.3.6 核選擇

5.3.7 自適應學生核嵌入

5.4 圖形佈局

5.4.1 力導向圖形佈局:彈性嵌入

5.4.2 機率圖形佈局:LargeVis

5.4.3 拓撲方法 UMAP

5.5 人工神經網絡

5.5.1 自編碼器

5.5.2 IVIS

6 監督降維

6.1 監督類型

6.1.1 完全監督

6.1.2 弱監督

6.1.3 半監督

6.2 參數化與類別純度

作者簡介

After his PhD degree in biomathematics from Pierre and Marie Curie University, Sylvain Lespinats held postdoc positions at several institutions, including INSERM (the French National Institute of Medical Reseach), INREST (the French National Insistute of Transport and Security Research), and some universities and research institutes. He is currently a permanent researcher at CEA-INES (the French National Institute of Solar Energy) near Chambery. He is the author or co-author of about 50 papers and more then ten patents. His work is dedicated to providing ad hoc approaches for data mining and knowledge discovery to his colleagues in various fields, including genomics, virology, quantitiative sociology, transport security, solar energy forecasting, solar plang security, and battery diagnosis. Dr. Lespinats's scientific interests include the exhibition of spatial structures in high dimensional data. In that framework, he developed several non-linear mapping methods and worked on the local evaluation of mappings. Recently he mainly focuses on renewable data to contribute to energy transition.
Benoit Colange graduated from the Ecole Centrale de Lyon and hte Universite Claude Bernard Lyon 1 in France. During his PhD training in collaboration between the CEA-INES and LAMA (Laboratory of Mathematics UMR 5127), he worked toward the connection of new methods for the analysis of metric data, including multidimensional data.The main purpose of this PhD was to provide innovative tools for the diagnosis of energy systems, such as photovoltaic power plants, electrochemical storage systems and smart buildings. His research interests mainly focus on dimensionality reduction and visual exploration of data.
Denys Dutykh completed his PhD at Ecole Normale Superieure de Cachan in 2007 on the topic of mathematical modelling of tsunami waves. He then joined CNRS (the French National Centre of Scientific Research) as a full-time researcher. In 2010 he defended his Habilitation thesis on the topic of mathematical modeling in the environment several years before it became mainstream. In 2012 and 2013 he lent the University College Dublin his expertise to the ERC AdG "Multiwave" project. Upon his return to CNRS in 2014 he started to diversify his research topics to include dimensionality reduction, building physics, electrochemistry, number theory and geometric approaches to Partial Differential Equations. Dr. Dutykh is the author of Numerical Methods for Diffusion Phenomena in Building Physics (Springer, 2019) and Dispersive Shallow Water Waves (Birkhauser, 2020) as well as many contributed book chapters, conference proceedings, and over 100 journal articles.

作者簡介(中文翻譯)

在皮埃爾與瑪麗居里大學獲得生物數學博士學位後,Sylvain Lespinats 在多個機構擔任博士後職位,包括法國國家醫學研究所 (INSERM)、法國國家運輸與安全研究所 (INREST) 以及一些大學和研究機構。他目前是位於尚貝里的法國國家太陽能研究所 (CEA-INES) 的常駐研究員。他是約 50 篇論文的作者或共同作者,並擁有十多項專利。他的工作致力於為各個領域的同事提供專門的數據挖掘和知識發現方法,這些領域包括基因組學、病毒學、定量社會學、運輸安全、太陽能預測、太陽能電廠安全以及電池診斷。Lespinats 博士的科學興趣包括高維數據中空間結構的展示。在這個框架下,他開發了幾種非線性映射方法,並研究了映射的局部評估。最近,他主要專注於可再生數據,以促進能源轉型。
Benoit Colange 畢業於法國里昂中央學院和克勞德·伯納大學 (Universite Claude Bernard Lyon 1)。在 CEA-INES 和 LAMA (數學實驗室 UMR 5127) 的合作下進行博士訓練期間,他致力於分析度量數據的新方法的連接,包括多維數據。這個博士學位的主要目的是為能源系統的診斷提供創新的工具,例如光伏電廠、電化學儲能系統和智能建築。他的研究興趣主要集中在降維和數據的視覺探索上。
Denys Dutykh 於 2007 年在法國卡尚高等師範學校完成了有關海嘯波數學建模的博士學位。隨後,他加入法國國家科學研究中心 (CNRS) 擔任全職研究員。2010 年,他為數學建模的主題辯護了他的資格論文,這一主題在幾年前就已經成為主流。2012 年和 2013 年,他向都柏林大學學院提供了他在 ERC AdG 'Multiwave' 項目中的專業知識。2014 年回到 CNRS 後,他開始多樣化他的研究主題,包括降維、建築物理學、電化學、數論以及偏微分方程的幾何方法。Dutykh 博士是 建築物理學中的擴散現象數值方法 (Springer, 2019) 和 色散淺水波 (Birkhauser, 2020) 的作者,以及許多貢獻的書籍章節、會議論文和超過 100 篇的期刊文章。