Deep Learning - Hardware Design
暫譯: 深度學習 - 硬體設計

Albert Chun Chen Liu, Oscar Ming Kin Law

  • 出版日期: 2020-03-26
  • 售價: $680
  • 貴賓價: 9.5$646
  • 語言: 英文
  • 頁數: 107
  • 裝訂: 平裝
  • ISBN: 9869890202
  • ISBN-13: 9789869890205
  • 相關分類: DeepLearning
  • 相關翻譯: 深度學習 -- 硬體設計 (繁中版)
  • 銷售排行: 🥉 2020 年度 英文書 銷售排行 第 3 名
    🥉 2020/10 英文書 銷售排行 第 3 名
    🥇 2020/8 英文書 銷售排行 第 1 名
    🥇 2020/7 英文書 銷售排行 第 1 名
    🥇 2020/6 英文書 銷售排行 第 1 名
    🥇 2020/5 英文書 銷售排行 第 1 名

    立即出貨

買這商品的人也買了...

相關主題

商品描述

Preface

In 2012, the convolutional neural network (CNN) technology arrived at major breakthroughs. Since then, deep learning has become widely integrated into daily life via automotive, retail, healthcare and finance products. In 2016, the triumph of Alpha Go, as enabled by reinforcement learning (RL), further proved that the AI revolution is set to transform society––much as did the personal computer (in 1977), internet (in 1994), and the smartphone (in 2007.) Nonetheless, the revolution’s innovative efforts have thus far been focused on software development. Major hardware challenges, such as the following, remain little addressed:

•    Big input data
•    Deep neural network
•    Massive parallel processing
•    Reconfigurable network
•    Memory bottleneck
•    Intensive computation
•    Network pruning
•    Data sparsity

This book reviews various hardware designs, including the CPU, GPU and NPU. It also surveys special features aimed at resolving the above challenges. New hardware may be derived from the following designs for performance and power improvement:

•    Parallel architecture
•    Convolution optimization
•    In-memory computation
•    Near-memory architecture
•    Network optimization

The book is organized as follows:

•    Chapter 1: The neural network and its history
•    Chapter 2: The convolutional neural network model, it’s layer functions, and examples
•    Chapter 3: Parallel architectures––the Intel CPU, Nvidia GPU, Google TPU and Microsoft NPU)
•    Chapter 4: Optimizing convolution––the UCLA DCNN accelerator and MIT Eyeriss DNN
•    Chapter 5: The GT Neurocube architecture and Stanford Tetris DNN process with in-memory computation using Hybrid Memory Cube (HMC)
•    Chapter 6: Near-memory architecture––the ICT DaDianNao supercomputer and UofT Cnvlutin DNN accelerator
•    Chapter 7: Energy-efficient inference engines for network pruning


Future revisions will incorporate new approaches for enhancing deep learning hardware designs alongside other topics, including:

•    Distributive graph theory
•    High speed arithmetic
•    3D neural processing

商品描述(中文翻譯)

前言

在2012年,卷積神經網絡(CNN)技術取得了重大突破。自那時起,深度學習已廣泛融入日常生活,應用於汽車、零售、醫療保健和金融產品中。2016年,Alpha Go的成功,得益於強化學習(RL),進一步證明了人工智慧革命將改變社會——就像個人電腦(1977年)、互聯網(1994年)和智能手機(2007年)一樣。然而,這場革命的創新努力迄今為止主要集中在軟體開發上。以下主要硬體挑戰仍然鮮有解決:

- 大量輸入數據
- 深度神經網絡
- 大規模並行處理
- 可重構網絡
- 記憶瓶頸
- 密集計算
- 網絡剪枝
- 數據稀疏性

本書回顧了各種硬體設計,包括CPU、GPU和NPU。它還調查了旨在解決上述挑戰的特殊功能。新硬體可能源自以下設計,以改善性能和功耗:

- 並行架構
- 卷積優化
- 內存計算
- 近內存架構
- 網絡優化

本書的組織結構如下:

- 第1章:神經網絡及其歷史
- 第2章:卷積神經網絡模型、其層功能及示例
- 第3章:並行架構——Intel CPU、Nvidia GPU、Google TPU和Microsoft NPU
- 第4章:優化卷積——UCLA DCNN加速器和MIT Eyeriss DNN
- 第5章:GT Neurocube架構和斯坦福大學Tetris DNN過程,使用混合記憶立方體(HMC)進行內存計算
- 第6章:近內存架構——ICT DaDianNao超級計算機和多倫多大學Cnvlutin DNN加速器
- 第7章:用於網絡剪枝的能效推理引擎

未來的修訂將納入增強深度學習硬體設計的新方法,以及其他主題,包括:

- 分佈式圖論
- 高速算術
- 3D神經處理

作者簡介

劉峻誠 Albert Chun Chen Liu

創辦人暨執行長

劉峻誠為Kneron創辦人暨執行長,於2015年在美國聖地牙哥創辦耐能。自台灣國立成功大學畢業後,獲得美國雷神公司(Raytheon)獎學金和加州大學獎學金,赴美深造,就讀美國加州大學柏克萊、洛杉磯與聖地牙哥分校的共同研究計劃碩博班,之後取得加州大學(UCLA)電子工程博士學位。劉峻誠先後在高通、三星電子研發中心、晨星半導體(MStar)和Wireless Info 等企業擔任不同的研發和管理職務。於高通任職期間,領導研發團隊獲得9個核心技術專利,榮獲公司的ImpaQt研發大獎。

劉峻誠曾受邀在加州大學開授計算機視覺技術與人工智慧講座課程,也是諸多國際知名學術期刊的技術審稿人,此外,還曾參與美國智產局 IARPA 與貝爾實驗室NASA前端合作技術開發,在人工智慧、電腦視覺和影像處理領域擁有超過30餘項國際專利,先後在國際重要期刊發表70餘篇論文。

 

羅明健 Oscar Ming Kin Law

作者簡介(中文翻譯)

劉峻誠 Albert Chun Chen Liu

創辦人暨執行長

劉峻誠為Kneron創辦人暨執行長,於2015年在美國聖地牙哥創辦耐能。自台灣國立成功大學畢業後,獲得美國雷神公司(Raytheon)獎學金和加州大學獎學金,赴美深造,就讀美國加州大學柏克萊、洛杉磯與聖地牙哥分校的共同研究計劃碩博班,之後取得加州大學(UCLA)電子工程博士學位。劉峻誠先後在高通、三星電子研發中心、晨星半導體(MStar)和Wireless Info等企業擔任不同的研發和管理職務。於高通任職期間,領導研發團隊獲得9個核心技術專利,榮獲公司的ImpaQt研發大獎。

劉峻誠曾受邀在加州大學開授計算機視覺技術與人工智慧講座課程,也是諸多國際知名學術期刊的技術審稿人,此外,還曾參與美國智產局(IARPA)與貝爾實驗室、NASA前端合作技術開發,在人工智慧、電腦視覺和影像處理領域擁有超過30餘項國際專利,先後在國際重要期刊發表70餘篇論文。

羅明健 Oscar Ming Kin Law

目錄大綱

1 Introduction .......................................................................................................................................... 5
1.1 History ........................................................................................................................................... 5
1.2 Neural Network ............................................................................................................................. 6
2 Deep Learning ....................................................................................................................................... 7
2.1 Network Model ............................................................................................................................. 7
2.1.1 Convolutional Layer .............................................................................................................. 7
2.1.2 Activation Layer .................................................................................................................... 7
2.1.3 Pooling .................................................................................................................................. 7
2.1.4 Normalization ........................................................................................................................ 7
2.2 Deep Learning Challenges ............................................................................................................. 8
3 Parallel Architecture ............................................................................................................................. 9
3.1 Intel Central Processing Unit (CPU)............................................................................................... 9
3.1.1 Skylake Mesh Architecture ................................................................................................. 10
3.1.2 Intel Ultra Path Interconnect (UPI) ..................................................................................... 12
3.1.3 Sub-NUMA Clustering (SNC) ............................................................................................... 13
3.1.4 Cache Hierarchy Changes .................................................................................................... 14
3.1.5 Advanced Vector Software Extension ................................................................................. 15
3.1.6 Math Kernel Library for Deep Neural Network (MKL-DNN) ............................................... 15
3.2 Nvidia Graphics Processing Unit (GPU) ....................................................................................... 16
3.2.1 Tensor Core Architecture .................................................................................................... 18
3.2.2 Simultaneous Multi-Threading (SMT) ................................................................................. 21
3.2.3 High Bandwidth Memory (HBM2)....................................................................................... 21
3.2.4 NVLink2 Configuration ........................................................................................................ 22
3.3 Google Tensor Processing Unit (TPU) ......................................................................................... 24
3.3.1 System Architecture ............................................................................................................ 25
3.3.2 Multiply-Accumulate (MAC) Systolic Array ......................................................................... 27
3.3.3 New Brain Floating Point Format ........................................................................................ 28
3.3.4 Cloud TPU Configuration ..................................................................................................... 29
3.3.5 Cloud Software Architecture ............................................................................................... 31
3.4 Microsoft Catapult Fabric NPU Processor ................................................................................... 32
3.4.1 System Configuration .......................................................................................................... 32
3.4.2 Neural Processor Architecture ............................................................................................ 32
3.4.3 Matrix-Vector Multiplier ..................................................................................................... 33
3.4.4 Sparse Matrix-Vector Multiplication ................................................................................... 33
4 Convolution Optimization ................................................................................................................... 35
4.1 UCLA DCNN Accelerator .............................................................................................................. 35
4.1.1 System Architecture ............................................................................................................ 35
4.1.2 Filter Decomposition ........................................................................................................... 35
4.1.3 Streaming Architecture ....................................................................................................... 35
4.1.4 Convolution Unit (CU) Engine ............................................................................................. 36
4.1.5 Accumulation (ACCU) Buffer ............................................................................................... 36
4.1.6 Max Pooling......................................................................................................................... 36
4.2 MIT Eyeriss DNN Accelerator ...................................................................................................... 36
4.2.1 Convolution Mapping .......................................................................................................... 37
4.2.2 Row Stationary (RS) Dataflow ............................................................................................. 37
4.2.3 Run-Length Compression (RLC) ........................................................................................... 38
4.2.4 Network-on-Chip (NoC) ...................................................................................................... 38
4.2.5 Row Stationary Plus (RS+) Dataflow ................................................................................... 39
5 In-Memory Hierarchy .......................................................................................................................... 40
5.1 GT Neurocube Architecture ........................................................................................................ 40
5.1.1 Hybrid Memory Cube (HNC) ............................................................................................... 40
5.1.2 Memory Centric Neural Computing (MCNC) ...................................................................... 42
5.1.3 Programmable Neurosequence Generator (PNG) .............................................................. 43
5.2 Stanford Tetris DNN Processor ................................................................................................... 44
5.2.1 Memory Hierarchy .............................................................................................................. 45
5.2.2 In-Memory Accumulation ................................................................................................... 46
5.2.3 Data Scheduling .................................................................................................................. 46
5.2.4 NN Partitioning across Vaults ............................................................................................. 47
6 Near-Memory Architecture ................................................................................................................ 49
6.1 ICT DaDianNao Supercomputer .................................................................................................. 49
6.1.1 Memory Configuration ........................................................................................................ 49
6.1.2 Neural Functional Unit (NFU) .............................................................................................. 49
6.2 UofT Cnvlutin DNN Accelerator .................................................................................................. 49
6.2.1 System Architecture ............................................................................................................ 49
6.2.2 Zero-Free Neuron Array Format (ZFNAf) ............................................................................ 50
6.2.3 Network Pruning ................................................................................................................. 50
6.2.4 Raw or Encoded Format (RoE) ............................................................................................ 51
6.2.5 Vector Ineffectual Activation Identifier Format (VIAI) ........................................................ 51
6.2.6 Zero Memory Overhead Ineffectual Activation Skipping ................................................... 51
7 Network Pruning ................................................................................................................................. 52
7.1 Energy Efficient Inference Engine (EIE) ....................................................................................... 52
7.1.1 Compressed DNN Model ..................................................................................................... 52
7.1.2 Central Control Unit (CCU) .................................................................................................. 52

目錄大綱(中文翻譯)

1 Introduction .......................................................................................................................................... 5

1.1 History ........................................................................................................................................... 5

1.2 Neural Network ............................................................................................................................. 6

2 Deep Learning ....................................................................................................................................... 7

2.1 Network Model ............................................................................................................................. 7

2.1.1 Convolutional Layer .............................................................................................................. 7

2.1.2 Activation Layer .................................................................................................................... 7

2.1.3 Pooling .................................................................................................................................. 7

2.1.4 Normalization ........................................................................................................................ 7

2.2 Deep Learning Challenges ............................................................................................................. 8

3 Parallel Architecture ............................................................................................................................. 9

3.1 Intel Central Processing Unit (CPU)............................................................................................... 9

3.1.1 Skylake Mesh Architecture ................................................................................................. 10

3.1.2 Intel Ultra Path Interconnect (UPI) ..................................................................................... 12

3.1.3 Sub-NUMA Clustering (SNC) ............................................................................................... 13

3.1.4 Cache Hierarchy Changes .................................................................................................... 14

3.1.5 Advanced Vector Software Extension ................................................................................. 15

3.1.6 Math Kernel Library for Deep Neural Network (MKL-DNN) ............................................... 15

3.2 Nvidia Graphics Processing Unit (GPU) ....................................................................................... 16

3.2.1 Tensor Core Architecture .................................................................................................... 18

3.2.2 Simultaneous Multi-Threading (SMT) ................................................................................. 21

3.2.3 High Bandwidth Memory (HBM2)....................................................................................... 21

3.2.4 NVLink2 Configuration ........................................................................................................ 22

3.3 Google Tensor Processing Unit (TPU) ......................................................................................... 24

3.3.1 System Architecture ............................................................................................................ 25

3.3.2 Multiply-Accumulate (MAC) Systolic Array ......................................................................... 27

3.3.3 New Brain Floating Point Format ........................................................................................ 28

3.3.4 Cloud TPU Configuration ..................................................................................................... 29

3.3.5 Cloud Software Architecture ............................................................................................... 31

3.4 Microsoft Catapult Fabric NPU Processor ................................................................................... 32

3.4.1 System Configuration .......................................................................................................... 32

3.4.2 Neural Processor Architecture ............................................................................................ 32

3.4.3 Matrix-Vector Multiplier ..................................................................................................... 33

3.4.4 Sparse Matrix-Vector Multiplication ................................................................................... 33

4 Convolution Optimization ................................................................................................................... 35

4.1 UCLA DCNN Accelerator .............................................................................................................. 35

4.1.1 System Architecture ............................................................................................................ 35

4.1.2 Filter Decomposition ........................................................................................................... 35

4.1.3 Streaming Architecture ....................................................................................................... 35

4.1.4 Convolution Unit (CU) Engine ............................................................................................. 36

4.1.5 Accumulation (ACCU) Buffer ............................................................................................... 36

4.1.6 Max Pooling......................................................................................................................... 36

4.2 MIT Eyeriss DNN Accelerator ...................................................................................................... 36

4.2.1 Convolution Mapping .......................................................................................................... 37

4.2.2 Row Stationary (RS) Dataflow ............................................................................................. 37

4.2.3 Run-Length Compression (RLC) ........................................................................................... 38

4.2.4 Network-on-Chip (NoC) ...................................................................................................... 38

4.2.5 Row Stationary Plus (RS+) Dataflow ................................................................................... 39

5 In-Memory Hierarchy .......................................................................................................................... 40

5.1 GT Neurocube Architecture ........................................................................................................ 40

5.1.1 Hybrid Memory Cube (HNC) ............................................................................................... 40

5.1.2 Memory Centric Neural Computing (MCNC) ...................................................................... 42

5.1.3 Programmable Neurosequence Generator (PNG) .............................................................. 43

5.2 Stanford Tetris DNN Processor ................................................................................................... 44

5.2.1 Memory Hierarchy .............................................................................................................. 45

5.2.2 In-Memory Accumulation ................................................................................................... 46

5.2.3 Data Scheduling .................................................................................................................. 46

5.2.4 NN Partitioning across Vaults ............................................................................................. 47

6 Near-Memory Architecture ................................................................................................................ 49

6.1 ICT DaDianNao Supercomputer .................................................................................................. 49

6.1.1 Memory Configuration ........................................................................................................ 49

6.1.2 Neural Functional Unit (NFU) .............................................................................................. 49

6.2 UofT Cnvlutin DNN Accelerator .................................................................................................. 49

6.2.1 System Architecture ............................................................................................................ 49

6.2.2 Zero-Free Neuron Array Format (ZFNAf) ............................................................................ 50

6.2.3 Network Pruning ................................................................................................................. 50

6.2.4 Raw or Encoded Format (RoE) ............................................................................................ 51

6.2.5 Vector Ineffectual Activation Identifier Format (VIAI) ........................................................ 51

6.2.6 Zero Memory Overhead Ineffectual Activation Skipping ................................................... 51

7 Network Pruning ................................................................................................................................. 52

7.1 Energy Efficient Inference Engine (EIE) ....................................................................................... 52

7.1.1 Compressed DNN Model ..................................................................................................... 52

7.1.2 Central Control Unit (CCU) .................................................................................................. 52