Deep Learning - Hardware Design

Albert Chun Chen Liu, Oscar Ming Kin Law

  • 出版日期: 2020-03-26
  • 售價: $680
  • 貴賓價: 9.5$646
  • 語言: 英文
  • 頁數: 107
  • 裝訂: 平裝
  • ISBN: 9869890202
  • ISBN-13: 9789869890205
  • 相關分類: DeepLearning
  • 相關翻譯: 深度學習 -- 硬體設計 (繁中版)
  • 銷售排行: 🥉 2020 年度 英文書 銷售排行 第 3 名
    🥉 2020/10 英文書 銷售排行 第 3 名
    🥇 2020/8 英文書 銷售排行 第 1 名
    🥇 2020/7 英文書 銷售排行 第 1 名
    🥇 2020/6 英文書 銷售排行 第 1 名
    🥇 2020/5 英文書 銷售排行 第 1 名

    立即出貨

買這商品的人也買了...

相關主題

商品描述

Preface

In 2012, the convolutional neural network (CNN) technology arrived at major breakthroughs. Since then, deep learning has become widely integrated into daily life via automotive, retail, healthcare and finance products. In 2016, the triumph of Alpha Go, as enabled by reinforcement learning (RL), further proved that the AI revolution is set to transform society––much as did the personal computer (in 1977), internet (in 1994), and the smartphone (in 2007.) Nonetheless, the revolution’s innovative efforts have thus far been focused on software development. Major hardware challenges, such as the following, remain little addressed:

•    Big input data
•    Deep neural network
•    Massive parallel processing
•    Reconfigurable network
•    Memory bottleneck
•    Intensive computation
•    Network pruning
•    Data sparsity

This book reviews various hardware designs, including the CPU, GPU and NPU. It also surveys special features aimed at resolving the above challenges. New hardware may be derived from the following designs for performance and power improvement:

•    Parallel architecture
•    Convolution optimization
•    In-memory computation
•    Near-memory architecture
•    Network optimization

The book is organized as follows:

•    Chapter 1: The neural network and its history
•    Chapter 2: The convolutional neural network model, it’s layer functions, and examples
•    Chapter 3: Parallel architectures––the Intel CPU, Nvidia GPU, Google TPU and Microsoft NPU)
•    Chapter 4: Optimizing convolution––the UCLA DCNN accelerator and MIT Eyeriss DNN
•    Chapter 5: The GT Neurocube architecture and Stanford Tetris DNN process with in-memory computation using Hybrid Memory Cube (HMC)
•    Chapter 6: Near-memory architecture––the ICT DaDianNao supercomputer and UofT Cnvlutin DNN accelerator
•    Chapter 7: Energy-efficient inference engines for network pruning


Future revisions will incorporate new approaches for enhancing deep learning hardware designs alongside other topics, including:

•    Distributive graph theory
•    High speed arithmetic
•    3D neural processing

商品描述(中文翻譯)

前言

2012年,卷積神經網絡(CNN)技術取得了重大突破。從那時起,深度學習已經被廣泛應用於汽車、零售、醫療保健和金融產品中,並融入了日常生活。2016年,由強化學習(RL)實現的Alpha Go的勝利進一步證明了人工智能革命將改變社會,就像個人電腦(1977年)、互聯網(1994年)和智能手機(2007年)一樣。然而,這場革命的創新努力到目前為止主要集中在軟件開發上。以下是一些主要的硬件挑戰,目前尚未得到很好的解決:

- 大量輸入數據
- 深度神經網絡
- 大規模並行處理
- 可重構網絡
- 記憶體瓶頸
- 高強度計算
- 網絡修剪
- 數據稀疏性

本書對各種硬件設計進行了綜述,包括CPU、GPU和NPU。它還介紹了解決上述挑戰的特殊功能。新的硬件設計可以從以下設計中獲得性能和功耗的改進:

- 平行架構
- 卷積優化
- 內存計算
- 靠近內存的架構
- 網絡優化

本書的組織如下:

- 第1章:神經網絡及其歷史
- 第2章:卷積神經網絡模型、層功能和示例
- 第3章:並行架構-英特爾CPU、Nvidia GPU、Google TPU和Microsoft NPU
- 第4章:優化卷積-UCLA DCNN加速器和MIT Eyeriss DNN
- 第5章:GT Neurocube架構和Stanford Tetris DNN使用混合內存立方體(HMC)進行內存計算的過程
- 第6章:靠近內存的架構-ICT DaDianNao超級計算機和UofT Cnvlutin DNN加速器
- 第7章:用於網絡修剪的節能推理引擎

未來的修訂將包括增強深度學習硬件設計的新方法以及其他主題,包括:

- 分散圖論
- 高速算術
- 3D神經處理

作者簡介

劉峻誠 Albert Chun Chen Liu

創辦人暨執行長

劉峻誠為Kneron創辦人暨執行長,於2015年在美國聖地牙哥創辦耐能。自台灣國立成功大學畢業後,獲得美國雷神公司(Raytheon)獎學金和加州大學獎學金,赴美深造,就讀美國加州大學柏克萊、洛杉磯與聖地牙哥分校的共同研究計劃碩博班,之後取得加州大學(UCLA)電子工程博士學位。劉峻誠先後在高通、三星電子研發中心、晨星半導體(MStar)和Wireless Info 等企業擔任不同的研發和管理職務。於高通任職期間,領導研發團隊獲得9個核心技術專利,榮獲公司的ImpaQt研發大獎。

劉峻誠曾受邀在加州大學開授計算機視覺技術與人工智慧講座課程,也是諸多國際知名學術期刊的技術審稿人,此外,還曾參與美國智產局 IARPA 與貝爾實驗室NASA前端合作技術開發,在人工智慧、電腦視覺和影像處理領域擁有超過30餘項國際專利,先後在國際重要期刊發表70餘篇論文。

 

羅明健 Oscar Ming Kin Law

作者簡介(中文翻譯)

劉峻誠 Albert Chun Chen Liu

創辦人暨執行長

劉峻誠為Kneron創辦人暨執行長,於2015年在美國聖地牙哥創辦耐能。自台灣國立成功大學畢業後,獲得美國雷神公司(Raytheon)獎學金和加州大學獎學金,赴美深造,就讀美國加州大學柏克萊、洛杉磯與聖地牙哥分校的共同研究計劃碩博班,之後取得加州大學(UCLA)電子工程博士學位。劉峻誠先後在高通、三星電子研發中心、晨星半導體(MStar)和Wireless Info 等企業擔任不同的研發和管理職務。於高通任職期間,領導研發團隊獲得9個核心技術專利,榮獲公司的ImpaQt研發大獎。

劉峻誠曾受邀在加州大學開授計算機視覺技術與人工智慧講座課程,也是諸多國際知名學術期刊的技術審稿人,此外,還曾參與美國智產局IARPA與貝爾實驗室、NASA前端合作技術開發,在人工智慧、電腦視覺和影像處理領域擁有超過30餘項國際專利,先後在國際重要期刊發表70餘篇論文。

羅明健 Oscar Ming Kin Law

目錄大綱

1 Introduction .......................................................................................................................................... 5
1.1 History ........................................................................................................................................... 5
1.2 Neural Network ............................................................................................................................. 6
2 Deep Learning ....................................................................................................................................... 7
2.1 Network Model ............................................................................................................................. 7
2.1.1 Convolutional Layer .............................................................................................................. 7
2.1.2 Activation Layer .................................................................................................................... 7
2.1.3 Pooling .................................................................................................................................. 7
2.1.4 Normalization ........................................................................................................................ 7
2.2 Deep Learning Challenges ............................................................................................................. 8
3 Parallel Architecture ............................................................................................................................. 9
3.1 Intel Central Processing Unit (CPU)............................................................................................... 9
3.1.1 Skylake Mesh Architecture ................................................................................................. 10
3.1.2 Intel Ultra Path Interconnect (UPI) ..................................................................................... 12
3.1.3 Sub-NUMA Clustering (SNC) ............................................................................................... 13
3.1.4 Cache Hierarchy Changes .................................................................................................... 14
3.1.5 Advanced Vector Software Extension ................................................................................. 15
3.1.6 Math Kernel Library for Deep Neural Network (MKL-DNN) ............................................... 15
3.2 Nvidia Graphics Processing Unit (GPU) ....................................................................................... 16
3.2.1 Tensor Core Architecture .................................................................................................... 18
3.2.2 Simultaneous Multi-Threading (SMT) ................................................................................. 21
3.2.3 High Bandwidth Memory (HBM2)....................................................................................... 21
3.2.4 NVLink2 Configuration ........................................................................................................ 22
3.3 Google Tensor Processing Unit (TPU) ......................................................................................... 24
3.3.1 System Architecture ............................................................................................................ 25
3.3.2 Multiply-Accumulate (MAC) Systolic Array ......................................................................... 27
3.3.3 New Brain Floating Point Format ........................................................................................ 28
3.3.4 Cloud TPU Configuration ..................................................................................................... 29
3.3.5 Cloud Software Architecture ............................................................................................... 31
3.4 Microsoft Catapult Fabric NPU Processor ................................................................................... 32
3.4.1 System Configuration .......................................................................................................... 32
3.4.2 Neural Processor Architecture ............................................................................................ 32
3.4.3 Matrix-Vector Multiplier ..................................................................................................... 33
3.4.4 Sparse Matrix-Vector Multiplication ................................................................................... 33
4 Convolution Optimization ................................................................................................................... 35
4.1 UCLA DCNN Accelerator .............................................................................................................. 35
4.1.1 System Architecture ............................................................................................................ 35
4.1.2 Filter Decomposition ........................................................................................................... 35
4.1.3 Streaming Architecture ....................................................................................................... 35
4.1.4 Convolution Unit (CU) Engine ............................................................................................. 36
4.1.5 Accumulation (ACCU) Buffer ............................................................................................... 36
4.1.6 Max Pooling......................................................................................................................... 36
4.2 MIT Eyeriss DNN Accelerator ...................................................................................................... 36
4.2.1 Convolution Mapping .......................................................................................................... 37
4.2.2 Row Stationary (RS) Dataflow ............................................................................................. 37
4.2.3 Run-Length Compression (RLC) ........................................................................................... 38
4.2.4 Network-on-Chip (NoC) ...................................................................................................... 38
4.2.5 Row Stationary Plus (RS+) Dataflow ................................................................................... 39
5 In-Memory Hierarchy .......................................................................................................................... 40
5.1 GT Neurocube Architecture ........................................................................................................ 40
5.1.1 Hybrid Memory Cube (HNC) ............................................................................................... 40
5.1.2 Memory Centric Neural Computing (MCNC) ...................................................................... 42
5.1.3 Programmable Neurosequence Generator (PNG) .............................................................. 43
5.2 Stanford Tetris DNN Processor ................................................................................................... 44
5.2.1 Memory Hierarchy .............................................................................................................. 45
5.2.2 In-Memory Accumulation ................................................................................................... 46
5.2.3 Data Scheduling .................................................................................................................. 46
5.2.4 NN Partitioning across Vaults ............................................................................................. 47
6 Near-Memory Architecture ................................................................................................................ 49
6.1 ICT DaDianNao Supercomputer .................................................................................................. 49
6.1.1 Memory Configuration ........................................................................................................ 49
6.1.2 Neural Functional Unit (NFU) .............................................................................................. 49
6.2 UofT Cnvlutin DNN Accelerator .................................................................................................. 49
6.2.1 System Architecture ............................................................................................................ 49
6.2.2 Zero-Free Neuron Array Format (ZFNAf) ............................................................................ 50
6.2.3 Network Pruning ................................................................................................................. 50
6.2.4 Raw or Encoded Format (RoE) ............................................................................................ 51
6.2.5 Vector Ineffectual Activation Identifier Format (VIAI) ........................................................ 51
6.2.6 Zero Memory Overhead Ineffectual Activation Skipping ................................................... 51
7 Network Pruning ................................................................................................................................. 52
7.1 Energy Efficient Inference Engine (EIE) ....................................................................................... 52
7.1.1 Compressed DNN Model ..................................................................................................... 52
7.1.2 Central Control Unit (CCU) .................................................................................................. 52

目錄大綱(中文翻譯)

1 簡介 .......................................................................................................................................... 5

1.1 歷史 ........................................................................................................................................... 5

1.2 神經網絡 ............................................................................................................................. 6

2 深度學習 ....................................................................................................................................... 7

2.1 網絡模型 ............................................................................................................................. 7

2.1.1 卷積層 .............................................................................................................. 7

2.1.2 激活層 .................................................................................................................... 7

2.1.3 池化 .................................................................................................................................. 7

2.1.4 正規化 ........................................................................................................................ 7

2.2 深度學習挑戰 ............................................................................................................. 8

3 平行架構 ............................................................................................................................. 9

3.1 Intel 中央處理器 (CPU)............................................................................................... 9

3.1.1 Skylake 網狀架構 ................................................................................................. 10

3.1.2 Intel Ultra Path Interconnect (UPI) ..................................................................................... 12

3.1.3 子非一致性記憶體架構 (SNC) ............................................................................................... 13

3.1.4 快取階層變更 .................................................................................................... 14

3.1.5 深度神經網絡數學核心函式庫 (MKL-DNN) ............................................... 15

3.2 Nvidia 圖形處理器 (GPU) ....................................................................................... 16

3.2.1 張量核心架構 .................................................................................................... 18

3.2.2 同時多執行緒 (SMT) ................................................................................. 21

3.2.3 高頻寬記憶體 (HBM2)....................................................................................... 21

3.2.4 NVLink2 配置 ........................................................................................................ 22

3.3 Google 張量處理器 (TPU) ......................................................................................... 24

3.3.1 系統架構 ............................................................................................................ 25

3.3.2 乘加 (MAC) 整列陣列 ......................................................................... 27

3.3.3 新腦浮點數格式 ........................................................................................ 28

3.3.4 雲端 TPU 配置 ..........................................................................