Deep Learning - Hardware Design
暫譯: 深度學習 - 硬體設計

Name: Deep Learning - Hardware Design
Price: 646 TWD
Availability: InStock
Author: Albert Chun Chen Liu, Oscar Ming Kin Law
ISBN: 9869890202

Albert Chun Chen Liu, Oscar Ming Kin Law

出版日期: 2020-03-26
售價: $680
貴賓價: 9.5 折 $646
語言: 英文
頁數: 107
裝訂: 平裝
ISBN: 9869890202
ISBN-13: 9789869890205
相關分類: DeepLearning
相關翻譯: 深度學習 -- 硬體設計 (繁中版)

銷售排行:

🥉 2020 年度英文書銷售排行第 3 名
🥉 2020/10 英文書銷售排行第 3 名
🥇 2020/8 英文書銷售排行第 1 名
🥇 2020/7 英文書銷售排行第 1 名
🥇 2020/6 英文書銷售排行第 1 名
🥇 2020/5 英文書銷售排行第 1 名

立即出貨

買這商品的人也買了...

~~$980~~ $980

C 語言程式設計 + C 語言程式技巧問答實戰 (Kernighan: The C Programming Language, 2/e) (雙書合購)
$958

深度學習
~~$534~~ $507

深入淺出 SSD：固態存儲核心技術、原理與實戰
~~$594~~ $564

芯片驗證漫游指南 : 從系統理論到 UVM 的驗證全視界
~~$580~~ $458

統計之美：人工智慧時代的科學思維
~~$594~~ $564

無人機網絡與通信
~~$708~~ $673

基於 FPGA 與 RISC-V 的嵌入式系統設計
~~$1,200~~ $948

精通機器學習｜使用 Scikit-Learn , Keras 與 TensorFlow, 2/e (Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2/e)
~~$534~~ $507

智能天線：MATLAB 實踐版, 2/e (Smart Antennas with MATLAB, 2/e)
~~$750~~ $675

深度學習 -- 硬體設計
~~$880~~ $695

TinyML｜TensorFlow Lite 機器學習 : 應用 Arduino 與低耗電微控制器 (Tinyml: Machine Learning with Tensorflow Lite on Arduino and Ultra-Low-Power Microcontrollers)
~~$414~~ $393

FPGA 數字圖像採集與處理 — 從理論知識、模擬驗證到板級調試的實例精講
$454

信號完整性 — 深入理解高速數字電路設計
~~$5,120~~ $4,864

5G Radio Access Network Architecture: The Dark Side of 5G (Hardcover)
~~$280~~ $252

初探機器學習－從認識 AI 到 Kaggle競賽 (學AI真簡單系列1)
~~$420~~ $378

認識人工智慧－第四波工業革命
$509

5G 核心網關鍵技術與網絡雲化部署
~~$468~~ $445

智能傳感器技術
$551

自動駕駛算法與芯片設計
$484

射頻噪聲理論和工程應用
~~$479~~ $455

SoC 設計方法與實現, 4/e
~~$594~~ $564

AI 加速器架構設計與實現圖書
~~$774~~ $735

數字 SoC 設計、驗證與實例
~~$570~~ $542

集成電路設計 — 模擬、版圖、綜合、驗證及實踐
~~$479~~ $455

SystemVerilog 數字集成電路功能驗證

商品描述

Preface

In 2012, the convolutional neural network (CNN) technology arrived at major breakthroughs. Since then, deep learning has become widely integrated into daily life via automotive, retail, healthcare and finance products. In 2016, the triumph of Alpha Go, as enabled by reinforcement learning (RL), further proved that the AI revolution is set to transform society––much as did the personal computer (in 1977), internet (in 1994), and the smartphone (in 2007.) Nonetheless, the revolution’s innovative efforts have thus far been focused on software development. Major hardware challenges, such as the following, remain little addressed:

•   Big input data
•   Deep neural network
•   Massive parallel processing
•   Reconfigurable network
•   Memory bottleneck
•   Intensive computation
•   Network pruning
•   Data sparsity

This book reviews various hardware designs, including the CPU, GPU and NPU. It also surveys special features aimed at resolving the above challenges. New hardware may be derived from the following designs for performance and power improvement:

•   Parallel architecture
•   Convolution optimization
•   In-memory computation
•   Near-memory architecture
•   Network optimization

The book is organized as follows:

•   Chapter 1: The neural network and its history
•   Chapter 2: The convolutional neural network model, it’s layer functions, and examples
•   Chapter 3: Parallel architectures––the Intel CPU, Nvidia GPU, Google TPU and Microsoft NPU)
•   Chapter 4: Optimizing convolution––the UCLA DCNN accelerator and MIT Eyeriss DNN
•   Chapter 5: The GT Neurocube architecture and Stanford Tetris DNN process with in-memory computation using Hybrid Memory Cube (HMC)
•   Chapter 6: Near-memory architecture––the ICT DaDianNao supercomputer and UofT Cnvlutin DNN accelerator
•   Chapter 7: Energy-efficient inference engines for network pruning

Future revisions will incorporate new approaches for enhancing deep learning hardware designs alongside other topics, including:

•   Distributive graph theory
•   High speed arithmetic
•   3D neural processing

商品描述(中文翻譯)

前言

在2012年，卷積神經網絡（CNN）技術取得了重大突破。自那時起，深度學習已廣泛融入日常生活，應用於汽車、零售、醫療保健和金融產品中。2016年，Alpha Go的成功，得益於強化學習（RL），進一步證明了人工智慧革命將改變社會——就像個人電腦（1977年）、互聯網（1994年）和智能手機（2007年）一樣。然而，這場革命的創新努力迄今為止主要集中在軟體開發上。以下主要硬體挑戰仍然鮮有解決：

- 大量輸入數據
- 深度神經網絡
- 大規模並行處理
- 可重構網絡
- 記憶瓶頸
- 密集計算
- 網絡剪枝
- 數據稀疏性

本書回顧了各種硬體設計，包括CPU、GPU和NPU。它還調查了旨在解決上述挑戰的特殊功能。新硬體可能源自以下設計，以改善性能和功耗：

- 並行架構
- 卷積優化
- 內存計算
- 近內存架構
- 網絡優化

本書的組織結構如下：

- 第1章：神經網絡及其歷史
- 第2章：卷積神經網絡模型、其層功能及示例
- 第3章：並行架構——Intel CPU、Nvidia GPU、Google TPU和Microsoft NPU
- 第4章：優化卷積——UCLA DCNN加速器和MIT Eyeriss DNN
- 第5章：GT Neurocube架構和斯坦福大學Tetris DNN過程，使用混合記憶立方體（HMC）進行內存計算
- 第6章：近內存架構——ICT DaDianNao超級計算機和多倫多大學Cnvlutin DNN加速器
- 第7章：用於網絡剪枝的能效推理引擎

未來的修訂將納入增強深度學習硬體設計的新方法，以及其他主題，包括：

- 分佈式圖論
- 高速算術
- 3D神經處理

作者簡介

劉峻誠 Albert Chun Chen Liu

創辦人暨執行長

劉峻誠為Kneron創辦人暨執行長，於2015年在美國聖地牙哥創辦耐能。自台灣國立成功大學畢業後，獲得美國雷神公司（Raytheon）獎學金和加州大學獎學金，赴美深造，就讀美國加州大學柏克萊、洛杉磯與聖地牙哥分校的共同研究計劃碩博班，之後取得加州大學(UCLA)電子工程博士學位。劉峻誠先後在高通、三星電子研發中心、晨星半導體（MStar）和Wireless Info 等企業擔任不同的研發和管理職務。於高通任職期間，領導研發團隊獲得9個核心技術專利，榮獲公司的ImpaQt研發大獎。

劉峻誠曾受邀在加州大學開授計算機視覺技術與人工智慧講座課程，也是諸多國際知名學術期刊的技術審稿人，此外，還曾參與美國智產局 IARPA 與貝爾實驗室、NASA前端合作技術開發，在人工智慧、電腦視覺和影像處理領域擁有超過30餘項國際專利，先後在國際重要期刊發表70餘篇論文。

羅明健 Oscar Ming Kin Law

作者簡介(中文翻譯)

劉峻誠 Albert Chun Chen Liu

創辦人暨執行長

劉峻誠為Kneron創辦人暨執行長，於2015年在美國聖地牙哥創辦耐能。自台灣國立成功大學畢業後，獲得美國雷神公司（Raytheon）獎學金和加州大學獎學金，赴美深造，就讀美國加州大學柏克萊、洛杉磯與聖地牙哥分校的共同研究計劃碩博班，之後取得加州大學（UCLA）電子工程博士學位。劉峻誠先後在高通、三星電子研發中心、晨星半導體（MStar）和Wireless Info等企業擔任不同的研發和管理職務。於高通任職期間，領導研發團隊獲得9個核心技術專利，榮獲公司的ImpaQt研發大獎。

劉峻誠曾受邀在加州大學開授計算機視覺技術與人工智慧講座課程，也是諸多國際知名學術期刊的技術審稿人，此外，還曾參與美國智產局（IARPA）與貝爾實驗室、NASA前端合作技術開發，在人工智慧、電腦視覺和影像處理領域擁有超過30餘項國際專利，先後在國際重要期刊發表70餘篇論文。

羅明健 Oscar Ming Kin Law

目錄大綱

1 Introduction .......................................................................................................................................... 5
1.1 History ........................................................................................................................................... 5
1.2 Neural Network ............................................................................................................................. 6
2 Deep Learning ....................................................................................................................................... 7
2.1 Network Model ............................................................................................................................. 7
2.1.1 Convolutional Layer .............................................................................................................. 7
2.1.2 Activation Layer .................................................................................................................... 7
2.1.3 Pooling .................................................................................................................................. 7
2.1.4 Normalization ........................................................................................................................ 7
2.2 Deep Learning Challenges ............................................................................................................. 8
3 Parallel Architecture ............................................................................................................................. 9
3.1 Intel Central Processing Unit (CPU)............................................................................................... 9
3.1.1 Skylake Mesh Architecture ................................................................................................. 10
3.1.2 Intel Ultra Path Interconnect (UPI) ..................................................................................... 12
3.1.3 Sub-NUMA Clustering (SNC) ............................................................................................... 13
3.1.4 Cache Hierarchy Changes .................................................................................................... 14
3.1.5 Advanced Vector Software Extension ................................................................................. 15
3.1.6 Math Kernel Library for Deep Neural Network (MKL-DNN) ............................................... 15
3.2 Nvidia Graphics Processing Unit (GPU) ....................................................................................... 16
3.2.1 Tensor Core Architecture .................................................................................................... 18
3.2.2 Simultaneous Multi-Threading (SMT) ................................................................................. 21
3.2.3 High Bandwidth Memory (HBM2)....................................................................................... 21
3.2.4 NVLink2 Configuration ........................................................................................................ 22
3.3 Google Tensor Processing Unit (TPU) ......................................................................................... 24
3.3.1 System Architecture ............................................................................................................ 25
3.3.2 Multiply-Accumulate (MAC) Systolic Array ......................................................................... 27
3.3.3 New Brain Floating Point Format ........................................................................................ 28
3.3.4 Cloud TPU Configuration ..................................................................................................... 29
3.3.5 Cloud Software Architecture ............................................................................................... 31
3.4 Microsoft Catapult Fabric NPU Processor ................................................................................... 32
3.4.1 System Configuration .......................................................................................................... 32
3.4.2 Neural Processor Architecture ............................................................................................ 32
3.4.3 Matrix-Vector Multiplier ..................................................................................................... 33
3.4.4 Sparse Matrix-Vector Multiplication ................................................................................... 33
4 Convolution Optimization ................................................................................................................... 35
4.1 UCLA DCNN Accelerator .............................................................................................................. 35
4.1.1 System Architecture ............................................................................................................ 35
4.1.2 Filter Decomposition ........................................................................................................... 35
4.1.3 Streaming Architecture ....................................................................................................... 35
4.1.4 Convolution Unit (CU) Engine ............................................................................................. 36
4.1.5 Accumulation (ACCU) Buffer ............................................................................................... 36
4.1.6 Max Pooling......................................................................................................................... 36
4.2 MIT Eyeriss DNN Accelerator ...................................................................................................... 36
4.2.1 Convolution Mapping .......................................................................................................... 37
4.2.2 Row Stationary (RS) Dataflow ............................................................................................. 37
4.2.3 Run-Length Compression (RLC) ........................................................................................... 38
4.2.4 Network-on-Chip (NoC) ...................................................................................................... 38
4.2.5 Row Stationary Plus (RS+) Dataflow ................................................................................... 39
5 In-Memory Hierarchy .......................................................................................................................... 40
5.1 GT Neurocube Architecture ........................................................................................................ 40
5.1.1 Hybrid Memory Cube (HNC) ............................................................................................... 40
5.1.2 Memory Centric Neural Computing (MCNC) ...................................................................... 42
5.1.3 Programmable Neurosequence Generator (PNG) .............................................................. 43
5.2 Stanford Tetris DNN Processor ................................................................................................... 44
5.2.1 Memory Hierarchy .............................................................................................................. 45
5.2.2 In-Memory Accumulation ................................................................................................... 46
5.2.3 Data Scheduling .................................................................................................................. 46
5.2.4 NN Partitioning across Vaults ............................................................................................. 47
6 Near-Memory Architecture ................................................................................................................ 49
6.1 ICT DaDianNao Supercomputer .................................................................................................. 49
6.1.1 Memory Configuration ........................................................................................................ 49
6.1.2 Neural Functional Unit (NFU) .............................................................................................. 49
6.2 UofT Cnvlutin DNN Accelerator .................................................................................................. 49
6.2.1 System Architecture ............................................................................................................ 49
6.2.2 Zero-Free Neuron Array Format (ZFNAf) ............................................................................ 50
6.2.3 Network Pruning ................................................................................................................. 50
6.2.4 Raw or Encoded Format (RoE) ............................................................................................ 51
6.2.5 Vector Ineffectual Activation Identifier Format (VIAI) ........................................................ 51
6.2.6 Zero Memory Overhead Ineffectual Activation Skipping ................................................... 51
7 Network Pruning ................................................................................................................................. 52
7.1 Energy Efficient Inference Engine (EIE) ....................................................................................... 52
7.1.1 Compressed DNN Model ..................................................................................................... 52
7.1.2 Central Control Unit (CCU) .................................................................................................. 52

目錄大綱(中文翻譯)

1 Introduction .......................................................................................................................................... 5

1.1 History ........................................................................................................................................... 5

1.2 Neural Network ............................................................................................................................. 6

2 Deep Learning ....................................................................................................................................... 7

2.1 Network Model ............................................................................................................................. 7

2.1.1 Convolutional Layer .............................................................................................................. 7

2.1.2 Activation Layer .................................................................................................................... 7

2.1.3 Pooling .................................................................................................................................. 7

2.1.4 Normalization ........................................................................................................................ 7

2.2 Deep Learning Challenges ............................................................................................................. 8

3 Parallel Architecture ............................................................................................................................. 9

3.1 Intel Central Processing Unit (CPU)............................................................................................... 9

3.1.1 Skylake Mesh Architecture ................................................................................................. 10

3.1.2 Intel Ultra Path Interconnect (UPI) ..................................................................................... 12

3.1.3 Sub-NUMA Clustering (SNC) ............................................................................................... 13

3.1.4 Cache Hierarchy Changes .................................................................................................... 14

3.1.5 Advanced Vector Software Extension ................................................................................. 15

3.1.6 Math Kernel Library for Deep Neural Network (MKL-DNN) ............................................... 15

3.2 Nvidia Graphics Processing Unit (GPU) ....................................................................................... 16

3.2.1 Tensor Core Architecture .................................................................................................... 18

3.2.2 Simultaneous Multi-Threading (SMT) ................................................................................. 21

3.2.3 High Bandwidth Memory (HBM2)....................................................................................... 21

3.2.4 NVLink2 Configuration ........................................................................................................ 22

3.3 Google Tensor Processing Unit (TPU) ......................................................................................... 24

3.3.1 System Architecture ............................................................................................................ 25

3.3.2 Multiply-Accumulate (MAC) Systolic Array ......................................................................... 27

3.3.3 New Brain Floating Point Format ........................................................................................ 28

3.3.4 Cloud TPU Configuration ..................................................................................................... 29

3.3.5 Cloud Software Architecture ............................................................................................... 31

3.4 Microsoft Catapult Fabric NPU Processor ................................................................................... 32

3.4.1 System Configuration .......................................................................................................... 32

3.4.2 Neural Processor Architecture ............................................................................................ 32

3.4.3 Matrix-Vector Multiplier ..................................................................................................... 33

3.4.4 Sparse Matrix-Vector Multiplication ................................................................................... 33

4 Convolution Optimization ................................................................................................................... 35

4.1 UCLA DCNN Accelerator .............................................................................................................. 35

4.1.1 System Architecture ............................................................................................................ 35

4.1.2 Filter Decomposition ........................................................................................................... 35

4.1.3 Streaming Architecture ....................................................................................................... 35

4.1.4 Convolution Unit (CU) Engine ............................................................................................. 36

4.1.5 Accumulation (ACCU) Buffer ............................................................................................... 36

4.1.6 Max Pooling......................................................................................................................... 36

4.2 MIT Eyeriss DNN Accelerator ...................................................................................................... 36

4.2.1 Convolution Mapping .......................................................................................................... 37

4.2.2 Row Stationary (RS) Dataflow ............................................................................................. 37

4.2.3 Run-Length Compression (RLC) ........................................................................................... 38

4.2.4 Network-on-Chip (NoC) ...................................................................................................... 38

4.2.5 Row Stationary Plus (RS+) Dataflow ................................................................................... 39

5 In-Memory Hierarchy .......................................................................................................................... 40

5.1 GT Neurocube Architecture ........................................................................................................ 40

5.1.1 Hybrid Memory Cube (HNC) ............................................................................................... 40

5.1.2 Memory Centric Neural Computing (MCNC) ...................................................................... 42

5.1.3 Programmable Neurosequence Generator (PNG) .............................................................. 43

5.2 Stanford Tetris DNN Processor ................................................................................................... 44

5.2.1 Memory Hierarchy .............................................................................................................. 45

5.2.2 In-Memory Accumulation ................................................................................................... 46

5.2.3 Data Scheduling .................................................................................................................. 46

5.2.4 NN Partitioning across Vaults ............................................................................................. 47

6 Near-Memory Architecture ................................................................................................................ 49

6.1 ICT DaDianNao Supercomputer .................................................................................................. 49

6.1.1 Memory Configuration ........................................................................................................ 49

6.1.2 Neural Functional Unit (NFU) .............................................................................................. 49

6.2 UofT Cnvlutin DNN Accelerator .................................................................................................. 49

6.2.1 System Architecture ............................................................................................................ 49

6.2.2 Zero-Free Neuron Array Format (ZFNAf) ............................................................................ 50

6.2.3 Network Pruning ................................................................................................................. 50

6.2.4 Raw or Encoded Format (RoE) ............................................................................................ 51

6.2.5 Vector Ineffectual Activation Identifier Format (VIAI) ........................................................ 51

6.2.6 Zero Memory Overhead Ineffectual Activation Skipping ................................................... 51

7 Network Pruning ................................................................................................................................. 52

7.1 Energy Efficient Inference Engine (EIE) ....................................................................................... 52

7.1.1 Compressed DNN Model ..................................................................................................... 52

7.1.2 Central Control Unit (CCU) .................................................................................................. 52