Performance Analysis and Tuning for General Purpose Graphics Processing Units (Synthesis Lectures on Computer Architecture)
暫譯: 通用圖形處理單元的性能分析與調整(計算機架構綜合講座)
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, Wen-mei Hwu
- 出版商: Morgan & Claypool
- 出版日期: 2012-11-01
- 售價: $1,460
- 貴賓價: 9.5 折 $1,387
- 語言: 英文
- 頁數: 96
- 裝訂: Paperback
- ISBN: 1608459543
- ISBN-13: 9781608459544
海外代購書籍(需單獨結帳)
買這商品的人也買了...
商品描述
General-purpose graphics processing units (GPGPU) have emerged as an important class of shared memory parallel processing architectures, with widespread deployment in every computer class from high-end supercomputers to embedded mobile platforms. Relative to more traditional multicore systems of today, GPGPUs have distinctly higher degrees of hardware multithreading (hundreds of hardware thread contexts vs. tens), a return to wide vector units (several tens vs. 1-10), memory architectures that deliver higher peak memory bandwidth (hundreds of gigabytes per second vs. tens), and smaller caches/scratchpad memories (less than 1 megabyte vs. 1-10 megabytes). In this book, we provide a high-level overview of current GPGPU architectures and programming models. We review the principles that are used in previous shared memory parallel platforms, focusing on recent results in both the theory and practice of parallel algorithms, and suggest a connection to GPGPU platforms. We aim to provide hints to architects about understanding algorithm aspect to GPGPU. We also provide detailed performance analysis and guide optimizations from high-level algorithms to low-level instruction level optimizations. As a case study, we use n-body particle simulations known as the fast multipole method (FMM) as an example. We also briefly survey the state-of-the-art in GPU performance analysis tools and techniques. Table of Contents: GPU Design, Programming, and Trends / Performance Principles / From Principles to Practice: Analysis and Tuning / Using Detailed Performance Analysis to Guide Optimization
商品描述(中文翻譯)
通用圖形處理單元(GPGPU)已成為一種重要的共享記憶體平行處理架構,廣泛應用於從高端超級電腦到嵌入式行動平台的各類計算機。相較於當今更傳統的多核心系統,GPGPU 在硬體多執行緒方面具有明顯更高的程度(數百個硬體執行緒上下文對比數十個)、回歸寬向量單元(數十個對比1-10個)、提供更高峰值記憶體帶寬的記憶體架構(每秒數百吉位元組對比數十吉位元組),以及更小的快取/暫存記憶體(少於1MB對比1-10MB)。在本書中,我們提供當前 GPGPU 架構和程式設計模型的高層次概述。我們回顧以往共享記憶體平行平台中使用的原則,專注於平行演算法的理論和實踐中的最新成果,並建議與 GPGPU 平台的連結。我們旨在為架構師提供有關理解演算法與 GPGPU 之間關係的提示。我們還提供詳細的性能分析,並指導從高層次演算法到低層次指令級優化的優化過程。作為案例研究,我們使用稱為快速多極法(FMM)的 n-body 粒子模擬作為範例。我們還簡要調查了 GPU 性能分析工具和技術的最新進展。
目錄:GPU 設計、程式設計與趨勢 / 性能原則 / 從原則到實踐:分析與調整 / 使用詳細性能分析指導優化