Mastering Parallel Programming with R (Paperback)
暫譯: 精通 R 語言的平行程式設計 (平裝本)

Simon R. Chapple, Eilidh Troup, Thorsten Forster, Terence Sloan

買這商品的人也買了...

商品描述

Master the robust features of R parallel programming to accelerate your data science computations

About This Book

  • Create R programs that exploit the computational capability of your cloud platforms and computers to the fullest
  • Become an expert in writing the most efficient and highest performance parallel algorithms in R
  • Get to grips with the concept of parallelism to accelerate your existing R programs

Who This Book Is For

This book is for R programmers who want to step beyond its inherent single-threaded and restricted memory limitations and learn how to implement highly accelerated and scalable algorithms that are a necessity for the performant processing of Big Data. No previous knowledge of parallelism is required. This book also provides for the more advanced technical programmer seeking to go beyond high level parallel frameworks.

What You Will Learn

  • Create and structure efficient load-balanced parallel computation in R, using R's built-in parallel package
  • Deploy and utilize cloud-based parallel infrastructure from R, including launching a distributed computation on Hadoop running on Amazon Web Services (AWS)
  • Get accustomed to parallel efficiency, and apply simple techniques to benchmark, measure speed and target improvement in your own code
  • Develop complex parallel processing algorithms with the standard Message Passing Interface (MPI) using RMPI, pbdMPI, and SPRINT packages
  • Build and extend a parallel R package (SPRINT) with your own MPI-based routines
  • Implement accelerated numerical functions in R utilizing the vector processing capability of your Graphics Processing Unit (GPU) with OpenCL
  • Understand parallel programming pitfalls, such as deadlock and numerical instability, and the approaches to handle and avoid them
  • Build a task farm master-worker, spatial grid, and hybrid parallel R programs

In Detail

R is one of the most popular programming languages used in data science. Applying R to big data and complex analytic tasks requires the harnessing of scalable compute resources.

Mastering Parallel Programming with R presents a comprehensive and practical treatise on how to build highly scalable and efficient algorithms in R. It will teach you a variety of parallelization techniques, from simple use of R's built-in parallel package versions of lapply(), to high-level AWS cloud-based Hadoop and Apache Spark frameworks. It will also teach you low level scalable parallel programming using RMPI and pbdMPI for message passing, applicable to clusters and supercomputers, and how to exploit thousand-fold simple processor GPUs through ROpenCL. By the end of the book, you will understand the factors that influence parallel efficiency, including assessing code performance and implementing load balancing; pitfalls to avoid, including deadlock and numerical instability issues; how to structure your code and data for the most appropriate type of parallelism for your problem domain; and how to extract the maximum performance from your R code running on a variety of computer systems.

商品描述(中文翻譯)

掌握 R 平行程式設計的強大功能,以加速您的資料科學計算

關於本書


  • 創建 R 程式,充分利用您的雲端平台和計算機的計算能力

  • 成為撰寫最有效率和最高效能平行演算法的 R 專家

  • 掌握平行性概念,以加速您現有的 R 程式

本書適合誰閱讀

本書適合希望超越 R 的固有單執行緒和記憶體限制的 R 程式設計師,學習如何實現高效加速和可擴展的演算法,這對於高效處理大數據是必需的。無需具備平行性方面的先前知識。本書也適合尋求超越高階平行框架的進階技術程式設計師。

您將學到什麼


  • 使用 R 的內建平行套件創建和結構高效的負載平衡平行計算

  • 從 R 部署和利用基於雲端的平行基礎設施,包括在 Amazon Web Services (AWS) 上啟動運行 Hadoop 的分散式計算

  • 熟悉平行效率,並應用簡單技術來基準測試、測量速度並針對自己的程式碼進行改進

  • 使用 RMPI、pbdMPI 和 SPRINT 套件開發複雜的平行處理演算法,並使用標準的訊息傳遞介面 (MPI)

  • 構建和擴展一個平行 R 套件 (SPRINT),並使用您自己的基於 MPI 的例程

  • 利用 OpenCL 在 R 中實現加速的數值函數,利用您的圖形處理單元 (GPU) 的向量處理能力

  • 了解平行程式設計的陷阱,如死鎖和數值不穩定,以及處理和避免這些問題的方法

  • 構建任務農場主從、空間網格和混合平行 R 程式

詳細內容

R 是資料科學中最受歡迎的程式語言之一。將 R 應用於大數據和複雜的分析任務需要利用可擴展的計算資源。

《掌握 R 的平行程式設計》提供了一部全面且實用的論文,講述如何在 R 中構建高效可擴展的演算法。它將教您各種平行化技術,從簡單使用 R 的內建平行套件的 lapply(),到高階的基於 AWS 雲端的 Hadoop 和 Apache Spark 框架。它還將教您使用 RMPI 和 pbdMPI 進行訊息傳遞的低階可擴展平行程式設計,適用於叢集和超級計算機,以及如何通過 ROpenCL 利用千倍簡單處理器 GPU。到本書結束時,您將了解影響平行效率的因素,包括評估程式碼性能和實施負載平衡;需要避免的陷阱,包括死鎖和數值不穩定問題;如何結構您的程式碼和數據,以適應您問題領域的最合適平行性類型;以及如何從在各種計算機系統上運行的 R 程式碼中提取最大性能。