Introducing .Net for Apache Spark: Distributed Processing for Massive Datasets
暫譯: .NET 與 Apache Spark 入門:大規模資料集的分散式處理

Elliott, Ed

  • 出版商: Apress
  • 出版日期: 2021-04-14
  • 售價: $2,100
  • 貴賓價: 9.5$1,995
  • 語言: 英文
  • 頁數: 262
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1484269918
  • ISBN-13: 9781484269916
  • 相關分類: .NETSpark
  • 立即出貨 (庫存=1)

買這商品的人也買了...

相關主題

商品描述

Get started using Apache Spark via C# or F# and the .NET for Apache Spark bindings. This book is an introduction to both Apache Spark and the .NET bindings. Readers new to Apache Spark will get up to speed quickly using Spark for data processing tasks performed against large and very large datasets. You will learn how to combine your knowledge of .NET with Apache Spark to bring massive computing power to bear by distributed processing of extremely large datasets across multiple servers.
This book covers how to get a local instance of Apache Spark running on your developer machine and shows you how to create your first .NET program that uses the Microsoft .NET bindings for Apache Spark. Techniques shown in the book allow you to use Apache Spark to distribute your data processing tasks over multiple compute nodes. You will learn to process data using both batch mode and streaming mode so you can make the right choice depending on whether you are processing an existing dataset or are working against new records in micro-batches as they arrive. The goal of the book is leave you comfortable in bringing the power of Apache Spark to your favorite .NET language.

What You Will Learn

  • Install and configure Spark .NET on Windows, Linux, and macOS
  • Write Apache Spark programs in C# and F# using the .NET bindings
  • Access and invoke the Apache Spark APIs from .NET with the same high performance as Python, Scala, and R
  • Encapsulate functionality in user-defined functions
  • Transform and aggregate large datasets
  • Execute SQL queries against files through Apache Hive
  • Distribute processing of large datasets across multiple servers
  • Create your own batch, streaming, and machine learning programs

Who This Book Is For
.NET developers who want to perform big data processing without having to migrate to Python, Scala, or R; and Apache Spark developers who want to run natively on .NET and take advantage of the C# and F# ecosystems

商品描述(中文翻譯)

開始使用 Apache Spark,透過 C# 或 F# 以及 .NET for Apache Spark 綁定。本書是對 Apache Spark 和 .NET 綁定的介紹。對於新接觸 Apache Spark 的讀者,將能快速上手,使用 Spark 進行針對大型和超大型數據集的數據處理任務。您將學會如何將 .NET 的知識與 Apache Spark 結合,通過在多台伺服器上分散處理極大型數據集,發揮巨大的計算能力。

本書涵蓋如何在您的開發機器上運行本地的 Apache Spark 實例,並展示如何創建您的第一個使用 Microsoft .NET 綁定的 Apache Spark 程式。本書中展示的技術使您能夠使用 Apache Spark 將數據處理任務分散到多個計算節點上。您將學會使用批處理模式和流處理模式來處理數據,以便根據您是處理現有數據集還是處理隨著微批次到達的新記錄來做出正確的選擇。本書的目標是讓您能夠自如地將 Apache Spark 的力量帶入您最喜愛的 .NET 語言。

您將學到什麼


  • 在 Windows、Linux 和 macOS 上安裝和配置 Spark .NET

  • 使用 .NET 綁定在 C# 和 F# 中編寫 Apache Spark 程式

  • 從 .NET 訪問和調用 Apache Spark API,性能與 Python、Scala 和 R 相同

  • 將功能封裝在用戶自定義函數中

  • 轉換和聚合大型數據集

  • 通過 Apache Hive 執行針對文件的 SQL 查詢

  • 在多台伺服器上分散處理大型數據集

  • 創建您自己的批處理、流處理和機器學習程式

本書適合誰

希望在不必遷移到 Python、Scala 或 R 的情況下進行大數據處理的 .NET 開發人員;以及希望在 .NET 上原生運行並利用 C# 和 F# 生態系統的 Apache Spark 開發人員。

作者簡介

Ed Elliott is a data engineer who has been working in IT for 20 years and has focused on data for the last 15 years. He uses Apache Spark at work and has been contributing to the Microsoft .NET for Apache Spark open source project since it was released in 2019. Ed has been blogging and writing since 2014 at his own blog as well as for SQL Server Central and Redgate. He has spoken at a number of events such as SQLBits, SQL Saturday, and the GroupBy conference.

作者簡介(中文翻譯)

Ed Elliott 是一位數據工程師,擁有 20 年的 IT 工作經驗,並在過去 15 年專注於數據領域。他在工作中使用 Apache Spark,自 2019 年該項目發布以來,一直為 Microsoft .NET for Apache Spark 開源項目做出貢獻。自 2014 年以來,Ed 在自己的部落格以及 SQL Server Central 和 Redgate 上撰寫部落格和文章。他曾在多個活動上發表演講,例如 SQLBits、SQL Saturday 和 GroupBy 會議。