Spark for Python Developers (Paperback)
暫譯: Python 開發者的 Spark 實戰指南 (平裝本)

Amit Nandi

  • 出版商: Packt Publishing
  • 出版日期: 2015-12-24
  • 售價: $1,330
  • 貴賓價: 9.5$1,264
  • 語言: 英文
  • 頁數: 206
  • 裝訂: Paperback
  • ISBN: 1784399698
  • ISBN-13: 9781784399696
  • 相關分類: Python程式語言Spark
  • 立即出貨 (庫存=1)

買這商品的人也買了...

相關主題

商品描述

Key Features

  • Set up real-time streaming and batch data intensive infrastructure using Spark and Python
  • Deliver insightful visualizations in a web app using Spark (PySpark)
  • Inject live data using Spark Streaming with real-time events

Book Description

Looking for a cluster computing system that provides high-level APIs? Apache Spark is your answer―an open source, fast, and general purpose cluster computing system. Spark's multi-stage memory primitives provide performance up to 100 times faster than Hadoop, and it is also well-suited for machine learning algorithms.

Are you a Python developer inclined to work with Spark engine? If so, this book will be your companion as you create data-intensive app using Spark as a processing engine, Python visualization libraries, and web frameworks such as Flask.

To begin with, you will learn the most effective way to install the Python development environment powered by Spark, Blaze, and Bookeh. You will then find out how to connect with data stores such as MySQL, MongoDB, Cassandra, and Hadoop.

You'll expand your skills throughout, getting familiarized with the various data sources (Github, Twitter, Meetup, and Blogs), their data structures, and solutions to effectively tackle complexities. You'll explore datasets using iPython Notebook and will discover how to optimize the data models and pipeline. Finally, you'll get to know how to create training datasets and train the machine learning models.

By the end of the book, you will have created a real-time and insightful trend tracker data-intensive app with Spark.

What you will learn

  • Create a Python development environment powered by Spark (PySpark), Blaze, and Bookeh
  • Build a real-time trend tracker data intensive app
  • Visualize the trends and insights gained from data using Bookeh
  • Generate insights from data using machine learning through Spark MLLIB
  • Juggle with data using Blaze
  • Create training data sets and train the Machine Learning models
  • Test the machine learning models on test datasets
  • Deploy the machine learning algorithms and models and scale it for real-time events

About the Author

Amit Nandi studied physics at the Free University of Brussels in Belgium, where he did his research on computer generated holograms. Computer generated holograms are the key components of an optical computer, which is powered by photons running at the speed of light. He then worked with the university Cray supercomputer, sending batch jobs of programs written in Fortran. This gave him a taste for computing, which kept growing. He has worked extensively on large business reengineering initiatives, using SAP as the main enabler. He focused for the last 15 years on start-ups in the data space, pioneering new areas of the information technology landscape. He is currently focusing on large-scale data-intensive applications as an enterprise architect, data engineer, and software developer. He understands and speaks seven human languages. Although Python is his computer language of choice, he aims to be able to write fluently in seven computer languages too.

Table of Contents

  1. Setting Up a Spark Virtual Environment
  2. Building Batch and Streaming Apps with Spark
  3. Juggling Data with Spark
  4. Learning from Data Using Spark
  5. Streaming Live Data with Spark
  6. Visualizing Insights and Trends

商品描述(中文翻譯)

**主要特點**
- 使用 Spark 和 Python 設置實時串流和批量數據密集型基礎設施
- 在網頁應用中使用 Spark (PySpark) 提供深入的可視化
- 使用 Spark Streaming 注入實時事件的實時數據

**書籍描述**
尋找提供高級 API 的叢集計算系統嗎?Apache Spark 是您的答案——一個開源、快速且通用的叢集計算系統。Spark 的多階段記憶體原語提供的性能比 Hadoop 快高達 100 倍,並且非常適合機器學習算法。

您是一位傾向於使用 Spark 引擎的 Python 開發者嗎?如果是的話,這本書將成為您創建數據密集型應用的夥伴,使用 Spark 作為處理引擎,Python 可視化庫,以及 Flask 等網頁框架。

首先,您將學習如何有效地安裝由 Spark、Blaze 和 Bookeh 提供支持的 Python 開發環境。接著,您將了解如何連接到數據存儲,如 MySQL、MongoDB、Cassandra 和 Hadoop。

在整個過程中,您將擴展您的技能,熟悉各種數據來源(Github、Twitter、Meetup 和部落格)、它們的數據結構,以及有效應對複雜性的解決方案。您將使用 iPython Notebook 探索數據集,並發現如何優化數據模型和管道。最後,您將了解如何創建訓練數據集並訓練機器學習模型。

到書籍結束時,您將創建一個基於 Spark 的實時且深入的趨勢追蹤數據密集型應用。

**您將學到的內容**
- 創建由 Spark (PySpark)、Blaze 和 Bookeh 提供支持的 Python 開發環境
- 構建一個實時趨勢追蹤數據密集型應用
- 使用 Bookeh 可視化從數據中獲得的趨勢和見解
- 通過 Spark MLLIB 使用機器學習從數據中生成見解
- 使用 Blaze 處理數據
- 創建訓練數據集並訓練機器學習模型
- 在測試數據集上測試機器學習模型
- 部署機器學習算法和模型,並為實時事件進行擴展

**關於作者**
**Amit Nandi** 在比利時布魯塞爾自由大學學習物理,並在那裡進行了計算機生成全息圖的研究。計算機生成全息圖是光學計算機的關鍵組件,這種計算機由以光速運行的光子驅動。之後,他與大學的 Cray 超級計算機合作,發送用 Fortran 編寫的批量作業。這讓他品嚐到了計算的樂趣,並不斷增長。他在大型商業重組計劃中廣泛工作,以 SAP 作為主要推動力。在過去的 15 年中,他專注於數據領域的初創企業,開創了信息技術領域的新領域。目前,他專注於大型數據密集型應用,擔任企業架構師、數據工程師和軟體開發人員。他懂得並能說七種人類語言。雖然 Python 是他首選的計算機語言,但他也希望能流利地使用七種計算機語言。

**目錄**
1. 設置 Spark 虛擬環境
2. 使用 Spark 構建批量和串流應用
3. 使用 Spark 處理數據
4. 使用 Spark 從數據中學習
5. 使用 Spark 串流實時數據
6. 可視化見解和趨勢