Frank Kane's Taming Big Data with Apache Spark and Python
暫譯: 弗蘭克·凱恩的《用 Apache Spark 和 Python 駕馭大數據》
Frank Kane
- 出版商: Packt Publishing
- 出版日期: 2017-06-30
- 售價: $1,770
- 貴賓價: 9.5 折 $1,682
- 語言: 英文
- 頁數: 296
- 裝訂: Paperback
- ISBN: 1787287947
- ISBN-13: 9781787287945
-
相關分類:
Python、程式語言、Spark、大數據 Big-data
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
$580$458 -
$1,780$1,744 -
$980$774 -
$550$435 -
$580$452 -
$2,002Hadoop: The Definitive Guide, 4/e (Paperback)
-
$1,176Simulation with Arena, 6/e (IE-Paperback)
-
$620$484 -
$352Python 資料分析與挖掘實戰
-
$580$458 -
$1,323Digital Signal Processing First, 2/e (DSP First)(IE-Paerback)
-
$590$460 -
$1,900$1,805 -
$520$411 -
$430$387 -
$2,185Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence (Paperback)
-
$960$912 -
$550$468 -
$500$390 -
$890$757 -
$580$458 -
$480$432 -
$580$458 -
$550$495 -
$500$390
商品描述
Key Features
- Understand how Spark can be distributed across computing clusters
- Develop and run Spark jobs efficiently using Python
- A hands-on tutorial by Frank Kane with over 15 real-world examples teaching you Big Data processing with Spark
Book Description
Frank Kane's Taming Big Data with Apache Spark and Python is your companion to learning Apache Spark in a hands-on manner. Frank will start you off by teaching you how to set up Spark on a single system or on a cluster, and you'll soon move on to analyzing large data sets using Spark RDD, and developing and running effective Spark jobs quickly using Python.
Apache Spark has emerged as the next big thing in the Big Data domain - quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis, making it an essential tool in many modern businesses.
Frank has packed this book with over 15 interactive, fun-filled examples relevant to the real world, and he will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease.
What you will learn
- Find out how you can identify Big Data problems as Spark problems
- Install and run Apache Spark on your computer or on a cluster
- Analyze large data sets across many CPUs using Spark's Resilient Distributed Datasets
- Implement machine learning on Spark using the MLlib library
- Process continuous streams of data in real time using the Spark streaming module
- Perform complex network analysis using Spark's GraphX library
- Use Amazon's Elastic MapReduce service to run your Spark jobs on a cluster
About the Author
My name is Frank Kane. I spent nine years at Amazon and IMDb, wrangling millions of customer ratings and customer transactions to produce things such as personalized recommendations for movies and products and "people who bought this also bought." I tell you, I wish we had Apache Spark back then, when I spent years trying to solve these problems there. I hold 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, I left to start my own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.
Table of Contents
- Getting Started with Spark
- Spark Basics and Simple Examples
- Advanced Examples of Spark Programs
- Running Spark on a Cluster
- SparkSQL, Dataframes and Datasets
- Other Spark Technologies and Libraries
- Where to Go From Here? - Learning More About Spark and Data Science
商品描述(中文翻譯)
#### 主要特點
- 了解 Spark 如何在計算叢集上進行分散式運算
- 使用 Python 高效開發和執行 Spark 任務
- Frank Kane 提供的實作教學,包含超過 15 個真實案例,教你如何使用 Spark 進行大數據處理
#### 書籍描述
Frank Kane 的《使用 Apache Spark 和 Python 駕馭大數據》是你學習 Apache Spark 的實作伴侶。Frank 將從教你如何在單一系統或叢集上設置 Spark 開始,然後你將很快學會使用 Spark RDD 分析大型數據集,並使用 Python 快速開發和執行有效的 Spark 任務。
Apache Spark 已經成為大數據領域的下一個重要技術——在短短幾年內,從一個新興技術迅速崛起為一個成熟的明星。Spark 使你能夠快速從大量數據中提取可行的見解,並且能夠實時進行,這使其成為許多現代企業中不可或缺的工具。
Frank 在這本書中提供了超過 15 個互動且充滿趣味的真實案例,幫助你理解 Spark 生態系統,並輕鬆實現生產級的實時 Spark 專案。
#### 你將學到什麼
- 瞭解如何將大數據問題識別為 Spark 問題
- 在你的電腦或叢集上安裝和運行 Apache Spark
- 使用 Spark 的彈性分散式數據集分析大型數據集,跨多個 CPU
- 使用 MLlib 庫在 Spark 上實現機器學習
- 使用 Spark Streaming 模組實時處理連續數據流
- 使用 Spark 的 GraphX 庫執行複雜的網絡分析
- 使用亞馬遜的 Elastic MapReduce 服務在叢集上運行你的 Spark 任務
#### 關於作者
我的名字是 **Frank Kane**。我在亞馬遜和 IMDb 工作了九年,處理數百萬的客戶評分和交易,產生個性化的電影和產品推薦,以及「購買此商品的人也購買了」的功能。我告訴你,我希望當時有 Apache Spark,因為我花了多年時間試圖解決這些問題。我在分散式計算、數據挖掘和機器學習領域擁有 17 項已授權專利。2012 年,我離開去創辦自己的成功公司 Sundog Software,專注於虛擬現實環境技術,以及教導他人有關大數據分析的知識。
#### 目錄
1. 開始使用 Spark
2. Spark 基礎與簡單範例
3. Spark 程式的進階範例
4. 在叢集上運行 Spark
5. SparkSQL、數據框和數據集
6. 其他 Spark 技術和庫
7. 接下來該怎麼辦? - 繼續學習 Spark 和數據科學