Spark for Data Science
暫譯: 數據科學中的 Spark

Srinivas Duvvuri, Bikramaditya Singhal

  • 出版商: Packt Publishing
  • 出版日期: 2016-09-30
  • 售價: $2,220
  • 貴賓價: 9.5$2,109
  • 語言: 英文
  • 頁數: 344
  • 裝訂: Paperback
  • ISBN: 1785885650
  • ISBN-13: 9781785885655
  • 相關分類: SparkData Science
  • 海外代購書籍(需單獨結帳)

商品描述

Analyze your data and delve deep into the world of machine learning with the latest Spark version, 2.0

About This Book

  • Perform data analysis and build predictive models on huge datasets that leverage Apache Spark
  • Learn to integrate data science algorithms and techniques with the fast and scalable computing features of Spark to address big data challenges
  • Work through practical examples on real-world problems with sample code snippets

Who This Book Is For

This book is for anyone who wants to leverage Apache Spark for data science and machine learning. If you are a technologist who wants to expand your knowledge to perform data science operations in Spark, or a data scientist who wants to understand how algorithms are implemented in Spark, or a newbie with minimal development experience who wants to learn about Big Data Analytics, this book is for you!

What You Will Learn

  • Consolidate, clean, and transform your data acquired from various data sources
  • Perform statistical analysis of data to find hidden insights
  • Explore graphical techniques to see what your data looks like
  • Use machine learning techniques to build predictive models
  • Build scalable data products and solutions
  • Start programming using the RDD, DataFrame and Dataset APIs
  • Become an expert by improving your data analytical skills

In Detail

This is the era of Big Data. The words Big Data implies big innovation and enables a competitive advantage for businesses. Apache Spark was designed to perform Big Data analytics at scale, and so Spark is equipped with the necessary algorithms and supports multiple programming languages.

Whether you are a technologist, a data scientist, or a beginner to Big Data analytics, this book will provide you with all the skills necessary to perform statistical data analysis, data visualization, predictive modeling, and build scalable data products or solutions using Python, Scala, and R.

With ample case studies and real-world examples, Spark for Data Science will help you ensure the successful execution of your data science projects.

Style and approach

This book takes a step-by-step approach to statistical analysis and machine learning, and is explained in a conversational and easy-to-follow style. Each topic is explained sequentially with a focus on the fundamentals as well as the advanced concepts of algorithms and techniques. Real-world examples with sample code snippets are also included.

商品描述(中文翻譯)

分析您的數據,深入探索機器學習的世界,使用最新的 Spark 版本 2.0

本書介紹



  • 在利用 Apache Spark 的巨大數據集上執行數據分析和建立預測模型

  • 學習如何將數據科學算法和技術與 Spark 的快速和可擴展計算特性結合,以應對大數據挑戰

  • 通過實際範例解決現實世界的問題,並提供範例代碼片段

本書適合誰


本書適合任何希望利用 Apache Spark 進行數據科學和機器學習的人。如果您是一位技術專家,想擴展知識以在 Spark 中執行數據科學操作,或是一位數據科學家,想了解算法在 Spark 中的實現,或是一位擁有最少開發經驗的新手,想學習大數據分析,本書都非常適合您!

您將學到什麼



  • 整合、清理和轉換來自各種數據來源的數據

  • 對數據進行統計分析以發現隱藏的見解

  • 探索圖形技術以查看您的數據樣貌

  • 使用機器學習技術建立預測模型

  • 構建可擴展的數據產品和解決方案

  • 開始使用 RDD、DataFrame 和 Dataset API 進行編程

  • 通過提高您的數據分析技能成為專家

詳細內容


這是大數據的時代。大數據這個詞意味著巨大的創新,並為企業提供競爭優勢。Apache Spark 被設計用於大規模執行大數據分析,因此 Spark 配備了必要的算法並支持多種編程語言。


無論您是技術專家、數據科學家,還是大數據分析的初學者,本書將為您提供執行統計數據分析、數據可視化、預測建模以及使用 Python、Scala 和 R 構建可擴展數據產品或解決方案所需的所有技能。


通過豐富的案例研究和現實世界的範例,《Spark for Data Science》將幫助您確保成功執行您的數據科學項目。

風格與方法


本書採取逐步的方法來進行統計分析和機器學習,並以對話式和易於理解的風格進行解釋。每個主題都按順序解釋,重點放在基本概念以及算法和技術的高級概念上。還包括現實世界的範例和範例代碼片段。