Scala Data Analysis Cookbook
暫譯: Scala 數據分析食譜

Arun Manivannan

  • 出版商: Packt Publishing
  • 出版日期: 2015-10-30
  • 售價: $2,010
  • 貴賓價: 9.5$1,910
  • 語言: 英文
  • 頁數: 254
  • 裝訂: Paperback
  • ISBN: 1784396745
  • ISBN-13: 9781784396749
  • 相關分類: JVM 語言Data Science
  • 海外代購書籍(需單獨結帳)

商品描述

Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes

About This Book

  • Implement Scala in your data analysis using features from Spark, Breeze, and Zeppelin
  • Scale up your data anlytics infrastructure with practical recipes for Scala machine learning
  • Recipes for every stage of the data analysis process, from reading and collecting data to distributed analytics

Who This Book Is For

This book shows data scientists and analysts how to leverage their existing knowledge of Scala for quality and scalable data analysis.

What You Will Learn

  • Familiarize and set up the Breeze and Spark libraries and use data structures
  • Import data from a host of possible sources and create dataframes from CSV
  • Clean, validate and transform data using Scala to pre-process numerical and string data
  • Integrate quintessential machine learning algorithms using Scala stack
  • Bundle and scale up Spark jobs by deploying them into a variety of cluster managers
  • Run streaming and graph analytics in Spark to visualize data, enabling exploratory analysis

In Detail

This book will introduce you to the most popular Scala tools, libraries, and frameworks through practical recipes around loading, manipulating, and preparing your data. It will also help you explore and make sense of your data using stunning and insightfulvisualizations, and machine learning toolkits.

Starting with introductory recipes on utilizing the Breeze and Spark libraries, get to grips withhow to import data from a host of possible sources and how to pre-process numerical, string, and date data. Next, you'll get an understanding of concepts that will help you visualize data using the Apache Zeppelin and Bokeh bindings in Scala, enabling exploratory data analysis. iscover how to program quintessential machine learning algorithms using Spark ML library. Work through steps to scale your machine learning models and deploy them into a standalone cluster, EC2, YARN, and Mesos. Finally dip into the powerful options presented by Spark Streaming, and machine learning for streaming data, as well as utilizing Spark GraphX.

Style and approach

This book contains a rich set of recipes that covers the full spectrum of interesting data analysis tasks and will help you revolutionize your data analysis skills using Scala and Spark.

商品描述(中文翻譯)

探索數據分析、可視化和機器學習的世界,透過超過100個實用的Scala食譜

本書介紹


  • 使用Spark、Breeze和Zeppelin的功能在數據分析中實現Scala

  • 透過實用的Scala機器學習食譜擴展您的數據分析基礎設施

  • 涵蓋數據分析過程每個階段的食譜,從讀取和收集數據到分散式分析

本書適合誰

本書向數據科學家和分析師展示如何利用他們現有的Scala知識進行高品質和可擴展的數據分析。

您將學到什麼


  • 熟悉並設置Breeze和Spark庫,並使用數據結構

  • 從各種可能的來源導入數據,並從CSV創建數據框

  • 使用Scala清理、驗證和轉換數據,以預處理數值和字符串數據

  • 使用Scala堆疊整合典型的機器學習算法

  • 通過將Spark作業部署到各種集群管理器來打包和擴展Spark作業

  • 在Spark中運行流式和圖形分析以可視化數據,實現探索性分析

詳細內容

本書將通過實用的食譜介紹您最受歡迎的Scala工具、庫和框架,涵蓋加載、操作和準備數據的過程。它還將幫助您使用驚人且具洞察力的可視化和機器學習工具包來探索和理解您的數據。

從利用Breeze和Spark庫的入門食譜開始,掌握如何從各種可能的來源導入數據,以及如何預處理數值、字符串和日期數據。接下來,您將了解幫助您使用Apache Zeppelin和Bokeh綁定在Scala中可視化數據的概念,實現探索性數據分析。了解如何使用Spark ML庫編程典型的機器學習算法。逐步學習如何擴展您的機器學習模型並將其部署到獨立集群、EC2、YARN和Mesos。最後,深入了解Spark Streaming和流式數據的機器學習所提供的強大選項,以及利用Spark GraphX。

風格與方法

本書包含一系列豐富的食譜,涵蓋了各種有趣的數據分析任務,將幫助您使用Scala和Spark徹底改變您的數據分析技能。