Scala Programming for Big Data Analytics: Get Started with Big Data Analytics Using Apache Spark
暫譯: Scala 程式設計與大數據分析:使用 Apache Spark 開始大數據分析之旅

Elahi, Irfan

買這商品的人也買了...

商品描述

Gain the key language concepts and programming techniques of Scala in the context of big data analytics and Apache Spark. The book begins by introducing you to Scala and establishes a firm contextual understanding of why you should learn this language, how it stands in comparison to Java, and how Scala is related to Apache Spark for big data analytics. Next, you'll set up the Scala environment ready for examining your first Scala programs. This is followed by sections on Scala fundamentals including mutable/immutable variables, the type hierarchy system, control flow expressions and code blocks.
The author discusses functions at length and highlights a number of associated concepts such as functional programming and anonymous functions. The book then delves deeper into Scala's powerful collections system because many of Apache Spark's APIs bear a strong resemblance to Scala collections.
Along the way you'll see the development life cycle of a Scala program. This involves compiling and building programs using the industry-standard Scala Build Tool (SBT). You'll cover guidelines related to dependency management using SBT as this is critical for building large Apache Spark applications. Scala Programming for Big Data Analytics concludes by demonstrating how you can make use of the concepts to write programs that run on the Apache Spark framework. These programs will provide distributed and parallel computing, which is critical for big data analytics.
What You Will Learn

  • See the fundamentals of Scala as a general-purpose programming language
  • Understand functional programming and object-oriented programming constructs in Scala
  • Use Scala collections and functions
  • Develop, package and run Apache Spark applications for big data analytics

Who This Book Is For
Data scientists, data analysts and data engineers who intend to use Apache Spark for large-scale analytics.

商品描述(中文翻譯)

獲得 Scala 在大數據分析和 Apache Spark 背景下的關鍵語言概念和程式設計技術。本書首先介紹 Scala,並建立對於為何應該學習這門語言的堅實背景理解,說明它與 Java 的比較,以及 Scala 如何與 Apache Spark 相關聯以進行大數據分析。接下來,您將設置 Scala 環境,準備檢查您的第一個 Scala 程式。隨後的章節將涵蓋 Scala 基礎知識,包括可變/不可變變數、類型層次系統、控制流表達式和程式碼區塊。

作者詳細討論了函數,並強調了一些相關概念,如函數式程式設計和匿名函數。本書接著深入探討 Scala 強大的集合系統,因為許多 Apache Spark 的 API 與 Scala 集合有著密切的相似性。

在此過程中,您將看到 Scala 程式的開發生命週期。這涉及使用行業標準的 Scala Build Tool (SBT) 來編譯和構建程式。您將涵蓋與使用 SBT 進行依賴管理相關的指導方針,因為這對於構建大型 Apache Spark 應用程式至關重要。《Scala 程式設計與大數據分析》最後展示了如何利用這些概念來編寫在 Apache Spark 框架上運行的程式。這些程式將提供分散式和並行計算,這對於大數據分析至關重要。

您將學到什麼


  • 了解 Scala 作為通用程式語言的基本概念

  • 理解 Scala 中的函數式程式設計和物件導向程式設計結構

  • 使用 Scala 集合和函數

  • 開發、打包和運行用於大數據分析的 Apache Spark 應用程式

本書適合誰

希望使用 Apache Spark 進行大規模分析的資料科學家、資料分析師和資料工程師。

作者簡介

Irfan Elahi is a senior consultant in Deloitte Australia specializing in big data and machine learning. His primary focus lies in using big data and machine learning to support business growth with multifaceted and strong ties to the telecommunications, energy, retail and media industries. He has worked on a number of projects in Australia to design, prototype, develop, and deploy production-grade big data solutions in Amazon Web Services (AWS) and Azure to support a number of use-cases ranging from enterprise data warehousing, ETL offloading, analytics, batch processing and stream processing while employing leading commercial Hadoop solutions such as Cloudera and Hortonworks. He has worked closely with clients' systems and software engineering teams using DevOps to enhance the continuous integration and continuous deployment (CICD) processes and manage a Hadoop cluster's operations and security.
In addition to his technology competencies, Irfan has recently presented at the DataWorks Summit in Sydney on the subject of in-memory big data technologies and in a number of meetups all around the world. He also remains involved delivering knowledge-transfer sessions, training and workshops about big data and machine learning, both within his firm and at clients. He also has launched Udemy courses on Apache Spark for big data analytics and R programming for data science with more than 18,000 students from 145 countries enrolled.

作者簡介(中文翻譯)

Irfan Elahi 是德勤澳洲的高級顧問,專注於大數據和機器學習。他的主要重點在於利用大數據和機器學習來支持業務增長,並與電信、能源、零售和媒體行業建立多方面且緊密的聯繫。他在澳洲參與了多個項目,設計、原型開發、開發和部署生產級的大數據解決方案,使用 Amazon Web Services (AWS) 和 Azure,以支持從企業數據倉儲、ETL 卸載、分析、批處理到流處理等多種用例,同時採用 Cloudera 和 Hortonworks 等領先的商業 Hadoop 解決方案。他與客戶的系統和軟體工程團隊密切合作,利用 DevOps 來增強持續集成和持續部署 (CICD) 流程,並管理 Hadoop 集群的運營和安全。

除了他的技術能力外,Irfan 最近在悉尼的 DataWorks Summit 上就內存大數據技術發表了演講,並在全球多個聚會中分享經驗。他還持續參與知識轉移會議、培訓和工作坊,主題涵蓋大數據和機器學習,無論是在他的公司內部還是在客戶那裡。他還推出了關於 Apache Spark 的大數據分析和 R 程式設計的 Udemy 課程,已有來自 145 個國家的超過 18,000 名學生註冊。

最後瀏覽商品 (20)