PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes
暫譯: PySpark SQL 食譜:使用 HiveQL、DataFrame 和 GraphFrames

Raju Kumar Mishra, Sundar Rajan Raman

  • 出版商: Apress
  • 出版日期: 2019-03-19
  • 售價: $1,740
  • 貴賓價: 9.5$1,653
  • 語言: 英文
  • 頁數: 323
  • 裝訂: Paperback
  • ISBN: 148424334X
  • ISBN-13: 9781484243343
  • 相關分類: SparkSQL
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Carry out data analysis with PySpark SQL, graphframes, and graph data processing using a problem-solution approach. This book provides solutions to problems related to dataframes, data manipulation summarization, and exploratory analysis. You will improve your skills in graph data analysis using graphframes and see how to optimize your PySpark SQL code.

 

PySpark SQL Recipes starts with recipes on creating dataframes from different types of data source, data aggregation and summarization, and exploratory data analysis using PySpark SQL. You’ll also discover how to solve problems in graph analysis using graphframes.

 

On completing this book, you’ll have ready-made code for all your PySpark SQL tasks, including creating dataframes using data from different file formats as well as from SQL or NoSQL databases.

 

What You Will Learn

  • Understand PySpark SQL and its advanced features
  • Use SQL and HiveQL with PySpark SQL
  • Work with structured streaming
  • Optimize PySpark SQL 
  • Master graphframes and graph processing

商品描述(中文翻譯)

進行數據分析,使用 PySpark SQL、graphframes 和圖形數據處理,採用問題解決的方法。本書提供與數據框(dataframes)、數據操作摘要和探索性分析相關的問題解決方案。您將提升使用 graphframes 進行圖形數據分析的技能,並學習如何優化您的 PySpark SQL 代碼。

《PySpark SQL 食譜》從創建來自不同類型數據源的數據框、數據聚合和摘要,以及使用 PySpark SQL 進行探索性數據分析的食譜開始。您還將發現如何使用 graphframes 解決圖形分析中的問題。

完成本書後,您將擁有所有 PySpark SQL 任務的現成代碼,包括使用來自不同文件格式的數據以及來自 SQL 或 NoSQL 數據庫的數據創建數據框。

您將學到的內容:

- 理解 PySpark SQL 及其進階功能
- 使用 SQL 和 HiveQL 與 PySpark SQL
- 使用結構化流(structured streaming)
- 優化 PySpark SQL
- 精通 graphframes 和圖形處理

作者簡介

Raju Kumar Mishra has strong interests in data science and systems that have the capability of handling large amounts of data and operating complex mathematical models through computational programming. He was inspired to pursue an M. Tech in computational sciences from Indian Institute of Science in Bangalore, India. Raju primarily works in the areas of data science and its different applications. Working as a corporate trainer he has developed unique insights that help him in teaching and explaining complex ideas with ease. Raju is also a data science consultant solving complex industrial problems. He works on programming tools such as R, Python, scikit-learn, Statsmodels, Hadoop, Hive, Pig, Spark, and many others. His venture Walsoul Private Ltd provides training in data science, programming, and big data.

 

Sundar Rajan Raman is an artificial intelligence practitioner currently working at Bank of America. He holds a Bachelor of Technology degree from the National Institute of Technology, India. Being a seasoned Java and J2EE programmer he has worked on critical applications for companies such as AT&T, Singtel, and Deutsche Bank. He is also a seasoned big data architect. His current focus is on artificial intelligence space including machine learning and deep learning.

作者簡介(中文翻譯)

拉朱·庫馬·米什拉對數據科學和能夠處理大量數據及通過計算編程運行複雜數學模型的系統有著濃厚的興趣。他受到啟發,選擇在印度班加羅爾的印度科學研究所攻讀計算科學碩士學位。拉朱主要從事數據科學及其不同應用領域的工作。作為企業培訓師,他發展出獨特的見解,幫助他輕鬆地教學和解釋複雜的概念。拉朱也是一名數據科學顧問,專注於解決複雜的工業問題。他使用的編程工具包括 R、Python、scikit-learn、Statsmodels、Hadoop、Hive、Pig、Spark 等等。他的公司 Walsoul Private Ltd 提供數據科學、編程和大數據的培訓。

桑達爾·拉詹·拉曼是一位人工智慧從業者,目前在美國銀行工作。他擁有印度國立技術學院的技術學士學位。作為一名經驗豐富的 Java 和 J2EE 程序員,他曾為 AT&T、Singtel 和德意志銀行等公司開發關鍵應用。他也是一位資深的大數據架構師。目前他的重點是人工智慧領域,包括機器學習和深度學習。