Applied Data Science Using Pyspark: Learn the End-To-End Predictive Model-Building Cycle
暫譯: 應用數據科學與 Pyspark:學習端到端預測模型建構流程
Kakarla, Ramcharan, Krishnan, Sundar, Alla, Sridhar
- 出版商: Apress
- 出版日期: 2020-12-18
- 售價: $2,080
- 貴賓價: 9.5 折 $1,976
- 語言: 英文
- 頁數: 377
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1484264991
- ISBN-13: 9781484264997
-
相關分類:
Spark、Data Science、Machine Learning
海外代購書籍(需單獨結帳)
買這商品的人也買了...
-
$1,900$1,805 -
$576Spark權威指南
-
$780$616 -
$680$537 -
$828$787
商品描述
Discover the capabilities of PySpark and its application in the realm of data science. This comprehensive guide with hand-picked examples of daily use cases will walk you through the end-to-end predictive model-building cycle with the latest techniques and tricks of the trade.
Applied Data Science Using PySpark is divided unto six sections which walk you through the book. In section 1, you start with the basics of PySpark focusing on data manipulation. We make you comfortable with the language and then build upon it to introduce you to the mathematical functions available off the shelf. In section 2, you will dive into the art of variable selection where we demonstrate various selection techniques available in PySpark. In section 3, we take you on a journey through machine learning algorithms, implementations, and fine-tuning techniques. We will also talk about different validation metrics and how to use them for picking the best models. Sections 4 and 5 go through machine learning pipelines and various methods available to operationalize the model and serve it through Docker/an API. In the final section, you will cover reusable objects for easy experimentation and learn some tricks that can help you optimize your programs and machine learning pipelines.
By the end of this book, you will have seen the flexibility and advantages of PySpark in data science applications. This book is recommended to those who want to unleash the power of parallel computing by simultaneously working with big datasets.
What You Will Learn
- Build an end-to-end predictive model
- Implement multiple variable selection techniques
- Operationalize models
- Master multiple algorithms and implementations
Who This Book is For
Data scientists and machine learning and deep learning engineers who want to learn and use PySpark for real-time analysis of streaming data.商品描述(中文翻譯)
發現 PySpark 的能力及其在數據科學領域的應用。本書提供了精心挑選的日常使用案例,將引導您了解從頭到尾的預測模型構建過程,並介紹最新的技術和行業技巧。
《使用 PySpark 的應用數據科學》分為六個部分,逐步引導您了解本書內容。在第一部分,您將從 PySpark 的基礎開始,重點在於數據操作。我們將使您熟悉這門語言,然後在此基礎上介紹現成的數學函數。在第二部分,您將深入變數選擇的藝術,我們將演示 PySpark 中可用的各種選擇技術。在第三部分,我們將帶您探索機器學習算法、實現和微調技術。我們還將討論不同的驗證指標以及如何使用它們來選擇最佳模型。第四和第五部分將介紹機器學習管道及各種可用的方法,以便將模型運行化並通過 Docker/API 提供服務。在最後一部分,您將學習可重用的對象以便於實驗,並學習一些技巧,幫助您優化程序和機器學習管道。
在本書結束時,您將看到 PySpark 在數據科學應用中的靈活性和優勢。本書推薦給那些希望通過同時處理大型數據集來釋放並行計算能力的人。
您將學到的內容:
- 構建端到端的預測模型
- 實現多種變數選擇技術
- 將模型運行化
- 精通多種算法和實現
本書適合對象:
希望學習並使用 PySpark 進行流數據實時分析的數據科學家及機器學習和深度學習工程師。
作者簡介
Ramcharan Kakarla is currently lead data scientist at Comcast residing in Philadelphia. He is a passionate data science and artificial intelligence advocate with five+ years of experience. He holds a master's degree from Oklahoma State University with specialization in data mining. Prior to OSU, he received his bachelor's in electrical and electronics engineering from Sastra University in India. He was born and raised in the coastal town of Kakinada, India. He started his career working as a performance engineer with several Fortune 500 clients including State Farm and British Airways. In his current role he is focused on building data science solutions and frameworks leveraging big data. He has published several papers and posters in the field of predictive analytics. He served as SAS Global Ambassador for the year 2015.
Sundar Krishnan is passionate about artificial intelligence and data science with more than five years of industrial experience. He has tremendous experience in building and deploying customer analytics models and designing machine learning workflow automation. Currently, he is associated with Comcast as a lead data scientist. Sundar was born and raised in Tamil Nadu, India and has a bachelor's degree from Government College of Technology, Coimbatore. He completed his master's at Oklahoma State University, Stillwater. In his spare time, he blogs about his data science works on Medium.
作者簡介(中文翻譯)
**Ramcharan Kakarla** 目前是位於費城的 Comcast 首席數據科學家。他是一位熱衷於數據科學和人工智慧的倡導者,擁有五年以上的經驗。他擁有俄克拉荷馬州立大學的碩士學位,專攻數據挖掘。在進入俄克拉荷馬州立大學之前,他在印度的 Sastra 大學獲得了電氣與電子工程的學士學位。他出生並成長於印度的海濱小鎮 Kakinada。他的職業生涯始於性能工程師,曾為多家《財富》500 強公司工作,包括 State Farm 和英國航空。在目前的職位上,他專注於利用大數據構建數據科學解決方案和框架。他在預測分析領域發表了多篇論文和海報。他曾於 2015 年擔任 SAS 全球大使。
**Sundar Krishnan** 對人工智慧和數據科學充滿熱情,擁有超過五年的行業經驗。他在構建和部署客戶分析模型以及設計機器學習工作流程自動化方面擁有豐富的經驗。目前,他在 Comcast 擔任首席數據科學家。Sundar 出生並成長於印度的泰米爾納德邦,擁有來自 Coimbatore 政府技術學院的學士學位。他在俄克拉荷馬州立大學(Stillwater)完成了碩士學位。在空閒時間,他在 Medium 上撰寫有關數據科學工作的部落格。