Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale (paper)

Ofer Mendelevitch, Casey Stella, Douglas Eadline

  • 出版商: Addison Wesley
  • 出版日期: 2016-12-12
  • 售價: $1,820
  • 貴賓價: 9.5$1,729
  • 語言: 英文
  • 頁數: 256
  • 裝訂: Paperback
  • ISBN: 0134024141
  • ISBN-13: 9780134024141
  • 相關分類: HadoopSparkData Science
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students

 

Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. Practical Data Science with Hadoop® and Spark is your complete guide to doing just that. Drawing on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials.

 

The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization.

 

Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP).

 

This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives.

 

Learn

  • What data science is, how it has evolved, and how to plan a data science career
  • How data volume, variety, and velocity shape data science use cases
  • Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark
  • Data importation with Hive and Spark
  • Data quality, preprocessing, preparation, and modeling
  • Visualization: surfacing insights from huge data sets
  • Machine learning: classification, regression, clustering, and anomaly detection
  • Algorithms and Hadoop tools for predictive modeling
  • Cluster analysis and similarity functions
  • Large-scale anomaly detection
  • NLP: applying data science to human language

商品描述(中文翻譯)

《使用Hadoop進行數據科學的完全指南——針對技術專業人士、商業人士和學生》

需求急劇增長,需要能夠使用Hadoop和Spark解決真實數據科學問題的專業人士。《實用的Hadoop®和Spark數據科學》是您完成這一任務的完全指南。三位領先專家憑藉對Hadoop和大數據的豐富經驗,匯集了您所需的一切:高層次概念、深入技術、實際應用案例、實用應用和實踐教程。

作者介紹了數據科學的基本知識和現代Hadoop生態系統,解釋了Hadoop和Spark如何演變成為解決大規模數據科學問題的有效平台。除了全面的應用覆蓋範圍外,作者還提供了有關數據輸入、數據整理和可視化的重要步驟的有用指導。

一旦基礎工作完成,作者專注於特定應用,包括機器學習、情感分析的預測建模、文檔分析的聚類、異常檢測和自然語言處理(NLP)。

本指南為那些希望進行實際數據科學的人提供了堅實的技術基礎,並提供了如何應用Hadoop和Spark以優化數據科學項目的投資回報率的業務指導。

學習內容包括:
- 數據科學的定義、演變以及如何規劃數據科學職業生涯
- 數據量、多樣性和速度如何塑造數據科學應用案例
- Hadoop及其生態系統,包括HDFS、MapReduce、YARN和Spark
- 使用Hive和Spark進行數據導入
- 數據質量、預處理、準備和建模
- 可視化:從大數據集中獲取洞察
- 機器學習:分類、回歸、聚類和異常檢測
- 用於預測建模的算法和Hadoop工具
- 聚類分析和相似性函數
- 大規模異常檢測
- NLP:將數據科學應用於人類語言