Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale (paper)
暫譯: 實用數據科學:使用 Hadoop 和 Spark 設計與構建有效的規模分析 (紙本)

Ofer Mendelevitch, Casey Stella, Douglas Eadline

  • 出版商: Addison Wesley
  • 出版日期: 2016-12-12
  • 售價: $1,860
  • 貴賓價: 9.5$1,767
  • 語言: 英文
  • 頁數: 256
  • 裝訂: Paperback
  • ISBN: 0134024141
  • ISBN-13: 9780134024141
  • 相關分類: HadoopSparkData Science
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students

 

Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. Practical Data Science with Hadoop® and Spark is your complete guide to doing just that. Drawing on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials.

 

The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization.

 

Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP).

 

This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives.

 

Learn

  • What data science is, how it has evolved, and how to plan a data science career
  • How data volume, variety, and velocity shape data science use cases
  • Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark
  • Data importation with Hive and Spark
  • Data quality, preprocessing, preparation, and modeling
  • Visualization: surfacing insights from huge data sets
  • Machine learning: classification, regression, clustering, and anomaly detection
  • Algorithms and Hadoop tools for predictive modeling
  • Cluster analysis and similarity functions
  • Large-scale anomaly detection
  • NLP: applying data science to human language

商品描述(中文翻譯)

《完整的數據科學指南:使用Hadoop—為技術專業人士、商業人士和學生而作》

對於能夠使用Hadoop和Spark解決實際數據科學問題的專業人士需求正在急劇上升。《實用數據科學與Hadoop®和Spark》是您完成這一目標的完整指南。三位領先的專家憑藉對Hadoop和大數據的豐富經驗,為您提供所需的一切:高層次概念、深入技術、實際案例、應用實例和實作教程。

作者介紹了數據科學的基本要素和現代Hadoop生態系統,解釋了Hadoop和Spark如何演變成為解決大規模數據科學問題的有效平台。除了全面的應用覆蓋外,作者還提供了有關數據攝取、數據清理和可視化的重要步驟的有用指導。

一旦基礎工作完成,作者將重點放在特定應用上,包括機器學習、情感分析的預測建模、文檔分析的聚類、異常檢測和自然語言處理(NLP)。

本指南為希望進行實用數據科學的人提供了堅實的技術基礎,並提供了如何應用Hadoop和Spark來優化數據科學計劃的投資回報率的商業驅動指導。

學習內容包括:
- 數據科學是什麼,它如何演變,以及如何規劃數據科學職業
- 數據的體量、多樣性和速度如何塑造數據科學用例
- Hadoop及其生態系統,包括HDFS、MapReduce、YARN和Spark
- 使用Hive和Spark進行數據導入
- 數據質量、預處理、準備和建模
- 可視化:從龐大的數據集中提取見解
- 機器學習:分類、回歸、聚類和異常檢測
- 用於預測建模的算法和Hadoop工具
- 聚類分析和相似性函數
- 大規模異常檢測
- NLP:將數據科學應用於人類語言