Big Data Science & Analytics: A Hands-On Approach (快遞進口)
暫譯: 大數據科學與分析:實作方法
Arshdeep Bahga, Vijay Madisetti
- 出版商: VPT
- 出版日期: 2016-04-15
- 售價: $2,859
- 貴賓價: 9.5 折 $2,716
- 語言: 英文
- 頁數: 544
- 裝訂: Hardcover
- ISBN: 0996025545
- ISBN-13: 9780996025546
-
相關分類:
大數據 Big-data、Data Science
立即出貨 (庫存=1)
商品描述
Data and information are fuel of this new age where powerful analytics algorithms burn this fuel to generate decisions that are expected to create a smarter and more efficient world for all of us to live in. This new area of technology has been defined as Big Data Science and Analytics, and the industrial and academic communities are realizing this as a competitive technology that can generate significant new wealth and opportunity.
Big data is defined as collections of datasets whose volume, velocity or variety is so large that it is difficult to store, manage, process and analyze the data using traditional databases and data processing tools. Big data science and analytics deals with collection, storage, processing and analysis of massive-scale data. Industry surveys, by Gartner and e-Skills, for instance, predict that there will be over 2 million job openings for engineers and scientists trained in the area of data science and analytics alone, and that the job market is in this area is growing at a 150 percent year-over-year growth rate.
We have written this textbook, as part of our expanding "A Hands-On Approach"(TM) series, to meet this need at colleges and universities, and also for big data service providers who may be interested in offering a broader perspective of this emerging field to accompany their customer and developer training programs. The typical reader is expected to have completed a couple of courses in programming using traditional high-level languages at the college-level, and is either a senior or a beginning graduate student in one of the science, technology, engineering or mathematics (STEM) fields. An accompanying website for this book contains additional support for instruction and learning (www.big-data-analytics-book.com)
The book is organized into three main parts, comprising a total of twelve chapters. Part I provides an introduction to big data, applications of big data, and big data science and analytics patterns and architectures. A novel data science and analytics application system design methodology is proposed and its realization through use of open-source big data frameworks is described. This methodology describes big data analytics applications as realization of the proposed Alpha, Beta, Gamma and Delta models, that comprise tools and frameworks for collecting and ingesting data from various sources into the big data analytics infrastructure, incorporating distributed filesystems and non-relational (NoSQL) databases for data storage, and processing frameworks for batch and real-time analytics. This new methodology forms the pedagogical foundation of this book.
Part II introduces the reader to various tools and frameworks for big data analytics, and the architectural and programming aspects of these frameworks, with examples in Python. We describe Publish-Subscribe messaging frameworks (Kafka & Kinesis), Source-Sink connectors (Flume), Database Connectors (Sqoop), Messaging Queues (RabbitMQ, ZeroMQ, RestMQ, Amazon SQS) and custom REST, WebSocket and MQTT-based connectors. The reader is introduced to data storage, batch and real-time analysis, and interactive querying frameworks including HDFS, Hadoop, MapReduce, YARN, Pig, Oozie, Spark, Solr, HBase, Storm, Spark Streaming, Spark SQL, Hive, Amazon Redshift and Google BigQuery. Also described are serving databases (MySQL, Amazon DynamoDB, Cassandra, MongoDB) and the Django Python web framework.
Part III introduces the reader to various machine learning algorithms with examples using the Spark MLlib and H2O frameworks, and visualizations using frameworks such as Lightning, Pygal and Seaborn.
商品描述(中文翻譯)
資料和信息是這個新時代的燃料,強大的分析算法利用這些燃料來生成決策,期望為我們創造一個更智能、更高效的世界。這一新技術領域被定義為大數據科學與分析,工業界和學術界意識到這是一項具有競爭力的技術,能夠創造顯著的新財富和機會。
大數據被定義為數據集的集合,其體積、速度或多樣性如此之大,以至於使用傳統數據庫和數據處理工具來存儲、管理、處理和分析這些數據變得困難。大數據科學與分析涉及大規模數據的收集、存儲、處理和分析。例如,Gartner 和 e-Skills 的行業調查預測,僅在數據科學和分析領域,將會有超過 200 萬個工程師和科學家的職位空缺,並且該領域的就業市場以每年 150% 的增長率增長。
我們編寫了這本教科書,作為我們擴展的「實作方法」(TM) 系列的一部分,以滿足大學和學院的需求,並且也為可能有興趣提供這一新興領域更廣泛視角的大數據服務提供商,來輔助他們的客戶和開發者培訓計劃。預期的讀者應該已經完成幾門使用傳統高級語言的編程課程,並且是科學、技術、工程或數學 (STEM) 領域的高年級或初級研究生。本書的附屬網站提供額外的教學和學習支持 (www.big-data-analytics-book.com)。
本書分為三個主要部分,共有十二章。第一部分介紹大數據、大數據的應用以及大數據科學與分析的模式和架構。提出了一種新穎的數據科學與分析應用系統設計方法論,並描述了通過使用開源大數據框架來實現該方法論。這種方法論將大數據分析應用描述為所提出的 Alpha、Beta、Gamma 和 Delta 模型的實現,這些模型包括從各種來源收集和引入數據到大數據分析基礎設施的工具和框架,並結合分佈式文件系統和非關係型 (NoSQL) 數據庫進行數據存儲,以及用於批處理和實時分析的處理框架。這種新方法論構成了本書的教學基礎。
第二部分向讀者介紹了各種大數據分析的工具和框架,以及這些框架的架構和編程方面,並提供了 Python 的示例。我們描述了發布-訂閱消息框架 (Kafka 和 Kinesis)、源-匯接頭 (Flume)、數據庫連接器 (Sqoop)、消息隊列 (RabbitMQ、ZeroMQ、RestMQ、Amazon SQS) 和自定義的 REST、WebSocket 和 MQTT 基於的連接器。讀者將接觸到數據存儲、批處理和實時分析,以及包括 HDFS、Hadoop、MapReduce、YARN、Pig、Oozie、Spark、Solr、HBase、Storm、Spark Streaming、Spark SQL、Hive、Amazon Redshift 和 Google BigQuery 的互動查詢框架。還描述了服務數據庫 (MySQL、Amazon DynamoDB、Cassandra、MongoDB) 和 Django Python 網頁框架。
第三部分向讀者介紹了各種機器學習算法,並使用 Spark MLlib 和 H2O 框架提供示例,並使用 Lightning、Pygal 和 Seaborn 等框架進行可視化。