Algorithms for Data Science
暫譯: 數據科學的演算法

Brian Steele, John Chandler, Swarna Reddy

  • 出版商: Springer
  • 出版日期: 2016-12-27
  • 定價: $3,500
  • 售價: 8.0$2,800
  • 語言: 英文
  • 頁數: 430
  • 裝訂: Hardcover
  • ISBN: 3319457950
  • ISBN-13: 9783319457956
  • 相關分類: Algorithms-data-structuresData Science
  • 立即出貨 (庫存=1)

買這商品的人也買了...

相關主題

商品描述

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses.
 
This book has three parts:
(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.
(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.
(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.
This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.
 
 

商品描述(中文翻譯)

這本實用數據分析的教科書結合了基本原則、算法和數據。算法是數據分析的基石,也是這本教科書的焦點。數學和統計基礎的清晰且直觀的解釋使算法變得透明。然而,實用數據分析不僅僅需要這些基礎。問題和數據變化多端,只有最基本的算法可以在不修改的情況下使用。編程流利度和處理真實且具挑戰性的數據的經驗是不可或缺的,因此讀者將沉浸於 Python 和 R 以及真實的數據分析中。到書籍結束時,讀者將具備將算法適應於新問題並進行創新分析的能力。

這本書分為三個部分:

(a) 數據減少:從數據減少、數據映射和信息提取的概念開始。第二章介紹了關聯統計,這是可擴展算法和分佈式計算的數學基礎。分佈式計算的實際方面是 Hadoop 和 MapReduce 章節的主題。

(b) 從數據中提取信息:線性回歸和數據可視化是第二部分的主要主題。作者專門 dedicates 一章來探討醫療分析這一關鍵領域,以提供一個實用數據分析的擴展範例。這些算法和分析將對有興趣利用疾病控制與預防中心的行為風險因素監測系統的大型且繁瑣的數據集的從業者非常有吸引力。

(c) 預測分析:詳細介紹了兩個基礎且廣泛使用的算法,k-最近鄰和朴素貝葉斯。專門 dedicates 一章來進行預測。最後一章專注於流數據,並在教程中使用來自 Twitter API 和 NASDAQ 股票市場的公開可訪問數據流。

這本書旨在為數據分析的高年級本科生和研究生提供一個為期一或兩學期的課程,適用於數學、統計學和計算機科學的學生。先修課程要求較低,修過一兩門概率或統計課程、接觸過向量和矩陣以及一門編程課程的學生將不會有困難。每一章的核心材料對於具備這些先修課程的所有人都是可接觸的。各章節通常在結尾處擴展,介紹對數據科學從業者感興趣的創新。每一章都包括不同難度級別的練習題。這本書非常適合自學,並且是從業者的卓越資源。