Statistics for Data Science and Analytics

Bruce, Peter C., Gedeck, Peter, Dobbins, Janet

  • 出版商: Wiley
  • 出版日期: 2024-09-04
  • 售價: $4,060
  • 貴賓價: 9.5$3,857
  • 語言: 英文
  • 頁數: 384
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 139425380X
  • ISBN-13: 9781394253807
  • 相關分類: 機率統計學 Probability-and-statisticsData Science
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Introductory statistics textbook with a focus on data science topics such as prediction, correlation, and data exploration

Statistics for Data Science and Analytics is a comprehensive guide to statistical analysis using Python, presenting important topics useful for data science such as prediction, correlation, and data exploration. The authors provide an introduction to statistical science and big data, as well as an overview of Python data structures and operations.

A range of statistical techniques are presented with their implementation in Python, including hypothesis testing, probability, exploratory data analysis, categorical variables, surveys and sampling, A/B testing, and correlation. The text introduces binary classification, a foundational element of machine learning, validation of statistical models by applying them to holdout data, and probability and inference via the easy-to-understand method of resampling and the bootstrap instead of using a myriad of "kitchen sink" formulas. Regression is taught both as a tool for explanation and for prediction.

This book is informed by the authors' experience designing and teaching both introductory statistics and machine learning at Statistics.com. Each chapter includes practical examples, explanations of the underlying concepts, and Python code snippets to help readers apply the techniques themselves.

Statistics for Data Science and Analytics includes information on sample topics such as:

  • Int, float, and string data types, numerical operations, manipulating strings, converting data types, and advanced data structures like lists, dictionaries, and sets
  • Experiment design via randomizing, blinding, and before-after pairing, as well as proportions and percents when handling binary data
  • Specialized Python packages like numpy, scipy, pandas, scikit-learn and statsmodels--the workhorses of data science--and how to get the most value from them
  • Statistical versus practical significance, random number generators, functions for code reuse, and binomial and normal probability distributions

Written by and for data science instructors, Statistics for Data Science and Analytics is an excellent learning resource for data science instructors prescribing a required intro stats course for their programs, as well as other students and professionals seeking to transition to the data science field.

商品描述(中文翻譯)

《統計學與數據科學分析》是一本專注於數據科學主題的入門統計學教科書,涵蓋預測、相關性和數據探索等重要主題。這本書提供了統計科學和大數據的介紹,以及Python數據結構和操作的概述。

書中介紹了一系列統計技術及其在Python中的實現,包括假設檢驗、概率、探索性數據分析、類別變數、調查和抽樣、A/B測試以及相關性。文本介紹了二元分類,這是機器學習的基礎元素,並通過將統計模型應用於保留數據來驗證這些模型,還有通過易於理解的重抽樣和自助法來進行概率和推斷,而不是使用眾多的“萬用公式”。回歸分析則作為解釋和預測的工具進行教學。

這本書的內容基於作者在Statistics.com設計和教授入門統計學和機器學習的經驗。每一章都包含實用的例子、基本概念的解釋以及Python代碼片段,以幫助讀者自行應用這些技術。

《統計學與數據科學分析》涵蓋了以下主題的資訊:
- 整數、浮點數和字串數據類型、數值運算、字串操作、數據類型轉換,以及列表、字典和集合等高級數據結構
- 通過隨機化、盲法和前後配對進行實驗設計,以及在處理二元數據時的比例和百分比
- 專門的Python套件,如numpy、scipy、pandas、scikit-learn和statsmodels——數據科學的工作馬——以及如何從中獲取最大價值
- 統計意義與實際意義、隨機數生成器、代碼重用的函數,以及二項和正態概率分佈

《統計學與數據科學分析》是由數據科學講師撰寫並為數據科學講師所用的優秀學習資源,適合為其課程指定必修入門統計課程的講師,以及其他希望轉向數據科學領域的學生和專業人士。

作者簡介

Peter C. Bruce is Founder of the Institute for Statistics Education at Statistics.com, now part of Elder Research, Inc. He is the developer of Resampling Stats software, and the author or co-author of a number of peer-reviewed articles and several books.

Dr. Peter Gedeck, PhD, is a scientist in the research informatics team at Collaborative Drug Discovery, specializing in the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates.

Janet Dobbins is the Chair of the Board of Directors for Data Community DC, a non-profit 501(c)(3) corporation committed to promoting data science by fostering education, opportunity, and professional development through high-quality community-driven events. She previously served as the Vice President of Business Development and Strategic Partnership at The Institute for Statistics Education at Statistics.com. Bruce and Gedeck are part of the author teams for the best-selling books Machine Learning for Business Analytics (Wiley) and Practical Statistics for Data Scientists(O'Reilly).

作者簡介(中文翻譯)

彼得·C·布魯斯(Peter C. Bruce)是Statistics.com統計教育研究所的創辦人,該機構現已成為Elder Research, Inc.的一部分。他是Resampling Stats軟體的開發者,並且是多篇經過同行評審的文章及幾本書籍的作者或合著者。

彼得·蓋德克博士(Dr. Peter Gedeck, PhD)是Collaborative Drug Discovery研究資訊團隊的科學家,專注於開發機器學習演算法,以預測藥物候選者的生物學和物理化學特性。

珍妮特·多賓斯(Janet Dobbins)是Data Community DC董事會的主席,這是一家非營利501(c)(3)機構,致力於通過高品質的社區驅動活動促進數據科學,並促進教育、機會和專業發展。她曾擔任Statistics.com統計教育研究所的商業發展與戰略夥伴關係副總裁。布魯斯和蓋德克是暢銷書《商業分析的機器學習》(Machine Learning for Business Analytics,Wiley)和《數據科學家的實用統計》(Practical Statistics for Data Scientists,O'Reilly)的作者團隊成員。