Statistics for Data Science and Analytics
暫譯: 數據科學與分析的統計學

Bruce, Peter C., Gedeck, Peter, Dobbins, Janet

  • 出版商: Wiley
  • 出版日期: 2024-09-04
  • 售價: $4,130
  • 貴賓價: 9.5$3,924
  • 語言: 英文
  • 頁數: 384
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 139425380X
  • ISBN-13: 9781394253807
  • 相關分類: 機率統計學 Probability-and-statisticsData Science
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Introductory statistics textbook with a focus on data science topics such as prediction, correlation, and data exploration

Statistics for Data Science and Analytics is a comprehensive guide to statistical analysis using Python, presenting important topics useful for data science such as prediction, correlation, and data exploration. The authors provide an introduction to statistical science and big data, as well as an overview of Python data structures and operations.

A range of statistical techniques are presented with their implementation in Python, including hypothesis testing, probability, exploratory data analysis, categorical variables, surveys and sampling, A/B testing, and correlation. The text introduces binary classification, a foundational element of machine learning, validation of statistical models by applying them to holdout data, and probability and inference via the easy-to-understand method of resampling and the bootstrap instead of using a myriad of "kitchen sink" formulas. Regression is taught both as a tool for explanation and for prediction.

This book is informed by the authors' experience designing and teaching both introductory statistics and machine learning at Statistics.com. Each chapter includes practical examples, explanations of the underlying concepts, and Python code snippets to help readers apply the techniques themselves.

Statistics for Data Science and Analytics includes information on sample topics such as:

  • Int, float, and string data types, numerical operations, manipulating strings, converting data types, and advanced data structures like lists, dictionaries, and sets
  • Experiment design via randomizing, blinding, and before-after pairing, as well as proportions and percents when handling binary data
  • Specialized Python packages like numpy, scipy, pandas, scikit-learn and statsmodels--the workhorses of data science--and how to get the most value from them
  • Statistical versus practical significance, random number generators, functions for code reuse, and binomial and normal probability distributions

Written by and for data science instructors, Statistics for Data Science and Analytics is an excellent learning resource for data science instructors prescribing a required intro stats course for their programs, as well as other students and professionals seeking to transition to the data science field.

商品描述(中文翻譯)

專注於數據科學主題的入門統計學教科書,例如預測、相關性和數據探索

數據科學與分析的統計學 是一本全面的統計分析指南,使用 Python 來呈現對數據科學有用的重要主題,例如預測、相關性和數據探索。作者提供了統計科學和大數據的介紹,以及 Python 數據結構和操作的概述。

本書介紹了一系列統計技術及其在 Python 中的實現,包括假設檢驗、概率、探索性數據分析、類別變數、調查和抽樣、A/B 測試以及相關性。文本介紹了二元分類,這是機器學習的基礎元素,通過將統計模型應用於保留數據來驗證模型,並通過易於理解的重抽樣和自助法來進行概率和推斷,而不是使用眾多的「萬用」公式。回歸分析既作為解釋工具,也作為預測工具進行教學。

本書的內容受到作者在 Statistics.com 設計和教授入門統計學及機器學習的經驗啟發。每一章都包括實用的例子、基本概念的解釋以及 Python 代碼片段,以幫助讀者自行應用這些技術。

數據科學與分析的統計學 包含有關樣本主題的信息,例如:


  • 整數、浮點數和字符串數據類型、數值運算、字符串操作、數據類型轉換,以及列表、字典和集合等高級數據結構

  • 通過隨機化、盲法和前後配對進行實驗設計,以及處理二元數據時的比例和百分比

  • 專門的 Python 套件,如 numpy、scipy、pandas、scikit-learn 和 statsmodels——數據科學的工作馬——以及如何從中獲取最大價值

  • 統計意義與實際意義、隨機數生成器、代碼重用的函數,以及二項分佈和正態分佈

由數據科學講師撰寫並為數據科學講師而作的 數據科學與分析的統計學 是一本優秀的學習資源,適合為其課程指定必修入門統計課程的數據科學講師,以及其他希望轉向數據科學領域的學生和專業人士。

作者簡介

Peter C. Bruce is Founder of the Institute for Statistics Education at Statistics.com, now part of Elder Research, Inc. He is the developer of Resampling Stats software, and the author or co-author of a number of peer-reviewed articles and several books.

Dr. Peter Gedeck, PhD, is a scientist in the research informatics team at Collaborative Drug Discovery, specializing in the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates.

Janet Dobbins is the Chair of the Board of Directors for Data Community DC, a non-profit 501(c)(3) corporation committed to promoting data science by fostering education, opportunity, and professional development through high-quality community-driven events. She previously served as the Vice President of Business Development and Strategic Partnership at The Institute for Statistics Education at Statistics.com. Bruce and Gedeck are part of the author teams for the best-selling books Machine Learning for Business Analytics (Wiley) and Practical Statistics for Data Scientists(O'Reilly).

作者簡介(中文翻譯)

彼得·C·布魯斯是Statistics.com統計教育研究所的創始人,該機構現為Elder Research, Inc.的一部分。他是Resampling Stats軟體的開發者,並且是多篇經過同行評審的文章和幾本書籍的作者或合著者。

彼得·蓋德克博士(PhD)是Collaborative Drug Discovery研究資訊團隊的科學家,專注於開發機器學習演算法,以預測藥物候選者的生物學和物理化學特性。

珍妮特·多賓斯是Data Community DC董事會的主席,這是一家非營利501(c)(3)公司,致力於通過高品質的社區驅動活動促進數據科學,促進教育、機會和專業發展。她曾擔任Statistics.com統計教育研究所的商業發展和戰略夥伴關係副總裁。布魯斯和蓋德克是暢銷書籍商業分析的機器學習(Wiley)和數據科學家的實用統計(O'Reilly)的作者團隊成員。