SQL for Data Science: Data Cleaning, Wrangling and Analytics with Relational Databases
暫譯: 數據科學的 SQL:使用關聯數據庫進行數據清理、整理和分析
Badia, Antonio
- 出版商: Springer
- 出版日期: 2020-11-10
- 售價: $2,590
- 貴賓價: 9.5 折 $2,461
- 語言: 英文
- 頁數: 285
- 裝訂: Quality Paper - also called trade paper
- ISBN: 3030575918
- ISBN-13: 9783030575915
-
相關分類:
SQL、資料庫、Data Science
海外代購書籍(需單獨結帳)
商品描述
This textbook explains SQL within the context of data science and introduces the different parts of SQL as they are needed for the tasks usually carried out during data analysis. Using the framework of the data life cycle, it focuses on the steps that are very often given the short shift in traditional textbooks, like data loading, cleaning and pre-processing.
The book is organized as follows. Chapter 1 describes the data life cycle, i.e. the sequence of stages from data acquisition to archiving, that data goes through as it is prepared and then actually analyzed, together with the different activities that take place at each stage. Chapter 2 gets into databases proper, explaining how relational databases organize data. Non-traditional data, like XML and text, are also covered. Chapter 3 introduces SQL queries, but unlike traditional textbooks, queries and their parts are described around typical data analysis tasks like data exploration, cleaning and transformation. Chapter 4 introduces some basic techniques for data analysis and shows how SQL can be used for some simple analyses without too much complication. Chapter 5 introduces additional SQL constructs that are important in a variety of situations and thus completes the coverage of SQL queries. Lastly, chapter 6 briefly explains how to use SQL from within R and from within Python programs. It focuses on how these languages can interact with a database, and how what has been learned about SQL can be leveraged to make life easier when using R or Python. All chapters contain a lot of examples and exercises on the way, and readers are encouraged to install the two open-source database systems (MySQL and Postgres) that are used throughout the book in order to practice and work on the exercises, because simply reading the book is much less useful than actually using it.
This book is for anyone interested in data science and/or databases. It just demands a bit of computer fluency, but no specific background on databases or data analysis. All concepts are introduced intuitively and with a minimum of specialized jargon. After going through this book, readers should be able to profitably learn more about data mining, machine learning, and database management from more advanced textbooks and courses.
商品描述(中文翻譯)
這本教科書在資料科學的背景下解釋了 SQL,並介紹了在資料分析過程中通常需要的 SQL 的不同部分。使用資料生命週期的框架,它專注於在傳統教科書中經常被忽略的步驟,例如資料載入、清理和預處理。
本書的組織結構如下。第一章描述了資料生命週期,即資料從獲取到歸檔的階段序列,資料在準備和實際分析過程中所經歷的不同活動。第二章深入探討資料庫,解釋關聯式資料庫如何組織資料。非傳統資料,如 XML 和文本,也有涵蓋。第三章介紹 SQL 查詢,但與傳統教科書不同,查詢及其組成部分是圍繞典型的資料分析任務,如資料探索、清理和轉換來描述的。第四章介紹了一些基本的資料分析技術,並展示了如何使用 SQL 進行一些簡單的分析,而不會過於複雜。第五章介紹了在各種情況下重要的額外 SQL 結構,從而完成了 SQL 查詢的涵蓋。最後,第六章簡要解釋了如何在 R 和 Python 程式中使用 SQL。它專注於這些語言如何與資料庫互動,以及如何利用所學的 SQL 知識來簡化使用 R 或 Python 的過程。所有章節中都有大量的範例和練習,鼓勵讀者安裝本書中使用的兩個開源資料庫系統(MySQL 和 Postgres),以便進行練習和完成練習,因為僅僅閱讀本書的效果遠不如實際使用它。
這本書適合任何對資料科學和/或資料庫感興趣的人。它只要求具備一定的電腦流暢度,但不需要具備資料庫或資料分析的特定背景。所有概念都以直觀的方式介紹,並且使用最少的專業術語。閱讀完這本書後,讀者應該能夠從更高級的教科書和課程中獲得有利的學習,進一步了解資料挖掘、機器學習和資料庫管理。
作者簡介
Antonio Badia is Associate Professor in the Department of Computer Science and Engineering at the University of Louisville, KY, USA. He has taught both introductory and advanced college database courses for more than 20 years, and created and taught a course on data management and analysis for non-computer science students. His research on database systems has been funded by NSF and others, and produced more than 50 publications in conferences and technical journals.
作者簡介(中文翻譯)
安東尼奧·巴迪亞(Antonio Badia)是美國肯塔基州路易斯維爾大學計算機科學與工程系的副教授。他在大學教授入門和進階的資料庫課程已超過20年,並為非計算機科學學生創建並教授了一門有關資料管理與分析的課程。他在資料庫系統方面的研究得到了國家科學基金會(NSF)等機構的資助,並在會議和技術期刊上發表了超過50篇論文。