Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools (Paperback)
暫譯: 命令行數據科學:使用 Unix 工具獲取、清理、探索和建模數據 (平裝本)

Janssens, Jeroen

  • 出版商: O'Reilly
  • 出版日期: 2021-09-21
  • 定價: $2,250
  • 售價: 8.8$1,980
  • 語言: 英文
  • 頁數: 250
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1492087912
  • ISBN-13: 9781492087915
  • 相關分類: Command LineData Science
  • 立即出貨 (庫存=1)

買這商品的人也買了...

相關主題

商品描述

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools--useful whether you work with Windows, macOS, or Linux.

You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers.

  • Obtain data from websites, APIs, databases, and spreadsheets
  • Perform scrub operations on text, CSV, HTML, XML, and JSON files
  • Explore data, compute descriptive statistics, and create visualizations
  • Manage your data science workflow
  • Create your own tools from one-liners and existing Python or R code
  • Parallelize and distribute data-intensive pipelines
  • Model data with dimensionality reduction, regression, and classification algorithms
  • Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark

商品描述(中文翻譯)

這本徹底修訂的指南展示了命令行的靈活性如何幫助你成為一個更高效和更具生產力的數據科學家。你將學會如何結合小而強大的命令行工具,快速獲取、清理、探索和建模你的數據。為了幫助你入門,作者 Jeroen Janssens 提供了一個包含超過 100 種 Unix 強大工具的 Docker 映像,無論你使用 Windows、macOS 還是 Linux 都非常有用。

你將迅速發現為什麼命令行是一種靈活、可擴展和可擴展的技術。即使你已經習慣使用 Python 或 R 處理數據,你也將學會如何通過利用命令行的力量來大幅改善你的數據科學工作流程。本書非常適合數據科學家、分析師、工程師、系統管理員和研究人員。

- 從網站、API、數據庫和電子表格獲取數據
- 對文本、CSV、HTML、XML 和 JSON 文件執行清理操作
- 探索數據、計算描述性統計並創建可視化
- 管理你的數據科學工作流程
- 從單行命令和現有的 Python 或 R 代碼創建自己的工具
- 將數據密集型管道進行並行化和分佈
- 使用降維、回歸和分類算法對數據進行建模
- 從 Python、Jupyter、R、RStudio 和 Apache Spark 利用命令行

作者簡介

Jeroen Janssens teaches data science; often through training and coaching, occasionally through speaking, and infrequently through writing. His interests include visualizing data, building machine learning models, and automating things using either Python, R, or Bash. He is the author of Data Science at the Command Line, published by O'Reilly Media. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and various startups in New York City. Currently, Jeroen is the CEO of Data Science Workshops, which organises open enrollment workshops, in-company courses, inspiration sessions, hackathons, and meetups. All related to data science of course. He lives with his wife and two kids in Rotterdam, the Netherlands.

作者簡介(中文翻譯)

Jeroen Janssens 教授數據科學;通常透過培訓和指導進行,偶爾透過演講,並不常透過寫作。他的興趣包括數據可視化、構建機器學習模型,以及使用 Python、R 或 Bash 自動化各種任務。他是《Data Science at the Command Line》的作者,該書由 O'Reilly Media 出版。Jeroen 擁有 Tilburg University 的機器學習博士學位和 Maastricht University 的人工智慧碩士學位。之前,他曾擔任 Jheronimus Academy of Data Science 的助理教授,以及在阿姆斯特丹的 Elsevier 和紐約市的多家初創公司擔任數據科學家。目前,Jeroen 是 Data Science Workshops 的 CEO,該機構組織公開報名的工作坊、公司內部課程、靈感會議、黑客松和聚會,當然都是與數據科學相關的。他與妻子和兩個孩子住在荷蘭的鹿特丹。