Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way
暫譯: 用 Pandas 思考:正確使用 Python 數據分析庫的方法

Stepanek, Hannah

買這商品的人也買了...

相關主題

商品描述

Understand and implement big data analysis solutions in pandas with an emphasis on performance. This book strengthens your intuition for working with pandas, the Python data analysis library, by exploring its underlying implementation and data structures.

Thinking in Pandas introduces the topic of big data and demonstrates concepts by looking at exciting and impactful projects that pandas helped to solve. From there, you will learn to assess your own projects by size and type to see if pandas is the appropriate library for your needs. Author Hannah Stepanek explains how to load and normalize data in pandas efficiently, and reviews some of the most commonly used loaders and several of their most powerful options. You will then learn how to access and transform data efficiently, what methods to avoid, and when to employ more advanced performance techniques. You will also go over basic data access and munging in pandas and the intuitive dictionary syntax. Choosing the right DataFrame format, how to work with multi-level DataFrames, and how pandas might be improved upon in the future are also covered.

By the end of the book, you will have a solid understanding of how the pandas library works under the hood. Get ready to make confident decisions in your own projects by utilizing pandas--the right way.


What You Will Learn

  • Understand the underlying data structure of pandas and why it performs the way it does under certain circumstances
  • Discover how to use pandas to extract, transform, and load data correctly with an emphasis on performance
  • Choose the right data frame so that the data analysis is simple and efficient.
  • Improve performance of pandas operations with other Python libraries


Who This Book Is ForSoftware engineers with basic programming skills in Python keen on using pandas for a big data analysis project. Python software developers interested in big data

商品描述(中文翻譯)

了解並實現以性能為重點的 pandas 大數據分析解決方案。本書通過探索 pandas 的底層實現和數據結構,增強您使用 pandas(Python 數據分析庫)的直覺。

Thinking in Pandas 介紹了大數據的主題,並通過查看 pandas 幫助解決的令人興奮且有影響力的項目來演示概念。從那裡,您將學會根據項目的大小和類型評估自己的項目,以確定 pandas 是否是滿足您需求的合適庫。作者 Hannah Stepanek 解釋了如何有效地在 pandas 中加載和標準化數據,並回顧了一些最常用的加載器及其幾個最強大的選項。然後,您將學習如何有效地訪問和轉換數據,避免使用哪些方法,以及何時採用更高級的性能技術。您還將了解在 pandas 中的基本數據訪問和數據清理,以及直觀的字典語法。選擇合適的 DataFrame 格式、如何處理多層次 DataFrame,以及未來如何改進 pandas 也將被涵蓋。

到本書結束時,您將對 pandas 庫的內部運作有堅實的理解。準備好通過正確的方式利用 pandas,在自己的項目中做出自信的決策。


您將學到什麼


  • 了解 pandas 的底層數據結構,以及為什麼在某些情況下它的性能表現如此

  • 發現如何使用 pandas 正確地提取、轉換和加載數據,並強調性能

  • 選擇合適的數據框,以便數據分析簡單且高效。

  • 通過其他 Python 庫提高 pandas 操作的性能


本書適合誰對於具備基本 Python 編程技能的軟體工程師,熱衷於使用 pandas 進行大數據分析項目。對大數據感興趣的 Python 軟體開發人員。

作者簡介

Hannah Stepanek is a software developer with a passion for performance and is an open source advocate. She has over seven years of industry experience programming in Python and spent about two of those years implementing a data analysis project using pandas.

Hannah was born and raised in Corvallis, OR, and graduated from Oregon State University with a major in Electrical Computer Engineering. She enjoys engaging with the software community, often giving talks at local meetups as well as larger conferences. In early 2019, she spoke at PyCon US about the pandas library and at OpenCon Cascadia about the benefits of open source software. In her spare time she enjoys riding her horse Sophie and playing board games.

作者簡介(中文翻譯)

漢娜·斯特帕內克(Hannah Stepanek)是一位對性能充滿熱情的軟體開發人員,也是開源的倡導者。她擁有超過七年的行業經驗,主要使用 Python 進行程式設計,其中約有兩年時間專注於使用 pandas 實施數據分析專案。

漢娜出生並成長於俄勒岡州的科瓦利斯(Corvallis, OR),並在俄勒岡州立大學(Oregon State University)獲得電氣計算機工程(Electrical Computer Engineering)學位。她喜歡與軟體社群互動,經常在當地的聚會以及更大型的會議上發表演講。在2019年初,她在美國 PyCon 上講解了 pandas 函式庫,並在 OpenCon Cascadia 上談論了開源軟體的好處。在空閒時間,她喜歡騎馬(她的馬叫索菲(Sophie))和玩桌上遊戲。