Hands-On Web Scraping with Python
暫譯: 使用 Python 進行實作網頁爬蟲

Chapagain, Anish

商品描述

Web scraping is an essential technique used in many organizations to gather valuable data from web pages. This book will enable you to delve into web scraping techniques and methodologies.

The book will introduce you to the fundamental concepts of web scraping techniques and how they can be applied to multiple sets of web pages. You'll use powerful libraries from the Python ecosystem such as Scrapy, lxml, pyquery, and bs4 to carry out web scraping operations. You will then get up to speed with simple to intermediate scraping operations such as identifying information from web pages and using patterns or attributes to retrieve information. This book adopts a practical approach to web scraping concepts and tools, guiding you through a series of use cases and showing you how to use the best tools and techniques to efficiently scrape web pages. You'll even cover the use of other popular web scraping tools, such as Selenium, Regex, and web-based APIs.

By the end of this book, you will have learned how to efficiently scrape the web using different techniques with Python and other popular tools.

商品描述(中文翻譯)

網頁爬蟲是一種在許多組織中用來從網頁收集有價值數據的基本技術。本書將使您深入了解網頁爬蟲的技術和方法。

本書將介紹網頁爬蟲技術的基本概念,以及如何將其應用於多組網頁。您將使用來自 Python 生態系統的強大庫,如 Scrapy、lxml、pyquery 和 bs4 來執行網頁爬蟲操作。接著,您將熟悉從簡單到中等的爬蟲操作,例如從網頁中識別信息,並使用模式或屬性來檢索信息。本書採取實用的方法來介紹網頁爬蟲的概念和工具,指導您通過一系列的使用案例,並展示如何使用最佳工具和技術來高效地爬取網頁。您還將涵蓋其他流行的網頁爬蟲工具,如 Selenium、Regex 和基於網頁的 API 的使用。

在本書結束時,您將學會如何使用不同的技術和其他流行工具高效地爬取網頁。