Website Scraping with Python: Using BeautifulSoup and Scrapy
暫譯: 使用 Python 進行網站爬蟲:運用 BeautifulSoup 和 Scrapy

Gábor László Hajba

  • 出版商: Apress
  • 出版日期: 2018-09-15
  • 售價: $2,370
  • 貴賓價: 9.5$2,252
  • 語言: 英文
  • 頁數: 244
  • 裝訂: Paperback
  • ISBN: 1484239245
  • ISBN-13: 9781484239247
  • 相關分類: Python程式語言Web-crawler 網路爬蟲
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Closely examine website scraping and data processing: the technique of extracting data from websites in a format suitable for further analysis. You'll review which tools to use, and compare their features and efficiency. Focusing on BeautifulSoup4 and Scrapy, this concise, focused book highlights common problems and suggests solutions that readers can implement on their own.

 
Website Scraping with Python starts by introducing and installing the scraping tools and explaining the features of the full application that readers will build throughout the book. You'll see how to use BeautifulSoup4 and Scrapy individually or together to achieve the desired results. Because many sites use JavaScript, you'll also employ Selenium with a browser emulator to render these sites and make them ready for scraping.
 
By the end of this book, you'll have a complete scraping application to use and rewrite to suit your needs. As a bonus, the author shows you options of how to deploy your spiders into the Cloud to leverage your computer from long-running scraping tasks.

 

 
What You'll Learn
  • Install and implement scraping tools individually and together
  • Run spiders to crawl websites for data from the cloud
  • Work with emulators and drivers to extract data from scripted sites
 
Who This Book Is For
 
Readers with some previous Python and software development experience, and an interest in website scraping.

商品描述(中文翻譯)

深入探討網站擷取和數據處理:這是一種從網站中提取數據的技術,將其轉換為適合進一步分析的格式。您將回顧應該使用哪些工具,並比較它們的特性和效率。本書專注於 BeautifulSoup4 和 Scrapy,突顯常見問題並提出讀者可以自行實施的解決方案。

 


使用 Python 進行網站擷取 開始時介紹並安裝擷取工具,並解釋讀者在整本書中將構建的完整應用程序的特性。您將學會如何單獨或一起使用 BeautifulSoup4 和 Scrapy 來達成所需的結果。由於許多網站使用 JavaScript,您還將使用 Selenium 和瀏覽器模擬器來渲染這些網站,並使其準備好進行擷取。

 

在本書結束時,您將擁有一個完整的擷取應用程序,可以根據您的需求進行使用和重寫。作為額外獎勵,作者將向您展示如何將您的爬蟲部署到雲端,以利用您的計算機來處理長時間運行的擷取任務。

 


 

您將學到什麼



  • 單獨和一起安裝及實施擷取工具

  • 運行爬蟲從雲端爬取網站數據

  • 使用模擬器和驅動程序從腳本網站中提取數據


 

本書適合誰閱讀

 

具有一定 Python 和軟體開發經驗的讀者,以及對網站擷取感興趣的人。

最後瀏覽商品 (20)