Website Scraping with Python: Using BeautifulSoup and Scrapy (使用 Python 進行網站爬蟲:運用 BeautifulSoup 和 Scrapy)

Gábor László Hajba

  • 出版商: Apress
  • 出版日期: 2018-09-15
  • 售價: $2,330
  • 貴賓價: 9.5$2,214
  • 語言: 英文
  • 頁數: 244
  • 裝訂: Paperback
  • ISBN: 1484239245
  • ISBN-13: 9781484239247
  • 相關分類: Python程式語言Web-crawler 網路爬蟲
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

相關主題

商品描述

Closely examine website scraping and data processing: the technique of extracting data from websites in a format suitable for further analysis. You'll review which tools to use, and compare their features and efficiency. Focusing on BeautifulSoup4 and Scrapy, this concise, focused book highlights common problems and suggests solutions that readers can implement on their own.

 
Website Scraping with Python starts by introducing and installing the scraping tools and explaining the features of the full application that readers will build throughout the book. You'll see how to use BeautifulSoup4 and Scrapy individually or together to achieve the desired results. Because many sites use JavaScript, you'll also employ Selenium with a browser emulator to render these sites and make them ready for scraping.
 
By the end of this book, you'll have a complete scraping application to use and rewrite to suit your needs. As a bonus, the author shows you options of how to deploy your spiders into the Cloud to leverage your computer from long-running scraping tasks.

 

 
What You'll Learn
  • Install and implement scraping tools individually and together
  • Run spiders to crawl websites for data from the cloud
  • Work with emulators and drivers to extract data from scripted sites
 
Who This Book Is For
 
Readers with some previous Python and software development experience, and an interest in website scraping.

商品描述(中文翻譯)

深入研究網站爬蟲和數據處理:這是一種從網站中提取數據並以適合進一步分析的格式進行處理的技術。您將審查使用哪些工具,並比較它們的功能和效率。這本簡明扼要的書籍專注於BeautifulSoup4和Scrapy,突出了常見問題並提供讀者可以自行實施的解決方案。

《使用Python進行網站爬蟲》首先介紹並安裝爬蟲工具,並解釋讀者在整本書中將要建立的完整應用程序的功能。您將學習如何單獨或結合使用BeautifulSoup4和Scrapy來實現所需的結果。由於許多網站使用JavaScript,您還將使用瀏覽器仿真器Selenium來渲染這些網站,使其可以進行爬取。

通過閱讀本書,您將擁有一個完整的爬蟲應用程序,可以根據自己的需求進行修改和使用。作為額外的福利,作者還向您展示了如何將爬蟲部署到雲端,以利用計算機來處理長時間運行的爬取任務。

您將學到什麼:
- 單獨和結合安裝和實施爬蟲工具
- 運行爬蟲來從雲端爬取網站數據
- 使用仿真器和驅動程序從腳本化網站中提取數據

適合閱讀者:
具有一些Python和軟件開發經驗,對網站爬蟲感興趣的讀者。