R Web Scraping Quick Start Guide: Techniques and tools to crawl and scrape data from websites
暫譯: R 網頁爬蟲快速入門指南:從網站爬取和擷取數據的技術與工具
Olgun Aydin
- 出版商: Packt Publishing
- 出版日期: 2018-10-31
- 售價: $1,470
- 貴賓價: 9.5 折 $1,397
- 語言: 英文
- 頁數: 114
- 裝訂: Paperback
- ISBN: 1789138736
- ISBN-13: 9781789138733
-
相關分類:
R 語言、Web-crawler 網路爬蟲
海外代購書籍(需單獨結帳)
商品描述
Web Scraping techniques are getting more popular, since data is as valuable as oil in 21st century. Through this book get some key knowledge about using XPath, regEX; web scraping libraries for R like rvest and RSelenium technologies.
Key Features
- Techniques, tools and frameworks for web scraping with R
- Scrape data effortlessly from a variety of websites
- Learn how to selectively choose the data to scrape, and build your dataset
Book Description
Web scraping is a technique to extract data from websites. It simulates the behavior of a website user to turn the website itself into a web service to retrieve or introduce new data. This book gives you all you need to get started with scraping web pages using R programming.
You will learn about the rules of RegEx and Xpath, key components for scraping website data. We will show you web scraping techniques, methodologies, and frameworks. With this book's guidance, you will become comfortable with the tools to write and test RegEx and XPath rules.
We will focus on examples of dynamic websites for scraping data and how to implement the techniques learned. You will learn how to collect URLs and then create XPath rules for your first web scraping script using rvest library. From the data you collect, you will be able to calculate the statistics and create R plots to visualize them.
Finally, you will discover how to use Selenium drivers with R for more sophisticated scraping. You will create AWS instances and use R to connect a PostgreSQL database hosted on AWS. By the end of the book, you will be sufficiently confident to create end-to-end web scraping systems using R.
What you will learn
- Write and create regEX rules
- Write XPath rules to query your data
- Learn how web scraping methods work
- Use rvest to crawl web pages
- Store data retrieved from the web
- Learn the key uses of Rselenium to scrape data
Who this book is for
This book is for R programmers who want to get started quickly with web scraping, as well as data analysts who want to learn scraping using R. Basic knowledge of R is all you need to get started with this book.
Table of Contents
- Introduction to Web Scraping
- XML Path Language and Regular Expression Language
- Web Scraping with rvest
- Web Scraping with Rselenium
- Storing Data and Creating Cronjob
商品描述(中文翻譯)
**網頁擷取技術越來越受歡迎,因為在21世紀,數據的價值如同石油。通過本書,您將獲得有關使用XPath、正則表達式(regEX)以及R語言的網頁擷取庫,如rvest和RSelenium技術的關鍵知識。**
#### 主要特點
- 使用R進行網頁擷取的技術、工具和框架
- 從各種網站輕鬆擷取數據
- 學習如何選擇性地選擇要擷取的數據,並建立您的數據集
#### 書籍描述
網頁擷取是一種從網站提取數據的技術。它模擬網站用戶的行為,將網站本身轉變為一個網路服務,以檢索或引入新數據。本書提供了您開始使用R程式語言擷取網頁所需的一切。
您將學習正則表達式(RegEx)和XPath的規則,這是擷取網站數據的關鍵組件。我們將向您展示網頁擷取的技術、方法論和框架。在本書的指導下,您將能夠熟練使用工具來編寫和測試RegEx和XPath規則。
我們將專注於動態網站的擷取數據示例,以及如何實施所學的技術。您將學習如何收集URL,然後使用rvest庫為您的第一個網頁擷取腳本創建XPath規則。從您收集的數據中,您將能夠計算統計數據並創建R圖表來可視化它們。
最後,您將發現如何使用Selenium驅動程式與R進行更複雜的擷取。您將創建AWS實例並使用R連接到托管在AWS上的PostgreSQL數據庫。在本書結束時,您將對使用R創建端到端的網頁擷取系統充滿信心。
#### 您將學到什麼
- 編寫和創建正則表達式(regEX)規則
- 編寫XPath規則以查詢您的數據
- 學習網頁擷取方法的運作原理
- 使用rvest爬取網頁
- 儲存從網路檢索的數據
- 學習Rselenium擷取數據的關鍵用途
#### 本書適合誰
本書適合希望快速入門網頁擷取的R程式設計師,以及希望學習使用R進行擷取的數據分析師。您只需具備基本的R知識即可開始閱讀本書。
#### 目錄
1. 網頁擷取簡介
2. XML路徑語言和正則表達式語言
3. 使用rvest進行網頁擷取
4. 使用Rselenium進行網頁擷取
5. 儲存數據和創建Cronjob