Using OpenRefine
暫譯: 使用 OpenRefine

Ruben Verborgh, Max De Wilde

  • 出版商: Packt Publishing
  • 出版日期: 2013-09-10
  • 售價: $1,660
  • 貴賓價: 9.5$1,577
  • 語言: 英文
  • 頁數: 114
  • 裝訂: Paperback
  • ISBN: 1783289082
  • ISBN-13: 9781783289080
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

With this book on OpenRefine, managing and cleaning your large datasets suddenly got a lot easier! With a cookbook approach and free datasheets included, you'll quickly and painlessly improve your data managing capabilities.

Overview

  • Create links between your dataset and others in an instant
  • Effectively transform data with regular expressions and the General Refine Expression Language
  • Spot issues in your dataset and take effective action with just a few clicks

In Detail

Data is supposed to be the new gold, but how can you unlock the value in your data? Managing large datasets used to be a task for specialists, but you don't have to worry about inconsistencies or errors anymore. OpenRefine lets you clean, link, and publish your dataset in a breeze.

Using OpenRefine takes you on a practical tour of all the handy features of this well-known data transformation tool. It is a hands-on recipe book that teaches you data techniques by example. Starting from the basics, it gradually transforms you into an OpenRefine expert.

This book will teach you all the necessary skills to handle any large dataset and to turn it into high-quality data for the Web. After you learn how to analyze data and spot issues, we'll see how we can solve them to obtain a clean dataset. Messy and inconsistent data is recovered through advanced techniques such as automated clustering. We'll then show extract links from keyword and full-text fields using reconciliation and named-entity extraction.

Using OpenRefine is more than a manual: it's a guide stuffed with tips and tricks to get the best out of your data.

What you will learn from this book

  • Import data in various formats
  • Explore datasets in a matter of seconds
  • Apply basic and advanced cell transformations
  • Deal with cells that contain multiple values
  • Create instantaneous links between datasets
  • Filter and partition your data easily with regular expressions
  • Use named-entity extraction on full-text fields to automatically identify topics
  • Perform advanced data operations with the General Refine Expression Language

Approach

The book is styled on a Cookbook, containing recipes - combined with free datasets - which will turn readers into proficient OpenRefine users in the fastest possible way.

Who this book is written for

This book is targeted at anyone who works on or handles a large amount of data. No prior knowledge of OpenRefine is required, as we start from the very beginning and gradually reveal more advanced features. You don't even need your own dataset, as we provide example data to try out the book's recipes.

商品描述(中文翻譯)

這本關於 OpenRefine 的書籍,讓您管理和清理大型數據集變得更加輕鬆!透過食譜式的方式和附帶的免費數據表,您將快速且輕鬆地提升您的數據管理能力。

**概述**
- 立即在您的數據集與其他數據集之間建立連結
- 使用正則表達式和通用 Refine 表達式語言有效轉換數據
- 只需幾次點擊即可發現數據集中的問題並採取有效行動

**詳細內容**
數據被認為是新金子,但您如何解鎖數據的價值呢?管理大型數據集曾經是專家的任務,但您不必再擔心不一致或錯誤。OpenRefine 讓您輕鬆清理、連結和發布您的數據集。

使用 OpenRefine 將帶您實際體驗這個知名數據轉換工具的所有便利功能。這是一本實用的食譜書,通過範例教您數據技術。從基礎開始,它逐步將您轉變為 OpenRefine 專家。

這本書將教您處理任何大型數據集所需的所有技能,並將其轉化為高品質的網絡數據。在學會如何分析數據和發現問題後,我們將看看如何解決這些問題以獲得乾淨的數據集。雜亂和不一致的數據將通過自動聚類等高級技術進行恢復。我們接著將展示如何使用對照和命名實體提取從關鍵字和全文字段中提取連結。

使用 OpenRefine 不僅僅是一本手冊:它是一本充滿技巧和竅門的指南,幫助您充分利用數據。

**您將從這本書中學到什麼**
- 以各種格式導入數據
- 在幾秒鐘內探索數據集
- 應用基本和高級單元格轉換
- 處理包含多個值的單元格
- 在數據集之間創建瞬時連結
- 使用正則表達式輕鬆過濾和劃分數據
- 在全文字段上使用命名實體提取自動識別主題
- 使用通用 Refine 表達式語言執行高級數據操作

**方法**
本書以食譜的形式編寫,包含食譜 - 結合免費數據集 - 將使讀者以最快的方式成為熟練的 OpenRefine 使用者。

**本書的目標讀者**
本書針對任何處理大量數據的人士。無需具備 OpenRefine 的先前知識,因為我們從最基本的開始,逐步揭示更高級的功能。您甚至不需要自己的數據集,因為我們提供範例數據以便您嘗試書中的食譜。