Data and Text Processing for Health and Life Sciences
暫譯: 健康與生命科學的數據與文本處理

Couto, Francisco M.

  • 出版商: Springer
  • 出版日期: 2019-06-25
  • 售價: $6,330
  • 貴賓價: 9.5$6,014
  • 語言: 英文
  • 頁數: 98
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 3030138445
  • ISBN-13: 9783030138448
  • 海外代購書籍(需單獨結帳)

商品描述

This open access book is a step-by-step introduction on how shell scripting can help solve many of the data processing tasks that Health and Life specialists face everyday with minimal software dependencies. The examples presented in the book show how simple command line tools can be used and combined to retrieve data and text from web resources, to filter and mine literature, and to explore the semantics encoded in biomedical ontologies. To store data this book relies on open standard text file formats, such as TSV, CSV, XML, and OWL, that can be open by any text editor or spreadsheet application.

The first two chapters, Introduction and Resources, provide a brief introduction to the shell scripting and describe popular data resources in Health and Life Sciences. The third chapter, Data Retrieval, starts by introducing a common data processing task that involves multiple data resources. Then, this chapter explains how to automate each step of that task by introducing the required commands line tools one by one. The fourth chapter, Text Processing, shows how to filter and analyze text by using simple string matching techniques and regular expressions. The last chapter, Semantic Processing, shows how XPath queries and shell scripting is able to process complex data, such as the graphs used to specify ontologies.

Besides being almost immutable for more than four decades and being available in most of our personal computers, shell scripting is relatively easy to learn by Health and Life specialists as a sequence of independent commands. Comprehending them is like conducting a new laboratory protocol by testing and understanding its procedural steps and variables, and combining their intermediate results. Thus, this book is particularly relevant to Health and Life specialists or students that want to easily learn how to process data and text, and which in return may facilitate and inspire them to acquire deeper bioinformatics skills in the future.


商品描述(中文翻譯)

這本開放存取的書籍是一步一步介紹如何使用 Shell 腳本來解決健康與生命科學專家每天面對的許多數據處理任務,並且對軟體依賴性要求最低。書中提供的範例展示了如何使用和結合簡單的命令行工具來從網路資源檢索數據和文本、過濾和挖掘文獻,以及探索生物醫學本體中編碼的語義。為了儲存數據,本書依賴於開放標準的文本文件格式,如 TSV、CSV、XML 和 OWL,這些格式可以被任何文本編輯器或電子表格應用程式打開。

前兩章,介紹和資源,簡要介紹了 Shell 腳本並描述了健康與生命科學中流行的數據資源。第三章,數據檢索,首先介紹了一個涉及多個數據資源的常見數據處理任務。然後,本章逐一介紹自動化該任務每個步驟所需的命令行工具。第四章,文本處理,展示了如何使用簡單的字符串匹配技術和正則表達式來過濾和分析文本。最後一章,語義處理,展示了如何使用 XPath 查詢和 Shell 腳本來處理複雜數據,例如用於指定本體的圖形。

除了在過去四十多年幾乎不變且在我們大多數個人電腦上可用外,Shell 腳本對健康與生命科學專家來說相對容易學習,因為它是一系列獨立的命令。理解這些命令就像進行一個新的實驗室協議,通過測試和理解其程序步驟和變數,並結合它們的中間結果。因此,這本書對於希望輕鬆學習如何處理數據和文本的健康與生命科學專家或學生特別相關,這可能會促進並激勵他們在未來獲得更深入的生物信息學技能。

作者簡介

Francisco M. Couto is currently an associate professor with habilitation and vice-president of the Department of Informatics of FCUL, member of coordination board of the master in Bioinformatics and Computational Biology, and a member of LASIGE coordinating the XLDB research group and the Biomedical Informatics research line. He was an invited researcher at EBI, AFMB-CNRS, BioAlma during his doctoral studies. He received the Young Engineer Innovation Prize 2004 from the Portuguese Engineers Guild, and an honorable mention in the Scientific Prizes of Universidade de Lisboa in 2017.

作者簡介(中文翻譯)

Francisco M. Couto 目前是 FCUL 資訊學系的副教授及副系主任,並且是生物資訊學與計算生物學碩士課程的協調委員會成員,以及 LASIGE 的成員,負責 XLDB 研究小組和生物醫學資訊研究方向。在攻讀博士學位期間,他曾擔任 EBI、AFMB-CNRS 和 BioAlma 的受邀研究員。他於 2004 年獲得葡萄牙工程師公會頒發的青年工程師創新獎,並於 2017 年在里斯本大學的科學獎中獲得榮譽提名。

最後瀏覽商品 (20)