Genomics in the Cloud: GATK, Spark, and Docker
暫譯: 雲端基因組學:GATK、Spark 與 Docker

Brian D. O'Connor, Geraldine van der Auwera

  • 出版商: O'Reilly
  • 出版日期: 2020-05-12
  • 定價: $2,800
  • 售價: 8.8$2,464 (限時優惠至 2025-02-02)
  • 語言: 英文
  • 頁數: 475
  • 裝訂: Paperback
  • ISBN: 1491975199
  • ISBN-13: 9781491975190
  • 相關分類: DockerSpark
  • 立即出貨 (庫存=1)

買這商品的人也買了...

相關主題

商品描述

Data in the genomics field is booming. In just a few years, organizations such as the National Institutes of Health (NIH) will host 50+ petabytes—or 52.4 million gigabytes—of genomic data, and they’re turning to cloud infrastructure to make that data available to the research community. How do you adapt analysis tools and protocols to access and analyze that data in the cloud?

With this practical book, researchers will learn how to work with genomics algorithms using open source tools including the Genome Analysis Toolkit (GATK), Docker, WDL, and Terra. Brian O’Connor of the UC Santa Cruz Genomics Institute and Geraldine Van der Auwera, longtime custodian of the GATK user community, guide you through the process. You’ll learn by working with real data and genomics algorithms from the field.

This book takes you through:

  • Essential genomics and computing technology background
  • Basic cloud computing operations
  • Getting started with GATK
  • Three major GATK best practices for variant discovery pipelines
  • Automating analysis with scripted workflows using WDL and Cromwell
  • Scaling up workflow execution in the cloud, including parallelization and cost optimization
  • Interactive analysis in the cloud using Jupyter notebooks
  • Secure collaboration and computational reproducibility using Terra

商品描述(中文翻譯)

資料科學在基因組學領域正在蓬勃發展。在短短幾年內,像是國立衛生研究院(NIH)這樣的組織將擁有超過50個PB(即52.4百萬GB)的基因組數據,並且他們正轉向雲端基礎設施,以便將這些數據提供給研究社群。您如何調整分析工具和協議,以便在雲端訪問和分析這些數據?

在這本實用的書中,研究人員將學習如何使用開源工具(包括Genome Analysis Toolkit (GATK)、Docker、WDL和Terra)來處理基因組算法。來自加州大學聖克魯斯分校基因組學研究所的Brian O’Connor和GATK用戶社群的長期管理者Geraldine Van der Auwera將指導您完成這個過程。您將通過處理來自該領域的真實數據和基因組算法來學習。

本書將帶您了解:

- 基因組學和計算技術的基本背景
- 基本的雲端計算操作
- 開始使用GATK
- 變異發現管道的三個主要GATK最佳實踐
- 使用WDL和Cromwell自動化分析的腳本化工作流程
- 在雲端擴展工作流程執行,包括平行化和成本優化
- 使用Jupyter notebooks進行雲端互動分析
- 使用Terra進行安全協作和計算可重現性

作者簡介

Dr. Geraldine A. Van der Auwera is the Director of Outreach and Communication for the Data Sciences Platform (DSP) at the Broad Institute of MIT and Harvard. As part of her outreach role, she serves as an educator and advocate for researchers who use DSP software and services including GATK, the Broad's industry-leading toolkit for variant discovery analysis; the Cromwell/WDL workflow management system; and Terra.bio, a cloud-based analysis platform that integrates computational resources, methods repository and data management in a user-friendly environment. Van der Auwera was originally trained as a microbiologist, earning her Ph.D. in Biological Engineering from the Université catholique de Louvain (UCL) in Belgium in 2007, then surviving a 4-year postdoctoral stint at Harvard Medical School. She joined the Broad Institute in 2012 to become Benevolent Dictator For Life of the GATK user community, leaving behind the bench and pipette work forever.

Dr. Brian O’Connor is the Technical Director of the UCSC Genomics Institute Analysis Core. There he focuses on the development and deployment of large-scale, cloud-based systems for analyzing genomics data. This includes the Toil workflow execution platform, which is designed to run genomic pipelines on a wide range of cloud environments including AWS, Azure, Google and OpenStack, and ADAM, a distributed genomics platform developed in collaboration with UC Berkeley. He is also the co-chair of the Containers and Workflows task team of the Global Alliance for Genomics and Health (GA4GH) where he works on tool and workflow container standards. Brian recently joined UCSC from the Ontario Institute for Cancer Research (OICR) where his previous projects included leading the technical implementation of cloud-based analysis systems for the PanCancer Analysis of Whole Genomes (PCAWG) effort, the creation of the Dockstore project (http://dockstore.org), and the development of the International Cancer Genome Consortium’s Data Portal (http://dcc.icgc.org).

作者簡介(中文翻譯)

Dr. Geraldine A. Van der Auwera 是麻省理工學院與哈佛大學布羅德研究所(Broad Institute)數據科學平台(Data Sciences Platform, DSP)的外聯與溝通主任。作為外聯角色的一部分,她擔任教育者和倡導者,支持使用 DSP 軟體和服務的研究人員,包括 GATK,布羅德的行業領先變異發現分析工具包;Cromwell/WDL 工作流程管理系統;以及 Terra.bio,一個雲端分析平台,整合計算資源、方法庫和數據管理,提供友好的使用者環境。Van der Auwera 最初接受微生物學訓練,於 2007 年在比利時的魯汶天主教大學(Université catholique de Louvain, UCL)獲得生物工程博士學位,隨後在哈佛醫學院完成了四年的博士後研究。她於 2012 年加入布羅德研究所,成為 GATK 使用者社群的終身仁慈獨裁者,永遠告別了實驗室和移液管的工作。

Dr. Brian O’Connor 是加州大學聖克魯斯分校基因組學研究所分析核心的技術主任。他專注於開發和部署大規模的雲端系統,以分析基因組數據。這包括 Toil 工作流程執行平台,旨在在包括 AWS、Azure、Google 和 OpenStack 等多種雲環境中運行基因組管道,以及與加州大學伯克利分校合作開發的分佈式基因組平台 ADAM。他還是全球基因組學與健康聯盟(Global Alliance for Genomics and Health, GA4GH)容器與工作流程任務小組的共同主席,致力於工具和工作流程容器標準的制定。Brian 最近從安大略癌症研究所(Ontario Institute for Cancer Research, OICR)加入 UCSC,之前的項目包括領導雲端分析系統的技術實施,支持全基因組的泛癌症分析(PanCancer Analysis of Whole Genomes, PCAWG)工作,創建 Dockstore 項目(http://dockstore.org),以及開發國際癌症基因組聯盟的數據入口網站(http://dcc.icgc.org)。