Bioinformatics: Managing Scientific Data (Hardcover)
暫譯: 生物資訊學:科學數據管理 (精裝版)

Zoe Lacroix, Terence Critchlow

  • 出版商: Morgan Kaufmann
  • 出版日期: 2003-07-18
  • 定價: $1,980
  • 售價: 5.0$990
  • 語言: 英文
  • 頁數: 441
  • 裝訂: Hardcover
  • ISBN: 885440649X
  • ISBN-13: 9781558608290
  • 相關分類: 生物資訊 Bioinformatics
  • 立即出貨

買這商品的人也買了...

相關主題

商品描述

Life science data integration and interoperability is one of the most challenging problems facing bioinformatics today. In the current age of the life sciences, investigators have to interpret many types of information from a variety of sources: lab instruments, public databases, gene expression profiles, raw sequence traces, single nucleotide polymorphisms, chemical screening data, proteomic data, putative metabolic pathway models, and many others. Unfortunately, scientists are not currently able to easily identify and access this information because of the variety of semantics, interfaces, and data formats used by the underlying data sources.
Bioinformatics: Managing Scientific Data tackles this challenge head-on by discussing the current approaches and variety of systems available to help bioinformaticians with this increasingly complex issue. The heart of the book lies in the collaboration efforts of eight distinct bioinformatics teams that describe their own unique approaches to data integration and interoperability. Each system receives its own chapter where the lead contributors provide precious insight into the specific problems being addressed by the system, why the particular architecture was chosen, and details on the system’s strengths and weaknesses. In closing, the editors provide important criteria for evaluating these systems that bioinformatics professionals will find valuable.

Contents

1 Introduction
Zoe Lacroix and Terence Critchlow
1.1 Overview
1.2 Problem and Scope
1.3 Biological Data Integration
1.4 Developing a Biological Data Integration System
1.4.1 Specifications
1.4.2 Translating Specifications into a Technical Approach
1.4.3 Development Process
1.4.4 Evaluation of the System
References

2 Challenges Faced in the Integration of Biological
Information
Su Yun Chung and John C. Wooley
2.1 The Life Science Discovery Process
2.2 An Information Integration Environment for Life Science Discovery
2.3 The Nature of Biological Data
2.3.1 Diversity
2.3.2 Variability
2.4 Data Sources in Life Science
2.4.1 Biological Databases Are Autonomous
2.4.2 Biological Databases Are Heterogeneous in Data Formats
2.4.3 Biological Data Sources Are Dynamic
2.4.4 Computational Analysis Tools Require Specific
Input/Output Formats and Broad Domain Knowledge
2.5 Challenges in Information Integration
2.5.1 Data Integration
2.5.2 Meta-Data Specification
2.5.3 Data Provenance and Data Accuracy
2.5.4 Ontology
2.5.5 Web Presentations
Conclusion
References

3 A Practitioner’s Guide to Data Management and Data
Integration in Bioinformatics
Barbara A. Eckman
3.1 Introduction
3.2 Data Management in Bioinformatics
3.2.1 Data Management Basics
3.2.2 Two Popular Data Management Strategies
and Their Limitations
3.2.3 Traditional Database Management
3.3 Dimensions Describing the Space of Integration Solutions
3.3.1 A Motivating Use Case for Integration
3.3.2 Browsing vs. Querying
3.3.3 Syntactic vs. Semantic Integration
3.3.4 Warehouse vs. Federation
3.3.5 Declarative vs. Procedural Access
3.3.6 Generic vs. Hard-Coded
3.3.7 Relational vs. Non-Relational Data Model
3.4 Use Cases of Integration Solutions
3.4.1 Browsing-Driven Solutions
3.4.2 Data Warehousing Solutions
3.4.3 Federated Database Systems Approach
3.4.4 Semantic Data Integration
3.5 Strengths and Weaknesses of the Various Approaches to Integration
3.5.1 Browsing and Querying: Strengths and Weaknesses
3.5.2 Warehousing and Federation: Strengths and Weaknesses
3.5.3 Procedural Code and Declarative Query Language:
Strengths and Weaknesses
3.5.4 Generic and Hard-Coded Approaches:
Strengths and Weaknesses
3.5.5 Relational and Non-Relational Data Models: Strengths
and Weaknesses
3.5.6 Conclusion: A Hybrid Approach to Integration Is Ideal
3.6 Tough Problems in Bioinformatics Integration
3.6.1 Semantic Query Planning Over Web Data Sources
3.6.2 Schema Management
3.7 Summary
Acknowledgments
References

4 Issues to Address While Designing a Biological
Information System
Zoe Lacroix
4.1 Legacy
4.1.1 Biological Data
4.1.2 Biological Tools and Workflows
4.2 A Domain in Constant Evolution
4.2.1 Traditional Database Management and Changes
4.2.2 Data Fusion
4.2.3 Fully Structured vs. Semi-Structured
4.2.4 Scientific Object Identity
4.2.5 Concepts and Ontologies
4.3 Biological Queries
4.3.1 Searching and Mining
4.3.2 Browsing
4.3.3 Semantics of Queries
4.3.4 Tool-Driven vs. Data-Driven Integration
4.4 Query Processing
4.4.1 Biological Resources
4.4.2 Query Planning
4.4.3 Query Optimization
4.5 Visualization
4.5.1 Multimedia Data
4.5.2 Browsing Scientific Objects
4.6 Conclusion
Acknowledgments
References

5 SRS: An Integration Platform for Databanks
and Analysis Tools in Bioinformatics
Thure Etzold, Howard Harris, and Simon Beaulah
5.1 Integrating Flat File Databanks
5.1.1 The SRS Token Server
5.1.2 Subentry Libraries
5.2 Integration of XML Databases
5.2.1 What Makes XML Unique?
5.2.2 How Are XML Databanks Integrated into SRS?
5.2.3 Overview of XML Support Features
5.2.4 How Does SRS Meet the Challenges of XML?
5.3 Integrating Relational Databases
5.3.1 Whole Schema Integration
5.3.2 Capturing the Relational Schema
5.3.3 Selecting a Hub Table
5.3.4 Generation of SQL
5.3.5 Restricting Access to Parts of the Schema
5.3.6 Query Performance to Relational Databases
5.3.7 Viewing Entries from a Relational Databank
5.3.8 Summary
5.4 The SRS Query Language
5.4.1 SRS Fields
5.5 Linking Databanks
5.5.1 Constructing Links
5.5.2 The Link Operators
5.6 The Object Loader
5.6.1 Creating Complex and Nested Objects
5.6.2 Support for Loading from XML Databanks
5.6.3 Using Links to Create Composite Structures
5.6.4 Exporting Objects to XML
5.7 Scientific Analysis Tools
5.7.1 Processing of Input and Output
5.7.2 Batch Queues
5.8 Interfaces to SRS
5.8.1 The Web Interface
5.8.2 SRS Objects
5.8.3 SOAP and Web Services
5.9 Automated Server Maintenance with SRS Prisma
5.10 Conclusion
References

6 The Kleisli Query System as a Backbone for
Bioinformatics Data Integration and Analysis
Jing Chen, Su Yun Chung, and Limsoon Wong
6.1 Motivating Example
6.2 Approach
6.3 Data Model and Representation
6.4 Query Capability
6.5 Warehousing Capability
6.6 Data Sources
6.7 Optimizations
6.7.1 Monadic Optimizations
6.7.2 Context-Sensitive Optimizations
6.7.3 Relational Optimizations
6.8 User Interfaces
6.8.1 Programming Language Interface
6.8.2 Graphical Interface
6.9 Other Data Integration Technologies
6.9.1 SRS
6.9.2 DiscoveryLink
6.9.3 Object-Protocol Model (OPM)
6.10 Conclusions
References


7 Complex Query Formulation Over Diverse
Information Sources in TAMBIS
Robert Stevens, Carole Goble, Norman W. Paton,
Sean Bechhofer, Gary Ng, Patricia Baker, and Andy Brass
7.1 The Ontology
7.2 The User Interface
7.2.1 Exploring the Ontology
7.2.2 Constructing Queries
7.2.3 The Role of Reasoning in Query Formulation
7.3 The Query Processor
7.3.1 The Sources and Services Model
7.3.2 The Query Planner
7.3.3 The Wrappers
7.4 Related Work
x Contents
7.4.1 Information Integration in Bioinformatics
7.4.2 Knowledge Based Information Integration
7.4.3 Biological Ontologies
7.5 Current and Future Developments in TAMBIS
7.5.1 Summary
Acknowledgments
References


8 The Information Integration System K2
Val Tannen, Susan B. Davidson, and Scott Harker
8.1 Approach
8.2 Data Model and Languages
8.3 An Example
8.4 Internal Language
8.5 Data Sources
8.6 Query Optimization
8.7 User Interfaces
8.8 Scalability
8.9 Impact
8.10 Summary
Acknowledgments
References


9 P/FDM Mediator for a Bioinformatics Database
Federation
Graham J. L. Kemp and Peter M. D. Gray
9.1 Approach
9.1.1 Alternative Architectures for Integrating Databases
9.1.2 The Functional Data Model
9.1.3 Schemas in the Federation
9.1.4 Mediator Architecture
9.1.5 Example
9.1.6 Query Capabilities
9.1.7 Data Sources
9.2 Analysis
9.2.1 Optimization
9.2.2 User Interfaces
9.2.3 Scalability
9.3 Conclusions
Acknowledgment
References


10 Integration Challenges in Gene Expression Data
Management
Victor M. Markowitz, John Campbell, I-Min A. Chen,
Anthony Kosky, Krishna Palaniappan,
and Thodoros Topaloglou
10.1 Gene Expression Data Management: Background
10.1.1 Gene Expression Data Spaces
10.1.2 Standards: Benefits and Limitations
10.2 The GeneExpress System
10.2.1 GeneExpress System Components
10.2.2 GeneExpress Deployment and Update Issues
10.3 Managing Gene Expression Data: Integration Challenges
10.3.1 Gene Expression Data: Array Versions
10.3.2 Gene Expression Data: Algorithms and Normalization
10.3.3 Gene Expression Data: Variability
10.3.4 Sample Data
10.3.5 Gene Annotations
10.4 Integrating Third-Party Gene Expression Data in GeneExpress
10.4.1 Data Exchange Formats
10.4.2 Structural Data Transformation Issues
10.4.3 Semantic Data Mapping Issues
10.4.4 Data Loading Issues
10.4.5 Update Issues
10.5 Summary
Acknowledgments
Trademarks
References


11 DiscoveryLink
Laura M. Haas, Barbara A. Eckman, Prasad Kodali,
Eileen T. Lin, Julia E. Rice, and Peter M. Schwarz
11.1 Approach
11.1.1 Architecture
11.1.2 Registration
11.2 Query Processing Overview
11.2.1 Query Optimization
11.2.2 An Example
11.2.3 Determining Costs
11.3 Ease of Use, Scalability, and Performance
11.4 Conclusions
References


12 A Model-Based Mediator System for Scientific Data
Management
Bertram Ludascher, Amarnath Gupta,
and Maryann E. Martone
12.1 Background
12.2 Scientific Data Integration Across Multiple Worlds: Examples
and Challenges from the Neurosciences
12.2.1 From Terminology and Static Knowledge
to Process Context
12.3 Model-Based Mediation
12.3.1 Model-Based Mediation: The Protagonists
12.3.2 Conceptual Models and Registration
of Sources at the Mediator
12.3.3 Interplay Between Mediator and Sources
12.4 Knowledge Representation for Model-Based Mediation
12.4.1 Domain Maps
12.4.2 Process Maps
12.5 Model-Based Mediator System and Tools
12.5.1 The KIND Mediator Prototype
12.5.2 The Cell-Centered Database and SMART Atlas:
Retrieval and Navigation Through
Multi-Scale Data
12.6 Related Work and Conclusion
12.6.1 Related Work
12.6.2 Summary: Model-Based Mediation
and Reason-Able Meta-Data
Acknowledgments
References

13 Compared Evaluation of Scientific Data
Management Systems
Zoe Lacroix and Terence Critchlow
13.1 Performance Model
13.1.1 Evaluation Matrix
13.1.2 Cost Model
13.1.3 Benchmarks
13.1.4 User Survey
13.2 Evaluation Criteria
13.2.1 The Implementation Perspective
13.2.2 The User Perspective
13.3 Tradeoffs
13.3.1 Materialized vs. Non-Materialized
13.3.2 Data Distribution and Heterogeneity
13.3.3 Semi-Structured Data vs. Fully Structured Data
13.3.4 Text Retrieval
13.3.5 Integrating Applications
13.4 Summary
References
Concluding Remarks
Summary
Looking Toward the Future
Appendix: Biological Resources
Glossary
System Information
SRS
Kleisli
TAMBIS
K2
P/FDM Mediator
GeneExpress
DiscoveryLink
KIND
Index

商品描述(中文翻譯)

生命科學數據整合與互操作性是當前生物資訊學面臨的最具挑戰性的問題之一。在當前的生命科學時代,研究人員必須解釋來自多種來源的多種類型的信息:實驗室儀器、公共數據庫、基因表達譜、原始序列痕跡、單核苷酸多態性、化學篩選數據、蛋白質組學數據、假定的代謝途徑模型等等。不幸的是,科學家目前無法輕易識別和訪問這些信息,因為底層數據來源使用了多種語義、介面和數據格式。生物資訊學:科學數據管理正面對這一挑戰,討論當前的各種方法和系統,以幫助生物資訊學家應對這一日益複雜的問題。本書的核心在於八個不同生物資訊學團隊的合作努力,他們描述了各自獨特的數據整合和互操作性方法。每個系統都有自己的章節,主要貢獻者提供了對系統所解決的具體問題的寶貴見解,為何選擇特定架構,以及系統的優勢和劣勢的詳細信息。最後,編輯提供了評估這些系統的重要標準,生物資訊學專業人士將會發現這些標準非常有價值。

目錄
1 引言
Zoe Lacroix 和 Terence Critchlow
1.1 概述
1.2 問題與範圍
1.3 生物數據整合
1.4 開發生物數據整合系統
1.4.1 規範
1.4.2 將規範轉換為技術方法
1.4.3 開發過程
1.4.4 系統評估
參考文獻

2 整合生物信息時面臨的挑戰
Su Yun Chung 和 John C. Wooley
2.1 生命科學發現過程
2.2 生命科學發現的信息整合環境
2.3 生物數據的特性
2.3.1 多樣性
2.3.2 變異性
2.4 生命科學中的數據來源
2.4.1 生物數據庫是自主的
2.4.2 生物數據庫在數據格式上是異質的
2.4.3 生物數據來源是動態的
2.4.4 計算分析工具需要特定的輸入/輸出格式和廣泛的領域知識
2.5 信息整合中的挑戰
2.5.1 數據整合
2.5.2 元數據規範
2.5.3 數據來源和數據準確性
2.5.4 本體論
2.5.5 網頁展示
結論
參考文獻

3 生物資訊學中的數據管理與數據整合實務指南
Barbara A. Eckman
3.1 引言
3.2 生物資訊學中的數據管理
3.2.1 數據管理基礎
3.2.2 兩種流行的數據管理策略及其局限性
3.2.3 傳統數據庫管理
3.3 描述整合解決方案空間的維度
3.3.1 整合的激勵用例
3.3.2 瀏覽與查詢
3.3.3 語法整合與語義整合
3.3.4 數據倉庫與聯邦
3.3.5 聲明式與程序式訪問
3.3.6 通用與硬編碼
3.3.7 關聯與非關聯數據模型
3.4 整合解決方案的用例
3.4.1 瀏覽驅動的解決方案
3.4.2 數據倉儲解決方案
3.4.3 聯邦數據庫系統方法
3.4.4 語義數據整合
3.5 各種整合方法的優勢與劣勢
3.5.1 瀏覽與查詢:優勢與劣勢
3.5.2 倉儲與聯邦:優勢與劣勢
3.5.3 程序代碼與聲明式查詢語言:優勢與劣勢
3.5.4 通用與硬編碼方法:優勢與劣勢
3.5.5 關聯與非關聯數據模型:優勢與劣勢
3.5.6 結論:混合整合方法是理想的
3.6 生物資訊學整合中的棘手問題
3.6.1 基於語義的查詢規劃
3.6.2 架構管理
3.7 總結
致謝
參考文獻

4 設計生物信息系統時需解決的問題
Zoe Lacroix
4.1 遺留問題
4.1.1 生物數據
4.1.2 生物工具與工作流程
4.2 不斷演變的領域
4.2.1 傳統數據庫管理與變更
4.2.2 數據融合
4.2.3 完全結構化與半結構化
4.2.4 科學對象身份
4.2.5 概念與本體
4.3 生物查詢
4.3.1 搜索與挖掘
4.3.2 瀏覽
4.3.3 查詢的語義
4.3.4 工具驅動與數據驅動的整合
4.4 查詢處理
4.4.1 生物資源
4.4.2 查詢規劃
4.4.3 查詢優化
4.5 可視化
4.5.1 多媒體數據
4.5.2 瀏覽科學對象
4.6 結論
致謝
參考文獻

5 SRS:生物資訊學數據庫與分析工具的整合平台
Thure Etzold、Howard Harris 和 Simon Beaulah
5.1 整合平面文件數據庫
5.1.1 SRS 令牌伺服器
5.1.2 子條目庫
5.2 XML 數據庫的整合
5.2.1 XML 的獨特性
5.2.2 XML 數據庫如何整合到 SRS 中?
5.2.3 XML 支持功能概述
5.2.4 SRS 如何應對 XML 的挑戰?
5.3 關聯數據庫的整合
5.3.1 整體架構整合
5.3.2 捕獲關聯架構
5.3.3 選擇中心表
5.3.4 SQL 生成
5.3.5 限制對架構部分的訪問
5.3.6 對關聯數據庫的查詢性能
5.3.7 從關聯數據庫查看條目
5.3.8 總結
5.4 SRS 查詢語言
5.4.1 SRS 欄位
5.5 數據庫鏈接
5.5.1 構建鏈接
5.5.2 鏈接運算符
5.6 對象加載器
5.6.1 創建複雜和嵌套對象
5.6.2 支持從 XML 數據庫加載
5.6.3 使用鏈接創建複合結構
5.6.4 將對象導出到 XML
5.7 科學分析工具
5.7.1 輸入和輸出的處理
5.7.2 批量隊列
5.8 與 SRS 的介面
5.8.1 網頁介面
5.8.2 SRS 對象
5.8.3 SOAP 和網頁服務
5.9 使用 SRS Prisma 進行自動伺服器維護
5.10 結論
參考文獻

6 Kleisli 查詢系統作為生物資訊學數據整合與分析的支柱
Jing Chen、Su Yun Chung 和 Limsoon Wong
6.1 激勵示例
6.2 方法
6.3 數據模型與表示
6.4 查詢能力
6.5 倉儲能力
6.6 數據來源
6.7 優化
6.7.1 單子優化
6.7.2 上下文敏感優化
6.7.3 關聯優化
6.8 用戶介面
6.8.1 程式語言介面
6.8.2 圖形介面
6.9 其他數據整合技術
6.9.1 SRS
6.9.2 DiscoveryLink
6.9.3 對象-協議模型 (OPM)
6.10 結論
參考文獻

7 在 TAMBIS 中對多樣信息來源的複雜查詢公式
Robert Stevens、Carole Goble、Norman W. Paton、Sean Bechhofer、Gary Ng、Patricia Baker 和 Andy Brass
7.1 本體論
7.2 用戶介面
7.2.1 探索本體論
7.2.2 構建查詢
7.2.3 推理在查詢公式中的角色
7.3 查詢處理器
7.3.1 來源和服務模型
7.3.2 查詢規劃
7.3.3 包裝器
7.4 相關工作
7.4.1 生物資訊學中的信息整合
7.4.2 基於知識的信息整合
7.4.3 生物本體
7.5 TAMBIS 的當前與未來發展
7.5.1 總結
致謝
參考文獻

8 信息整合系統 K2
Val Tannen、Susan B. Davidson 和 Scott Harker
8.1 方法
8.2 數據模型與語言
8.3 示例
8.4 內部語言
8.5 數據來源
8.6 查詢優化
8.7 用戶介面
8.8 可擴展性
8.9 影響
8.10 總結
致謝
參考文獻

9 P/FDM 中介用於生物資訊學數據庫聯邦
Graham J. L. Kemp 和 Peter M. D. Gray
9.1 方法
9.1.1 整合數據庫的替代架構
9.1.2 功能數據模型
9.1.3 聯邦中的架構
9.1.4 中介架構
9.1.5 示例
9.1.6 查詢能力
9.1.7 數據來源
9.2 分析
9.2.1 優化
9.2.2 用戶介面
9.2.3 可擴展性
9.3 結論
致謝
參考文獻

10 基因表達數據管理中的整合挑戰
Victor M. Markowitz、John Campbell、I-Min A. Chen、Anthony Kosky、Krishna Palaniappan 和 Thodoros Topaloglou
10.1 基因表達數據管理:背景
10.1.1 基因表達數據空間
10.1.2 標準:優勢與局限
10.2 GeneExpress 系統
10.2.1 GeneExpress 系統組件
10.2.2 GeneExpress 部署與更新問題
10.3 管理基因表達數據:整合挑戰
10.3.1 基因表達數據:陣列版本
10.3.2 基因表達數據:算法與標準化
10.3.3 基因表達數據:變異性
10.3.4 樣本數據
10.3.5 基因註釋
10.4 在 GeneExpress 中整合第三方基因表達數據
10.4.1 數據交換格式
10.4.2 結構數據轉換問題
10.4.3 語義數據映射問題
10.4.4 數據加載問題
10.4.5 更新問題
10.5 總結
致謝
商標
參考文獻

11 DiscoveryLink
Laura M. Haas、Barbara A. Eckman、Prasad Kodali、Eileen T. Lin、Julia E. Rice 和 Peter M. Schwarz
11.1 方法
11.1.1 架構
11.1.2 註冊
11.2 查詢處理概述
11.2.1 查詢優化
11.2.2 示例
11.2.3 確定成本
11.3 易用性、可擴展性和性能
11.4 結論
參考文獻

12 基於模型的科學數據管理中介系統
Bertram Ludascher、Amarnath Gupta 和 Maryann E. Martone
12.1 背景
12.2 跨多個世界的科學數據整合:來自神經科學的示例與挑戰
12.2.1 從術語和靜態知識到過程上下文
12.3 基於模型的中介
12.3.1 基於模型的中介:主角
12.3.2 概念模型與中介的來源註冊
12.3.3 中介與來源之間的相互作用
12.4 基於模型的中介的知識表示
12.4.1 領域地圖
12.4.2 過程地圖
12.5 基於模型的中介系統與工具
12.5.1 KIND 中介原型
12.5.2 以細胞為中心的數據庫和 SMART Atlas:通過多尺度數據的檢索與導航
12.6 相關工作與結論
12.6.1 相關工作
12.6.2 總結:基於模型的中介與合理的元數據
致謝
參考文獻