Database Systems: The Complete Book, 2/e (IE-Paperback)
暫譯: 資料庫系統:完整書籍(第二版,IE-平裝本)

Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom

  • 出版商: Prentice Hall
  • 出版日期: 2008-06-14
  • 定價: $1,180
  • 售價: 9.8$1,156
  • 語言: 英文
  • 頁數: 1203
  • ISBN: 0131354280
  • ISBN-13: 9780131354289
  • 相關分類: 資料庫
  • 相關翻譯: 數據庫系統實現, 2/e (簡中版)
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Description

For Database Systems and Database Design and Application courses offered at the junior, senior and graduate levels in Computer Science departments.

Written by well-known computer scientists, this introduction to database systems offers a comprehensive approach, focusing on database design, database use, and implementation of database applications and database management systems.

The first half of the book provides in-depth coverage of databases from the point of view of the database designer, user, and application programmer. It covers the latest database standards SQL:1999, SQL/PSM, SQL/CLI, JDBC, ODL, and XML, with broader coverage of SQL than most other texts. The second half of the book provides in-depth coverage of databases from the point of view of the DBMS implementor. It focuses on storage structures, query processing, and transaction management. The book covers the main techniques in these areas with broader coverage of query optimization than most other texts, along with advanced topics including multidimensional and bitmap indexes, distributed transactions, and information integration techniques.

 

Resources:

  • Open access Author Website ¿http://infolab.stanford.edu/~ullman/dscb.html¿includes Power Point slides, teaching notes, assignments, projects, Oracle Programming Guidelines, and solutions to selected exercises.
  • Instructor only Pearson Resources: Complete Solutions Manual (click on the Resources tab above to view downloadable files)

Features

  • Many real-world examples.
    • Offers a readable and engaging presentation.

  • Extensive treatment of database modeling–Includes detailed and separate explanations of how to use E/R and ODL to design databases.
    • Teaches about this important first step of the planning process.

  • Excellent, up-to-date and detailed coverage of SQL–Includes coverage of object-relational systems and many aspects of the new SQL:1999 standard.
    • Provides a more extensive treatment of query processing than other books on the market.

  • Discussion of the technologies used to connect database programming with C or Java code–Includes discussions of SQL/PSM, SQL/CLI, and JDBC.
    • Gives students practical advice on integrating state-of-the-art technologies with databases.

  • Coverage of advanced issues important to database designers and users.
    • Includes discussions of views, integrity constraints, assertions, triggers, transactions, authorization, and recursion in SQL:1999.

  • Discussions of how to successfully plan a database application before building it.
    • Reflects how these plans are developed in the real world.

  • Coverage of topics such as designing storage structures and implementing a variety of indexing schemes.
    • Shows students how to build efficient database management systems.

  • Extensive coverage of query processing and optimization.
    • Shows students how to fine tune database systems to improve performance.

  • Comprehensive coverage of transaction processing mechanisms for concurrency control and recovery, including distributed and long-duration transactions.
    • Shows how to design complex database systems that can handle real-world business applications.

  • Coverage of information integration, including data warehousing, mediation, OLAP, data-cube systems, and data mining.
    • Exposes readers to cutting edge technology used in business applications.

  • Extensive exercises–In almost every section.
    • Provides students with the opportunity to practice and apply the concepts they've learned in each chapter.

  • Please note that GOAL/Gradiance is no longer available with this book.

New to this Edition

  • Chapters have been extensively reorganized and augmented.
  • Relational modeling is covered in chapters 2-4.
  • Chapter 4 is devoted to high-level modeling, and includes the E/R model as well as UML (Unified Modeling Language).
  • The viewpoint of Chapter 3 - which focuses on functional and multivalued dependencies - has been modified, so that a functional dependency is now assumed to have a set of attributes on the right. Explicitly certain algorithms, including the “chase,” allow us to manipulate dependencies. The discussion of third normal form has been augmented to include the 3NF synthesis algorithm and to delineate the tradeoff between 3NF and BCNF.
  • Chapter 5 contains the coverage of relational algebra from the previous edition, and is joined by (part of) the treatment of Datalog from the old Chapter 10.
  • The discussion of recursion in Datalog is either moved to the book's Web site or combined with the treatment of recursive SQL in Chapter 10 of this edition.
  • Chapters 6-10 are devoted to aspects of SQL programming, and they represent a reorganization and augmentation of the earlier book's Chapters 6, 7, 8, and parts of 10. The material on indexes and views has been moved to its own chapter, number 8, and this material has been augmented with a discussion of important new topics, including materialized views, and automatic selection of indexes.
  • The New Chapter 9 is based on the old Chapter 8 (embedded SQL). It is introduced by a new section on 3-tiered architecture. It also includes an expanded discussion of JDBC and new coverage of PHP.
  • Chapter 10 collects a number of advanced SQL topics, including the nested-relation model, object-relational features of SQL, and data cubes.
  • Chapters 11 and 12, covering XML and XML-based systems, contain new and expanded material on modeling and programming, including XML schema, DTD's, XPath, XQuery, and XSLT.
  • Chapter 20, covering parallel and distributed databases, includes new sections on distributed query execution, the map-reduce framework for parallel computation, peer-to-peer databases and their implementation of distributed hash tables.
  • New sections on local-as-view mediators and entity resolution have been added to Chapter 21, which covers information integration.
  • The expanded data mining chapter includes material on association rules and frequent itemset mining, including both the famous A-Priori Algorithm and certain efficiency improvements. Key techniques of shingling, minhashing, locality-sensitive hashing, and clustering have been added.
  • An entirely new Chapter 23 addresses ways in which the Internet has impacted database technology through search engines and data-stream management systems.

商品描述(中文翻譯)

### 內容描述

**適用於計算機科學系的初級、高級及研究生層級的資料庫系統及資料庫設計與應用課程。**

本書由知名計算機科學家撰寫,提供了資料庫系統的綜合介紹,重點在於資料庫設計、資料庫使用及資料庫應用程式和資料庫管理系統的實作。

本書的前半部分從資料庫設計者、使用者和應用程式程式設計師的角度深入探討資料庫。涵蓋了最新的資料庫標準 SQL:1999、SQL/PSM、SQL/CLI、JDBC、ODL 和 XML,對 SQL 的涵蓋範圍比大多數其他書籍更廣泛。後半部分則從 DBMS 實作者的角度深入探討資料庫,重點在於儲存結構、查詢處理和交易管理。本書涵蓋了這些領域的主要技術,對查詢優化的探討比大多數其他書籍更為深入,並包括多維和位圖索引、分散式交易及資訊整合技術等進階主題。

資源:

- 開放存取作者網站?http://infolab.stanford.edu/~ullman/dscb.html?包括 Power Point 簡報、教學筆記、作業、專案、Oracle 程式設計指導方針及選定練習的解答。
- 僅限教師的 Pearson 資源:完整解答手冊(請點擊上方的資源標籤以查看可下載的檔案)

---

### 特點

- **許多真實世界的範例。**
- 提供可讀性高且引人入勝的呈現方式。

- **廣泛的資料庫建模處理** – 包括如何使用 E/R 和 ODL 設計資料庫的詳細和獨立解釋。
- 教導這一重要的規劃過程的第一步。

- **優秀、最新且詳細的 SQL 涵蓋** – 包括對物件關聯系統及新 SQL:1999 標準的許多方面的探討。
- 提供比市場上其他書籍更廣泛的查詢處理探討。

- **討論將資料庫程式設計與 C 或 Java 代碼連接的技術** – 包括 SQL/PSM、SQL/CLI 和 JDBC 的討論。
- 給予學生有關將尖端技術與資料庫整合的實用建議。

- **涵蓋對資料庫設計者和使用者重要的進階議題。**
- 包括對 SQL:1999 中的視圖、完整性約束、斷言、觸發器、交易、授權和遞迴的討論。

- **討論如何在建構資料庫應用程式之前成功規劃。**
- 反映這些計劃在現實世界中的發展方式。

- **涵蓋設計儲存結構和實作各種索引方案的主題。**
- 向學生展示如何建立高效的資料庫管理系統。

- **廣泛的查詢處理和優化涵蓋。**
- 向學生展示如何微調資料庫系統以改善性能。

- **全面涵蓋交易處理機制以進行併發控制和恢復,包括分散式和長時間交易。**
- 展示如何設計能夠處理現實世界商業應用的複雜資料庫系統。

- **涵蓋資訊整合,包括資料倉儲、中介、OLAP、資料立方體系統和資料挖掘。**
- 讓讀者接觸到商業應用中使用的尖端技術。

- **廣泛的練習** – 幾乎在每一個部分。
- 提供學生練習和應用他們在每一章中學到的概念的機會。

- 請注意,**GOAL/Gradiance** 現在不再與本書一起提供。

---

### 本版新內容

- 各章節已被廣泛重組和增補。
- 關聯建模在第 2-4 章中涵蓋。
- 第 4 章專注於高層次建模,包括 E/R 模型及 UML(統一建模語言)。
- 第 3 章的觀點 - 專注於功能和多值依賴 - 已被修改,現在假設功能依賴的右側有一組屬性。明確某些算法,包括“追逐”,允許我們操作依賴。對第三范式的討論已增補,包括 3NF 合成算法並界定 3NF 和 BCNF 之間的權衡。
- 第 5 章包含前一版的關聯代數內容,並與舊第 10 章的 Datalog 部分結合。
- Datalog 中的遞迴討論已移至本書的網站或與本版第 10 章的遞迴 SQL 討論結合。
- 第 6-10 章專注於 SQL 程式設計的各個方面,並重新組織和增補了早期書籍的第 6、7、8 章及第 10 章的部分內容。索引和視圖的材料已移至第 8 章,並增補了重要新主題的討論,包括物化視圖和自動索引選擇。
- 新的第 9 章基於舊的第 8 章(嵌入式 SQL)。它由有關三層架構的新部分引入,還包括擴展的 JDBC 討論和新的 PHP 涵蓋。
- 第 10 章收集了多個進階 SQL 主題,包括嵌套關係模型、SQL 的物件關聯特性和資料立方體。
- 第 11 和 12 章涵蓋 XML 和基於 XML 的系統,包含有關建模和程式設計的新擴展材料,包括 XML 架構、DTD、XPath、XQuery 和 XSLT。
- 第 20 章涵蓋平行和分散式資料庫,包括有關分散式查詢執行、平行計算的 map-reduce 框架、對等資料庫及其分散式雜湊表實作的新部分。
- 第 21 章涵蓋資訊整合,新增有關 local-as-view 中介和實體解析的新部分。
- 擴展的資料挖掘章節包括有關關聯規則和頻繁項集挖掘的材料,包括著名的 A-Priori 算法及某些效率改進。新增了 shingling、minhashing、局部敏感雜湊和聚類的關鍵技術。
- 全新的第 23 章探討了互聯網如何通過搜尋引擎和資料流管理系統影響資料庫技術。

作者簡介

Hector Garcia-Molina is the L. Bosack and S. Lerner Professor of Computer Science and Electrical Engineering at Stanford University. His research interests include digital libraries, information integration, and database applications on the Internet. He was a recipient of the SIGMOD Innovations Award and a member of PITAC (President's Information-Technology Advisory Council). He currently serves on the Board of Directors of Oracle Corp.


Jeffrey D. Ullman is the Stanford W. Ascherman Professor of Computer Science (emeritus) at Stanford University. He is the author or co-author of 16 books, including Elements of ML Programming (Prentice Hall 1998). His research interests include data mining, information integration, and electronic education. He is a member of the National Academy of Engineering, and recipient of a Guggenheim Fellowship, the Karl V. Karlstom Outstanding Educator Award, the SIGMOD Contributions and Edgar F. Codd Innovations Awards, and the Knuth Prize.
 

Jennifer Widom is Professor of Computer Science and Electrical Engineering at Stanford University. Her research interests span many aspects of nontraditional data management. She is an ACM Fellow and a member of the National Academy of Engineering, she received the ACM SIGMOD Edgar F. Codd Innovations award in 2007 and was a Guggenheim Fellow in 2000, and she has served on a variety of program committees, advisory boards, and editorial boards.

作者簡介(中文翻譯)

赫克托·加西亞-莫利納是史丹佛大學計算機科學與電機工程的L. Bosack和S. Lerner教授。他的研究興趣包括數位圖書館、資訊整合以及互聯網上的資料庫應用。他曾獲得SIGMOD創新獎,並且是PITAC(總統資訊科技諮詢委員會)的成員。他目前擔任Oracle Corp.的董事會成員。



傑佛瑞·D·烏爾曼是史丹佛大學的史丹佛·W·阿什曼計算機科學榮譽教授。他是16本書的作者或合著者,包括《ML程式設計要素》(Prentice Hall 1998)。他的研究興趣包括資料探勘、資訊整合和電子教育。他是美國國家工程院的成員,並獲得古根海姆獎學金、卡爾·V·卡爾斯頓傑出教育者獎、SIGMOD貢獻獎和埃德加·F·科德創新獎,以及克努斯獎。



珍妮佛·維多姆是史丹佛大學計算機科學與電機工程的教授。她的研究興趣涵蓋非傳統資料管理的多個方面。她是ACM會士及美國國家工程院的成員,於2007年獲得ACM SIGMOD埃德加·F·科德創新獎,並於2000年成為古根海姆獎學金得主,此外,她還曾擔任多個程序委員會、諮詢委員會和編輯委員會的成員。

目錄大綱

1 The Worlds of Database Systems
   1.1 The Evolution of Database Systems
   1.1.1 Early Database Management Systems
   1.1.2 Relational Database Systems
   1.1.3 Smaller and Smaller Systems
   1.1.4 Bigger and Bigger Systems
   1.1.5 Information Integration
   1.2 Overview of a Database Management System
   1.2.1 Data-Definition Language Commands
   1.2.2 Overview of Query Processing
   1.2.3 Storage and Buffer Management
   1.2.4 Transaction Processing
   1.2.5 The Query Processor
   1.3 Outline of Database-System Studies
   1.4 References for Chapter 1

PART I: Relational Database Modeling
2 The Relational Model of Data
   2.1 An Overview of Data Models
   2.1.1 What is a Data Model?
   2.1.2 Important Data Models
   2.1.3 The Relational Model in Brief
   2.1.4 The Semistructured Model in Brief
   2.1.5 Other Data Model
   2.1.6 Comparison of Modeling Approaches
   2.2 Basics of the Relational Model
   2.2.1 Attributes
   2.2.2 Schemas
   2.2.3 Tuples
   2.2.4 Domains
   2.2.5 Equivalent Representations of a Relation
   2.2.6 Relation Instances
   2.2.7 Keys of Relations
   2.2.8 An Example Database Schema
   2.2.9 Exercises for Section 2.2
   2.3 Defining a Relation Schema in SQL
   2.3.1 Relations in SQL
   2.3.2 Data Types
   2.3.3 Simple Table Declarations
   2.3.4 Modifying Relation Schemas
   2.3.5 Default Values
   2.3.6 Declaring Keys
   2.3.7 Exercises for Section 2.3
   2.4 An Algebraic Query Language
   2.4.1 Why Do We Need a Special Query Language?
   2.4.2 What is an Algebra?
   2.4.3 Overview of Relational Algebra
   2.4.4 Set Operations on Relations
   2.4.5 Projection
   2.4.6 Selection
   2.4.7 Cartesian Product
   2.4.8 Natural Joins
   2.4.9 Theta-Joins
   2.4.10 Combining Operations to Form Queries
   2.4.11 Naming and Renaming
   2.4.12 Relationships Among Operations
   2.4.13 A Linear Notation for Algebraic Expressions
   2.4.14 Exercises for Section 2.4
   2.5 Constraints on Relations
   2.5.1 Relational Algebra as a Constraint Language
   2.5.2 Referential Integrity Constraints
   2.5.3 Key Constraints
   2.5.4 Additional Constraint Examples
   2.5.5 Exercises for Section 2.5
   2.6 Summary of Chapter 2
   2.7 References for Chapter 2

3 Design Theory for Relational Databases
   3.1 Functional Dependencies
   3.1.1 Definition of Functional Dependency
   3.1.2 Keys of Relations
   3.1.3 Superkeys
   3.1.4 Exercises for Section 3.1
   3.2 Rules About Functional Dependencies
   3.2.1 Reasoning About Functional Dependencies
   3.2.2 The Splitting/Combining Rule
   3.2.3 Trivial Functional Dependencies
   3.2.4 Computing the Closure of Attributes
   3.2.5 Why the Closure Algorithm Works
   3.2.6 The Transitive Rule
   3.2.7 Closing Sets of Functional Dependencies
   3.2.8 Projecting Functional Dependencies
   3.2.9 Exercises for Section 3.2
   3.3 Design of Relational Database Schemas
   3.3.1 Anomalies
   3.3.2 Decomposing Relations
   3.3.3 Boyce-Codd Normal Form
   3.3.4 Decomposition into BCN
   3.3.5 Exercises for Section 3.
   3.4 Decomposition: The Good, Bad, and Ugly
   3.4.1 Recovering Information from a Decomposition
   3.4.2 The Chase Test for Lossless Join
   3.4.3 Why the Chase Works
   3.4.4 Dependency Preservation
   3.4.5 Exercises for Section 3.4
   3.5 Third Normal Form
   3.5.1 Definition of Third Normal Form
   3.5.2 The Synthesis Algorithm for 3NF Schemas
   3.5.3 Why the 3NF Synthesis Algorithm Work
   3.5.4 Exercises for Section 3.5
   3.6 Multivalued Dependencies
   3.6.1 Attribute Independence and Its Consequent Redundancy
   3.6.2 Definition of Multivalued Dependencies
   3.6.3 Reasoning About Multivalued Dependencies
   3.6.4 Fourth Normal Form
   3.6.5 Decomposition into Fourth Normal Form
   3.6.6 Relationships Among Normal Forms
   3.6.7 Exercises for Section 3.6
   3.7 An Algorithm for Discovering MVD's
   3.7.1 The Closure and the Chase
   3.7.2 Extending the Chase to MVD's
   3.7.3 Why the Chase Works for MVD's
   3.7.4 Projecting MVD's
   3.7.5 Exercises for Section 3.7
   3.8 Summary of Chapter 3
   3.9 References for Chapter 3

4 High-Level Database Models
   4.1 The Entity/Relationship Model
   4.1.1 Entity Sets
   4.1.2 Attributes
   4.1.3 Relationships
   4.1.4 Entity-Relationship Diagrams
   4.1.5 Instances of an E/R Diagram
   4.1.6 Multiplicity of Binary E/R Relationships
   4.1.7 Multiway Relationships
   4.1.8 Roles in Relationships
   4.1.9 Attributes on Relationships
   4.1.10 Converting Multiway Relationships to Binary
   4.1.11 Subclasses in the E/R Model
   4.1.12 Exercises for Section 4.1
   4.2 Design Principles
   4.2.1 Faithfulness
   4.2.2 Avoiding Redundancy
   4.2.3 Simplicity Counts
   4.2.4 Choosing the Right Relationships
   4.2.5 Picking the Right Kind of Element
   4.2.6 Exercises for Section 4.2
   4.3 Constraints in the E/R Model
   4.3.1 Keys in the E/R Model
   4.3.2 Representing Keys in the E/R Model
   4.3.3 Referential Integrity
   4.3.4 Degree Constraints
   4.3.5 Exercises for Section 4.3
   4.4 Weak Entity Sets
   4.4.1 Causes of Weak Entity Sets
   4.4.2 Requirements for Weak Entity Sets
   4.4.3 Weak Entity Set Notation
   4.4.4 Exercises for Section 4.4
   4.5 From E/R Diagrams to Relational Designs
   4.5.1 From Entity Sets to Relations
   4.5.2 From E/R Relationships to Relations
   4.5.3 Combining Relations
   4.5.4 Handling Weak Entity Sets
   4.5.5 Exercises for Section 4.5
   4.6 Converting Subclass Structures to Relations
   4.6.1 E/R-Style Conversion
   4.6.2 An Object-Oriented Approach
   4.6.3 Using Null Values to Combine Relations
   4.6.4 Comparison of Approaches
   4.6.5 Exercises for Section 4.6
   4.7 Unified Modeling Language
   4.7.1 UML Classes
   4.7.2 Keys for UML classes
   4.7.3 Associations
   4.7.4 Self-Associations
   4.7.5 Association Classes
   4.7.6 Subclasses in UML
   4.7.7 Aggregations and Compositions
   4.7.8 Exercises for Section 4.7
   4.8 From UML Diagrams to Relations
   4.8.1 UML-to-Relations Basics
   4.8.2 From UML Subclasses to Relations
   4.8.3 From Aggregations and Compositions to Relations
   4.8.4 The UML Analog of Weak Entity Sets
   4.8.5 Exercises for Section 4.8
   4.9 Object Definition Language
   4.9.1 Class Declarations
   4.9.2 Attributes in ODL
   4.9.3 Relationships in ODL
   4.9.4 Inverse Relationships
   4.9.5 Multiplicity of Relationships
   4.9.6 Types in ODL
   4.9.7 Subclasses in ODL
   4.9.8 Declaring Keys in ODL
   4.9.9 Exercises for Section 4.9
   4.10 From ODL Designs to Relational Designs
   4.10.1 From ODL Classes to Relations
   4.10.2 Complex Attributes in Classes
   4.10.3 Representing Set-Valued Attributes
   4.10.4 Representing Other Type Constructors
   4.10.5 Representing ODL Relationships
   4.10.6 Exercises for Section 4.10
   4.11 Summary of Chapter 4
   4.12 References for Chapter 4

PART II: Relational Database Programming
5 Algebraic and Logical Query Languages
   5.1 Relational Operations on Bags
   5.1.1 Why Bags?
   5.1.2 Union, Intersection, and Difference of Bags
   5.1.3 Projection of Bags
   5.1.4 Selection on Bags
   5.1.5 Product of Bags
   5.1.6 Joins of Bags
   5.1.7 Exercises for Section 5.1
   5.2 Extended Operators of Relational Algebra
   5.2.1 Duplicate Elimination
   5.2.2 Aggregation Operators
   5.2.3 Grouping
   5.2.4 The Grouping Operator
   5.2.5 Extending the Projection Operator
   5.2.6 The Sorting Operator
   5.2.7 Outerjoins
   5.2.8 Exercises for Section 5.2
   5.3 A Logic for Relations
   5.3.1 Predicates and Atoms
   5.3.2 Arithmetic Atoms
   5.3.3 Datalog Rules and Queries
   5.3.4 Meaning of Datalog Rules
   5.3.5 Extensional and Intensional Predicates
   5.3.6 Datalog Rules Applied to Bags
   5.3.7 Exercises for Section 5.3
   5.4 Relational Algebra and Datalog
   5.4.1 Boolean Operations
   5.4.2 Projection
   5.4.3 Selection
   5.4.4 Product
   5.4.5 Joins
   5.4.6 Simulating Multiple Operations with Datalog
   5.4.7 Comparison Between Datalog and Relational Algebra
   5.4.8 Exercises for Section 5.4
   5.5 Summary of Chapter 5
   5.6 References for Chapter 5

6 The Database Language SQL
   6.1 Simple Queries in SQL
   6.1.1 Projection in SQL
   6.1.2 Selection in SQL
   6.1.3 Comparison of Strings
   6.1.4 Pattern Matching in SQL
   6.1.5 Dates and Times
   6.1.6 Null Values and Comparisons Involving {\tt NULL}
   6.1.7 The Truth-Value {\tt UNKNOWN}
   6.1.8 Ordering the Output
   6.1.9 Exercises for Section 6.1
   6.2 Queries Involving More Than One Relation
   6.2.1 Products and Joins in SQL
   6.2.2 Disambiguating Attributes
   6.2.3 Tuple Variables
   6.2.4 Interpreting Multirelation Queries
   6.2.5 Union, Intersection, and Difference of Queries
   6.2.6 Exercises for Section 6.2
   6.3 Subqueries
   6.3.1 Subqueries that Produce Scalar Values
   6.3.2 Conditions Involving Relations
   6.3.3 Conditions Involving Tuples
   6.3.4 Correlated Subqueries
   6.3.5 Subqueries in {\tt FROM}\ Clauses
   6.3.6 SQL Join Expressions
   6.3.7 Natural Joins
   6.3.8 Outerjoins
   6.3.9 Exercises for Section 6.3
   6.4 Full-Relation Operations
   6.4.1 Eliminating Duplicates
   6.4.2 Duplicates in Unions, Intersections, and Differences
   6.4.3 Grouping and Aggregation in SQL
   6.4.4 Aggregation Operators
   6.4.5 Grouping
   6.4.6 Grouping, Aggregation, and Nulls
   6.4.7 {\tt HAVING} Clauses
   6.4.8 Exercises for Section 6.4
   6.5 Database Modifications
   6.5.1 Insertion
   6.5.2 Deletion
   6.5.3 Updates
   6.5.4 Exercises for Section 6.5
   6.6 Transactions in SQL
   6.6.1 Serializability
   6.6.2 Atomicity
   6.6.3 Transactions
   6.6.4 Read-Only Transactions
   6.6.5 Dirty Reads
   6.6.6 Other Isolation Levels
   6.6.7 Exercises for Section 6.6
   6.7 Summary of Chapter 6
   6.8 References for Chapter 6

7 Constraints and Triggers
   7.1 Keys and Foreign Keys
   7.1.1 Declaring Foreign-Key Constraints
   7.1.2 Maintaining Referential Integrity
   7.1.3 Deferred Checking of Constraints
   7.1.4 Exercises for Section 7.1
   7.2 Constraints on Attributes and Tuples
   7.2.1 Not-Null Constraints
   7.2.2 Attribute-Based {\tt CHECK} Constraints
   7.2.3 Tuple-Based {\tt CHECK} Constraints
   7.2.4 Comparison of Tuple- and Attribute-Based Constraints
   7.2.5 Exercises for Section 7.2
   7.3 Modification of Constraints
   7.3.1 Giving Names to Constraints
   7.3.2 Altering Constraints on Tables
   7.3.3 Exercises for Section 7.3
   7.4 Assertions
   7.4.1 Creating Assertions
   7.4.2 Using Assertions
   7.4.3 Exercises for Section 7.4
   7.5 Triggers
   7.5.1 Triggers in SQL
   7.5.2 The Options for Trigger Design
   7.5.3 Exercises for Section 7.5
   7.6 Summary of Chapter 7
   7.7 References for Chapter 7

8 Views and Indexes
   8.1 Virtual Views
   8.1.1 Declaring Views
   8.1.2 Querying Views
   8.1.3 Renaming Attributes
   8.1.4 Exercises for Section 8.1
   8.2 Modifying Views
   8.2.1 View Removal
   8.2.2 Updatable Views
   8.2.3 Instead-Of Triggers on Views
   8.2.4 Exercises for Section 8.2
   8.3 Indexes in SQL
   8.3.1 Motivation for Indexes
   8.3.2 Declaring Indexes
   8.3.3 Exercises for Section 8.3
   8.4 Selection of Indexes
   8.4.1 A Simple Cost Model
   8.4.2 Some Useful Indexes
   8.4.3 Calculating the Best Indexes to Create
   8.4.4 Automatic Selection of Indexes to Create
   8.4.5 Exercises for Section 8.4
   8.5 Materialized Views
   8.5.1 Maintaining a Materialized View
   8.5.2 Periodic Maintenance of Materialized Views
   8.5.3 Rewriting Queries to Use Materialized Views
   8.5.4 Automatic Creation of Materialized Views
   8.5.5 Exercises for Section 8.5
   8.6 Summary of Chapter 8
   8.7 References for Chapter 8

9 SQL in a Server Environment
   9.1 The Three-Tier Architecture
   9.1.1 The Web-Server Tier
   9.1.2 The Application Tier
   9.1.3 The Database Tier
   9.2 The SQL Environment
   9.2.1 Environments
   9.2.2 Schemas
   9.2.3 Catalogs
   9.2.4 Clients and Servers in the SQL Environment
   9.2.5 Connections
   9.2.6 Sessions
   9.2.7 Modules
   9.3 The SQL/Host-Language Interface
   9.3.1 The Impedance Mismatch Problem
   9.3.2 Connecting SQL to the Host Language
   9.3.3 The {\tt DECLARE} Section
   9.3.4 Using Shared Variables
   9.3.5 Single-Row Select Statements
   9.3.6 Cursors
   9.3.7 Modifications by Cursor
   9.3.8 Protecting Against Concurrent Updates
   9.3.9 Dynamic SQL
   9.3.10 Exercises for Section 9.3
   9.4 Stored Procedures
   9.4.1 Creating PSM Functions and Procedures
   9.4.2 Some Simple Statement Forms in PSM
   9.4.3 Branching Statements
   9.4.4 Queries in PSM
   9.4.5 Loops in PSM
   9.4.6 For-Loops
   9.4.7 Exceptions in PSM
   9.4.8 Using PSM Functions and Procedures
   9.4.9 Exercises for Section 9.4
   9.5 Using a Call-Level Interface
   9.5.1 Introduction to SQL/CLI
   9.5.2 Processing Statements
   9.5.3 Fetching Data From a Query Result
   9.5.4 Passing Parameters to Queries
   9.5.5 Exercises for Section 9.5
   9.6 JDBC
   9.6.1 Introduction to JDBC
   9.6.2 Creating Statements in JDBC
   9.6.3 Cursor Operations in JDBC
   9.6.4 Parameter Passing
   9.6.5 Exercises for Section 9.6
   9.7 PHP
   9.7.1 PHP Basics
   9.7.2 Arrays
   9.7.3 The PEAR DB Library
   9.7.4 Creating a Database Connection Using DB
   9.7.5 Executing SQL Statements
   9.7.6 Cursor Operations in PHP
   9.7.7 Dynamic SQL in PHP
   9.7.8 Exercises for Section 9.7
   9.8 Summary of Chapter 9
   9.9 References for Chapter 9

10 Advanced Topics in Relational Databases
   10.1 Security and User Authorization in SQL
   10.1.1 Privileges
   10.1.2 Creating Privileges
   10.1.3 The Privilege-Checking Process
   10.1.4 Granting Privileges
   10.1.5 Grant Diagrams
   10.1.6 Revoking Privileges
   10.1.7 Exercises for Section 10.1
   10.2 Recursion in SQL
   10.2.1 Defining Recursive Relations in SQL
   10.2.2 Problematic Expressions in Recursive SQL
   10.2.3 Exercises for Section 10.2
   10.3 The Object-Relational Model
   10.3.1 From Relations to Object-Relations
   10.3.2 Nested Relations
   10.3.3 References
   10.3.4 Object-Oriented Versus Object-Relational
   10.3.5 Exercises for Section 10.3
   10.4 User-Defined Types in SQL
   10.4.1 Defining Types in SQL
   10.4.2 Method Declarations in UDT's
   10.4.3 Method Definitions
   10.4.4 Declaring Relations with a UDT
   10.4.5 References
   10.4.6 Creating Object ID's for Tables
   10.4.7 Exercises for Section 10.4
   10.5 Operations on Object-Relational Data
   10.5.1 Following References
   10.5.2 Accessing Components of Tuples with a UDT
   10.5.3 Generator and Mutator Functions
   10.5.4 Ordering Relationships on UDT's
   10.5.5 Exercises for Section 10.5
   10.6 On-Line Analytic Processing
   10.6.1 OLAP and Data Warehouses
   10.6.2 OLAP Applications
   10.6.3 A Multidimensional View of OLAP Data
   10.6.4 Star Schemas
   10.6.5 Slicing and Dicing
   10.6.6 Exercises for Section 10.6
   10.7 Data Cubes
   10.7.1 The Cube Operator
   10.7.2 The Cube Operator in SQL
   10.7.3 Exercises for Section 10.7
   10.8 Summary of Chapter 10
   10.9 References for Chapter 10

PART III: Modeling and Programming for Semistructured Data
11 The Semistructured-Data Model
   11.1 Semistructured Data
   11.1.1 Motivation for the Semistructured-Data Model
   11.1.2 Semistructured Data Representation
   11.1.3 Information Integration Via Semistructured Data
   11.1.4 Exercises for Section 11.1
   11.2 XML
   11.2.1 Semantic Tags
   11.2.2 XML With and Without a Schema
   11.2.3 Well-Formed XML
   11.2.4 Attributes
   11.2.5 Attributes That Connect Elements
   11.2.6 Namespaces
   11.2.7 XML and Databases
   11.2.8 Exercises for Section 11.2
   11.3 Document Type Definitions
   11.3.1 The Form of a DTD
   11.3.2 Using a DTD
   11.3.3 Attribute Lists
   11.3.4 Identifiers and References
   11.3.5 Exercises for Section 11.3
   11.4 XML Schema
   11.4.1 The Form of an XML Schema
   11.4.2 Elements
   11.4.3 Complex Types
   11.4.4 Attributes
   11.4.5 Restricted Simple Types
   11.4.6 Keys in XML Schema
   11.4.7 Foreign Keys in XML Schema
   11.4.8 Exercises for Section 11.4
   11.5 Summary of Chapter 11
   11.6 References for Chapter 11

12 Programming Languages for XML
   12.1 XPath
   12.1.1 The XPath Data Model
   12.1.2 Document Nodes
   12.1.3 Path Expressions
   12.1.4 Relative Path Expressions
   12.1.5 Attributes in Path Expressions
   12.1.6 Axes
   12.1.7 Context of Expressions
   12.1.8 Wildcards
   12.1.9 Conditions in Path Expressions
   12.1.10 Exercises for Section 12.1
   12.2 XQuery
   12.2.1 XQuery Basics
   12.2.2 FLWR Expressions
   12.2.3 Replacement of Variables by Their Values
   12.2.4 Joins in XQuery
   12.2.5 XQuery Comparison Operators
   12.2.6 Elimination of Duplicates
   12.2.7 Quantification in XQuery
   12.2.8 Aggregations
   12.2.9 Branching in XQuery Expressions
   12.2.10 Ordering the Result of a Query
   12.2.11 Exercises for Section 12.2
   12.3 Extensible Stylesheet Language
   12.3.1 XSLT Basics
   12.3.2 Templates
   12.3.3 Obtaining Values From XML Data
   12.3.4 Recursive Use of Templates
   12.3.5 Iteration in XSLT
   12.3.6 Conditionals in XSLT
   12.3.7 Exercises for Section 12.3
   12.4 Summary of Chapter 12
   12.5 References for Chapter 12

PART IV: Database System Implementation
13 Secondary Storage Management
   13.1 The Memory Hierarchy
   13.1.1 The Memory Hierarchy
   13.1.2 Transfer of Data Between Levels
   13.1.3 Volatile and Nonvolatile Storage
   13.1.4 Virtual Memory
   13.1.5 Exercises for Section 13.1
   13.2 Disks
   13.2.1 Mechanics of Disks
   13.2.2 The Disk Controller
   13.2.3 Disk Access Characteristics
   13.2.4 Exercises for Section 13.2
   13.3 Accelerating Access to Secondary Storage
   13.3.1 The I/O Model of Computation
   13.3.2 Organizing Data by Cylinders
   13.3.3 Using Multiple Disks
   13.3.4 Mirroring Disks
   13.3.5 Disk Scheduling and the Elevator Algorithm
   13.3.6 Prefetching and Large-Scale Buffering
   13.3.7 Exercises for Section 13.3
   13.4 Disk Failures
   13.4.1 Intermittent Failures
   13.4.2 Checksums
   13.4.3 Stable Storage
   13.4.4 Error-Handling Capabilities of Stable Storage
   13.4.5 Recovery from Disk Crashes
   13.4.6 Mirroring as a Redundancy Technique
   13.4.7 Parity Blocks
   13.4.8 An Improvement: RAID 5
   13.4.9 Coping With Multiple Disk Crashes
   13.4.10 Exercises for Section 13.4
   13.5 Arranging Data on Disk
   13.5.1 Fixed-Length Records
   13.5.2 Packing Fixed-Length Records into Blocks
   13.5.3 Exercises for Section 13.5
   13.6 Representing Block and Record Addresses
   13.6.1 Addresses in Client-Server Systems
   13.6.2 Logical and Structured Addresses
   13.6.3 Pointer Swizzling
   13.6.4 Returning Blocks to Disk
   13.6.5 Pinned Records and Blocks
   13.6.6 Exercises for Section 13.6
   13.7 Variable-Length Data and Records
   13.7.1 Records With Variable-Length Fields
   13.7.2 Records With Repeating Fields
   13.7.3 Variable-Format Records
   13.7.4 Records That Do Not Fit in a Block
   13.7.5 BLOBs
   13.7.6 Column Stores
   13.7.7 Exercises for Section 13.7
   13.8 Record Modifications
   13.8.1 Insertion
   13.8.2 Deletion
   13.8.3 Update
   13.8.4 Exercises for Section 13.8
   13.9 Summary of Chapter 13
   13.10 References for Chapter 13

14 Index Structures
   14.1 Index-Structure Basics
   14.1.1 Sequential Files
   14.1.2 Dense Indexes
   14.1.3 Sparse Indexes
   14.1.4 Multiple Levels of Index
   14.1.5 Secondary Indexes
   14.1.6 Applications of Secondary Indexes
   14.1.7 Indirection in Secondary Indexes
   14.1.8 Document Retrieval and Inverted Indexes
   14.1.9 Exercises for Section 14.1
   14.2 B-Trees
   14.2.1 The Structure of B-trees
   14.2.2 Applications of B-trees
   14.2.3 Lookup in B-Trees
   14.2.4 Range Queries
   14.2.5 Insertion Into B-Trees
   14.2.6 Deletion From B-Trees
   14.2.7 Efficiency of B-Trees
   14.2.8 Exercises for Section 14.2
   14.3 Hash Tables
   14.3.1 Secondary-Storage Hash Tables
   14.3.2 Insertion Into a Hash Table
   14.3.3 Hash-Table Deletion
   14.3.4 Efficiency of Hash Table Indexes
   14.3.5 Extensible Hash Tables
   14.3.6 Insertion Into Extensible Hash Tables
   14.3.7 Linear Hash Tables
   14.3.8 Insertion Into Linear Hash Tables
   14.3.9 Exercises for Section 14.3
   14.4 Multidimensional Indexes
   14.4.1 Applications of Multidimensional Indexes
   14.4.2 Executing Range Queries Using Conventional Indexes
   14.4.3 Executing Nearest-Neighbor Queries Using Conventional Indexes
   14.4.4 Overview of Multidimensional Index Structures
   14.5 Hash Structures for Multidimensional Data
   14.5.1 Grid Files
   14.5.2 Lookup in a Grid File
   14.5.3 Insertion Into Grid Files
   14.5.4 Performance of Grid Files
   14.5.5 Partitioned Hash Functions
   14.5.6 Comparison of Grid Files and Partitioned Hashing
   14.5.7 Exercises for Section 14.5
   14.6 Tree Structures for Multidimensional Data
   14.6.1 Multiple-Key Indexes
   14.6.2 Performance of Multiple-Key Indexes
   14.6.3 $kd$-Trees
   14.6.4 Operations on $kd$-Trees
   14.6.5 Adapting $kd$-Trees to Secondary Storage
   14.6.6 Quad Trees
   14.6.7 R-Trees
   14.6.8 Operations on R-Trees
   14.6.9 Exercises for Section 14.6
   14.7 Bitmap Indexes
   14.7.1 Motivation for Bitmap Indexes
   14.7.2 Compressed Bitmaps
   14.7.3 Operating on Run-Length-Encoded Bit-Vectors
   14.7.4 Managing Bitmap Indexes
   14.7.5 Exercises for Section 14.7
   14.8 Summary of Chapter 14
   14.9 References for Chapter 14

15 Query Execution
   15.1 Introduction to Physical-Query-Plan Operators
   15.1.1 Scanning Tables
   15.1.2 Sorting While Scanning Tables
   15.1.3 The Computation Model for Physical Operators
   15.1.4 Parameters for Measuring Costs
   15.1.5 I/O Cost for Scan Operators
   15.1.6 Iterators for Implementation of Physical Operators
   15.2 One-Pass Algorithms
   15.2.1 One-Pass Algorithms for Tuple-at-a-Time Operations
   15.2.2 One-Pass Algorithms for Unary, Full-Relation Operations
   15.2.3 One-Pass Algorithms for Binary Operations
   15.2.4 Exercises for Section 15.2
   15.3 Nested-Loop Joins
   15.3.1 Tuple-Based Nested-Loop Join
   15.3.2 An Iterator for Tuple-Based Nested-Loop Join
   15.3.3 Block-Based Nested-Loop Join Algorithm
   15.3.4 Analysis of Nested-Loop Join
   15.3.5 Summary of Algorithms so Far
   15.3.6 Exercises for Section 15.3
   15.4 Two-Pass Algorithms Based on Sorting
   15.4.1 Two-Phase, Multiway Merge-Sort
   15.4.2 Duplicate Elimination Using Sorting
   15.4.3 Grouping and Aggregation Using Sorting
   15.4.4 A Sort-Based Union Algorithm
   15.4.5 Sort-Based Intersection and Difference
   15.4.6 A Simple Sort-Based Join Algorithm
   15.4.7 Analysis of Simple Sort-Join
   15.4.8 A More Efficient Sort-Based Join
   15.4.9 Summary of Sort-Based Algorithms
   15.4.10 Exercises for Section 15.4
   15.5 Two-Pass Algorithms Based on Hashing
   15.5.1 Partitioning Relations by Hashing
   15.5.2 A Hash-Based Algorithm for Duplicate Elimination
   15.5.3 Hash-Based Grouping and Aggregation
   15.5.4 Hash-Based Union, Intersection, and Difference
   15.5.5 The Hash-Join Algorithm
   15.5.6 Saving Some Disk I/O's
   15.5.7 Summary of Hash-Based Algorithms
   15.5.8 Exercises for Section 15.5
   15.6 Index-Based Algorithms
   15.6.1 Clustering and Nonclustering Indexes
   15.6.2 Index-Based Selection
   15.6.3 Joining by Using an Index
   15.6.4 Joins Using a Sorted Index
   15.6.5 Exercises for Section 15.6
   15.7 Buffer Management
   15.7.1 Buffer Management Architecture
   15.7.2 Buffer Management Strategies
   15.7.3 The Relationship Between Physical Operator Selection and Buffer Management
   15.7.4 Exercises for Section 15.7
   15.8 Algorithms Using More Than Two Passes
   15.8.1 Multipass Sort-Based Algorithms
   15.8.2 Performance of Multipass, Sort-Based Algorithms
   15.8.3 Multipass Hash-Based Algorithms
   15.8.4 Performance of Multipass Hash-Based Algorithms
   15.8.5 Exercises for Section 15.8
   15.9 Summary of Chapter 15
   15.10 References for Chapter 15

16 The Query Compiler
   16.1 Parsing and Preprocessing
   16.1.1 Syntax Analysis and Parse Trees
   16.1.2 A Grammar for a Simple Subset of SQL
   16.1.3 The Preprocessor
   16.1.4 Preprocessing Queries Involving Views
   16.1.5 Exercises for Section 16.1
   16.2 Algebraic Laws for Improving Query Plans
   16.2.1 Commutative and Associative Laws
   16.2.2 Laws Involving Selection
   16.2.3 Pushing Selections
   16.2.4 Laws Involving Projection
   16.2.5 Laws About Joins and Products
   16.2.6 Laws Involving Duplicate Elimination
   16.2.7 Laws Involving Grouping and Aggregation
   16.2.8 Exercises for Section 16.2
   16.3 From Parse Trees to Logical Query Plans
   16.3.1 Conversion to Relational Algebra
   16.3.2 Removing Subqueries From Conditions
   16.3.3 Improving the Logical Query Plan
   16.3.4 Grouping Associative/Commutative Operators
   16.3.5 Exercises for Section 16.3
   16.4 Estimating the Cost of Operations
   16.4.1 Estimating Sizes of Intermediate Relations
   16.4.2 Estimating the Size of a Projection
   16.4.3 Estimating the Size of a Selection
   16.4.4 Estimating the Size of a Join
   16.4.5 Natural Joins With Multiple Join Attributes
   16.4.6 Joins of Many Relations
   16.4.7 Estimating Sizes for Other Operations
   16.4.8 Exercises for Section 16.4
   16.5 Introduction to Cost-Based Plan Selection
   16.5.1 Obtaining Estimates for Size Parameters
   16.5.2 Computation of Statistics
   16.5.3 Heuristics for Reducing the Cost of Logical Query Plans
   16.5.4 Approaches to Enumerating Physical Plans
   16.5.5 Exercises for Section 16.5
   16.6 Choosing an Order for Joins
   16.6.1 Significance of Left and Right Join Arguments
   16.6.2 Join Trees
   16.6.3 Left-Deep Join Trees
   16.6.4 Dynamic Programming to Select a Join Order and Grouping
   16.6.5 Dynamic Programming With More Detailed Cost Functions
   16.6.6 A Greedy Algorithm for Selecting a Join Order
   16.6.7 Exercises for Section 16.6
   16.7 Completing the Physical-Query-Plan
   16.7.1 Choosing a Selection Method
   16.7.2 Choosing a Join Method
   16.7.3 Pipelining Versus Materialization
   16.7.4 Pipelining Unary Operations
   16.7.5 Pipelining Binary Operations
   16.7.6 Notation for Physical Query Plans
   16.7.7 Ordering of Physical Operations
   16.7.8 Exercises for Section 16.7
   16.8 Summary of Chapter 16
   16.9 References for Chapter 16

17 Coping With System Failures
   17.1 Issues and Models for Resilient Operation
   17.1.1 Failure Modes
   17.1.2 More About Transactions
   17.1.3 Correct Execution of Transactions
   17.1.4 The Primitive Operations of Transactions
   17.1.5 Exercises for Section 17.1
   17.2 Undo Logging
   17.2.1 Log Records
   17.2.2 The Undo-Logging Rules
   17.2.3 Recovery Using Undo Logging
   17.2.4 Checkpointing
   17.2.5 Nonquiescent Checkpointing
   17.2.6 Exercises for Section 17.2
   17.3 Redo Logging
   17.3.1 The Redo-Logging Rule
   17.3.2 Recovery With Redo Logging
   17.3.3 Checkpointing a Redo Log
   17.3.4 Recovery With a Checkpointed Redo Log
   17.3.5 Exercises for Section 17.3
   17.4 Undo/Redo Logging
   17.4.1 The Undo/Redo Rules
   17.4.2 Recovery With Undo/Redo Logging
   17.4.3 Checkpointing an Undo/Redo Log
   17.4.4 Exercises for Section 17.4
   17.5 Protecting Against Media Failures
   17.5.1 The Archive
   17.5.2 Nonquiescent Archiving
   17.5.3 Recovery Using an Archive and Log
   17.5.4 Exercises for Section 17.5
   17.6 Summary of Chapter 17
   17.7 References for Chapter 17

18 Concurrency Control
   18.1 Serial and Serializable Schedules
   18.1.1 Schedules
   18.1.2 Serial Schedules
   18.1.3 Serializable Schedules
   18.1.4 The Effect of Transaction Semantics
   18.1.5 A Notation for Transactions and Schedules
   18.1.6 Exercises for Section 18.1
   18.2 Conflict-Serializability
   18.2.1 Conflicts
   18.2.2 Precedence Graphs and a Test for Conflict-Serializability
   18.2.3 Why the Precedence-Graph Test Works
   18.2.4 Exercises for Section 18.2
   18.3 Enforcing Serializability by Locks
   18.3.1 Locks
   18.3.2 The Locking Scheduler
   18.3.3 Two-Phase Locking
   18.3.4 Why Two-Phase Locking Works
   18.3.5 Exercises for Section 18.3
   18.4 Locking Systems With Several Lock Modes
   18.4.1 Shared and Exclusive Locks
   18.4.2 Compatibility Matrices
   18.4.3 Upgrading Locks
   18.4.4 Update Locks
   18.4.5 Increment Locks
   18.4.6 Exercises for Section 18.4
   18.5 An Architecture for a Locking Scheduler
   18.5.1 A Scheduler That Inserts Lock Actions
   18.5.2 The Lock Table
   18.5.3 Exercises for Section 18.5
   18.6 Hierarchies of Database Elements
   18.6.1 Locks With Multiple Granularity
   18.6.2 Warning Locks
   18.6.3 Phantoms and Handling Insertions Correctly
   18.6.4 Exercises for Section 18.6
   18.7 The Tree Protocol
   18.7.1 Motivation for Tree-Based Locking
   18.7.2 Rules for Access to Tree-Structured Data
   18.7.3 Why the Tree Protocol Works
   18.7.4 Exercises for Section 18.7
   18.8 Concurrency Control by Timestamps
   18.8.1 Timestamps
   18.8.2 Physically Unrealizable Behaviors
   18.8.3 Problems With Dirty Data
   18.8.4 The Rules for Timestamp-Based Scheduling
   18.8.5 Multiversion Timestamps
   18.8.6 Timestamps Versus Locking
   18.8.7 Exercises for Section 18.8
   18.9 Concurrency Control by Validation
   18.9.1 Architecture of a Validation-Based Scheduler
   18.9.2 The Validation Rules
   18.9.3 Comparison of Three Concurrency-Control Mechanisms
   18.9.4 Exercises for Section 18.9
   18.10 Summary of Chapter 18
   18.11 References for Chapter 18

19 More About Transaction Management
   19.1 Serializability and Recoverability
   19.1.1 The Dirty-Data Problem
   19.1.2 Cascading Rollback
   19.1.3 Recoverable Schedules
   19.1.4 Schedules That Avoid Cascading Rollback
   19.1.5 Managing Rollbacks Using Locking
   19.1.6 Group Commit
   19.1.7 Logical Logging
   19.1.8 Recovery From Logical Logs
   19.1.9 Exercises for Section 19.1
   19.2 Deadlocks
   19.2.1 Deadlock Detection by Timeout
   19.2.2 The Waits-For Graph
   19.2.3 Deadlock Prevention by Ordering Elements
   19.2.4 Detecting Deadlocks by Timestamps
   19.2.5 Comparison of Deadlock-Management Methods
   19.2.6 Exercises for Section 19.2
   19.3 Long-Duration Transactions
   19.3.1 Problems of Long Transactions
   19.3.2 Sagas
   19.3.3 Compensating Transactions
   19.3.4 Why Compensating Transactions Work
   19.3.5 Exercises for Section 19.3
   19.4 Summary of Chapter 19
   19.5 References for Chapter 19

20 Parallel and Distributed Databases
   20.1 Parallel Algorithms on Relations
   20.1.1 Models of Parallelism
   20.1.2 Tuple-at-a-Time Operations in Parallel
   20.1.3 Parallel Algorithms for Full-Relation Operations
   20.1.4 Performance of Parallel Algorithms
   20.1.5 Exercises for Section 20.1
   20.2 The Map-Reduce Parallelism Framework
   20.2.1 The Storage Model
   20.2.2 The Map Function
   20.2.3 The Reduce Function
   20.2.4 Exercises for Section 20.2
   20.3 Distributed Databases
   20.3.1 Distribution of Data
   20.3.2 Distributed Transactions
   20.3.3 Data Replication
   20.3.4 Exercises for Section 20.3
   20.4 Distributed Query Processing
   20.4.1 The Distributed Join Problem
   20.4.2 Semijoin Reductions
   20.4.3 Joins of Many Relations
   20.4.4 Acyclic Hypergraphs
   20.4.5 Full Reducers for Acyclic Hypergraphs
   20.4.6 Why the Full-Reducer Algorithm Works
   20.4.7 Exercises for Section 20.4
   20.5 Distributed Commit
   20.5.1 Supporting Distributed Atomicity
   20.5.2 Two-Phase Commit
   20.5.3 Recovery of Distributed Transactions
   20.5.4 Exercises for Section 20.5
   20.6 Distributed Locking
   20.6.1 Centralized Lock Systems
   20.6.2 A Cost Model for Distributed Locking Algorithms
   20.6.3 Locking Replicated Elements
   20.6.4 Primary-Copy Locking
   20.6.5 Global Locks From Local Locks
   20.6.6 Exercises for Section 20.6
   20.7 Peer-to-Peer Distributed Search
   20.7.1 Peer-to-Peer Networks
   20.7.2 The Distributed-Hashing Problem
   20.7.3 Centralized Solutions for Distributed Hashing
   20.7.4 Chord Circles
   20.7.5 Links in Chord Circles
   20.7.6 Search Using Finger Tables
   20.7.7 Adding New Nodes
   20.7.8 When a Peer Leaves the Network
   20.7.9 When a Peer Fails
   20.7.10 Exercises for Section 20.7
   20.8 Summary of Chapter 20
   20.9 References for Chapter 20

PART V: Other Issues in Management of Massive Data
21 Information Integration
   21.1 Introduction to Information Integration
   21.1.1 Why Information Integration?
   21.1.2 The Heterogeneity Problem
   21.2 Modes of Information Integration
   21.2.1 Federated Database Systems
   21.2.2 Data Warehouses
   21.2.3 Mediators
   21.2.4 Exercises for Section 21.2
   21.3 Wrappers in Mediator-Based Systems
   21.3.1 Templates for Query Patterns
   21.3.2 Wrapper Generators
   21.3.3 Filters
   21.3.4 Other Operations at the Wrapper
   21.3.5 Exercises for Section 21.3
   21.4 Capability-Based Optimization
   21.4.1 The Problem of Limited Source Capabilities
   21.4.2 A Notation for Describing Source Capabilities
   21.4.3 Capability-Based Query-Plan Selection
   21.4.4 Adding Cost-Based Optimization
   21.4.5 Exercises for Section 21.4
   21.5 Optimizing Mediator Queries
   21.5.1 Simplified Adornment Notation
   21.5.2 Obtaining Answers for Subgoals
   21.5.3 The Chain Algorithm
   21.5.4 Incorporating Union Views at the Mediator
   21.5.5 Exercises for Section 21.5
   21.6 Local-as-View Mediators
   21.6.1 Motivation for LAV Mediators
   21.6.2 Terminology for LAV Mediation
   21.6.3 Expanding Solutions
   21.6.4 Containment of Conjunctive Queries
   21.6.5 Why the Containment-Mapping Test Works
   21.6.6 Finding Solutions to a Mediator Query
   21.6.7 Why the LMSS Theorem Holds
   21.6.8 Exercises for Section 21.6
   21.7 Entity Resolution
   21.7.1 Deciding Whether Records Represent a Common Entity
   21.7.2 Merging Similar Records
   21.7.3 Useful Properties of Similarity and Merge Functions
   21.7.4 The R-Swoosh Algorithm for ICAR Records
   21.7.5 Why R-Swoosh Works
   21.7.6 Other Approaches to Entity Resolution
   21.7.7 Exercises for Section 21.7
   21.8 Summary of Chapter 21
   21.9 References for Chapter 21

22 Data Mining
   22.1 Frequent-Itemset Mining
   22.1.1 The Market-Basket Model
   22.1.2 Basic Definitions
   22.1.3 Association Rules
   22.1.4 The Computation Model for Frequent Itemsets
   22.1.5 Exercises for Section 22.1
   22.2 Algorithms for Finding Frequent Itemsets
   22.2.1 The Distribution of Frequent Itemsets
   22.2.2 The Naive Algorithm for Finding Frequent Itemsets
   22.2.3 The A-Priori Algorithm
   22.2.4 Implementation of the A-Priori Algorithm
   22.2.5 Making Better Use of Main Memory
   22.2.6 When to Use the PCY Algorithm
   22.2.7 The Multistage Algorithm
   22.2.8 Exercises for Section 22.2
   22.3 Finding Similar Items
   22.3.1 The Jaccard Measure of Similarity
   22.3.2 Applications of Jaccard Similarity
   22.3.3 Minhashing
   22.3.4 Minhashing and Jaccard Distance
   22.3.5 Why Minhashing Works
   22.3.6 Implementing Minhashing
   22.3.7 Exercises for Section 22.3
   22.4 Locality-Sensitive Hashing
   22.4.1 Entity Resolution as an Example of LSH
   22.4.2 Locality-Sensitive Hashing of Signatures
   22.4.3 Combining Minhashing and Locality-Sensitive Hashing
   22.4.4 Exercises for Section 22.4
   22.5 Clustering of Large-Scale Data
   22.5.1 Applications of Clustering
   22.5.2 Distance Measures
   22.5.3 Agglomerative Clustering
   22.5.4 $k$-Means Algorithms
   22.5.5 $k$-Means for Large-Scale Data
   22.5.6 Processing a Memory Load of Points
   22.5.7 Exercises for Section 22.5
   22.6 Summary of Chapter 22
   22.7 References for Chapter 22

23 Database Systems and the Internet
   23.1 The Architecture of a Search Engine
   23.1.1 Components of a Search Engine
   23.1.2 Web Crawlers
   23.1.3 Query Processing in Search Engines
   23.1.4 Ranking Pages
   23.2 PageRank for Identifying Important Pages
   23.2.1 The Intuition Behind PageRank
   23.2.2 Recursive Formulation of PageRank\nobreakspace {}--- First Try
   23.2.3 Spider Traps and Dead Ends
   23.2.4 PageRank Accounting for Spider Traps and Dead Ends
   23.2.5 Exercises for Section 23.2
   23.3 Topic-Specific PageRank
   23.3.1 Teleport Sets
   23.3.2 Calculating A Topic-Specific PageRank
   23.3.3 Link Spam
   23.3.4 Topic-Specific PageRank and Link Spam
   23.3.5 Exercises for Section 23.3
   23.4 Data Streams
   23.4.1 Data-Stream-Management Systems
   23.4.2 Stream Applications
   23.4.3 A Data-Stream Data Model
   23.4.4 Converting Streams Into Relations
   23.4.5 Converting Relations Into Streams
   23.4.6 Exercises for Section 23.4
   23.5 Data Mining of Streams
   23.5.1 Motivation
   23.5.2 Counting Bits
   23.5.3 Counting the Number of Distinct Elements
   23.5.4 Exercises for Section 23.5
   23.6 Summary of Chapter 23
   23.7 References for Chapter 23

目錄大綱(中文翻譯)

1 The Worlds of Database Systems

   1.1 The Evolution of Database Systems

   1.1.1 Early Database Management Systems

   1.1.2 Relational Database Systems

   1.1.3 Smaller and Smaller Systems

   1.1.4 Bigger and Bigger Systems

   1.1.5 Information Integration

   1.2 Overview of a Database Management System

   1.2.1 Data-Definition Language Commands

   1.2.2 Overview of Query Processing

   1.2.3 Storage and Buffer Management

   1.2.4 Transaction Processing

   1.2.5 The Query Processor

   1.3 Outline of Database-System Studies

   1.4 References for Chapter 1

PART I: Relational Database Modeling

2 The Relational Model of Data

   2.1 An Overview of Data Models

   2.1.1 What is a Data Model?

   2.1.2 Important Data Models

   2.1.3 The Relational Model in Brief

   2.1.4 The Semistructured Model in Brief

   2.1.5 Other Data Model

   2.1.6 Comparison of Modeling Approaches

   2.2 Basics of the Relational Model

   2.2.1 Attributes

   2.2.2 Schemas

   2.2.3 Tuples

   2.2.4 Domains

   2.2.5 Equivalent Representations of a Relation

   2.2.6 Relation Instances

   2.2.7 Keys of Relations

   2.2.8 An Example Database Schema

   2.2.9 Exercises for Section 2.2

   2.3 Defining a Relation Schema in SQL

   2.3.1 Relations in SQL

   2.3.2 Data Types

   2.3.3 Simple Table Declarations

   2.3.4 Modifying Relation Schemas

   2.3.5 Default Values

   2.3.6 Declaring Keys

   2.3.7 Exercises for Section 2.3

   2.4 An Algebraic Query Language

   2.4.1 Why Do We Need a Special Query Language?

   2.4.2 What is an Algebra?

   2.4.3 Overview of Relational Algebra

   2.4.4 Set Operations on Relations

   2.4.5 Projection

   2.4.6 Selection

   2.4.7 Cartesian Product

   2.4.8 Natural Joins

   2.4.9 Theta-Joins

   2.4.10 Combining Operations to Form Queries

   2.4.11 Naming and Renaming

   2.4.12 Relationships Among Operations

   2.4.13 A Linear Notation for Algebraic Expressions

   2.4.14 Exercises for Section 2.4

   2.5 Constraints on Relations

   2.5.1 Relational Algebra as a Constraint Language

   2.5.2 Referential Integrity Constraints

   2.5.3 Key Constraints

   2.5.4 Additional Constraint Examples

   2.5.5 Exercises for Section 2.5

   2.6 Summary of Chapter 2

   2.7 References for Chapter 2

3 Design Theory for Relational Databases

   3.1 Functional Dependencies

   3.1.1 Definition of Functional Dependency

   3.1.2 Keys of Relations

   3.1.3 Superkeys

   3.1.4 Exercises for Section 3.1

   3.2 Rules About Functional Dependencies

   3.2.1 Reasoning About Functional Dependencies

   3.2.2 The Splitting/Combining Rule

   3.2.3 Trivial Functional Dependencies

   3.2.4 Computing the Closure of Attributes

   3.2.5 Why the Closure Algorithm Works

   3.2.6 The Transitive Rule

   3.2.7 Closing Sets of Functional Dependencies

   3.2.8 Projecting Functional Dependencies

   3.2.9 Exercises for Section 3.2

   3.3 Design of Relational Database Schemas

   3.3.1 Anomalies

   3.3.2 Decomposing Relations

   3.3.3 Boyce-Codd Normal Form

   3.3.4 Decomposition into BCN

   3.3.5 Exercises for Section 3.

   3.4 Decomposition: The Good, Bad, and Ugly

   3.4.1 Recovering Information from a Decomposition

   3.4.2 The Chase Test for Lossless Join

   3.4.3 Why the Chase Works

   3.4.4 Dependency Preservation

   3.4.5 Exercises for Section 3.4

   3.5 Third Normal Form

   3.5.1 Definition of Third Normal Form

   3.5.2 The Synthesis Algorithm for 3NF Schemas

   3.5.3 Why the 3NF Synthesis Algorithm Work

   3.5.4 Exercises for Section 3.5

   3.6 Multivalued Dependencies

   3.6.1 Attribute Independence and Its Consequent Redundancy

   3.6.2 Definition of Multivalued Dependencies

   3.6.3 Reasoning About Multivalued Dependencies

   3.6.4 Fourth Normal Form

   3.6.5 Decomposition into Fourth Normal Form

   3.6.6 Relationships Among Normal Forms

   3.6.7 Exercises for Section 3.6

   3.7 An Algorithm for Discovering MVD's

   3.7.1 The Closure and the Chase

   3.7.2 Extending the Chase to MVD's

   3.7.3 Why the Chase Works for MVD's

   3.7.4 Projecting MVD's

   3.7.5 Exercises for Section 3.7

   3.8 Summary of Chapter 3

   3.9 References for Chapter 3

4 High-Level Database Models

   4.1 The Entity/Relationship Model

   4.1.1 Entity Sets

   4.1.2 Attributes

   4.1.3 Relationships

   4.1.4 Entity-Relationship Diagrams

   4.1.5 Instances of an E/R Diagram

   4.1.6 Multiplicity of Binary E/R Relationships

   4.1.7 Multiway Relationships

   4.1.8 Roles in Relationships

   4.1.9 Attributes on Relationships

   4.1.10 Converting Multiway Relationships to Binary

   4.1.11 Subclasses in the E/R Model

   4.1.12 Exercises for Section 4.1

   4.2 Design Principles

   4.2.1 Faithfulness

   4.2.2 Avoiding Redundancy

   4.2.3 Simplicity Counts

   4.2.4 Choosing the Right Relationships

   4.2.5 Picking the Right Kind of Element

   4.2.6 Exercises for Section 4.2

   4.3 Constraints in the E/R Model

   4.3.1 Keys in the E/R Model

   4.3.2 Representing Keys in the E/R Model

   4.3.3 Referential Integrity

   4.3.4 Degree Constraints

   4.3.5 Exercises for Section 4.3

   4.4 Weak Entity Sets

   4.4.1 Causes of Weak Entity Sets

   4.4.2 Requirements for Weak Entity Sets

   4.4.3 Weak Entity Set Notation

   4.4.4 Exercises for Section 4.4

   4.5 From E/R Diagrams to Relational Designs

   4.5.1 From Entity Sets to Relations

   4.5.2 From E/R Relationships to Relations

   4.5.3 Combining Relations

   4.5.4 Handling Weak Entity Sets

   4.5.5 Exercises for Section 4.5

   4.6 Converting Subclass Structures to Relations

   4.6.1 E/R-Style Conversion

   4.6.2 An Object-Oriented Approach

   4.6.3 Using Null Values to Combine Relations

   4.6.4 Comparison of Approaches

   4.6.5 Exercises for Section 4.6

   4.7 Unified Modeling Language

   4.7.1 UML Classes

   4.7.2 Keys for UML classes

   4.7.3 Associations

   4.7.4 Self-Associations

   4.7.5 Association Classes

   4.7.6 Subclasses in UML

   4.7.7 Aggregations and Compositions

   4.7.8 Exercises for Section 4.7

   4.8 From UML Diagrams to Relations

   4.8.1 UML-to-Relations Basics

   4.8.2 From UML Subclasses to Relations

   4.8.3 From Aggregations and Compositions to Relations

   4.8.4 The UML Analog of Weak Entity Sets

   4.8.5 Exercises for Section 4.8

   4.9 Object Definition Language

   4.9.1 Class Declarations

   4.9.2 Attributes in ODL

   4.9.3 Relationships in ODL

   4.9.4 Inverse Relationships

   4.9.5 Multiplicity of Relationships

   4.9.6 Types in ODL

   4.9.7 Subclasses in ODL

   4.9.8 Declaring Keys in ODL

   4.9.9 Exercises for Section 4.9

   4.10 From ODL Designs to Relational Designs

   4.10.1 From ODL Classes to Relations

   4.10.2 Complex Attributes in Classes

   4.10.3 Representing Set-Valued Attributes

   4.10.4 Representing Other Type Constructors

   4.10.5 Representing ODL Relationships

   4.10.6 Exercises for Section 4.10

   4.11 Summary of Chapter 4

   4.12 References for Chapter 4

PART II: Relational Database Programming

5 Algebraic and Logical Query Languages

   5.1 Relational Operations on Bags

   5.1.1 Why Bags?

   5.1.2 Union, Intersection, and Difference of Bags

   5.1.3 Projection of Bags

   5.1.4 Selection on Bags

   5.1.5 Product of Bags

   5.1.6 Joins of Bags

   5.1.7 Exercises for Section 5.1

   5.2 Extended Operators of Relational Algebra

   5.2.1 Duplicate Elimination

   5.2.2 Aggregation Operators

   5.2.3 Grouping

   5.2.4 The Grouping Operator

   5.2.5 Extending the Projection Operator

   5.2.6 The Sorting Operator

   5.2.7 Outerjoins

   5.2.8 Exercises for Section 5.2

   5.3 A Logic for Relations

   5.3.1 Predicates and Atoms

   5.3.2 Arithmetic Atoms

   5.3.3 Datalog Rules and Queries

   5.3.4 Meaning of Datalog Rules

   5.3.5 Extensional and Intensional Predicates

   5.3.6 Datalog Rules Applied to Bags

   5.3.7 Exercises for Section 5.3

   5.4 Relational Algebra and Datalog

   5.4.1 Boolean Operations

   5.4.2 Projection

   5.4.3 Selection

   5.4.4 Product

   5.4.5 Joins

   5.4.6 Simulating Multiple Operations with Datalog

   5.4.7 Comparison Between Datalog and Relational Algebra

   5.4.8 Exercises for Section 5.4

   5.5 Summary of Chapter 5

   5.6 References for Chapter 5

6 The Database Language SQL

   6.1 Simple Queries in SQL

   6.1.1 Projection in SQL

   6.1.2 Selection in SQL

   6.1.3 Comparison of Strings

   6.1.4 Pattern Matching in SQL

   6.1.5 Dates and Times

   6.1.6 Null Values and Comparisons Involving {\tt NULL}

   6.1.7 The Truth-Value {\tt UNKNOWN}

   6.1.8 Ordering the Output

   6.1.9 Exercises for Section 6.1

   6.2 Queries Involving More Than One Relation

   6.2.1 Products and Joins in SQL

   6.2.2 Disambiguating Attributes

   6.2.3 Tuple Variables

   6.2.4 Interpreting Multirelation Queries

   6.2.5 Union, Intersection, and Difference of Queries

   6.2.6 Exercises for Section 6.2

   6.3 Subqueries

   6.3.1 Subqueries that Produce Scalar Values

   6.3.2 Conditions Involving Relations

   6.3.3 Conditions Involving Tuples

   6.3.4 Correlated Subqueries

   6.3.5 Subqueries in {\tt FROM}\ Clauses

   6.3.6 SQL Join Expressions

   6.3.7 Natural Joins

   6.3.8 Outerjoins

   6.3.9 Exercises for Section 6.3

   6.4 Full-Relation Operations

   6.4.1 Eliminating Duplicates

   6.4.2 Duplicates in Unions, Intersections, and Differences

   6.4.3 Grouping and Aggregation in SQL

   6.4.4 Aggregation Operators

   6.4.5 Grouping

   6.4.6 Grouping, Aggregation, and Nulls

   6.4.7 {\tt HAVING} Clauses

   6.4.8 Exercises for Section 6.4

   6.5 Database Modifications

   6.5.1 Insertion

   6.5.2 Deletion

   6.5.3 Updates

   6.5.4 Exercises for Section 6.5

   6.6 Transactions in SQL

   6.6.1 Serializability

   6.6.2 Atomicity

   6.6.3 Transactions

   6.6.4 Read-Only Transactions

   6.6.5 Dirty Reads

   6.6.6 Other Isolation Levels

   6.6.7 Exercises for Section 6.6

   6.7 Summary of Chapter 6

   6.8 References for Chapter 6

7 Constraints and Triggers

   7.1 Keys and Foreign Keys

   7.1.1 Declaring Foreign-Key Constraints

   7.1.2 Maintaining Referential Integrity

   7.1.3 Deferred Checking of Constraints

   7.1.4 Exercises for Section 7.1

   7.2 Constraints on Attributes and Tuples

   7.2.1 Not-Null Constraints

   7.2.2 Attribute-Based {\tt CHECK} Constraints

   7.2.3 Tuple-Based {\tt CHECK} Constraints

   7.2.4 Comparison of Tuple- and Attribute-Based Constraints

   7.2.5 Exercises for Section 7.2

   7.3 Modification of Constraints

   7.3.1 Giving Names to Constraints

   7.3.2 Altering Constraints on Tables

   7.3.3 Exercises for Section 7.3

   7.4 Assertions

   7.4.1 Creating Assertions

   7.4.2 Using Assertions

   7.4.3 Exercises for Section 7.4

   7.5 Triggers

   7.5.1 Triggers in SQL

   7.5.2 The Options for Trigger Design

   7.5.3 Exercises for Section 7.5

   7.6 Summary of Chapter 7

   7.7 References for Chapter 7

8 Views and Indexes

   8.1 Virtual Views

   8.1.1 Declaring Views

   8.1.2 Querying Views

   8.1.3 Renaming Attributes

   8.1.4 Exercises for Section 8.1

   8.2 Modifying Views

   8.2.1 View Removal

   8.2.2 Updatable Views

   8.2.3 Instead-Of Triggers on Views

   8.2.4 Exercises for Section 8.2

   8.3 Indexes in SQL

   8.3.1 Motivation for Indexes

   8.3.2 Declaring Indexes

   8.3.3 Exercises for Section 8.3

   8.4 Selection of Indexes

   8.4.1 A Simple Cost Model

   8.4.2 Some Useful Indexes

   8.4.3 Calculating the Best Indexes to Create

   8.4.4 Automatic Selection of Indexes to Create

   8.4.5 Exercises for Section 8.4

   8.5 Materialized Views

   8.5.1 Maintaining a Materialized View

   8.5.2 Periodic Maintenance of Materialized Views

   8.5.3 Rewriting Queries to Use Materialized Views

   8.5.4 Automatic Creation of Materialized Views

   8.5.5 Exercises for Section 8.5

   8.6 Summary of Chapter 8

   8.7 References for Chapter 8

9 SQL in a Server Environment

   9.1 The Three-Tier Architecture

   9.1.1 The Web-Server Tier

   9.1.2 The Application Tier

   9.1.3 The Database Tier

   9.2 The SQL Environment

   9.2.1 Environments

   9.2.2 Schemas

   9.2.3 Catalogs

   9.2.4 Clients and Servers in the SQL Environment

   9.2.5 Connections

   9.2.6 Sessions

   9.2.7 Modules

   9.3 The SQL/Host-Language Interface

   9.3.1 The Impedance Mismatch Problem

   9.3.2 Connecting SQL to the Host Language

   9.3.3 The {\tt DECLARE} Section

   9.3.4 Using Shared Variables

   9.3.5 Single-Row Select Statements

   9.3.6 Cursors

   9.3.7 Modifications by Cursor

   9.3.8 Protecting Against Concurrent Updates

   9.3.9 Dynamic SQL

   9.3.10 Exercises for Section 9.3

   9.4 Stored Procedures

   9.4.1 Creating PSM Functions and Procedures

   9.4.2 Some Simple Statement Forms in PSM

   9.4.3 Branching Statements

   9.4.4 Queries in PSM

   9.4.5 Loops in PSM

   9.4.6 For-Loops

   9.4.7 Exceptions in PSM

   9.4.8 Using PSM Functions and Procedures

   9.4.9 Exercises for Section 9.4

   9.5 Using a Call-Level Interface

   9.5.1 Introduction to SQL/CLI

   9.5.2 Processing Statements

   9.5.3 Fetching Data From a Query Result

   9.5.4 Passing Parameters to Queries

   9.5.5 Exercises for Section 9.5

   9.6 JDBC

   9.6.1 Introduction to JDBC

   9.6.2 Creating Statements in JDBC

   9.6.3 Cursor Operations in JDBC

   9.6.4 Parameter Passing

   9.6.5 Exercises for Section 9.6

   9.7 PHP

   9.7.1 PHP Basics

   9.7.2 Arrays

   9.7.3 The PEAR DB Library

   9.7.4 Creating a Database Connection Using DB

   9.7.5 Executing SQL Statements

   9.7.6 Cursor Operations in PHP

   9.7.7 Dynamic SQL in PHP

   9.7.8 Exercises for Section 9.7

   9.8 Summary of Chapter 9

   9.9 References for Chapter 9

10 Advanced Topics in Relational Databases

   10.1 Security and User Authorization in SQL

   10.1.1 Privileges

   10.1.2 Creating Privileges

   10.1.3 The Privilege-Checking Process

   10.1.4 Granting Privileges

   10.1.5 Grant Diagrams

   10.1.6 Revoking Privileges

   10.1.7 Exercises for Section 10.1

   10.2 Recursion in SQL

   10.2.1 Defining Recursive Relations in SQL

   10.2.2 Problematic Expressions in Recursive SQL

   10.2.3 Exercises for Section 10.2

   10.3 The Object-Relational Model

   10.3.1 From Relations to Object-Relations

   10.3.2 Nested Relations

   10.3.3 References

   10.3.4 Object-Oriented Versus Object-Relational

   10.3.5 Exercises for Section 10.3

   10.4 User-Defined Types in SQL

   10.4.1 Defining Types in SQL

   10.4.2 Method Declarations in UDT's

   10.4.3 Method Definitions

   10.4.4 Declaring Relations with a UDT

   10.4.5 References

   10.4.6 Creating Object ID's for Tables

   10.4.7 Exercises for Section 10.4

   10.5 Operations on Object-Relational Data

   10.5.1 Following References

   10.5.2 Accessing Components of Tuples with a UDT

   10.5.3 Generator and Mutator Functions

   10.5.4 Ordering Relationships on UDT's

   10.5.5 Exercises for Section 10.5

   10.6 On-Line Analytic Processing

   10.6.1 OLAP and Data Warehouses

   10.6.2 OLAP Applications

   10.6.3 A Multidimensional View of OLAP Data

   10.6.4 Star Schemas

   10.6.5 Slicing and Dicing

   10.6.6 Exercises for Section 10.6

   10.7 Data Cubes

   10.7.1 The Cube Operator

   10.7.2 The Cube Operator in SQL

   10.7.3 Exercises for Section 10.7

   10.8 Summary of Chapter 10

   10.9 References for Chapter 10

PART III: Modeling and Programming for Semistructured Data

11 The Semistructured-Data Model

   11.1 Semistructured Data

   11.1.1 Motivation for the Semistructured-Data Model

   11.1.2 Semistructured Data Representation

   11.1.3 Information Integration Via Semistructured Data

   11.1.4 Exercises for Section 11.1

   11.2 XML

   11.2.1 Semantic Tags

   11.2.2 XML With and Without a Schema

   11.2.3 Well-Formed XML

   11.2.4 Attributes

   11.2.5 Attributes That Connect Elements

   11.2.6 Namespaces

   11.2.7 XML and Databases

   11.2.8 Exercises for Section 11.2

   11.3 Document Type Definitions

   11.3.1 The Form of a DTD

   11.3.2 Using a DTD

   11.3.3 Attribute Lists

   11.3.4 Identifiers and References

   11.3.5 Exercises for Section 11.3

   11.4 XML Schema

   11.4.1 The Form of an XML Schema

   11.4.2 Elements

   11.4.3 Complex Types

   11.4.4 Attributes

   11.4.5 Restricted Simple Types

   11.4.6 Keys in XML Schema

   11.4.7 Foreign Keys in XML Schema

   11.4.8 Exercises for Section 11.4

   11.5 Summary of Chapter 11

   11.6 References for Chapter 11

12 Programming Languages for XML

   12.1 XPath

   12.1.1 The XPath Data Model

   12.1.2 Document Nodes

   12.1.3 Path Expressions

   12.1.4 Relative Path Expressions

   12.1.5 Attributes in Path Expressions

   12.1.6 Axes

   12.1.7 Context of Expressions

   12.1.8 Wildcards

   12.1.9 Conditions in Path Expressions

   12.1.10 Exercises for Section 12.1

   12.2 XQuery

   12.2.1 XQuery Basics

   12.2.2 FLWR Expressions

   12.2.3 Replacement of Variables by Their Values

   12.2.4 Joins in XQuery

   12.2.5 XQuery Comparison Operators

   12.2.6 Elimination of Duplicates

   12.2.7 Quantification in XQuery

   12.2.8 Aggregations

   12.2.9 Branching in XQuery Expressions

   12.2.10 Ordering the Result of a Query

   12.2.11 Exercises for Section 12.2

   12.3 Extensible Stylesheet Language

   12.3.1 XSLT Basics

   12.3.2 Templates

   12.3.3 Obtaining Values From XML Data

   12.3.4 Recursive Use of Templates

   12.3.5 Iteration in XSLT

   12.3.6 Conditionals in XSLT

   12.3.7 Exercises for Section 12.3

   12.4 Summary of Chapter 12

   12.5 References for Chapter 12

PART IV: Database System Implementation

13 Secondary Storage Management

   13.1 The Memory Hierarchy

   13.1.1 The Memory Hierarchy

   13.1.2 Transfer of Data Between Levels

   13.1.3 Volatile and Nonvolatile Storage

   13.1.4 Virtual Memory

   13.1.5 Exercises for Section 13.1

   13.2 Disks

   13.2.1 Mechanics of Disks

   13.2.2 The Disk Controller

   13.2.3 Disk Access Characteristics

   13.2.4 Exercises for Section 13.2

   13.3 Accelerating Access to Secondary Storage

   13.3.1 The I/O Model of Computation

   13.3.2 Organizing Data by Cylinders

   13.3.3 Using Multiple Disks

   13.3.4 Mirroring Disks

   13.3.5 Disk Scheduling and the Elevator Algorithm

   13.3.6 Prefetching and Large-Scale Buffering

   13.3.7 Exercises for Section 13.3

   13.4 Disk Failures

   13.4.1 Intermittent Failures

   13.4.2 Checksums

   13.4.3 Stable Storage

   13.4.4 Error-Handling Capabilities of Stable Storage

   13.4.5 Recovery from Disk Crashes

   13.4.6 Mirroring as a Redundancy Technique

   13.4.7 Parity Blocks

   13.4.8 An Improvement: RAID 5

   13.4.9 Coping With Multiple Disk Crashes

   13.4.10 Exercises for Section 13.4

   13.5 Arranging Data on Disk

   13.5.1 Fixed-Length Records

   13.5.2 Packing Fixed-Length Records into Blocks

   13.5.3 Exercises for Section 13.5

   13.6 Representing Block and Record Addresses

   13.6.1 Addresses in Client-Server Systems

   13.6.2 Logical and Structured Addresses

   13.6.3 Pointer Swizzling

   13.6.4 Returning Blocks to Disk

   13.6.5 Pinned Records and Blocks

   13.6.6 Exercises for Section 13.6

   13.7 Variable-Length Data and Records

   13.7.1 Records With Variable-Length Fields

   13.7.2 Records With Repeating Fields

   13.7.3 Variable-Format Records

   13.7.4 Records That Do Not Fit in a Block

   13.7.5 BLOBs

   13.7.6 Column Stores

   13.7.7 Exercises for Section 13.7

   13.8 Record Modifications

   13.8.1 Insertion

   13.8.2 Deletion

   13.8.3 Update

   13.8.4 Exercises for Section 13.8

   13.9 Summary of Chapter 13

   13.10 References for Chapter 13

14 Index Structures

   14.1 Index-Structure Basics

   14.1.1 Sequential Files

   14.1.2 Dense Indexes

   14.1.3 Sparse Indexes

   14.1.4 Multiple Levels of Index

   14.1.5 Secondary Indexes

   14.1.6 Applications of Secondary Indexes

   14.1.7 Indirection in Secondary Indexes

   14.1.8 Document Retrieval and Inverted Indexes

   14.1.9 Exercises for Section 14.1

   14.2 B-Trees

   14.2.1 The Structure of B-trees

   14.2.2 Applications of B-trees

   14.2.3 Lookup in B-Trees

   14.2.4 Range Queries

   14.2.5 Insertion Into B-Trees

   14.2.6 Deletion From B-Trees

   14.2.7 Efficiency of B-Trees

   14.2.8 Exercises for Section 14.2

   14.3 Hash Tables

   14.3.1 Secondary-Storage Hash Tables

   14.3.2 Insertion Into a Hash Table

   14.3.3 Hash-Table Deletion

   14.3.4 Efficiency of Hash Table Indexes

   14.3.5 Extensible Hash Tables

   14.3.6 Insertion Into Extensible Hash Tables

   14.3.7 Linear Hash Tables

   14.3.8 Insertion Into Linear Hash Tables

   14.3.9 Exercises for Section 14.3

   14.4 Multidimensional Indexes

   14.4.1 Applications of Multidimensional Indexes

   14.4.2 Executing Range Queries Using Conventional Indexes

   14.4.3 Executing Nearest-Neighbor Queries Using Conventional Indexes

   14.4.4 Overview of Multidimensional Index Structures

   14.5 Hash Structures for Multidimensional Data

   14.5.1 Grid Files

   14.5.2 Lookup in a Grid File

   14.5.3 Insertion Into Grid Files

   14.5.4 Performance of Grid Files

   14.5.5 Partitioned Hash Functions

   14.5.6 Comparison of Grid Files and Partitioned Hashing

   14.5.7 Exercises for Section 14.5

   14.6 Tree Structures for Multidimensional Data

   14.6.1 Multiple-Key Indexes

   14.6.2 Performance of Multiple-Key Indexes

   14.6.3 $kd$-Trees

   14.6.4 Operations on $kd$-Trees

   14.6.5 Adapting $kd$-Trees to Secondary Storage

   14.6.6 Quad Trees

   14.6.7 R-Trees

   14.6.8 Operations on R-Trees

   14.6.9 Exercises for Section 14.6

   14.7 Bitmap Indexes

   14.7.1 Motivation for Bitmap Indexes

   14.7.2 Compressed Bitmaps

   14.7.3 Operating on Run-Length-Encoded Bit-Vectors

   14.7.4 Managing Bitmap Indexes

   14.7.5 Exercises for Section 14.7

   14.8 Summary of Chapter 14

   14.9 References for Chapter 14

15 Query Execution

   15.1 Introduction to Physical-Query-Plan Operators

   15.1.1 Scanning Tables

   15.1.2 Sorting While Scanning Tables

   15.1.3 The Computation Model for Physical Operators

   15.1.4 Parameters for Measuring Costs

   15.1.5 I/O Cost for Scan Operators

   15.1.6 Iterators for Implementation of Physical Operators

   15.2 One-Pass Algorithms

   15.2.1 One-Pass Algorithms for Tuple-at-a-Time Operations

   15.2.2 One-Pass Algorithms for Unary, Full-Relation Operations

   15.2.3 One-Pass Algorithms for Binary Operations

   15.2.4 Exercises for Section 15.2

   15.3 Nested-Loop Joins

   15.3.1 Tuple-Based Nested-Loop Join

   15.3.2 An Iterator for Tuple-Based Nested-Loop Join

   15.3.3 Block-Based Nested-Loop Join Algorithm

   15.3.4 Analysis of Nested-Loop Join

   15.3.5 Summary of Algorithms so Far

   15.3.6 Exercises for Section 15.3

   15.4 Two-Pass Algorithms Based on Sorting

   15.4.1 Two-Phase, Multiway Merge-Sort

   15.4.2 Duplicate Elimination Using Sorting

   15.4.3 Grouping and Aggregation Using Sorting

   15.4.4 A Sort-Based Union Algorithm

   15.4.5 Sort-Based Intersection and Difference

   15.4.6 A Simple Sort-Based Join Algorithm

   15.4.7 Analysis of Simple Sort-Join

   15.4.8 A More Efficient Sort-Based Join

   15.4.9 Summary of Sort-Based Algorithms

   15.4.10 Exercises for Section 15.4

   15.5 Two-Pass Algorithms Based on Hashing

   15.5.1 Partitioning Relations by Hashing

   15.5.2 A Hash-Based Algorithm for Duplicate Elimination

   15.5.3 Hash-Based Grouping and Aggregation

   15.5.4 Hash-Based Union, Intersection, and Difference

   15.5.5 The Hash-Join Algorithm

   15.5.6 Saving Some Disk I/O's

   15.5.7 Summary of Hash-Based Algorithms

   15.5.8 Exercises for Section 15.5

   15.6 Index-Based Algorithms

   15.6.1 Clustering and Nonclustering Indexes

   15.6.2 Index-Based Selection

   15.6.3 Joining by Using an Index

   15.6.4 Joins Using a Sorted Index

   15.6.5 Exercises for Section 15.6

   15.7 Buffer Management

   15.7.1 Buffer Management Architecture

   15.7.2 Buffer Management Strategies

   15.7.3 The Relationship Between Physical Operator Selection and Buffer Management

   15.7.4 Exercises for Section 15.7

   15.8 Algorithms Using More Than Two Passes

   15.8.1 Multipass Sort-Based Algorithms

   15.8.2 Performance of Multipass, Sort-Based Algorithms

   15.8.3 Multipass Hash-Based Algorithms

   15.8.4 Performance of Multipass Hash-Based Algorithms

   15.8.5 Exercises for Section 15.8

   15.9 Summary of Chapter 15

   15.10 References for Chapter 15

16 The Query Compiler

   16.1 Parsing and Preprocessing

   16.1.1 Syntax Analysis and Parse Trees

   16.1.2 A Grammar for a Simple Subset of SQL

   16.1.3 The Preprocessor

   16.1.4 Preprocessing Queries Involving Views

   16.1.5 Exercises for Section 16.1

   16.2 Algebraic Laws for Improving Query Plans

   16.2.1 Commutative and Associative Laws

   16.2.2 Laws Involving Selection

   16.2.3 Pushing Selections

   16.2.4 Laws Involving Projection

   16.2.5 Laws About Joins and Products

   16.2.6 Laws Involving Duplicate Elimination

   16.2.7 Laws Involving Grouping and Aggregation

   16.2.8 Exercises for Section 16.2

   16.3 From Parse Trees to Logical Query Plans

   16.3.1 Conversion to Relational Algebra

   16.3.2 Removing Subqueries From Conditions

   16.3.3 Improving the Logical Query Plan

   16.3.4 Grouping Associative/Commutative Operators

   16.3.5 Exercises for Section 16.3

   16.4 Estimating the Cost of Operations

   16.4.1 Estimating Sizes of Intermediate Relations

   16.4.2 Estimating the Size of a Projection

   16.4.3 Estimating the Size of a Selection

   16.4.4 Estimating the Size of a Join

   16.4.5 Natural Joins With Multiple Join Attributes

   16.4.6 Joins of Many Relations

   16.4.7 Estimating Sizes for Other Operations

   16.4.8 Exercises for Section 16.4

   16.5 Introduction to Cost-Based Plan Selection

   16.5.1 Obtaining Estimates for Size Parameters

   16.5.2 Computation of Statistics

   16.5.3 Heuristics for Reducing the Cost of Logical Query Plans

   16.5.4 Approaches to Enumerating Physical Plans

   16.5.5 Exercises for Section 16.5

   16.6 Choosing an Order for Joins

   16.6.1 Significance of Left and Right Join Arguments

   16.6.2 Join Trees

   16.6.3 Left-Deep Join Trees

   16.6.4 Dynamic Programming to Select a Join Order and Grouping

   16.6.5 Dynamic Programming With More Detailed Cost Functions

   16.6.6 A Greedy Algorithm for Selecting a Join Order

   16.6.7 Exercises for Section 16.6

   16.7 Completing the Physical-Query-Plan

   16.7.1 Choosing a Selection Method

   16.7.2 Choosing a Join Method

   16.7.3 Pipelining Versus Materialization

   16.7.4 Pipelining Unary Operations

   16.7.5 Pipelining Binary Operations

   16.7.6 Notation for Physical Query Plans

   16.7.7 Ordering of Physical Operations

   16.7.8 Exercises for Section 16.7

   16.8 Summary of Chapter 16

   16.9 References for Chapter 16

17 Coping With System Failures

   17.1 Issues and Models for Resilient Operation

   17.1.1 Failure Modes

   17.1.2 More About Transactions

   17.1.3 Correct Execution of Transactions

   17.1.4 The Primitive Operations of Transactions

   17.1.5 Exercises for Section 17.1

   17.2 Undo Logging

   17.2.1 Log Records

   17.2.2 The Undo-Logging Rules

   17.2.3 Recovery Using Undo Logging

   17.2.4 Checkpointing

   17.2.5 Nonquiescent Checkpointing

   17.2.6 Exercises for Section 17.2

   17.3 Redo Logging

   17.3.1 The Redo-Logging Rule

   17.3.2 Recovery With Redo Logging

   17.3.3 Checkpointing a Redo Log

   17.3.4 Recovery With a Checkpointed Redo Log

   17.3.5 Exercises for Section 17.3

   17.4 Undo/Redo Logging

   17.4.1 The Undo/Redo Rules

   17.4.2 Recovery With Undo/Redo Logging

   17.4.3 Checkpointing an Undo/Redo Log

   17.4.4 Exercises for Section 17.4

   17.5 Protecting Against Media Failures

   17.5.1 The Archive

   17.5.2 Nonquiescent Archiving

   17.5.3 Recovery Using an Archive and Log

   17.5.4 Exercises for Section 17.5

   17.6 Summary of Chapter 17

   17.7 References for Chapter 17

18 Concurrency Control

   18.1 Serial and Serializable Schedules

   18.1.1 Schedules

   18.1.2 Serial Schedules

   18.1.3 Serializable Schedules

   18.1.4 The Effect of Transaction Semantics

   18.1.5 A Notation for Transactions and Schedules

   18.1.6 Exercises for Section 18.1

   18.2 Conflict-Serializability

   18.2.1 Conflicts

   18.2.2 Precedence Graphs and a Test for Conflict-Serializability

   18.2.3 Why the Precedence-Graph Test Works

   18.2.4 Exercises for Section 18.2

   18.3 Enforcing Serializability by Locks

   18.3.1 Locks

   18.3.2 The Locking Scheduler

   18.3.3 Two-Phase Locking

   18.3.4 Why Two-Phase Locking Works

   18.3.5 Exercises for Section 18.3

   18.4 Locking Systems With Several Lock Modes

   18.4.1 Shared and Exclusive Locks

   18.4.2 Compatibility Matrices

   18.4.3 Upgrading Locks

   18.4.4 Update Locks

   18.4.5 Increment Locks

   18.4.6 Exercises for Section 18.4

   18.5 An Architecture for a Locking Scheduler

   18.5.1 A Scheduler That Inserts Lock Actions

   18.5.2 The Lock Table

   18.5.3 Exercises for Section 18.5

   18.6 Hierarchies of Database Elements

   18.6.1 Locks With Multiple Granularity

   18.6.2 Warning Locks

   18.6.3 Phantoms and Handling Insertions Correctly

   18.6.4 Exercises for Section 18.6

   18.7 The Tree Protocol

   18.7.1 Motivation for Tree-Based Locking

   18.7.2 Rules for Access to Tree-Structured Data

   18.7.3 Why the Tree Protocol Works

   18.7.4 Exercises for Section 18.7

   18.8 Concurrency Control by Timestamps

   18.8.1 Timestamps

   18.8.2 Physically Unrealizable Behaviors

   18.8.3 Problems With Dirty Data

   18.8.4 The Rules for Timestamp-Based Scheduling

   18.8.5 Multiversion Timestamps

   18.8.6 Timestamps Versus Locking

   18.8.7 Exercises for Section 18.8

   18.9 Concurrency Control by Validation

   18.9.1 Architecture of a Validation-Based Scheduler

   18.9.2 The Validation Rules

   18.9.3 Comparison of Three Concurrency-Control Mechanisms

   18.9.4 Exercises for Section 18.9

   18.10 Summary of Chapter 18

   18.11 References for Chapter 18

19 More About Transaction Management

   19.1 Serializability and Recoverability

   19.1.1 The Dirty-Data Problem

   19.1.2 Cascading Rollback

   19.1.3 Recoverable Schedules

   19.1.4 Schedules That Avoid Cascading Rollback

   19.1.5 Managing Rollbacks Using Locking

   19.1.6 Group Commit

   19.1.7 Logical Logging

   19.1.8 Recovery From Logical Logs

   19.1.9 Exercises for Section 19.1

   19.2 Deadlocks

   19.2.1 Deadlock Detection by Timeout

   19.2.2 The Waits-For Graph

   19.2.3 Deadlock Prevention by Ordering Elements

   19.2.4 Detecting Deadlocks by Timestamps

   19.2.5 Comparison of Deadlock-Management Methods

   19.2.6 Exercises for Section 19.2

   19.3 Long-Duration Transactions

   19.3.1 Problems of Long Transactions

   19.3.2 Sagas

   19.3.3 Compensating Transactions

   19.3.4 Why Compensating Transactions Work

   19.3.5 Exercises for Section 19.3

   19.4 Summary of Chapter 19

   19.5 References for Chapter 19

20 Parallel and Distributed Databases

   20.1 Parallel Algorithms on Relations

   20.1.1 Models of Parallelism

   20.1.2 Tuple-at-a-Time Operations in Parallel

   20.1.3 Parallel Algorithms for Full-Relation Operations

   20.1.4 Performance of Parallel Algorithms

   20.1.5 Exercises for Section 20.1

   20.2 The Map-Reduce Parallelism Framework

   20.2.1 The Storage Model

   20.2.2 The Map Function

   20.2.3 The Reduce Function

   20.2.4 Exercises for Section 20.2

   20.3 Distributed Databases

   20.3.1 Distribution of Data

   20.3.2 Distributed Transactions

   20.3.3 Data Replication

   20.3.4 Exercises for Section 20.3

   20.4 Distributed Query Processing

   20.4.1 The Distributed Join Problem

   20.4.2 Semijoin Reductions

   20.4.3 Joins of Many Relations

   20.4.4 Acyclic Hypergraphs

   20.4.5 Full Reducers for Acyclic Hypergraphs

   20.4.6 Why the Full-Reducer Algorithm Works

   20.4.7 Exercises for Section 20.4

   20.5 Distributed Commit

   20.5.1 Supporting Distributed Atomicity

   20.5.2 Two-Phase Commit

   20.5.3 Recovery of Distributed Transactions

   20.5.4 Exercises for Section 20.5

   20.6 Distributed Locking

   20.6.1 Centralized Lock Systems

   20.6.2 A Cost Model for Distributed Locking Algorithms

   20.6.3 Locking Replicated Elements

   20.6.4 Primary-Copy Locking

   20.6.5 Global Locks From Local Locks

   20.6.6 Exercises for Section 20.6

   20.7 Peer-to-Peer Distributed Search

   20.7.1 Peer-to-Peer Networks

   20.7.2 The Distributed-Hashing Problem

   20.7.3 Centralized Solutions for Distributed Hashing

   20.7.4 Chord Circles

   20.7.5 Links in Chord Circles

   20.7.6 Search Using Finger Tables

   20.7.7 Adding New Nodes

   20.7.8 When a Peer Leaves the Network

   20.7.9 When a Peer Fails

   20.7.10 Exercises for Section 20.7

   20.8 Summary of Chapter 20

   20.9 References for Chapter 20

PART V: Other Issues in Management of Massive Data

21 Information Integration

   21.1 Introduction to Information Integration

   21.1.1 Why Information Integration?

   21.1.2 The Heterogeneity Problem

   21.2 Modes of Information Integration

   21.2.1 Federated Database Systems

   21.2.2 Data Warehouses

   21.2.3 Mediators

   21.2.4 Exercises for Section 21.2

   21.3 Wrappers in Mediator-Based Systems

   21.3.1 Templates for Query Patterns

   21.3.2 Wrapper Generators

   21.3.3 Filters

   21.3.4 Other Operations at the Wrapper

   21.3.5 Exercises for Section 21.3

   21.4 Capability-Based Optimization

   21.4.1 The Problem of Limited Source Capabilities

   21.4.2 A Notation for Describing Source Capabilities

   21.4.3 Capability-Based Query-Plan Selection

   21.4.4 Adding Cost-Based Optimization

   21.4.5 Exercises for Section 21.4

   21.5 Optimizing Mediator Queries

   21.5.1 Simplified Adornment Notation

   21.5.2 Obtaining Answers for Subgoals

   21.5.3 The Chain Algorithm

   21.5.4 Incorporating Union Views at the Mediator

   21.5.5 Exercises for Section 21.5

   21.6 Local-as-View Mediators

   21.6.1 Motivation for LAV Mediators

   21.6.2 Terminology for LAV Mediation

   21.6.3 Expanding Solutions

   21.6.4 Containment of Conjunctive Queries

   21.6.5 Why the Containment-Mapping Test Works

   21.6.6 Finding Solutions to a Mediator Query

   21.6.7 Why the LMSS Theorem Holds

   21.6.8 Exercises for Section 21.6

   21.7 Entity Resolution

   21.7.1 Deciding Whether Records Represent a Common Entity

   21.7.2 Merging Similar Records

   21.7.3 Useful Properties of Similarity and Merge Functions

   21.7.4 The R-Swoosh Algorithm for ICAR Records

   21.7.5 Why R-Swoosh Works

   21.7.6 Other Approaches to Entity Resolution

   21.7.7 Exercises for Section 21.7

   21.8 Summary of Chapter 21

   21.9 References for Chapter 21

22 Data Mining

   22.1 Frequent-Itemset Mining

   22.1.1 The Market-Basket Model

   22.1.2 Basic Definitions

   22.1.3 Association Rules

   22.1.4 The Computation Model for Frequent Itemsets

   22.1.5 Exercises for Section 22.1

   22.2 Algorithms for Finding Frequent Itemsets

   22.2.1 The Distribution of Frequent Itemsets

   22.2.2 The Naive Algorithm for Finding Frequent Itemsets

   22.2.3 The A-Priori Algorithm

   22.2.4 Implementation of the A-Priori Algorithm

   22.2.5 Making Better Use of Main Memory

   22.2.6 When to Use the PCY Algorithm

   22.2.7 The Multistage Algorithm

   22.2.8 Exercises for Section 22.2

   22.3 Finding Similar Items

   22.3.1 The Jaccard Measure of Similarity

   22.3.2 Applications of Jaccard Similarity

   22.3.3 Minhashing

   22.3.4 Minhashing and Jaccard Distance

   22.3.5 Why Minhashing Works

   22.3.6 Implementing Minhashing

   22.3.7 Exercises for Section 22.3

   22.4 Locality-Sensitive Hashing

   22.4.1 Entity Resolution as an Example of LSH

   22.4.2 Locality-Sensitive Hashing of Signatures

   22.4.3 Combining Minhashing and Locality-Sensitive Hashing

   22.4.4 Exercises for Section 22.4

   22.5 Clustering of Large-Scale Data

   22.5.1 Applications of Clustering

   22.5.2 Distance Measures

   22.5.3 Agglomerative Clustering

   22.5.4 $k$-Means Algorithms

   22.5.5 $k$-Means for Large-Scale Data

   22.5.6 Processing a Memory Load of Points

   22.5.7 Exercises for Section 22.5

   22.6 Summary of Chapter 22

   22.7 References for Chapter 22

23 Database Systems and the Internet

   23.1 The Architecture of a Search Engine

   23.1.1 Components of a Search Engine

   23.1.2 Web Crawlers

   23.1.3 Query Processing in Search Engines

   23.1.4 Ranking Pages

   23.2 PageRank for Identifying Important Pages

   23.2.1 The Intuition Behind PageRank

   23.2.2 Recursive Formulation of PageRank\nobreakspace {}--- First Try

   23.2.3 Spider Traps and Dead Ends

   23.2.4 PageRank Accounting for Spider Traps and Dead Ends

   23.2.5 Exercises for Section 23.2

   23.3 Topic-Specific PageRank

   23.3.1 Teleport Sets

   23.3.2 Calculating A Topic-Specific PageRank

   23.3.3 Link Spam

   23.3.4 Topic-Specific PageRank and Link Spam

   23.3.5 Exercises for Section 23.3

   23.4 Data Streams

   23.4.1 Data-Stream-Management Systems

   23.4.2 Stream Applications

   23.4.3 A Data-Stream Data Model

   23.4.4 Converting Streams Into Relations

   23.4.5 Converting Relations Into Streams

   23.4.6 Exercises for Section 23.4

   23.5 Data Mining of Streams

   23.5.1 Motivation

   23.5.2 Counting Bits

   23.5.3 Counting the Number of Distinct Elements

   23.5.4 Exercises for Section 23.5

   23.6 Summary of Chapter 23

   23.7 References for Chapter 23