Content area

Abstract

Stringent performance requirements in DB applications have led to the use of parallelism for database processing. To allow the database system to take advantage of the performance of parallel shared-nothing systems, the physical DB design must be appropriate for the DB structure and the workload.

We develop decision algorithms that will select a good physical DB design both when the DB is first loaded into the system (static decision) and while the DB is being used by the workload (dynamic decision). Our decision algorithms take the database structure, workload, and system characteristics as inputs. The static (or initial) physical DB design decision algorithm involves: selecting a partitioning attribute for each relation that determines how the relation is fragmented across the nodes (allowing for high I/O bandwidth); (1) selecting indexes on the relation attributes to allow faster accesses compared to sequential file scans; (2) selecting the attributes by which to cluster a relation in order to take advantage of the prefetching and caching involved in 1/0 access; (3) grouping of relations to allow DB operations (joins) on relation pairs to be executed locally at each node, thus reducing communication costs; (4) selecting the number of partitions per relation group (and thus per relation); and (5) assigning each partition of each relation to a specific system node. Our studies show that, among the algorithms we studied, an algorithm based on a branch-and bound strategy finds designs with the lowest estimated workload average response time and requires an acceptable amount of computation.

The physical DB design reorganization problem, which is the dynamic DB design decision, involves determining how to change the current physical DB design based on new DB structure, workload, and system information. If a new physical DB design is chosen, a strategy to move from the old to the new design must also be identified by the algorithm. A formula is developed to calculate both the benefit resulting after a reorganization is complete and the cost of performing the reorganization. The value from this formula for a specific reorganization is called the (net) gain metric, and this metric is used by a decision algorithm to compare reorganizations and to select the reorganization for which the benefit most exceeds its cost. We also develop a method to estimate the costs of executing a reorganization with a workload, and we provide some decision algorithms. The selection of the priority level at which to run the reorganization processes concurrently with the workload is investigated. Our studies indicate that a low priority for the reorganization process compared to the priorities for the workload processes is often but not always best.

Details

1010268
Classification
Title
Physical database design decision algorithms and concurrent reorganization for parallel database systems
Number of pages
277
Degree date
1998
School code
0779
Source
DAI-B 60/01, Dissertation Abstracts International
ISBN
978-0-612-35386-2
University/institution
University of Toronto (Canada)
University location
Canada -- Ontario, CA
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
NQ35386
ProQuest document ID
304488427
Document URL
https://www.proquest.com/dissertations-theses/physical-database-design-decision-algorithms/docview/304488427/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic