Content area

Abstract

Larger databases and cheaper hardware have generated great interest in applying database applications to parallel architectures. A database system based on multiple processors that share nothing (i.e. share neither main memory nor disks) is one way to provide the functionality of a conventional DBMS. To exploit parallelism, the shared-nothing parallel system horizontally partitions a data relation across all the processors. Proponents of this loosely-coupled approach claim such a parallel architecture can achieve high scalability and provide good cost-performance.

However, the effectiveness of parallel executions on a shared-nothing system depends on our ability to equally divide the load among the nodes while minimizing the coordination overhead. In this dissertation, we investigate the skew effects, which frequently cause load imbalance and impair system performance if improperly handled, in parallel database systems. We discuss the nature of skew effects and the reason why they cause performance problems. In order to take full advantage of parallel executions, we study three major performance-oriented topics: Query Optimization, Index Mechanism, and Parallel Join Operation. In each topic, we illustrate the flaws in existing methods which are often straightforward generalizations of conventional database techniques when applied to parallel database systems.

In query optimization, we propose the two-level-query-optimization approach in which query optimization functions are split into system level and node level. We suggest to migrate all the decisions which need to consider individual node's data distribution to the node level. We show that this new approach is especially beneficial to large parallel systems which are vulnerable to the presence of various skew effects. In index mechanism, we present the unified index mechanism by concurrently incorporating both local and distributive mechanisms in a single index. We perform simulation experiments to validate the effectiveness of this new index mechanism. We devise the two-threshold-mechanism to efficiently maintain it. In parallel join operation, we introduce two modified parallel hash join algorithms using tuple duplication and partial duplication schemes respectively. We identify the domains in which our algorithms can provide good performance. As we extend our knowledge of effective parallel executions, our research contributes an essential step in achieving a high performance database system.

Details

1010268
Classification
Identifier / keyword
Title
Investigating skew effects in shared-nothing parallel database systems
Number of pages
168
Degree date
1993
School code
0031
Source
DAI-B 54/03, Dissertation Abstracts International
ISBN
979-8-208-31641-2
University/institution
University of California, Los Angeles
University location
United States -- California
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
9318714
ProQuest document ID
304049385
Document URL
https://www.proquest.com/dissertations-theses/investigating-skew-effects-shared-nothing/docview/304049385/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic