Content area

Abstract

The performance of a database system heavily depends on its storage engine. The design of on-disk index data structure is critical to the storage engine performance. Many database systems have chosen B+-trees as the on-disk index data structure. A B+-tree organizes every internal node in a block of the I/O unit so that the reads would incur minimum I/Os. However, writes in B+-trees are slow because a single write often causes multiple blocks to be rewritten due to the need to maintain its structure. To enable the support for more efficient writes, many database systems start to adopt LSM-trees in their storage engines. While an LSM-tree achieves efficient writes by batching small updates to eliminate in-place rewrites, reads in LSM-trees often require multiple I/Os to read the data scattered across a number of files on the disk which makes its reads much slower than B+-trees’.

For a long time, reads and writes have been perceived to be a trade-off in designing on-disk index data structures. It still remains a challenge to achieve efficient reads and writes at the same time. In this thesis, we explore the idea of using compact indexes on on-disk data structures that can achieve efficient reads and writes. In this thesis, we introduce three works that progressively achieve more efficient index data structure designs. The first work REMIX compactly records and replay the scanning paths to achieve more efficient scanning in LSM-trees. The second work Disco, built on top of REMIX, achieves more efficient use of I/O for point and range queries. Disco guarantees that a point query issues at most one I/O while a short range query needs at most two I/Os, achieving an I/O efficiency close to a B+-tree.

The third work extends the design of Disco to achieve an optimal use of I/O for on-disk data structures. 

Details

1010268
Title
Designing Compact On-Disk Index Data Structures
Number of pages
150
Publication year
2025
Degree date
2025
School code
0799
Source
DAI-B 87/5(E), Dissertation Abstracts International
ISBN
9798263306809
Committee member
Wu, Xingbo; Kanich, Chris; Asudeh, Abolfazl; Sintos, Stavros
University/institution
University of Illinois at Chicago
Department
Computer Science
University location
United States -- Illinois
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32409625
ProQuest document ID
3271801714
Document URL
https://www.proquest.com/dissertations-theses/designing-compact-on-disk-index-data-structures/docview/3271801714/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic