Content area

Abstract

Query engines enable users to execute queries quickly and gather results, supporting data retrievalacross multiple data sources without needing custom code. The exponential growth of data volumes places increasing demands on modern databases, requiring higher performance, scalability,and efficient real-time query processing. These demands motivated the creation of alternativeDatabase Management System (DBMS) architectures. Unlike traditional systems optimized forquick read-and-write operations on small datasets for transactional workloads, other architecturesprioritize statistical insights.

Columnar query engines have become a prominent architecture for analytical processing, asthey efficiently store and handle large datasets and optimize analytics extraction. These enginesleverage columnar storage formats to improve query performance, particularly for data scans andaggregations.

SIMD instructions allow CPUs to simultaneously execute the same operation across multiple data elements organized in vectors, significantly reducing execution time. This technique isparticularly beneficial for column-oriented databases due to their inherent memory locality.

Indexes provide an additional method for enhancing database performance. Traditional indexing techniques like B-trees are optimized for relational DBMS to accelerate row-level retrievals.In contrast, columnar systems focus on large-scale scans and aggregations, where conventionalindexes are less effective. Recent research, however, has refined indexing techniques to be morecompatible with OLAP queries and analytical workloads.

This dissertation investigates how combining indexing techniques with columnar databasesand vectorization improves performance in real-time analytics and query systems. It addresseslimitations in existing approaches by integrating index structures, such as bitmap and tree-basedindexes, with optimizations tailored for real-time analytics performance.

A systematic evaluation methodology is employed to validate the proposed solution usingindustry-standard benchmarks, including TPC-H and TPC-DS. These benchmarks measure querylatency, I/O operations, and resource utilization. Experiments cover multiple configurations, including tests with unindexed data, to isolate and demonstrate the contributions of the proposedtechniques. Performance metrics such as CPU and memory usage are analyzed to identify bottlenecks and opportunities for further optimization.

The results confirm that integrating vectorized indexing techniques can improve query performance by reducing latency, depending on the use case. However, the research also examinesinherent trade-offs, including increased data structure size, additional write overhead, and hardware usage. These findings validate the proposed approach and underscore its potential to addressthe challenges of modern analytical workloads.

These findings suggest SIMD-optimized indexes improve performance in OLAP workloadsand require further research into their integration in columnar query engines.

Details

1010268
Classification
Title
SIMD-Optimized Indexing for Columnar Databases: Benchmarking Performance in Real-Time Analytical Workloads
Number of pages
141
Publication year
2025
Degree date
2025
School code
5896
Source
MAI 87/5(E), Masters Abstracts International
ISBN
9798265425881
University/institution
Universidade do Porto (Portugal)
University location
Portugal
Degree
M.Eng.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32306701
ProQuest document ID
3275478593
Document URL
https://www.proquest.com/dissertations-theses/simd-optimized-indexing-columnar-databases/docview/3275478593/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic