Content area

Abstract

The emergence of exascale systems marks a transforming era in high-performance computing (HPC) powered by extensive use of GPUs. GPGPU's popularity in HPC, due to performance gains and power efficiency, demands redesigning traditional algorithms to exploit GPU parallelism. However, declarative languages, like Datalog, can directly leverage these advancements due to their ability to express complex problems through simple rules and queries, which can be efficiently compiled into relational algebra operations for execution on GPGPUs. Integrating Datalog's declarative syntax with GPGPU's computational power enables scalable declarative analytics across big data, graph mining, and program analysis on HPC systems.

While recent advancements have focused on multi-threaded and multi-core implementations of Datalog, the evolution of exascale systems presents a compelling opportunity to extend Datalog’s capabilities to multi-node, multi-GPU environments. This thesis addresses this gap by developing the first multi-GPU, multi-node Datalog engine. First we investigate the parallelization of iterated operations involving relational algebra primitives on GPUs, which are fundamental to Datalog operations. Then, we address challenges specific to heterogeneous architectures, including optimized communication strategies, recursive aggregation techniques, and efficient join operations, all tailored for a heterogeneous Datalog backend. We focus on optimizing specialized Datalog implementations for graph algorithms, including path-finding and topology-based feature extraction. For testing and benchmarking of the algorithms, we utilize publicly available datasets from the Stanford Large Network Dataset Collection and the SuiteSparse Matrix Collection. Our research extends beyond traditional graph mining and program analysis, exploring Datalog's potential in emerging domains such as topological data analysis, machine learning, and visual analytics for high-dimensional data. Evaluating power consumption alongside performance enhancement is increasingly vital in HPC systems, as energy efficiency significantly impacts operational sustainability and cost-effectiveness. Thus we conduct power analysis across GPU-based Datalog engines, which differ primarily in their recursive join strategies and underlying data structures. We evaluate how variations in implementation techniques for the same application, executed on identical hardware and datasets, influence power consumption. By advancing Datalog's applicability in exascale environments, we aim to demonstrate its scalability and suitability for performance and energy-efficient analysis of complex data on next-generation computing platforms.

Details

1010268
Title
Declarative Analytics on Heterogeneous HPC Systems
Number of pages
238
Publication year
2025
Degree date
2025
School code
0799
Source
DAI-A 87/5(E), Dissertation Abstracts International
ISBN
9798263306762
Committee member
Papka, Michael E.; Lan, Zhiling; Sintos, Stavros; Deshpande, Gopikrishna; Gilray, Thomas
University/institution
University of Illinois at Chicago
Department
Computer Science
University location
United States -- Illinois
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32409593
ProQuest document ID
3271766031
Document URL
https://www.proquest.com/dissertations-theses/declarative-analytics-on-heterogeneous-hpc/docview/3271766031/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic