Content area

Abstract

As heterogeneous parallel architectures grow increasingly complex, achieving high performance and effectively teaching parallel programming have become more challenging. Benchmark suites are powerful tools for illustrating and evaluating optimization techniques in practical, performance-critical scenarios. However, widely used parallel benchmark suites such as SPEC OMP and Rodinia are primarily designed for performance assessment and lack structured support for optimization exploration or ease of use. To address these limitations, this dissertation presents two novel benchmark suites, NeoRodinia and CUDAMicroBench, that extend beyond traditional benchmarking by enabling more accessible and structured performance experimentation.

NeoRodinia features a structured three-level parallelization model (P1, P2, P3) across CPU worksharing, GPU offloading, SIMD, and tasking. It provides standardized execution workflows, automated performance evaluation scripts and visualization tools. Additionally, NeoRodinia integrates AI-assisted analysis, allowing LLMs to offer optimization recommendations and debugging insights. CUDAMicroBench is a modular microbenchmark suite targeting key GPU optimization challenges such as memory hierarchy usage, warp divergence, and concurrent kernel execution. This work highlights a focused performance study on shuffle-based reduction kernels, analyzing instruction variants across GPU generations and demonstrating the suite’s capability for low-level architectural evaluation.

In addition to benchmark-based contributions, this dissertation advances parallel programming education by introducing the Interactive OpenMP Programming book. By employing deliberate prompt engineering strategies, it effectively leverages large language models (ChatGPT-4, Gemini Pro 1.5, and Claude 3) to enhance the quality, relevance, and pedagogical value of the generated content. Delivered via a Jupyter-based environment, it enables real-time experimentation with OpenMP constructs, promoting hands-on learning and deeper understanding. Surveys from both OpenMP educators and learners confirm that the book aligns well with instructional principles in parallel programming and functions effectively as a learning material.

Together, these contributions form a cohesive infrastructure for performance analysis, optimization strategy exploration, and interactive learning in modern parallel computing. By integrating benchmark design, LLM-driven analysis, and accessible tooling, this work provides a scalable resource for researchers, instructors, and practitioners navigating today’s increasingly heterogeneous high-performance computing landscape.


Details

1010268
Title
Advancing Parallel Computing Benchmarking: Multi-Level and Progressive Performance Analysis, Optimization and Learning Support for Parallel Programming
Author
Number of pages
208
Publication year
2025
Degree date
2025
School code
0694
Source
DAI-B 87/2(E), Dissertation Abstracts International
ISBN
9798290933870
Committee member
Allen, Tyler; Saule, Erik; Dai, Dong; Wei, Jinpeng
University/institution
The University of North Carolina at Charlotte
Department
Computer Science
University location
United States -- North Carolina
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32171352
ProQuest document ID
3237553165
Document URL
https://www.proquest.com/dissertations-theses/advancing-parallel-computing-benchmarking-multi/docview/3237553165/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic