Advancing Parallel Computing Benchmarking: Multi-Level and Progressive Performance Analysis, Optimization and Learning Support for Parallel Programming

Abstract

As heterogeneous parallel architectures grow increasingly complex, achieving high performance and effectively teaching parallel programming have become more challenging. Benchmark suites are powerful tools for illustrating and evaluating optimization techniques in practical, performance-critical scenarios. However, widely used parallel benchmark suites such as SPEC OMP and Rodinia are primarily designed for performance assessment and lack structured support for optimization exploration or ease of use. To address these limitations, this dissertation presents two novel benchmark suites, NeoRodinia and CUDAMicroBench, that extend beyond traditional benchmarking by enabling more accessible and structured performance experimentation.

NeoRodinia features a structured three-level parallelization model (P1, P2, P3) across CPU worksharing, GPU offloading, SIMD, and tasking. It provides standardized execution workflows, automated performance evaluation scripts and visualization tools. Additionally, NeoRodinia integrates AI-assisted analysis, allowing LLMs to offer optimization recommendations and debugging insights. CUDAMicroBench is a modular microbenchmark suite targeting key GPU optimization challenges such as memory hierarchy usage, warp divergence, and concurrent kernel execution. This work highlights a focused performance study on shuffle-based reduction kernels, analyzing instruction variants across GPU generations and demonstrating the suite’s capability for low-level architectural evaluation.

In addition to benchmark-based contributions, this dissertation advances parallel programming education by introducing the Interactive OpenMP Programming book. By employing deliberate prompt engineering strategies, it effectively leverages large language models (ChatGPT-4, Gemini Pro 1.5, and Claude 3) to enhance the quality, relevance, and pedagogical value of the generated content. Delivered via a Jupyter-based environment, it enables real-time experimentation with OpenMP constructs, promoting hands-on learning and deeper understanding. Surveys from both OpenMP educators and learners confirm that the book aligns well with instructional principles in parallel programming and functions effectively as a learning material.

Together, these contributions form a cohesive infrastructure for performance analysis, optimization strategy exploration, and interactive learning in modern parallel computing. By integrating benchmark design, LLM-driven analysis, and accessible tooling, this work provides a scalable resource for researchers, instructors, and practitioners navigating today’s increasingly heterogeneous high-performance computing landscape.

Details

Subject

Computer science;
Computer engineering

Classification

0984: Computer science
0464: Computer Engineering

Identifier / keyword

Parallel programming; Parallelization model; Interactive learning; Engineering strategies

Title

Advancing Parallel Computing Benchmarking: Multi-Level and Progressive Performance Analysis, Optimization and Learning Support for Parallel Programming

Author

Yi, Xinyao

Number of pages

208

Publication year

2025

Degree date

2025

School code

0694

Source

DAI-B 87/2(E), Dissertation Abstracts International

ISBN

9798290933870

Advisor

Yan, Yonghong

Committee member

Allen, Tyler; Saule, Erik; Dai, Dong; Wei, Jinpeng

University/institution

The University of North Carolina at Charlotte

Department

Computer Science

University location

United States -- North Carolina

Degree

Ph.D.

Source type

Dissertation or Thesis

Language

English

Document type

Dissertation/Thesis

Dissertation/thesis number

32171352

ProQuest document ID

3237553165

Document URL

https://www.proquest.com/dissertations-theses/advancing-parallel-computing-benchmarking-multi/docview/3237553165/se-2?accountid=208611

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Database

ProQuest One Academic

Advancing Parallel Computing Benchmarking: Multi-Level and Progressive Performance Analysis, Optimization and Learning Support for Parallel Programming

Content area

Abstract

Details