Content area

Abstract

The rapid growth of data- and compute-intensive applications, such as scientific simulations, real-time analytics, and neural networks, has driven a significant shift toward parallelism in performance-oriented programming. Meanwhile, microprocessor development has slowed as Moore’s law approaches its limits. With the stagnation of single-core performance, multi-core and many-core architectures have become increasingly dominant in high-performance computing (HPC). Fully exploiting the computational potential of these architectures, however, remains a significant challenge.

Recent machine learning (ML) advances have enabled new opportunities in software engineering and HPC code development. Nevertheless, applying ML to the HPC domain presents unique challenges. Effective models must account for crucial features such as syntax, code structure, and semantics to capture the complex characteristics of HPC code.

This dissertation presents four studies aimed at enabling machine learning techniques for parallel code generation. Each study introduces a novel approach to improve the applicability and effectiveness of ML in this domain.

The first two studies focus on graph neural networks (GNNs) for OpenMP pragma generation. The first proposes a multi-view approach that integrates structural pattern analysis with node-level features to predict OpenMP pragmas. The second addresses limitations in the initial study by introducing a novel dataset and a new representation of HPC code to better support GNN modeling. Both works demonstrate the utility of GNNs in learning from code structure for parallelization.

The third study explores the development of a domain-specific language model tailored for OpenMP pragma generation, emphasizing the potential of specialized language models in HPC contexts. The fourth study employs large language model (LLM) agents to construct a benchmark dataset for evaluating LLMs in parallel code generation, including both OpenMP and MPI.

Collectively, these studies address key challenges in applying ML to the HPC domain, focusing on accurate modeling of parallel programs and enabling automatic parallel code generation. The techniques developed aim to contribute to the broader effort of making ML a practical tool for high-performance scientific computing.

The author hopes this work inspires further research and contributes to overcoming the longstanding challenges of parallel programming in the HPC community.

Details

1010268
Title
Parallel Code Generation Using Graph Neural Networks and Language Models
Author
Chen, Le  VIAFID ORCID Logo 
Number of pages
137
Publication year
2025
Degree date
2025
School code
0097
Source
DAI-A 86/12(E), Dissertation Abstracts International
ISBN
9798286429424
Committee member
Aduri, Pavan; Ciardo, Gianfranco; Li, Qi; Miner, Andrew
University/institution
Iowa State University
Department
Computer Science
University location
United States -- Iowa
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
30993873
ProQuest document ID
3224180580
Document URL
https://www.proquest.com/dissertations-theses/parallel-code-generation-using-graph-neural/docview/3224180580/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic