Content area

Abstract

In the past decade, Graphics Processing Units (GPUs) have rapidly evolved as one of the most popular computing platforms to provide significant acceleration in machine learning, graph processing, scientific computing, and VR/AR. The ever-growing application complexity and input dataset sizes have driven the popularity of multi-GPU systems as desirable computing platforms. This trend is also evident in modern computing infrastructures and data centers, e.g., nine of the top ten supercomputers are equipped with multiple GPUs per node. While employing multiple GPUs intuitively offers aggregated memory capacity and combined computational parallelism, these increased resources rarely translate to tangible application benefits (e.g., performance and quality of services). This discrepancy arises from several factors, such as inefficient address translation, non-uniform memory accesses, inter-GPU communication overheads, and load imbalance among the GPUs, etc. Consequently, critical questions remain unaddressed: How to design multi-GPU computing architectures? and How to harness multi-GPU advantages in emerging applications?.

This thesis is motivated by these two critical questions and aims to advance the deployment of multi-GPU systems in modern computing. The thesis pioneered several distinctive directions of architectural and system-level designs toward fully exploiting multi-GPU capabilities. First, the thesis redesigns the TLB hierarchy and proposes i) “least-inclusive” TLB hierarchy and ii) hardware-supported address translation sharing with peer GPUs. Second, the thesis focuses on uncovering the bottlenecks and exploring opportunities in page table walking (PTW) in multi-GPUs. Third, the thesis investigates the effects of frequent page migration invalidations in multi-GPU systems and proposes a software-hardware co-design to mitigate the page table invalidation overhead and improve overall application performance. Finally, in multi-tenant environments, TLB sub-entries are often underutilized due to multi-tenancy interference. The thesis proposes shared-aware sub-entry technique to enhance utilization.

Details

1010268
Business indexing term
Title
Unleashing Multi-GPU Computing to the Next-Level
Author
Number of pages
140
Publication year
2025
Degree date
2025
School code
0178
Source
DAI-B 87/3(E), Dissertation Abstracts International
ISBN
9798293818518
Advisor
Committee member
Childers, Bruce R.; Zhang, Youtao; Yang, Jun
University/institution
University of Pittsburgh
University location
United States -- Pennsylvania
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32040909
ProQuest document ID
3248171077
Document URL
https://www.proquest.com/dissertations-theses/unleashing-multi-gpu-computing-next-level/docview/3248171077/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic