Content area

Abstract

The rapid growth of cloud-based applications continues to drive extraordinary demand for high performance and scalability. As data volumes increase and applications become more sophisticated, modern systems integrate increasingly diverse hardware and software components. While network bandwidth improves rapidly, CPU performance has plateaued, and GPUs have emerged as the dominant compute engines for AI workloads. This growing heterogeneity introduces significant system complexity, often resulting in core compute resources—CPUs and GPUs—being partially consumed by infrastructure overhead such as network processing, data movement, and resource orchestration.

This thesis demonstrates that in many high-performance systems, these inefficiencies go unnoticed, and core computation cycles are silently lost to non-application tasks. By identifying and addressing these hidden overheads, we show how compute resources can be harvested and returned to user workloads, thereby improving end-to-end application performance. We explore this principle across two major system trends: one where CPUs remain central, but emerging hardware such as SmartNICs offers new opportunities to alleviate CPU workloads—opportunities that this thesis actively exploits; and another where GPUs drive large-scale AI computation and face increasing pressure from dynamic workloads and GPU memory constraints.

Through system-level designs that reduce overhead and strategically reallocate compute to user-level execution, this thesis presents a unified approach to maximizing the effective use of core computation amid growing hardware and software complexity. 

This dissertation presents a line of systems work that target both two trends. It first focuses on the CPU-centric trend and introduces Cowbird, a system which fully offload network overhead in memory disaggregation from CPUs. This allows applications to benefit from expanded memory capacity without compromising CPU performance. Then, it discusses the extension to the GPU-Cowbird version and analyzes the reasons behind its limitations. Lastly, it shifts the focus to the GPU-centric trend and presents SwiftServe, which harvests unused GPU resources during in-place upgrades by overlapping engine initialization with ongoing inference, achieving minimal service disruption and maintaining SLA compliance under dynamic workloads.

Details

1010268
Business indexing term
Title
Harvesting Core Compute Resources to the Extreme
Number of pages
118
Publication year
2025
Degree date
2025
School code
0175
Source
DAI-B 86/12(E), Dissertation Abstracts International
ISBN
9798280758124
Advisor
Committee member
Loo, Boon Thau; Devietti, Joseph; Marcus, Ryan; Zhang, Qizhan
University/institution
University of Pennsylvania
Department
Computer and Information Science
University location
United States -- Pennsylvania
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31997125
ProQuest document ID
3217975592
Document URL
https://www.proquest.com/dissertations-theses/harvesting-core-compute-resources-extreme/docview/3217975592/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic