Content area

Abstract

High performance computing (HPC) and cloud datacenters are observing unprecedented increases in capacity and demand from users for more computing resources. A modern datacenter can host up to 100,000 hardware compute nodes, may cost up to $1 billion to build, and can consume over 300 gigawatt hours (GWh) of energy annually, resulting in a carbon footprint of up to 150,000 metric tons of carbon dioxide (comparable to the emissions of more than 20,000 U.S. households). However, a programmer's ability to use complex datacenter hardware efficiently for achieving high performance for their applications has not scaled up proportionally -- in fact, some pessimistically suspect it may have become worse with increasing hardware complexity.

Therefore, this dissertation poses a fundamental question (and goal): "Can we build system software tools to make large-scale computing systems more productive for programmers, cost-effective for service providers, and environmentally more sustainable?" The dissertation designs and implements innovative real-system experimental proof-of-concepts to demonstrate that a set of elegant, and sometimes non-intuitive, strategies enable us to achieve this challenging goal.

We demonstrate that intelligently leveraging serverless cloud computing (function-as-a-service) can make the execution of complex scientific workflows more resource-efficient and faster -- in contrast to the conventional practice of executing scientific workflows on on-premise, stateful HPC clusters instead of stateless, function-as-a-service execution on cloud platforms. However, leveraging serverless cloud computing model and cloud computing resources is cost-prohibitive, difficult to optimize for performance, and poses a severe ``hidden'' carbon footprint burden. This dissertation demonstrates that the unorthodox use of server heterogeneity (low-end and high-end hardware) and opportunistic scheduling can make serverless computing significantly more cost-effective. Toward lowering the productivity burden on programmers for cost-effective performance optimization, this dissertation demonstrates that an ensemble of lightweight, approximately accurate performance models and tuning methods can be more effective than building accurate and highly complex performance models and performance tuning strategies. Finally, this dissertation proposes the first carbon footprint accounting methodology and server-heterogeneity-inspired mitigation strategy for the serverless computing model -- revealing and reducing the high hidden embodied carbon footprint of keeping function code alive in server memory in anticipation of future invocations. We are hopeful that real-system open-source artifacts will accelerate innovation in this area and broader community engagement toward more productive, cost-effective, and sustainable HPC systems.

Details

1010268
Title
Toward Improving Productivity, Cost Effectiveness, and Sustainability of Large Scale Computing Systems
Number of pages
222
Publication year
2025
Degree date
2025
School code
0160
Source
DAI-B 86/11(E), Dissertation Abstracts International
ISBN
9798314844472
Committee member
Desnoyers, Peter; Kaeli, David; Panda, Dhabaleswar K DK
University/institution
Northeastern University
Department
Electrical and Computer Engineering
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31931241
ProQuest document ID
3198966621
Document URL
https://www.proquest.com/dissertations-theses/toward-improving-productivity-cost-effectiveness/docview/3198966621/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic