Content area

Abstract

Current supercomputers often have a heterogeneous architecture using both CPUs and GPUs. At the same time, numerical simulation tasks frequently involve multiphysics scenarios whose components run on different hardware due to multiple reasons, e.g., architectural requirements, pragmatism, etc. This leads naturally to a software design where different simulation modules are mapped to different subsystems of the heterogeneous architecture. We present a detailed performance analysis for such a hybrid four-way coupled simulation of a fully resolved particle-laden flow. The Eulerian representation of the flow utilizes GPUs, while the Lagrangian model for the particles runs on CPUs. First, a roofline model is employed to predict the node level performance and to show that the lattice-Boltzmann-based fluid simulation reaches very good performance on a single GPU. Furthermore, the GPU-GPU communication for a large-scale flow simulation results in only moderate slowdowns due to the efficiency of the CUDA-aware MPI communication, combined with communication hiding techniques. On 1024 A100 GPUs, a parallel efficiency of up to 71% is achieved. While the flow simulation has good performance characteristics, the integration of the stiff Lagrangian particle system requires frequent CPU-CPU communications that can become a bottleneck. Additionally, special attention is paid to the CPU-GPU communication overhead since this is essential for coupling the particles to the flow simulation. However, thanks to our problem-aware co-partitioning, the CPU-GPU communication overhead is found to be negligible. As a lesson learned from this development, four criteria are postulated that a hybrid implementation must meet for the efficient use of heterogeneous supercomputers. Additionally, an a priori estimate of the speedup for hybrid implementations is suggested.

Details

1009240
Title
Efficiency and scalability of fully-resolved fluid-particle simulations on heterogeneous CPU-GPU architectures
Publication title
arXiv.org; Ithaca
Publication year
2024
Publication date
Dec 9, 2024
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2024-12-10
Milestone dates
2023-03-21 (Submission v1); 2024-12-09 (Submission v2)
Publication history
 
 
   First posting date
10 Dec 2024
ProQuest document ID
2789557366
Document URL
https://www.proquest.com/working-papers/efficiency-scalability-fully-resolved-fluid/docview/2789557366/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2024-12-11
Database
ProQuest One Academic