Content area

Abstract

Large applications that run across multiple host computers need access to both local and remote distributed memory. They also use compute accelerators to achieve high levels of computation. The capacity of local memory is increasing gradually, but this trend is slowing down because of physical constraints that limit the number and size of memory devices that can be accessed locally. These limitations have led to the development of various new memory technologies that aim to increase local memory capacity. They have also led to the use of various Distributed Shared Memory (DSM) systems that provide pools of remote memory. In both cases, however, the application uses different memory abstractions, while its performance is affected by characteristics of each memory interface.

Moreover, compute accelerators may have their own local memories and are tied to local computation without being part of any broader distributed memory architecture. In such configurations, all communication from any processing element to the system memory resources must be processed by host software leading to significant performance inefficiencies. However, current approaches to provide accelerators access to any pool of distributed memory require significant capabilities and resources from the accelerator. Therefore, providing efficient access across a distributed heterogeneous system to pools of distributed memory is a key challenge for scaling applications.

The approach for accessing local and remote memories currently operates at two different levels of granularity in abstraction: the level where the host software abstracts the application access to remote memory, and the level where the host hardware and its memory transactions provide direct access to its local memory. At the software-level granularity, any access to remote resources requires higher-level software abstractions that add latency. By switching to hardware-level granularity for remote memory, any processing element will be able to directly access any memory resources through its instruction set architecture and native read/write transactions and thus remove any software invocation overhead and reduce latency compared to all current software-based approaches. Therefore, this thesis contributes the description of the Generalised Memory System (GMS). More precisely, this thesis:

1) Proposes the system architecture for GMS that logically unifies local, remote, and accelerator memory resources to create a novel, directly addressable distributed shared memory pool.

2) Extends the GMS so that any processing element within a heterogeneous distributed system can also directly access memory in the GMS without software overhead.

3) Contributes both, through engineering examples, with the solutions that allow FPGA accelerated code and existing PCIe-host based code to participate in the GMS, including the implementation of associated firmware.

4) Demonstrates the advantages of GMS through elevating accelerators as a peer of the hosts within a distributed application.

To evaluate the benefits of GMS, this work included a GMS implementation known as F-GMS. It is a baseline implementation of a GMS used to evaluate in multiple endpoint types. It was deployed on a 4-node distributed heterogeneous computing cluster that combines Peripheral Component Interconnect Express (PCIe)-attached Field Programmable Gate Array (FPGA) accelerators and x86-64 AMD computing nodes connected on a commodity 100Gbit network. The results show that the proposed GMS reduces the execution time of distributed applications by half compared to MPI, through the reduction in the access time of load/store operations in remote shared memory, and provide accelerators in heterogeneous applications direct access to remote memory.

Details

1010268
Classification
Title
Enabling Direct-Access Global Shared Memory for Distributed Heterogeneous Computing
Number of pages
232
Publication year
2023
Degree date
2023
School code
1543
Source
DAI-B 85/12(E), Dissertation Abstracts International
ISBN
9798383032879
University/institution
The University of Manchester (United Kingdom)
University location
England
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31179195
ProQuest document ID
3073246681
Document URL
https://www.proquest.com/dissertations-theses/enabling-direct-access-global-shared-memory/docview/3073246681/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic