Content area

Abstract

In the world of parallel programming, there are two major classes of programming models: shared memory and distributed memory. Shared memory models share all memory by default, and are most effective on multi-processor systems. Distributed memory models separate memory into distinct regions for each execution context and are most effective on a network of processors. Modern and future High Performance Computing (HPC) systems will contain multi- and many-core processors connected by a network, resulting in a hybrid shared and distributed memory environment. Neither programming model is ideal in both areas. Now and in the future, optimizing parallel performance for both memory models simultaneously is a major challenge. MPI (Message Passing Interface) is the de-facto standard for distributed memory programming, but results in less than ideal performance when used in a shared memory environment. Message passing incurs overhead in the form of unnecessary data copying as well specific queuing, ordering, and matching rules. In this thesis, we will present a series of techniques that optimize MPI performance in a shared memory environment, thus helping to solve the challenge of optimizing parallel performance for both distributed and shared memory. We introduce the concept of a shared memory heap, in which dynamically allocated memory is shared by default on all MPI processes within a node. We then use that to transparently optimize message passing with two new data transfer protocols. Next, we propose an MPI extension for ownership passing, which eliminates data copying overheads completely. Instead of copying data, we transfer control (ownership) of communication buffers. Finally, we explore how shared memory techniques can be applied in the context of MPI and the shared memory heap. Loop fusion is a new technique for combining the packing and unpacking code on two different MPI ranks to eliminate explicit communication. All of these techniques are implemented in a freely available software library named Hybrid MPI (HMPI). We experimentally evaluate our work using a variety of micro-benchmarks and mini-applications. In the mini-applications, communication performance is improved up to 46% by our data transfer protocols, 54% by ownership passing, and 63% by loop fusion.

Details

1010268
Classification
Title
Shared memory optimizations for distributed memory programming models
Number of pages
142
Degree date
2013
School code
0093
Source
DAI-B 75/04(E), Dissertation Abstracts International
ISBN
978-1-303-65920-1
Committee member
Bronevetsky, Greg; Chauhan, Arun; Siek, Jeremy; Swany, Martin
University/institution
Indiana University
Department
Computer Sciences
University location
United States -- Indiana
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3608075
ProQuest document ID
1494127403
Document URL
https://www.proquest.com/dissertations-theses/shared-memory-optimizations-distributed/docview/1494127403/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic