Content area

Abstract

Area-efficiency arguments motivate heterogeneity in the design of future multiprocessors. This thesis proposes a novel Heterogeneous Distributed Shared-Memory (HDSM) architecture that is organized as a processor-and-memory hierarchy. The top level of the hierarchy has few instruction-level parallel (ILP) processors with large on-chip caches for fast execution of sequential codes. Lower levels employ a larger number of simpler processors with smaller individual caches of chip-multiprocessors (CMPs) for efficient execution of parallel codes.

This thesis analyzes the proposed organization quantitatively to (1) determine its performance relative to conventional machines, (2) provide HDSM design guidelines based on next-generation ILP and CMP technologies, and (3) investigate its performance under conventional and speculative programming models.

Extensive simulation analyses consider 3-level, 4-node instances of the hierarchy. A comparison to a conventional DSM with equal silicon area shows that the hierarchical design outperforms the homogeneous counterpart for explicitly parallel (by 37% on average for 10 benchmarks from the SPLASH-2 suite) and parallelized applications (speedups ranging from 10% to 110% for 4 benchmarks from the Spec95 and NAS suites). A sensitivity analysis shows that support for hardware multithreading in top-level processors improves the performance of parallel workloads (by 15% on average). Another analysis uses a factorial design experiment to determine the relative impact of heterogeneity on performance, and concludes that the organization has low sensitivity (15%) to the speed of memories in the bottom level.

The performance analyses consider the execution of unmodified applications programmed in a model typically supported by conventional shared-memory multiprocessors: single-program, multiple-data (SPMD). This thesis proposes three static assignment policies of SPMD tasks to heterogeneous processors and analyzes their performance for applications that exhibit (1) explicit parallelism in the form of homogeneous threads, and (2) implicit loop-level parallelism automatically detected by a compiler.

In addition, this thesis proposes a novel hardware-based data-dependence speculation technique for DSMs that allows a compiler to relax the constraint of data-independence to issue SPMD tasks in parallel. It is the first coarse-grain DSM data-dependence speculation technique to extend conventional directory-based coherence protocols to support application-transparent speculative versions of shared-memory blocks. A simulation-based evaluation shows that the proposed mechanism allows for automatic extraction of thread-level parallelism from sequential programs with irregular data structures and inherent coarse-grain parallelism (windows of millions of instructions, and working sets of hundreds of KBytes) that cannot be detected statically. This analysis shows that speculatively parallelized programs from the Olden suite execute with higher performance (parallel speedups of up to 6.8) in the thread-parallel levels of the HDSM hierarchy than in aggressive instruction-parallel uniprocessors.

Details

1010268
Title
Speculative distributed shared -memory multiprocessors organized as processor -and -memory hierarchies
Number of pages
169
Degree date
2001
School code
0183
Source
DAI-B 63/02, Dissertation Abstracts International
ISBN
978-0-493-57591-9
University/institution
Purdue University
University location
United States -- Indiana
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3043724
ProQuest document ID
304724569
Document URL
https://www.proquest.com/dissertations-theses/speculative-distributed-shared-memory/docview/304724569/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic