Content area
Full text
The Cell Broadband Engine(TM) processor employs multiple accelerators, called synergistic processing elements (SPEs), for high performance. Each SPE has a high-speed local store attached to the main memory through direct memory access (DMA), but a drawback of this design is that the local store is not large enough for the entire application code or data. It must be decomposed into pieces small enough to fit into local memory, and they must be replaced through the DMA without losing the performance gain of multiple SPEs. We propose a new programming model, MPI microtask, based on the standard Message Passing Interface (MPI) programming model for distributed-memory parallel machines. In our new model, programmers do not need to manage the local store as long as they partition their application into a collection of small microtasks that fit into the local store. Furthermore, the preprocessor and runtime in our microtask system optimize the execution of microtasks by exploiting explicit communications in the MPI model. We have created a prototype that includes a novel static scheduler for such optimizations. Our initial experiments have shown some encouraging results.
INTRODUCTION
The Cell Broadband Engine** (BE) processor1 is an asymmetric multicore processor that combines a general-purpose IBM PowerPC* processor element (PPE) and eight synergistic processor elements (SPEs).2 From an architectural standpoint, this processor has a high peak performance because the SPE is simpler and more efficient than general-purpose processors in terms of the micro and memory architecture.3 One architectural aspect is the small high-speed local store at each SPE. Because the size of the local store is limited to a range of L2-cache sizes-256 KB for the first-generation Cell BE processor-many real-world applications do not fit in the local store. While conventional microprocessors have a hardware cache to manage such a small local store, the Cell BE processor must rely on a software mechanism to manage it. This requirement for software management could impose significant challenges to programmers, but at the same time it offers significant opportunities for the software to take advantage of the raw performance of the Cell BE processor.
The microtask we propose here provides a programming model that frees programmers from local-store management and enables the preprocessor and runtime system to optimize the scheduling of computations and...





