Content area
Full Text
Computing (2014) 96:279292
DOI 10.1007/s00607-013-0330-4
Received: 16 December 2012 / Accepted: 27 April 2013 / Published online: 9 May 2013 Springer-Verlag Wien 2013
Abstract Moving data between processes has often been discussed as one of the major bottlenecks in parallel computingthere is a large body of research, striving to improve communication latency and bandwidth on different networks, measured with ping-pong benchmarks of different message sizes. In practice, the data to be communicated generally originates from application data structures and needs to be serialized before communicating it over serial network channels. This serialization is often done by explicitly copying the data to communication buffers. The message passing interface (MPI) standard denes derived datatypes to allow zero-copy formulations of non-contiguous data access patterns. However, many applications still choose to implement manual pack/unpack loops, partly because they are more efcient than some MPI implementations. MPI implementers on the other hand do not have good benchmarks that represent important application access patterns. We demonstrate that the data serialization can consume up to 80 % of the total communication overhead for important applications. This indicates that most of the current research on optimizing serial network transfer times may be targeted at the smaller fraction of the communication overhead. To support the scientic community, we extracted the send/recv-buffer access patterns of a representative set of scientic applications to build a benchmark that includes serialization and communication of application data and thus reects all communication overheads. This can be used like traditional ping-pong benchmarks to determine the holistic communication latency and bandwidth
T. Schneider (B) T. Hoeer
ETH Zurich, Department of Computer Science, Universittstr. 6, Zurich 8092, Switzerland e-mail: [email protected]
T. Hoeere-mail: [email protected]
R. Gerstenberger
University of Illinois at Urbana-Champaign, Urbana, IL, USA e-mail: [email protected]
Application-oriented ping-pong benchmarking: how to assess the real communication overheads
Timo Schneider Robert Gerstenberger
Torsten Hoeer
123
280 T. Schneider et al.
as observed by an application. It supports serialization loops in C and Fortran as well as MPI datatypes for representative application access patterns. Our benchmark, consisting of seven micro-applications, unveils signicant performance discrepancies between the MPI datatype implementations of state of the art MPI implementations. Our micro-applications aim to provide a standard benchmark for MPI datatype implementations to guide optimizations similarly to the established benchmarks...