Content area
Full text
Abstract - In the LANL's PaScalBB network I/O nodes carry data traffic between backend compute nodes and global scratch based file systems. An I/O node is normally equipped with one Infiniband Nic for backend traffic and one or more 10-Gigabit Ethernet Nics for parallel file system data traffic. With the growing deployment of multiple, multi-core processors in server and storage systems, overall platform efficiency and CPU and memory utilization depends increasingly on interconnect bandwidth and latency. PCI-Express (PCIe) generation 2.0 has recently become available and has doubled the transfer rates available. This additional I/O bandwidth balances the system and makes higher data rates for external interconnects such as Infiniband feasible. As a result, Infiniband Quad-Data Rate (QDR) mode has become available on the Infiniband Host Channel Adapter (HCA) with a 40 Gb/sec signaling rate. Combining HCA QDR data rates with multiple 10-Gigabit Ethernet links and using it in an IO node has created the potential to solve some of the I/O traffic bottlenecks that currently exist. We have setup a small-scale PaScalBB testbed and conduct a sequence of I/O node performance tests. The goal of this I/O node performance testing is to figure out an enhanced network configuration that we can apply to the LANL's Cielo machine and future LANL HPC machines using PaScalBB architecture.
Keywords- Server I/O networking, High Performance Networking, Infiniband, 10 Gigabit Ethernet, Link aggregation, Load balancing
1. INTRODUCTION
Commercial offthe shelf based cluster computing Systems have delivered reasonable performance to technical and commercial areas for years. High speed computing, global storage, and networking (IPC and I/O) are the three most critical elements to build a large scale HPC cluster system. Without these three elements being well balanced, we cannot fully utilize a HPC cluster. High data bandwidth I/O networking provides a data super-highway to meet the needs of constantly increasing computation power and storage capacity.
LANL's PaScalBB server I/O architecture is designed to support data-intensive scientific applications running on very large-scale clusters. The main goal of PaScalBB is to provide high performance, efficient, reliable, parallel, and scalable I/O capabilities for data-intensive scientific applications running on very large-scale clusters. Data-intensive scientific simulation-based analysis normally requires efficient transfer of a huge volume of complex data among simulation, visualization, and data manipulation...




