Content area
Full text
The PowerPC® 440 floating-point unit (FPU) with complex-arithmetic extensions is an embedded application-specific integrated circuit (ASIC) core designed to be used with the IBM PowerPC 440 processor core on the Blue Gene®/L compute chip. The FPU core implements the floating-point instruction set from the PowerPC Architecture(TM) and the floating-point instruction extensions created to aid in matrix and complex-arithmetic operations. The FPU instruction extensions define double-precision operations that are primarily single-instruction multiple-data (SIMD) and require two (primary and secondary) arithmetic pipelines and floating-point register files. However, to aid complex-arithmetic routines, some FPU extensions actually perform different (yet closely related) operations while executing in the arithmetic pipelines. The FPU core implements an operand crossbar between the primary and secondary arithmetic datapaths to enable each pipeline operand access from the primary or secondary register file. The PowerPC 440 processor core provides 128-bit storage buses and simultaneous issue of an arithmetic instruction with a storage instruction, allowing the FPU core to fully utilize the parallel arithmetic pipes.
Introduction
The IBM PowerPC* 440 (PPC440) floating-point unit (FPU) with complex-arithmetic extensions (PPC440 FP2) was the design point that resulted when we started with the original PPC440 FPU [1] and applied the Blue Gene*/L requirements of doubling FPU performance and improving cycle time, all on an aggressive schedule. The original PPC440 FPU [1] implemented a double-precision floating-point fused multiply-add pipeline and an independent load and store pipeline in IBM 0.180-µm 7SF technology. It attached to the PPC440A4 central processing unit (CPU) core using an auxiliary processor unit (APU) interface [2] and used the dual-issue ability of the CPU to keep the FPU arithmetic and storage pipelines utilized. The PPC440 FPU is PowerPC Book E [3] compliant and supports IEEE Standard 754 [4]. It was designed with an ASIC methodology (without its major functions, such as the multiplier or register file, being custom-implemented) to meet a cycle time of 525 MHz (at 1.8 V) in nominal silicon.
With this FPU as a starting point, we wanted to double the FPU throughput of Blue Gene/L and aid software workloads that make heavy use of double-precision operations in complex arithmetic and matrix multiplication. In order to achieve the requirements and meet the aggressive schedule, the PPC440 FP2 had to reuse as much as possible...





