Abstract

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.

Details

Title
Optimization of the Brillouin operator on the KNL architecture
Author
Dürr, Stephan
Section
2 Algorithms and Machines
Publication year
2018
Publication date
2018
Publisher
EDP Sciences
ISSN
21016275
e-ISSN
2100014X
Source type
Conference Paper
Language of publication
English
ProQuest document ID
2050825613
Copyright
© 2018. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and conditions, you may use this content in accordance with the terms of the License.