Optimization of the Brillouin operator on the KNL

Abstract

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with N_c = 3 colors, N_v = 12 right-hand-sides, N_thr = 256 threads, on lattices of size 32³ × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.

Details

Title

Optimization of the Brillouin operator on the KNL architecture

Author

Dürr, Stephan

Section

2 Algorithms and Machines

Publication year

2018

Publication date

2018

Publisher

EDP Sciences

ISSN

21016275

e-ISSN

2100014X

Source type

Conference Paper

Language of publication

English

DOI

https://doi.org/10.1051/epjconf/201817502001

ProQuest document ID

2050825613

© 2018. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and conditions, you may use this content in accordance with the terms of the License.

Optimization of the Brillouin operator on the KNL architecture

Jump to:

Abstract

Details

Suggested sources