Content area
Full text
Introduction
Finite element method, finite difference method and finite volume method are the main methods of computational fluid dynamics (CFD). In the past three decades, Jiang1 and Bochev and Gunzburger2 have developed the least squares finite element method (LSFEM) by combining the least squares method and the finite element method. LSFEM has been applied initially in the field of incompressible flow by Ding and Tsang3 and Tang and Sun,4 thermodynamics by Zhao et al.5 and Luo et al.6 and fluid–structure interaction by Kayser-Herold and Matthies.7
LSFEM has the advantages of good convergence, good universality, good robustness and high accuracy. However, it requires a lot of computational expense, which can lead to long computational time, thus restricting the application in turbulent flow problems. Turbulent flow problems have large computational amount and complicated flow structure. In order to shorten the computational time and solve more complex turbulence problems, Ding et al.8 developed a large-scale parallel computing of LSFEM by message passing interface (MPI) on the central processing unit (CPU) platform, which obtained acceleration ratio of 7.7.
The graphic processing unit (GPU) can be thought as a massively parallel computer with several hundred cores, in which several hundred to thousands of threads that execute instructions in parallel. Due to the powerful processing power and high bandwidth, GPU has a significant advantage in computational cost, which outperforms the ability of CPU. Vanka9 reviews the literature of linear solvers and CFD algorithms based on GPUs. He pointed out that several researchers have developed/ported CFD software to GPUs and founded significant speedups (10–50 times, depending on algorithm, approach and implementation) over a single-core CPU.10–12 Although computing unified device architecture (CUDA) reduces the difficulty of GPU general-purpose computing, porting existing CPU codes to run on the GPU requires the user to write kernels that execute on multiple cores, which hinders the use of researchers. In order to achieve semi-automatic or fully automatic from CPU to GPU, Corrigan and Lohner13 and Chandar et al.14 developed a semi-automatic technique and CU++, respectively. The semi-automatic technique simultaneously achieves the fine-grained parallelism required to fully exploit the capabilities of multi-core GPUs, completely avoids the crippling bottleneck of GPU–CPU data transfer and uses a...





