Full text

Turn on search term navigation

Copyright INFOREC Association 2010

Abstract

This paper analyzes several aspects regarding the improvement of software performance for applications written in the Compute Unified Device Architecture (CUDA). We address an issue of great importance when programming a CUDA application: the Graphics Processing Unit's (GPU's) memory management through transpose kernels. We also benchmark and evaluate the performance for progressively optimizing a transposing matrix application in CUDA. One particular interest was to research how well the optimization techniques, applied to software application written in CUDA, scale to the latest generation of general-purpose graphic processors units (GPGPU), like the Fermi architecture implemented in the GTX480 and the previous architecture implemented in GTX280. Lately, there has been a lot of interest in the literature for this type of optimization analysis, but none of the works so far (to our best knowledge) tried to validate if the optimizations can apply to a GPU from the latest Fermi architecture and how well does the Fermi architecture scale to these software performance improving techniques. [PUBLICATION ABSTRACT]

Details

Title
Improving Software Performance in the Compute Unified Device Architecture
Author
Pirjan, Alexandru
Pages
30-47
Publication year
2010
Publication date
2010
Publisher
INFOREC Association
ISSN
1453-1305
e-ISSN
1842-8088
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
854847087
Copyright
Copyright INFOREC Association 2010