Content area
Full text
Abstract: Graphic Processing Units (GPUs) are unanimously considered as powerful computational resources. General-purpose computing on GPU (GPGPU), as well, is the de facto infrastructure for most of the today computationally intensive problems that researchers all over the globe dill with. High Performance Computing (HPC) facilities use state of the art GPUs. Many domains like deep learning, machine learning, and computational finance uses GPU's for decreasing the execution time. GPUs are widely used in data centers for high performance computing where virtualization techniques are intended for optimizing the resource utilization (e.g. GPU cloud computing). The GPU programming model requires for all the data to be stored in a global memory before it is used. This limits the dimension of the problem a GPU can handle. A system utilizing a cluster of GPU would have a bigger level of parallelization but also would eliminate the memory limitation imposed by a single GPU. These being just a few of the problems a programmer needs to handle. However, the ratio between specialists that are able to efficiently program such processors and the rest of programmers is very small. One important reason for this situation is the steepness of the GPU programming learning curve due to the complex parallel architecture of the processor. Therefore, the tool presented in this article aims to provide visual support for a better understanding of the execution on GPU. With it, the programmers can easily observe the trace of the parallel execution on their own algorithm and, from that, they could determine the unused GPU capacity that could be better exploited.
Keywords: GPU architecture; GPU programming; GPGPU.
I.INTRODUCTION
Much of the architectural complexity of a computational system is kept hidden from programmers (the term transparent is often used). Integrated developments environments and tools (IDE) with complex compilers and simulators tries to speedup the process of developing application. Time to market is the main driver for such approach. On the other hand, the same principle applies also to those IDEs, therefore we could assume that they are not so optimized. Others can argue that these optimizations are not required trading the benefits they provide in opposition to the cost of development. One exception here could be the case of battery powered devices that requires an...




