Content area
Using NVIDIA’s Compute Unified Device Architecture (CUDA) C/C++ as well as python libraries accelerated using a Graphics Processing Unit (GPU), GPU-accelerated computing was tested against traditional Central Processing Unit (CPU) computing to determine whether it’s feasible for GPU computing to replace traditional CPU computing. Three experiments were designed to compare the computational speed increase of the GPU versus the added overhead of the required memory transfers to and from the GPU. These experiments include purely computational tasks, machine learning model training, large-scale data processing, and cloud service creation. The goal was to use these experiments to optimize GPU kernels to fit into the GPU architecture to minimize execution time and to get a fair comparison against its CPU counterpart. In cloud services specifically, autoscaling is a major feature to handle varying workloads without wasting resources. The novel contribution for this thesis comes in the form of an intelligent autoscaling feature that schedules multiple kernels on a single GPU to maximize the resources available before autoscaling to more GPU resources.