Content area

Abstract

Nowadays, computing becomes a service on cloud computing resources. Users reserve virtual machines to execute their applications with minimum number of processing cores to save money. Optimizing user applications on the level of single core of a physical machine is highly desirable to users to reduce cost, as well as to cloud providers to reduce power consumption. In this paper, we showed how to exploit all the processing resources available in a single CPU physical core to optimize the performance of the 2D spatial filtering operation, a basic kernel in important image and multimedia applications such as image enhancement, edge detection, image segmentation, and image analysis. We proposed a novel computational procedure to restructure the conventional image filtering operation. Then, we demonstrated the merits of combining hand-optimized source-code restructuring, auto-optimized compiler techniques including vectorization, and hand-optimized threading to squeeze the performance of a single CPU core. Our intensive performance evaluations, using Sobel filters, on a variety of image sizes using the Linux Perf tool on a single core of the quad-core Intel Core i7 processor showed that our source-code restructurings with compiler auto-vectorization, using Intel AVX vector instructions, is 1.3X better than the non-restructured auto-vectorized version of the CImg library for computing the image gradient. Moreover, using OpenMP library directives we studied different image partitioning strategies to better exploit the two hardware threads inside a CPU core which boosted performance to 2.6X. Compared with the conventional CImg implementation, we obtained an average enhancement of 5.0X for image sizes ranging from 0.5 MPixel to 8 MPixel. However, comparing our best-optimized code to the conventional non-optimized serial code, without threading, resulted in a significant enhancement of 23X. The overall results showed how significant performance in important image processing applications can be obtained by applying source-code restructurings before employing any automatic compiler optimizations to exploit ILP, DLP and TLP parallelism degrees inside a single core of a multi-core CPU.

Details

Title
Optimizing image spatial filtering on single CPU core
Author
Ahmed Sherif Zekri 1 

 Department of Mathematics and Computer Science, Faculty of Science, Beirut Arab University, Beirut, Lebanon; Department of Mathematics and Computer Science, Faculty of Science, Alexandria University, Alexandria, Egypt 
Pages
251-281
Publication year
2018
Publication date
Jan 2018
Publisher
Springer Nature B.V.
ISSN
13807501
e-ISSN
15737721
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1992797265
Copyright
Multimedia Tools and Applications is a copyright of Springer, (2016). All Rights Reserved.