Content area

Abstract

Artificial Intelligence (AI) encompasses a wide range of techniques and algorithms designed to enable computers to mimic human-like intelligence. Among the most prominent AI models used in image recognition and processing are Convolutional Neural Networks (CNNs). CNNs are specialized neural networks that excel at identifying patterns and features within visual data, making them widely applicable to tasks such as image classification, object detection, and facial recognition. In recent years, CNNs have garnered significant attention from both industry and academia. 

However, the high computational cost associated with convolution operations remains a major challenge, limiting the widespread adoption of CNNs in mobile and embedded systems. Today, devices like smartphones are among the most commonly used computing platforms; being battery-powered, they impose resource constraints that pose substantial hurdles for implementing CNNs. To address this issue, two primary approaches have been explored: designing specialized hardware, such as GPUs, and optimizing algorithms to better align with existing hardware capabilities, especially through the development of more efficient convolution algorithms.

In our research, we initially examined linear convolution methods based on the natively real-valued Discrete Fourier Transform (RV-DFT), the Fast Fourier Transform (FFT), and the Discrete Hirschman Transform (DHT). Our analysis of computational complexity not only shows that the complexity of these convolution algorithms closely follows that of the lower-layer FFTs but also identifies the crossover points among these methods. RV-based convolution emerges as the most computationally efficient choice when the lower-layer RV-FFTs are of minimal complexity. Notably, when the actual convolution length slightly exceeds a power of two, RV-based convolution significantly reduces the number of operations, thereby improving efficiency. When the convolution length is equal to or just below a power of two, FFT-based algorithms are preferable, as they require minimal or no zero-padding, resulting in lower computational costs. Conversely, for input lengths that are not near powers of two — except in specific cases where RV-FFT offers disadvantages — the DHT-based convolution performs best, especially when employing optimized DHT implementations to further reduce the computational load. 

The development of fast convolution algorithms holds great promise for reducing the computational complexity of CNNs, thereby enhancing their viability in resource-constrained environments. To this end, we developed a highly efficient convolution algorithm based on the Discrete Hirschman  Transform, which we call DHTConv. This algorithm reduces both computational complexity and processing time. We validated its practical performance through hardware implementation, demonstrating its effectiveness in real-world scenarios. Our results highlight the advantages of DHTConv over traditional spatial domain methods and existing Fourier domain approaches, particularly in terms of lower computational complexity and increased speed. Notably, DHTConv avoids zeropadding the input to the next power of two—a requirement in FFT-based convolution—and eliminates the need to compute the inverse Fourier transform for each block, as is necessary in some existing methods like OaAConv (Overlap and add convolution). These features significantly improve its efficiency.

The FPGA implementation of DHTConv exhibited low latency and efficient utilization of hardware resources. Furthermore, we achieved reductions of 22.22% and 23.06% in real additions and multiplications, respectively. Additionally, we developed a software implementation, creating a Convolution Neural Network named DHTCNN, based on DHTConv and implemented using Python 3.8 in Anaconda 3. The architecture of this network consists of three convolutional layers, three max-pooling layers, one flattening layer, and two dense layers. We employed this straightforward network for tumor detection, and the results indicated that DHTCNN outperformed SpatialCNN in speed. This success has motivated me to implement DHTConv in more complex networks, such as the YOLO(You only look once) series. 

Details

1010268
Business indexing term
Title
Fast Convolution Algorithm for Artificial Intelligence(AI) Acceleration
Author
Number of pages
104
Publication year
2025
Degree date
2025
School code
0071
Source
DAI-B 87/3(E), Dissertation Abstracts International
ISBN
9798291599297
Committee member
DeBrunner, Linda S.; Gallivan, Kyle; Anubi, Olugbenga; Roberts, Rodney
University/institution
The Florida State University
Department
Electrical and Computer Engineering
University location
United States -- Florida
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32046647
ProQuest document ID
3245860154
Document URL
https://www.proquest.com/dissertations-theses/fast-convolution-algorithm-artificial/docview/3245860154/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic