Content area
Artificial Intelligence (AI) encompasses a wide range of techniques and algorithms designed to enable computers to mimic human-like intelligence. Among the most prominent AI models used in image recognition and processing are Convolutional Neural Networks (CNNs). CNNs are specialized neural networks that excel at identifying patterns and features within visual data, making them widely applicable to tasks such as image classification, object detection, and facial recognition. In recent years, CNNs have garnered significant attention from both industry and academia.
However, the high computational cost associated with convolution operations remains a major challenge, limiting the widespread adoption of CNNs in mobile and embedded systems. Today, devices like smartphones are among the most commonly used computing platforms; being battery-powered, they impose resource constraints that pose substantial hurdles for implementing CNNs. To address this issue, two primary approaches have been explored: designing specialized hardware, such as GPUs, and optimizing algorithms to better align with existing hardware capabilities, especially through the development of more efficient convolution algorithms.
In our research, we initially examined linear convolution methods based on the natively real-valued Discrete Fourier Transform (RV-DFT), the Fast Fourier Transform (FFT), and the Discrete Hirschman Transform (DHT). Our analysis of computational complexity not only shows that the complexity of these convolution algorithms closely follows that of the lower-layer FFTs but also identifies the crossover points among these methods. RV-based convolution emerges as the most computationally efficient choice when the lower-layer RV-FFTs are of minimal complexity. Notably, when the actual convolution length slightly exceeds a power of two, RV-based convolution significantly reduces the number of operations, thereby improving efficiency. When the convolution length is equal to or just below a power of two, FFT-based algorithms are preferable, as they require minimal or no zero-padding, resulting in lower computational costs. Conversely, for input lengths that are not near powers of two — except in specific cases where RV-FFT offers disadvantages — the DHT-based convolution performs best, especially when employing optimized DHT implementations to further reduce the computational load.
The development of fast convolution algorithms holds great promise for reducing the computational complexity of CNNs, thereby enhancing their viability in resource-constrained environments. To this end, we developed a highly efficient convolution algorithm based on the Discrete Hirschman Transform, which we call DHTConv. This algorithm reduces both computational complexity and processing time. We validated its practical performance through hardware implementation, demonstrating its effectiveness in real-world scenarios. Our results highlight the advantages of DHTConv over traditional spatial domain methods and existing Fourier domain approaches, particularly in terms of lower computational complexity and increased speed. Notably, DHTConv avoids zeropadding the input to the next power of two—a requirement in FFT-based convolution—and eliminates the need to compute the inverse Fourier transform for each block, as is necessary in some existing methods like OaAConv (Overlap and add convolution). These features significantly improve its efficiency.
The FPGA implementation of DHTConv exhibited low latency and efficient utilization of hardware resources. Furthermore, we achieved reductions of 22.22% and 23.06% in real additions and multiplications, respectively. Additionally, we developed a software implementation, creating a Convolution Neural Network named DHTCNN, based on DHTConv and implemented using Python 3.8 in Anaconda 3. The architecture of this network consists of three convolutional layers, three max-pooling layers, one flattening layer, and two dense layers. We employed this straightforward network for tumor detection, and the results indicated that DHTCNN outperformed SpatialCNN in speed. This success has motivated me to implement DHTConv in more complex networks, such as the YOLO(You only look once) series.
