Content area

Abstract

Energy-efficient high-performance computing has been a pivotal factor in driving the microprocessor industry. This rising demand necessitates addressing the immense computational requirements of growing AI advancements while maintaining low energy consumption. This work addresses the significant dynamic power consumption in two critical microprocessor systems: clock architecture and cache architecture. First, we propose a low-power, wideband energy-recycling clock architecture utilizing resonant flip-flops (FFs) with series LC resonance and an inductor tuning technique. This inductor tuning technique reduces clock skew and increases the robustness of the clock networks. Our design saves over 43% power and reduces skew by 90% in clock tree networks, and 44% power with 90% skew reduction in mesh networks, across a 1–5 GHz range, compared to industry-standard primary-secondary FF-based networks. To enhance edge artificial intelligence (AI) computational efficiency, we introduce two Compute-in-Memory (CiM) architectures that minimize costly data transfers between memory and CPU. The first architecture, an energy-recycling resonant 10T Compute-in-Memory SRAM (rCiM) macro, integrates Boolean logic computations within the memory, reducing core-cache data movement. Additionally, this work proposes an automation tool that generates energy and latency-optimized rCiM implementations for given logic circuits and memory constraints. When provided with a combinational circuit, the tool aims to generate an energy-efficient implementation strategy tailored to the specified input memory and latency constraints. An 8KB rCiM evaluated on the EPFL combinational benchmark suite showcased 55.42% average lower energy consumption than standard Von-Neuamnn architectures, achieving 88.2-106.6 GOPS throughput and 8.64-10.45 TOPS/W energy efficiency. The proposed combinational logic operation mapping methodology demonstrates that a three-topology macro strategy further cuts energy by 40.52% compared to single-macro designs. The second architecture is a resonant time-domain CiM (rTD-CiM) for Convolutional Neural Networks (CNNs) that avoids Analog-to-Digital converters (ADCs) by using a low-overhead time-to-digital converter (TDC) to digitize Multiply-Accumulate (MAC) operations, mitigating area, power, and non-linearity issues of traditional ADCs. In addition, a weight stationary data mapping strategy combined with an automated SRAM macro selection algorithm optimizes memory usage for quantized CNNs. Demonstrated across six CNNs and nine SRAM configurations, our algorithm achieves an 87.5% reduction in latency for ResNet-18 when mapped to a 256 KB SRAM macro and improves energy efficiency by 8× over a 32 KB SRAM. The rTD-CiM achieves 320 GOPS throughput and 38.46 TOPS/W on an 8 KB macro.

Details

1010268
Business indexing term
Title
Enabling Edge-Optimized AI Acceleration Through Energy-Recycling Clocks and Compute-in-Memory Architectures
Number of pages
180
Publication year
2025
Degree date
2025
School code
0434
Source
DAI-B 87/3(E), Dissertation Abstracts International
ISBN
9798293820412
Committee member
Robucci, Ryan; Younis, Mohamed; Vinjamuri, Ramana Kumar; Bezzam, Ignatius
University/institution
University of Maryland, Baltimore County
Department
Engineering, Computer
University location
United States -- Maryland
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32170931
ProQuest document ID
3248398676
Document URL
https://www.proquest.com/dissertations-theses/enabling-edge-optimized-ai-acceleration-through/docview/3248398676/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic