Content area

Abstract

Modeling and simulation is an important part of the process of designing embedded systems. It can be used to identify the effectiveness of new hardware architectures, and analyze performance trade offs. In this dissertation, we model a new CPU architecture called the Grid of Processing Cells (GPC). The GPC architecture is cache-less and is based on a grid/array computing platform, with a primary focus on reducing main memory contention.

Among the many constraints and limitations of today’s computing devices, one of the biggest open problems is the memory bottleneck. Observed already in the first single-processor architectures, the single lane through which data and instruction streams are transferred to and from the main memory impedes the traffic flow and often results in heavy congestion, limiting the processing speed severely. This problem is only multiplied in today’s shared-memory multi- and many-core architectures and, despite sophisticated multi-level cache hierarchies, remains as a grand challenge that stands in the way of efficient parallel processing. While Moore’s law is coming to an end due to the physical limits on further increasing clock frequencies, the demand for parallel computing will only increase with the advance of big data and deep learning applications. Realizing the required computing performance with efficient processor architectures requires novel parallel platforms that do not suffer from a central memory bottleneck (and ideally do not need any costly cache hierarchies).

In order to compensate for the traffic congestion created by the shared memory bottleneck, usually a cache hierarchy is deployed with multiple levels of cache memories between the processing cores and the main memory. Often, each core has a private L1 cache, shares a L2 cache with a neighbor, and finally shares a L3 cache with all other cores on the same chip. Here the memory hierarchy is ordered by access speed. The closer the cache is to the core, the faster it must deliver data for cache hits.

The GPC architecture is similar to a distributed memory system with individual cores and local memories which can be accessed only by their neighbors. The main difference is that the on-chip memory sizes are small and the cores themselves perform less computation, as the GPC is intended to be used for embedded applications. The small local memories are referred to as on-chip memories and are expected to be as fast as the L1 cache/scratchpad in a multi-core computer made of static random-access memory (SRAM). The off-chip memories are larger but they take more time to access, similar to the access speed of dynamic random-access memory (DRAM). The GPC can be scaled to any height and width. This is due to the repeating computational units called cells. Cells contain a CPU core, local memory and other peripherals necessary for the core to execute instructions and perform I/O. In other words, a cell has everything required to use the GPC, and a single core simulation of the GPC is possible. Since we model the GPC in SystemC, the components that we refer to have been implemented as SystemC modules.

In this work, we create a system-level model of the GPC architecture using RISC-V based instruction set simulators. As we focus on minimizing memory contention, we create a contention model for the interconnect components in Loosely Timed Contention Aware SystemC. We also model a single core and shared memory processor (SMP) with multiple levels of cache hierarchies. We simulate and validate the models to ensure their effectiveness in being used as a comparison metric. We develop the MemQ API which can be used to assist in inter-cell GPC communication. Various applications are then mapped to the GPC architecture and SMP architectures, and compared in terms of performance. We note the performance variation in the architectures as we increase clock speed, the number of cores and the communication method between cores, and more. Our results show advantages of the GPC architecture in terms of execution speed and memory access times.

Details

1010268
Title
Modeling and Evaluation of a Cache-Less Grid of Processing Cells
Number of pages
125
Publication year
2025
Degree date
2025
School code
0030
Source
DAI-A 86/11(E), Dissertation Abstracts International
ISBN
9798314859018
Committee member
Al Faruque, Mohammad; Harris, Ian
University/institution
University of California, Irvine
Department
Electrical and Computer Engineering
University location
United States -- California
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31845445
ProQuest document ID
3201333849
Document URL
https://www.proquest.com/dissertations-theses/modeling-evaluation-cache-less-grid-processing/docview/3201333849/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic