SNNSim: Investigation and Optimization of

Full text

Turn on search term navigation

Introduction

With the advent of deep learning,^[¹^] now computer software can achieve human-level performance in several applications, such as object classification, sensory recognition, and pattern recognition.^[^2–4^] However, conventional artificial neural networks (ANNs) implemented on central processing unit (CPU)/graphic processing unit (GPU)-based von Neumann architectures suffer from high energy consumption and latency, limiting their real-time performance. To overcome these problems, spiking neural networks (SNNs) that use event-driven spikes for data processing and mimic the behavior of biological neural networks such as human brains have been proposed. These bio-plausible networks are anticipated to offer a promising solution for implementing energy-efficient platforms compared to conventional ANNs. Adopting this perspective, TrueNorth^[⁵^] and Loihi,^[⁶^] which are silicon chips for processing SNNs with digital architecture, have been fabricated following this principle. Digital implementations of SNNs are often favored for their compatibility with existing digital systems, and their performance can be reliably estimated based on familiar digital frameworks, making them a popular choice.^[⁷^] While digital SNN systems are considerably more energy-efficient than conventional von Neumann architecture and offer predictability when compared to analog SNN systems, it is important to note that their reliance on synchronous processing based on an intrinsic digital architecture sets them apart from the human brain and can lead to different results in their speed and power consumption. To date, numerous studies have employed fully analog systems with analog synapses and integrate-and-fire (I&F) neurons to implement SNNs.^[^8–10^] However, it remains unclear how such a fully analog SNN would perform in terms of area, accuracy, latency, and power consumption when it is implemented in large-scale. Due to their ability to significantly reduce computational resources, analog SNNs present a promising option for serving as a substitutional processor in edge computing and on-device computing. Despite extensive research on exploring SNN systems at the software level,^[^11–13^] a significant gap remains in understanding the hardware implementation of analog SNNs. Some SNN algorithms are challenging to implement on hardware for various reasons, such as their reliance on bias for network operation.^[¹⁴^] In contrast, research on single synaptic devices or neuron circuits typically focuses solely on accuracy without adequately addressing the impact of incorporating these components into larger systems in terms of area, latency, and power consumption. It is necessary to thoroughly investigate into the performance estimations of these systems reflecting hardware characteristics.

In this article, we investigate the performance of large-scale analog SNNs using a hardware-level simulator for SNNs with analog synaptic devices and I&F neuron circuits. To accurately investigate the large-scale analog SNNs at the hardware-level prior to their silicon implementation, the SPICE simulator is widely used, but it suffers a long simulation time. Therefore, a novel simulator is necessary to explore the performance of analog SNNs, ensuring both high accuracy and speed, similar to that of NeuroSim,^[¹⁵^] which is a popular simulator for compute-in-memory (CIM) chips processing ANNs. The contributions of this work can be summarized as follows: 1) We design analog SNNs using nonvolatile flash memory devices and I&F neurons to be extended to the multi-layer networks, in contrast to the reported works focusing on the single-layer SNN.^[^8–10^] 2) We develop SNNSim that enables accurate and fast simulations of customized analog SNNs that reflect specifications of synaptic devices and neuron circuits. 3) We analyze and optimize the performance of the analog SNNs through the SNNSim, including area overhead, latency and verify the ultra-energy-efficiency as well as high accuracy of bio-inspired analog SNNs.

This article is organized as follows: Section 2 presents the background on SNNs, including their operational schemes and training methods. Section 3 provides an overview of the hardware architecture of analog SNNs, which consists of analog synaptic devices, current mirrors, and analog I&F neuron circuits. Section 4 provides a proposition of the performance estimation model utilized by SNNSim, while Section 5 describes the validation for the accuracy of SNNSim by comparing its estimated system performance against that of SPICE simulation. Section 6 presents the results of SNNSim simulations on the hardware performance of multi-layer perceptron (MLP), and convolutional neural networks (CNNs) with their optimization guidelines. Finally, we conclude this article in Section 7.

Backgrounds on Spiking Neural Networks

Figure 1 shows the operational scheme of a biological synapse-neuron model and an analog SNN implemented in hardware. The analog SNNs mimic the biological synapse-neuron model, in which data is processed via event-driven spikes in the networks. These spikes propagate to adjacent synapses to increase or decrease the membrane voltage (V_mem) of neurons in the subsequent layer. If the V_mem surpasses the threshold voltage, the neuron generates a spike, and the V_mem is reduced. In this work, we use I&F neurons. Thus, the leaky behavior is not considered. This process in I&F neurons can be expressed as follows:1 $V_{j}^{l} \left(\right. t_{i} \left.\right) = V_{j}^{l} \left(\right. t_{i - 1} \left.\right) &#x00026;amp;amp;amp;amp;amp;amp;amp;plus; \frac{1}{C_{\text{mem}}} \sum_{m = 1}^{\left(\text{N}\right)^{l - 1} \textrm{ }} S_{m}^{l - 1} \left(\right. t_{i} \left.\right) \star W_{m j}^{l - 1}$ 2 $i f \textrm{ } V_{j}^{l} \left(\right. t_{i} \left.\right)&#x00026;amp;amp;amp;amp;amp;amp;gt; \left(\text{V}\right)_{\text{th}} : \textrm{ } \left{\right. V_{j}^{l} \left(\right. t_{i} \left.\right) = V_{j}^{l} \left(\right. t_{i} \left.\right) - \left(\text{V}\right)_{\text{th}} \\ S_{j}^{l} \left(\right. t_{i} \left.\right) = 1$ 3 $\text{else} : \text{S}_{\text{j}}^{l} \left(\right. t_{i} \left.\right) = 0$ $V_{j}^{l} \left(\right. t_{i} \left.\right)$ represents the V_mem of the $j^{t h}$ neuron in layer l at the time step $t_{i}$ , while $\left(\text{N}\right)^{l - 1}$ denotes the number of neurons in layer $l - 1$ . $C_{\text{mem}}$ represents the capacitance of the membrane capacitor so that value of $\sum_{m = 1}^{\left(\text{N}\right)^{l - 1}} S_{m}^{l - 1} \left(\right. t_{i} \left.\right) \star W_{m j}^{l - 1}$ is converted to V_mem. $S_{m}^{l - 1} \left(\right. t_{i} \left.\right)$ represents spike pulse generation from $m_{t h}$ neuron in layer $l - 1 ,$ and $\text{W}_{m j}^{\text{l} - 1}$ represents the weight of synapses between $m_{t h}$ neuron in layer $l - 1$ and $j^{t h}$ neuron in layer l. This process is repeated until neurons in the output layers generate sufficient spikes to make decisions on the given task. In other words, we use the rate coding scheme that linearly encodes the input value as the frequency of the input spikes. The final decision is then referred by the index of the output layer neuron with the highest firing rate.

[IMAGE OMITTED. SEE PDF]

To train large-scale SNNs following the above rules, spike timing dependent plasticity (STDP),^[¹⁶^] which is a conventional learning rule of SNNs, has limitations to be extended to multi-layer. Instead, we note that analog SNNs can achieve high accuracy comparable to software-based ANNs using an ANN-to-SNN conversion method.^[¹⁷^] The spiking behavior of I&F neurons following the rules of Equation (1)–(3) is very similar to that of the rectified linear units (ReLU) activation function, which is commonly used in ANNs. In this regard, the weights first trained in ANNs with the ReLU activation function can be transferred to the weights in SNNs directly, and the SNNs can perform the inference task similar to the ANNs that exhibit state-of-the-art accuracy. In addition, the ANN-to-SNN conversion method does not require additional peripheral circuitries for training on-chip which are not utilized during inference. From this perspective, we adopt the ANN-to-SNN conversion method for analog SNN hardware.

Architecture of Analog Spiking Neural Networks Hardware

We propose a hardware architecture for analog SNNs using flash memory arrays and I&F neurons, as shown in Figure 2a. Figure 2a shows the main elements for the hardware implementation. The first element is the synapse array that performs multiply-accumulate (MAC) operations using Kirchhoff's current law shown in Figure 2b. $W_{m j}^{l - 1}$ in Equation (1) is represented by the conductance of each synaptic device. The conductance of each synaptic device can be fine-tuned by using the methodology depicted in Figure S1, Supporting Information. Since the variation of flash memory devices in the array is much smaller than the memory window, multi-level conductance in flash memory devices is reliably implemented. Voltage spikes represented by $S_{m}^{l - 1} \left(\right. t_{i} \left.\right)$ in Equation (1) generate currents proportional to the conductance of the synaptic devices. Thus, $\sum_{m = 1}^{\left(\text{N}\right)^{l - 1} \textrm{ }} S_{m}^{l - 1} \left(\right. t_{i} \left.\right) \star W_{m j}^{l - 1}$ term in Equation (1) is calculated by the current sums along the bit-lines (BLs) of the synapse array. The second element is the current mirrors that transfer the current sum to the membrane capacitor of the neuron circuits. The current sum is converted to the V_mem of $\frac{1}{C_{\text{mem}}} \sum_{m = 1}^{\left(\text{N}\right)^{l - 1}} S_{m}^{l - 1} \left(\right. t_{i} \left.\right) \star W_{m j}^{l - 1}$ . Finally, the third element is the I&F neuron circuits, which generate spikes transmitted to the next layer when the V_mem exceeds the threshold voltage, as shown in Equation (2). By combining these elements in series, multi-layer SNNs can be implemented.

[IMAGE OMITTED. SEE PDF]

Figure 2c depicts the circuit diagram of the current mirror and I&F neuron circuit and an example of changes in V_mem and output spikes. Here, we adopt a complementary metal-oxide-semiconductor (CMOS) I&F neuron circuit previously reported by our group.^[⁸^] The mechanism of the I&F neuron circuit is represented as follows. The current sum copied by the current mirror flows into the membrane capacitor of the neuron circuit, changing the gate voltage of the M₁ transistor. Until reaching the threshold voltage, NODE 1 maintains a high voltage by M₁₀. Upon reaching the threshold voltage of the neuron circuit, NODE1 is pulled down by M₁ and the inverter consisting of M₅ and M₆, switches the output of the neuron circuit to the high voltage state. M₃ and M₄ support M₁ to pull down NODE 1 to lower voltage so that M₅ pulls up the output node to the high voltage sufficiently when the neuron fires. With the output node pulled up to the high voltage, M₂ and the inverter comprising of M₇ and M₈ are triggered, causing NODE 2 to enter a low voltage state, and M₂ discharges the V_mem simultaneously. M₉ then drives NODE 1 into the high voltage state, triggering the inverter, consisting of M₅ and M₆, and causing the output of the neuron circuit to return to the low voltage state. Note that this cyclic process does not require synchronized clock or control signals, indicating that the event-driven analog SNNs can be implemented with the I&F neuron circuits.^[¹⁸^]

SNNSim

We introduce SNNSim, which estimates the area, accuracy, latency, and power consumption of analog SNNs for inference tasks, developed in Python. Unlike previous SNN simulators that estimate performance without considering hardware and only in the digital domain, SNNSim estimates performance in the analog-domain consisting of synapse arrays, current mirrors, and I&F neuron circuits. Figure 3 illustrates the performance estimation flowchart of SNNSim. In SNNSim, the estimation is performed by calculating currents of all synaptic devices and resulting updates in V_mem to determine spike generations. This process is conducted layer-wise from the input to the output layer in each timestep until a neuron of the output layer fires a predetermined amount. The remaining section will discuss how SNNSim calculates area, accuracy, latency, and power consumption.

[IMAGE OMITTED. SEE PDF]

Area

In SNNSim, we estimate the area of the analog SNN by calculating the area of each component: synapse arrays, current mirrors, and analog I&F neuron circuits. The area of the synapse array can be calculated by multiplying the number of devices required for each synapse array by the area per synaptic device cell. In this article, we adopt flash memory cells for the synapse array, as mentioned in the previous section. Flash memory cells in the AND-type synapse array occupy 6F² per cell.^[¹⁹^]

For the current mirrors and neuron circuits, the total area can be estimated by determining the layout of transistors and capacitors. The layout consists of multiple transistors and capacitors, which refers to the CMOS process design rule, which is the general rule that determines the size and spacing of the transistors. For example, we have depicted the mask layout of neuron circuits according to the CMOS process design rule, as shown in Figure S2, Supporting Information. The area of the capacitors can be calculated using the well-known capacitance equation. By dividing the dielectric constant of capacitor dielectric materials such as silicon dioxide or high-k dielectrics by the distance between the capacitor plates according to the process design, we can easily estimate the area of the capacitors for the given capacitance. By summing up the areas of all elements, SNNSim obtains the total area of the system.

Accuracy

SNNSim follows the flowchart in Figure 3 until any single neuron in the output layer generates a predetermined number of spikes, after which it returns the output based on the index of the neuron. In certain situations, there are two possibilities: either no neuron produces the required number of spikes, or multiple neurons simultaneously generate the predetermined number of spikes. When no neuron generates enough spikes, analog SNNs continuously wait for the specific number of spikes to be generated. In contrast, when multiple neurons produce the last spike together, analog SNNs cannot provide an accurate answer unless additional criteria are used to determine the correct response. To address these problems, SNNSim refers to the V_mem of the neurons in the output layer. When SNNSim outputs multiple answers, it refers to the V_mem and determines the answer according to the index of the output neuron storing the highest V_mem. This solution is reasonable because the frequency of spikes is proportional to the degree of increase of the V_mem.^[²⁰^]

Latency

The latency of the system is the time required for the system to perform the inference, i.e., the time required for an output neuron to generate the predetermined number of spikes. The latency of the analog SNNs is calculated as the average timesteps multiplied by the predefined timestep unit. Although the analog SNNs are event-driven, the minumum period of the input spike trains serves as the timestep. For instance, if the simulation takes eight timesteps for the inference, and each timestep is defined as 1 μs, the latency is 8 μs.

Power Consumption

SNNSim estimates the power consumption of analog SNNs by summing that of synapse arrays, current mirrors, and I&F neuron circuits. We can obtain power consumption by dividing energy consumption by latency. Energy consumption is caused by the current flowing into the circuits from the voltage source. By integrating the current and multiplying the result by the supply voltage, we can calculate the energy consumption of the electronic circuits.

Synapse arrays generate currents when voltage pulses are applied to their WLs. The currents are provided by the voltage sources connected to current mirrors, which generate the copied currents. The integration of the currents can be processed by multiplying the currents according to the conductance of synaptic devices by the applied voltage pulse width. In addition, by considering the scale of the copied currents generated by the current mirrors, SNNSim obtains the energy consumption of the synapse arrays and current mirrors.

I&F neuron circuits consume energy caused by their dynamic switching (dynamic energy) and leakage (static energy). Dynamic energy is mainly considered in the circuits, which operate with fast voltage switching. In circuits with fast voltage switching, the amount of currents flowing into the circuits can be calculated by multiplying the node capacitance by the voltage amplitude.

However, in I&F neuron circuits, moderate switching operations in transistors results in current flowing directly from the voltage source to the ground and thus the static energy consumption.^[^21–25^] In the circuit diagram illustrated in Figure 2c, M₁ and M₁₀ generate a large leakage current of analog SNNs when the V_mem rises. By modeling the leakage current depending on the V_mem, SNNSim can estimate the energy consumption by the leakage current.

Validation

In the subsequent section, we aim to verify the performance estimation of our developed SNNSim. Building upon our preceding research,^[⁸^] we have fabricated single-layer analog SNNs with CMOS I&F neuron circuits integrated with a charge-trapping flash array. We initially fit the electrical characteristics of the fabricated devices with the existing metal oxide semiconductor field effect transistor (MOSFET) model in the SPICE simulation. Then, we predict and compare the spike behaviors of the neurons in the SPICE simulation with the measured spike behaviors in the fabricated analog SNN. Thereafter, we upscale the network size in the SPICE simulation to compare the estimated performance with that of SNNSim. This process validates that SNNSim accurately estimates the performance of large-scale analog SNNs based on the SPICE simulation that covers the hardware-implemented analog SNNs. The details of this process are discussed in the following subsections.

Modeling of Fabricated Analog SNN AND-Type Flash Array to SNNSim

The electrical properties of the fabricated devices can be fitted by tuning the parameters of the BSIM-SOI model in the SPICE simulation, such as mobility, threshold voltage, subthreshold swing, mobility degradation, oxide thickness, and doping concentration. The fitting results of the devices are represented in Figure S3, Supporting Information. As shown in Figure S3, Supporting Information, the electrical properties of the fabricated devices, including n-channel MOSFET, p-channel MOSFET, and flash memory devices, are successfully fitted by the BSIM-SOI model in the SPICE simulation. Based on the fitting results, we design and simulate the analog SNN in the SPICE simulation with identical array architecture and neuron circuits to the fabricated SNN, which consists of a 25 × 4 AND-type flash array connected to four I&F neuron circuits. Figure 4a illustrates the spike behaviors of neurons exhibited in the fabricated analog SNN and the SPICE simulation. This figure indicates that the simulated spike behaviors, including firing timing and spike frequency, accurately match the behaviors of fabricated neurons. The comparison of the spike frequencies measured in the fabricated neurons and emulated in the SPICE simulation is represented in Figure 4b. The transient spike behaviors of the data represented in Figure 4b are shown in Figure S4, Supporting Information.

[IMAGE OMITTED. SEE PDF]

We also investigate the spike behaviors in the SNNSim, as shown in Figure 4a. The firing timings of the spikes in the SNNSim match those in not only the fabricated analog SNN but also the simulated analog SNN in the SPICE simulation. Figure 4c shows the spike rate in the SPICE simulation versus the spike rate in the SNNSim, indicating their rates almost match each other under various conditions of pulse and membrane capacitance. Although the SNNSim cannot mimic the spike shape precisely because the spikes are binarized in Python for simplicity, the SNNSim can accurately predict the spike timing in single-layer analog SNNs. Note that the latency and accuracy of analog SNNs are significantly determined by the transient spike behaviors of I&F neurons.

SNNSim Validation in Multi-Layer Perceptron

In addition to the single-layer analog SNN, we validate the estimation of the SNNSim in a multi-layer neural network by comparing it with the SPICE simulation. First, we train the network, in which the size is 324-40-10, using the ReLU activation function with ternary weights.^[²⁶^] Then, we design the network in both SPICE simulation and SNNSim with the I&F neurons and the trained weights. Subsequently, we compare the latency and power consumption estimated by the SPICE simulation with those by SNNSim. Figure 4d,e show the comparison of the latency and power consumption, respectively, estimated in the SPICE simulation and SNNSim. The latency and power consumption are estimated using ten digit images in Modified National Institute of Standards and Technology (MNIST) test data set. According to the matched estimations in Figure 4d,e, the validation of SNNSim can be extended to multi-layer perceptron (MLP), in which a large number of neurons and synaptic devices correlate with each other. Figure 4f compares classified digits from the SPICE simulation and SNNSim.

Optimization of Analog SNN Using SNNSim

Optimization of Analog SNN in MLP

In this section, we employ the validated SNNSim to optimize analog SNNs in various aspects such as area, accuracy, latency, power consumption, and conductance variation tolerance within a two-layer fully connected network structure, also known as an MLP, for MNIST classification task. The weights of this two-layer SNN are derived from an ANN through an ANN-to-SNN conversion. We optimize analog SNNs under various conditions, including membrane capacitance (C_mem), the number of hidden neurons, and technology nodes. A 0.5 μm CMOS technology node (the same node as our fabricated analog SNNs) as well as the advanced technology nodes are investigated for the analog SNN designs.

Figure 5a presents the network structure of the MLP, where the numbers of neurons in the input and output layers are fixed, while the number of hidden neurons (H) is variable. Figure 5b–e show the optimization results of analog SNNs with the MLP structure, including area, accuracy, latency, and power consumption, as a parameter of C_mem and the H. The total area of analog SNNs exponentially increases as the H increases. In addition, a slight area increase is observed with the increase in C_mem, but in 0.5 μm technology, other components, such as the synapse arrays and I&F neurons, are more significant. From the accuracy perspective, it is observed that the accuracy increases close to the baseline accuracy (ANN-MLP at an H of 512) at an H of 128. As the C_mem increases, the accuracy enhances in analog SNN, however, the effects of the C_mem on the accuracy are not critical at a H $\geq$ 128. Note that the latency of analog SNNs is significantly affected by the C_mem because the firing rate of I&F neurons is determined by the C_mem in (2), as shown in Figure 5d. If the spikes are more fired in hidden neurons, more spikes can be propagated to the deeper layer, accelerating the output neurons to fire and reducing the latency. The power consumption of analog SNNs is shown in Figure 5e. The power consumption increases with increasing H, while it is slightly affected by the C_mem (Figure S5, Supporting Information). Additionally, Figure 5e highlights the superior energy efficiency of analog SNNs in contrast to other studies focused on CIM. Despite operating on a larger array, analog SNNs demonstrate dramatically reduced power consumption, much lower than that of CIMs, which consume about a few mW to a hundred mW of power.^[^27–29^] This optimized C_mem varies according to the input voltage pulse width. When the voltage pulse width increases, the optimized membrane capacitance should also increase proportionally. This is due to the fact that an increased voltage pulse induces a greater quantity of synaptic current within a single period. If the membrane capacitance were to remain constant under these conditions, the membrane voltage change would double, potentially resulting in information loss. Therefore, to preserve the membrane voltage change as the optimized conditions, the membrane capacitance must also be doubled. In other words, if the membrane capacitance has been optimized for a given voltage pulse width, the product of the voltage pulse width and the membrane capacitance should remain a constant value.

[IMAGE OMITTED. SEE PDF]

Figure 5f depicts the accuracy degradation according to conductance variation. The conductance transferred from the weights trained in ANNs deviate from their nominal values and exhibit variation in flash memory devices. Thus, it is significant for the accuracy of analog SNNs to be tolerant against conductance variation. We simulate the tolerance of analog SNNs in SNNSim with C_mem of 0.4 pF and various Hs. The accuracy is more tolerant to conductance variation with an increase of H. In addition to the conductance variation, the various configurations such as on-off ratio of the synaptic devices and spike loss rate can be also considered for accuracy, as shown in Figure S6, Supporting Information. We also investigate the scalability of analog SNNs in Figure 5g. Using the Predictive Technology Model (PTM) library,^[³⁰^] SNNSim estimates the area and power consumption in 45 and 22 nm technology nodes. The validation of SNNSim with SPICE simulations using the PTM library is presented in Figure S7, Supporting Information. Theoretically, the transistor area is reduced by a squared factor as the technology node is scaled. However, the capacitor area is not scaled by the same factor, which leads to a large area overhead. Therefore, it is important to decrease the size of the membrane capacitors as the technology node shrinks. For instance, capacitors with a capacitance of 0.1 pF, using SiO² as dielectric, necessitate a footprint of 30 μm², which is considerably substantial. To address this problem, the requisite capacitance could be decreased from 0.1 pF to a lower value by either reducing the on-current of synaptic devices or by reducing the width of the current-copied side of the current mirrors. The substitution of SiO² with high-k dielectric materials, coupled with the three-dimensional fabrication of capacitors, also offers a viable solution to reduce the area demand by capacitors. Figure 5h represents benchmarking results for various analog SNNs. Trade-offs exist in optimizing area, accuracy, latency, and power consumption. Reducing H may be a solution to decrease the area of analog SNNs, but it degrades the accuracy and latency of analog SNNs. Also, to decrease the latency, the C_mem should be smaller. However, this reduces the accuracy of analog SNNs. This trade-off must be considered based on the defined objectives for analog SNNs. Depending on the goal of analog SNNs and design considerations, C_mem and H need to be determined depending on the trade-off between area, accuracy, latency, and power consumption.

Optimization of Analog Spiking CNN

We conduct optimization of analog SNNs using SNNSim on deeper and larger networks, which classify CIFAR-10 dataset. It is assumed that the implementation of analog SNNs. is conducted with a 22 nm technology node. The networks are trained using the ANN-to-SNN conversion method similar to the MLP cases and their performance is evaluated under various C_mem and network sizes. We conduct the simulation for two different networks depicted in Figure 6a. The first network consists of six layers in total: four convolutional layers with 3 × 3 filters and two fully connected layers. The second network is a variant of the first network in which the second and fourth convolutional layers of are replaced with convolutional layers of 1 × 1 filters. In summary, we construct two SNN networks. Through ANN-to-SNN conversion, the first adopts the VGG architecture,^[³¹^] and the second adopts MobileNet.^[³²^] We refer to the former structure as VGGSNN and the latter structure as MobileSNN. MobileSNN conserves resources by reducing the number of weight parameters and operations. Implementation the 1 × 1 filters in analog SNNs allows for an approximate reduction of one-third in the total number of required synaptic devices. Figure 6b–e show the optimization results of analog SNNs with VGGSNN and MobileSNN, including area, accuracy, latency, and power consumption, as a function of C_mem. In the 0.5 μm CMOS technology, the impact on area overhead due to capacitor size proves to be negligible, as shown in the previous subsection. The transistor size shrinks proportionally to the square of the feature size. In contrast, the capacitor size shrinks proportionally to the thickness of the gate oxide. Since scaling of the thickness of the gate oxide has limitations compared to the feature size, the area occupied by the capacitors becomes dominant. As depicted in Figure 6c, enlarging the C_mem is significant for attaining high accuracy. Thus, a careful examination of the area and accuracy trade-off is required when determining the capacitor size in 22 nm CMOS technology. A reduction of the capacitor size leads to an increased firing rate across all neurons, causing information loss across the layers, which diminishes the accuracy. Figure 6d depicts the relationship between capacitor size and latency. Similar to the trend observed in MLP, a larger capacitor size correlates with a reduced frequency of spike occurrence in neurons, thus leading to increased latency. This latency effect is further amplified in networks with more layers as the effect is accumulated through the layers. Figure 6e illustrates the power consumption depending on membrane capacitance. When C_mem is small, spikes are applied to synaptic devicecs frequently across the layers, leading to increased power consumption of the synaptic array. As the C_mem increases, spike fires sparse, and the current flow in the later layer decreases, resulting in power consumption by synapse arrays decreasing. The power consumption by I&F neurons exhibits a stable behavior in relation to C_mem. In the cases of MobileSNN, I&F neurons consume more power than that of VGGSNN. This is induced by different patterns of voltage variation in the membrane capacitor, leading to a larger leakage current flowing into I&F neurons than VGGSNN.

[IMAGE OMITTED. SEE PDF]

This section illustrates the implementation of large-scale analog SNNs, using flash memory devices and I&F neurons while preserving high accuracy and ultra-energy efficiency. The application of CNN structures to analog SNNs using flash memory arrays and I&F neurons suggests certain trade-offs. For minimizing area and latency, it appears more beneficial to employ 1 × 1 filters. However, it is better to adopt 3 × 3 filters and larger C_mem to achieve higher accuracy, sacrificing area overhead and latency.

Conclusion

In this work, we have introduced SNNSim, a tool designed to explore the performance metrics of large-scale, hardware-level analog SNNs using flash memory arrays and I&F neuron circuits. These metrics include area, accuracy, latency, and power consumption. Notably, SNNSim exhibits exceptional efficiency, as it can process the entire 10 000 MNIST test dataset in just a few seconds, whereas SPICE simulations require hours to simulate a single MNIST test data. The comparison of SNNSim with the previously developed simulators is proposed in Table 1. Our findings indicate that analog SNNs, which consist of flash memory arrays and I&F neurons, deliver highly energy-efficient performance, maintaining comparable accuracy to that of ANNs. Historically, research on SNNs has primarily focused on individual components such as synaptic memory devices, neuron circuits, and network structures. Our work bridges these components by demonstrating the competitive edge of analog SNNs in neural network operation. We also highlight the potential for deep analog SNNs for edge device computing. Moreover, our developed tool, SNNSim, can serve as a valuable resource for performance benchmarking or for verifying the performance of other analog SNNs prior to their silicon implementation. Other kinds of SNNs using different encoding schemes, such as temporal, phase, and burst coding, will be further investigated with algorithm-hardware co-optimization.^[³³^] This will provide a wider view of the performance of analog SNNs under different conditions.

Table 1 Comparison of SNNSim with the previous simulators

	SpikeSim^[⁷^]	NeuroSim^[¹²^]	RxNN^[³⁴^]	SNNSim
Neuron	Digital	Digital	Digital	Analog
Network	SNN	ANN	ANN	SNN
Language	Python	C++	Python, C++	Python

Acknowledgements

J.K. and D.K. contributed equally to this work. This work was supported in part by the National Research Foundation (NRF) funded by the Korean Ministry of Science and ICT under Grant 2022M3I7A2085479; in part by the BK21 FOUR Program of the Education and Research Program for Future ICT Pioneers, Seoul National University, in 2023; and in part by National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (MSIT) (RS-2023-00258527). [Correction added on 22 January 2024 after online publication: Author name is updated in this version.]

Conflict of Interest

The authors declare no conflict of interest.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Y. LeCun, Y. Bengio, G. Hinton, Nature 2015, 521, 436.

A. Krizhevsky, I. Sutskever, G. E. Hinton, Commun. ACM 2017, 60, 84.

A. Graves, A. Mohamed, G. Hinton, in 2013 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Vancouver, May 2013, p. 6645.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, Nature 2016, 529, 484.

F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez‐Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, D. S. Modha, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2015, 34, 1537.

M. Davies, N. Srinivasa, T. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y. Weng, A. Wild, Y. Yang, H. Wang, IEEE Micro 2018, 38, 82.

A. Moitram, A. Bhattacharjee, R. Kuang, G. Krishnan, Y. Cao, P. Panda, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2023, 42, 3815.

K. Lee, D. Kwon, S. Woo, J. Ko, W. Choi, B. Park, J. Lee, IEEE Trans. Electron Devices 2022, 69, 6065.

J. Han, M. Kang, J. Jeong, I. Cho, J. Yu, K. Yoon, I. Park, Y. Choi, Adv. Sci. 2022, 9, 2106017.

H. Kim, S. Hwang, J. Park, S. Yun, J. Lee, B. Park, IEEE Device Lett 2018, 39, 630.

B. Yin, F. Corradi, S. Bohte, Nat. Mach. Intell. 2021, 3, 905.

D. Zhao, Y. Zeng, T. Zhang, M. Shi, F. Zhao, Front. Neurosci. 2020, 14, 1.

J. Kim, H. Kim, S. Huh, J. Lee, K. Choi, Neurocomputing 2018, 311, 373.

A. Sengupta, Y. Ye, R. Wang, C. Liu, K. Roy, Front. Neurosci. 2019, 13, 1.

P. Chen, X. Peng, S. Yu, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 37, 3067.

T. Masquelier, S. Thorpe, PloS Comput. Biol. 2007, 3, 31.

B. Rueckauer, I. Lungu, Y. Hu, M. Pfeiffer, S. Liu, Front. Neurosci. 2017, 11, 1.

S. Oh, D. Kwon, G. Yeom, W. Kang, S. Lee, S. Woo, J. Kim, J. Lee, IEEE Access 2022, 10, 24444.

S. Lee, H. Kim, S. Lee, B. Park, J. Lee, IEEE Electron Device Lett. 2022, 43, 142.

S. Kheradpisheh, T. Masquelier, Int. J. Neural Syst. 2020, 30, 1.

G. Indiveri, B. Linares‐Barranco, T. Hamilton, A. Schaik, R. Etienne‐Cummings, T. Delbruck, S. Liu, P. Dudek, P. Hafliger, S. Renaud, J. Schemmel, G. Cauwenberghs, J. Arthur, K. Hynna, F. Folowosele, S. Saighi, T. Serrano‐Gotarredona, J. Wijekoon, Y. Wang, K. Boahen, Front. Neurosci. 2011, 5, 1.

D. Kadetotad, Z. Xu, A. Mohanty, P. Chen, B. Lin, J. Ye, S. Vrudhula, S. Yu, Y. Cao, J. Seo, IEEE J. Emerging Sel. Top. Power Electron. 2015, 5, 194.

F. Danneville, C. Loyez, K. Carpentier, I. Sourikopoulos, E. Mercier, A. Cappy, Solid State Electron. 2019, 153, 88.

P. Gibertini, L. Fehlings, S. Lancaster, Q. Duong, T. Mikolajick, C. Dubourdieu, S. Slesazeck, E. Covi, V. Deshpande, in Int. Conf. Electronics and Communication System, Glasgow 2022, p. 1.

C. Sun, X. Wang, H. Xu, J. Zhang, Z. Zheng, Q. Kong, Y. Kang, K. Han, L. Jiao, Z. Zhou, Y. Chen, D. Zhang, G. Liu, L. Liu, X. Gong, in IEEE Int. Electron Devices Meeting, IEEE, San Francisco 2022, p. 2.1.1.

D. Kwon, M. Park, W. Kang, J. Hwang, R. Koo, J. Bae, J. Lee, IEEE Trans. Electron Devices 2023, 70, 4206.

S. Yin, X. Sun, S. Yu, J. S. Seo, IEEE Trans. Electron Devices 2020, 67, 4185.

P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, H. Qian, Nature 2020, 577, 641.

P. Narayanan, S. Ambrogio, A. Okazaki, K. Hosokawa, H. Tsai, A. Nomura, T. Yasuda, C. Mackin, S. C. Lewis, A. Friz, M. Ishii, Y. Kohda, H. Mori, K. Spoon, R. Khaddam‐Aljameh, N. Saulnier, M. Bergendahl, J. Demarest, K. W. Brew, V. Chan, S. Choi, I. Ok, I. Ahsan, F. L. Lie, W. Haensch, V. Narayanan, G. W. Burr, IEEE Trans. Electron Devices 2021, 68, 6629.

ASU Predictive Technology Model (PTM), http://ptm.asu.edu/ (accessed: August 2022).

K. Simonyan, A. Zisserman, in Int. Conf. Learning Represent, San Diego 2015.

B. Rueckauer, I. Lungu, Y. Hu, M. Pfeiffer, in Proc. of NIPS, Barcelona 2016.

W. Guo, M. Fouda, A. Eltawil, K. Salama, Front. Neurosci. 2021, 15, 212.

S. Jain, A. Sengupta, K. Roy, A. Raghunathan, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2021, 40, 2.

Word count: 5944

Show less

© 2024. This work is published under https://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Spiking neural networks (SNNs) have emerged as a novel approach for reducing computational costs by mimicking the biologically plausible operations of neurons and synapses. In this article, large‐scale analog SNNs are investigated and optimized at the hardware‐level by using SNNSim, the novel simulator for SNNs that employ analog synaptic devices and integrate‐and‐fire (I&F) neuron circuits. SNNSim is a reconfigurable simulator that accurately and very quickly models the behavior of the user‐defined device characteristics and returns key metrics such as area, accuracy, latency, and power consumption as output. Notably, SNNSim exhibits exceptional efficiency, as it can process the entire 10 000 Modified National Institute of Standards and Technology (MNIST) test dataset in a few seconds, whereas SPICE simulations require hours to simulate a single MNIST test data. Using SNNSim, the conversion of artificial neural networks (ANNs) to SNNs is simulated and the performance of the large‐scale analog SNNs is optimized. The results enable the design of accurate, high‐speed, and low‐power operation of large‐scale SNNs. SNNSim code is now available at https://github.com/SMDLGITHUB/SNNSim.

Details

Title

SNNSim: Investigation and Optimization of Large‐Scale Analog Spiking Neural Networks Based on Flash Memory Devices

Author

Ko, Jonghyun¹; Kwon, Dongseok¹; Hwang, Joon¹; Lee, Kyu‐Ho¹; Oh, Seongbin¹; Kim, Jeonghyun¹; Im, Jiseong¹; Koo, Ryun‐Han¹; Kim, Jae‐Joon¹; Lee, Jong‐Ho¹

¹ Department of Electrical and Computer Engineering and Inter‐university Semiconductor Research Center, Seoul National University, Gwanak‐gu, Seoul, South Korea

Section

Research Articles

Publication year

2024

Publication date

Apr 1, 2024

Publisher

John Wiley & Sons, Inc.

e-ISSN

26404567

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/aisy.202300456

ProQuest document ID

3143066351

SNNSim: Investigation and Optimization of Large‐Scale Analog Spiking Neural Networks Based on Flash Memory Devices

Jump to:

Full text

Abstract

Details

Suggested sources