1. Introduction
Deep neural networks (DNNs) have shown outstanding performance, surpassing human-level accuracy in many applications such as image processing, voice recognition, and language translation. However, deploying DNNs in resource-constrained edge devices remains challenging for several reasons. DNNs typically require a large number of parameters, leading to substantial memory demands, which are difficult to accommodate in embedded systems. Moreover, the computational demands of DNNs result in high energy dissipation, presenting a major obstacle for edge devices.
To address these issues, computation in memory (CIM) has emerged as a promising paradigm, reducing energy dissipation by storing model parameters in memory and performing computations directly within the memory, thus minimizing energy-intensive data transfers. Prior works have demonstrated notable progress in this area. For example, NeuRRAM, presented in [1], demonstrates a versatile RRAM-based CIM chip that supports multiple model architectures and computational bit precisions. Similarly, the work in [2] focuses on high-precision floating point (FP16 and BF16) computations, proposing an ReRAM-based CIM macro that delivers high throughput and energy efficiency for artificial intelligence (AI) edge devices. Other studies, such as [3,4], investigate multi-bit precision techniques in analog CIM systems by utilizing phase-change memory (PCM). These works leverage the analog computing paradigm to perform multiplications and accumulations within memory. For instance, Ref. [3] introduces a signed multiply-and-accumulation (MAC) feature in embedded PCM arrays, while Ref. [4] develops a multi-bit precision core based on backend-integrated multi-level PCM.
To further enhance energy efficiency and reduce memory demands, binary neural networks (BNNs) have emerged as a promising solution. BNNs binarize weights and activations, significantly reducing model size and enabling efficient deployment on smaller embedded memories. This approach drastically lowers energy consumption without significantly compromising accuracy [5,6,7]. The energy efficiency of BNNs can be further enhanced by directly executing computations in embedded memories such as SRAM, e-FLASH, and STT-MRAM, through CIM techniques [8,9,10,11,12,13,14,15].
However, in reality, we should consider the potential problem that process variation significantly degrades the accuracy of BNNs operating on analog CIM platforms. Process variation occurs due to manufacturing imperfections in semiconductor fabrication, leading to deviations in device parameters such as threshold voltage, channel length, and oxide thickness [16]. These variations affect the behavior of transistors and other components, introducing inaccuracies in analog computations. The low-resolution weights of BNNs make them particularly sensitive to these variations, increasing the likelihood of computation errors. This strongly motivates the development of techniques to alleviate the impact of process variations and maintain the accuracy of BNNs in such platforms.
This work introduces a variation-aware BNN framework designed to ensure accurate analog CIM operations despite process variations in scaled technologies, using the 6T-SRAM bit-cell [9] from Table 1 as an example. Recently, many emerging non-volatile memory (eNVM)-based CIMs have garnered interest due to their high density and low standby power [11,12,17,18,19,20]. However, eNVM-based CIMs face challenges in manufacturing actual hardware, whereas SRAM offers advantages from a design perspective, and thus, plays a dominant role in CIM design. In light of this, we develop the variation-aware BNN framework on an SRAM-based CIM. Nonetheless, the developed framework can be readily extended to CIMs utilizing other memory types and SRAM bit-cell configurations.
Prior work, such as [21], has addressed variation-aware training for memristor-based CIM on crossbars, but this approach does not extend to SRAM-based BNNs. Unlike [21], which focuses on memristor crossbars, we develop more realistic models for weights and activation variations through Monte Carlo simulations. Additionally, we optimize the biasing potentials of word lines and bit lines in SRAM-based CIM circuits, resulting in significant accuracy improvements for more complex BNN models, such as RESNET-18 [22] and VGG-9 [23], when evaluated on the CIFAR-10 dataset [24]. Table 2 presents a summary of our work and a comparison with previous studies on SRAM-based BNN CIM systems. Unlike prior approaches such as [14,15], which introduce additional hardware to address process and temperature variations, our method relies on software-based techniques, avoiding any hardware overhead. Although the variation-aware training process requires multiple training iterations, the overhead is minimal as the trained weights can be reused once training is completed.
The contribution of this paper can be summarized as follows.
-
We develop mathematical models to quantify the impact of process variations on SRAM-based BNN CIM circuits. In these circuits, the current of an SRAM cell represents the multiplication result of the stored weight and the input activation. However, parametric process variations cause fluctuations in the SRAM cell current, directly affecting the accuracy of analog computations, and consequently, the BNN inference accuracy. Our model interprets these fluctuations as variations in the weights of the BNN. To model these weight variations, we utilized the distribution of SRAM cell currents obtained through Monte Carlo (MC) simulations in 28 nm FD-SOI technology. Consequently, our method is applicable to SRAM-CIM circuits employing current-based analog computation.
-
Based on the derived model, we present the variation-aware BNN training framework to produce variation-resilient training results. During the training, BNNs are considered bi-polar neural networks due to the weight variations aforementioned. We demonstrate the efficacy of the developed framework through extensive simulations.
-
We optimize the biasing voltages of word lines (WLs) and bit lines (BLs) in SRAM to achieve a balance between maintaining acceptable accuracy and minimizing power consumption.
The remaining part of this paper is organized as follows. In Section 2, we explain the background regarding BNN, the architecture of SRAM-based CIM, how DNNs can be mapped onto SRAM-based CIM arrays, and in-memory batch normalization. In Section 3, we present the variation-aware framework and optimization methodology for biasing voltages of WLs and BLs of SRAM. Section 4 validates the efficacy of our framework. Lastly, we conclude the paper in Section 5.
2. Preliminaries
2.1. Binary Neural Network
In a BNN, all weights and activations are binarized, significantly enhancing the DNN inference energy efficiency. Many researchers have shown that despite such a low-precision format, BNNs deliver good inference accuracy [5,6,7]. The first BNN introduced [5] used the sign function for the binarization of both weight and activation, where all weights and activations become ‘+1’ or ‘−1’. However, some state-of-the-art (SOTA) works have improved the accuracy of BNNs by using the activations of ‘0’ or ‘1’ [7] while still employing the sign function for the binarization of weights. Considering such a trend, we use the following activation function:
(1)
where is the activation threshold. In our experiments, where is assumed to be ‘0.5’, the results are shown in Table 3. These results follow a similar trend to the SOTA works [7]. Since we utilize the activation function in (1), the activations take values of ‘0’ or ‘1’.2.2. The Architecture of SRAM-Based CIM
Figure 1 shows the most widely used 6T-SRAM-based CIM architecture and cell configuration for the BNN computation, which refers to the design of Rui Liu et al. [9]. We consider such an architecture for our proposed BNN framework, discussed in Section 3.
In Figure 1, the weights of the BNNs are stored to 6T-SRAM cells, and the bitwise multiplications between weights and input activations of the network are directly computed in an analog fashion inside the SRAM array. Let us assume that the weight of ‘+1’ represents Q = 1, QB = 0, and the weight of ‘−1’ represents the inverted cell. When we operate a BNN on the given configuration, the input activations of a certain BNN layer become the digital values of WLs, since the activation function of (1) is considered in this work, as mentioned in Section 2.1. When the inference is executed, all WLs are biased upon the input activations, and then, the product of the weight and the activation becomes the difference between the and cell currents, ‘ - ’ in Figure 1. All cell currents are accumulated to the currents of BL and BLB, implying that ‘-’ becomes the multiply-and-accumulation (MAC) output. ‘-’ is sensed by the differential current sense amplifier (CSA), producing the binary activation output based on (1). Please note that when of (1) is not zero, a certain circuitry is necessary to implement this. Further, we need to implement batch normalization properly. These are embedded into the sense amplifier shown in Figure 2, discussed in Section 2.4.
2.3. Mapping DNNs onto SRAM-Based CIM Arrays
2.3.1. Input Splitting
In the CIM architecture of Figure 1, we store weights to SRAM and control the potentials of WL upon the input activations. Then, the SRAM directly computes the matrix multiplications of the convolutional and fully connected (FC) layers by using analog computing techniques. In such a scheme, the maximum matrix size that the SRAM can calculate at once is dependent on the SRAM array size, which relies on the physical design constraints. Unfortunately, the computed matrix size often exceeds the SRAM array size. Figure 3 shows such a situation well. Here, some convolution layers have 4-dimensional weights. We can regard a convolution layer with 4-dimensional weights as a 2-dimensional matrix with the size of (kernel size × kernel size × input channel size)×(output channel size). For instance, in Figure 3, the 4-dimensional convolutional layer whose kernel, input channel, and output channel sizes are 3, 128, and 256, respectively, is considered as a 1152 matrix. To compute the matrix on the circuit of Figure 1, we need 1152 memory rows. Since it is challenging to implement an SRAM array with 1152 rows, we need to properly split the matrix by considering the SRAM array size. Under such a circumstance, an SRAM CIM circuit can deal with one split part of the matrix and produces the corresponding partial sum. All partial sums delivered by the SRAM CIM circuits should be accumulated to complete the matrix computation. SOTA works showed that the precision of the partial sums significantly affects the accuracy of the computed BNNs [9,17,20,26]. To obtain multi-bit partial sums in the SRAM CIM circuits, we need analog-to-digital converters (ADCs) to produce multi-bit outputs, incurring large area and energy overheads.
To address this problem, the authors of [26] developed an input splitting technique, which we employ in this work. A large convolutional or FC layer is reconstructed into several smaller groups, as shown in Figure 3, whose input number should be smaller or equal to the number of rows in an SRAM array. Hence, the SRAM array of Figure 1 computes the weighted sums of each group, and the CSAs produce their own 1-bit outputs. Then, the outputs of all groups are merged to fit the input size of the following BNN layer, which is performed by digital machines to obtain accurate merging without the effect of process variations.
The accuracy of BNNs employing the input splitting technique is compared with the baseline BNN accuracy in Table 3. The results demonstrate that accuracy improves with increasing SRAM array size across all BNN models and both binary activation schemes. This improvement is attributed to the reduction in the number of groups required for splitting as the array size increases, as shown in Table 4, Table 5, and Table 6 for the CONVNET, RESNET-18, and VGG-9 BNN models, respectively. This trend aligns with observations from previous work [26]. It is noteworthy that the first layer, which processes the input image, and the last layer, which computes class scores, are excluded from the input splitting and binary quantization, and are instead managed by digital hardware, in line with SOTA BNN implementations [5,6,7,26]. The split BNN accuracies presented in Table 3 serve as baselines for evaluating the techniques introduced in this work.
2.3.2. Mapping
We provide more detailed discussions regarding the mapping between convolutional layers of BNNs and SRAM-based CIM arrays, shown in Figure 3. As aforementioned, convolutional layers are split to ensure that their size is equal to or less than the number of rows in the SRAM array. As shown in Table 6, where the SRAM array size is , the number of split groups of the layer is six , and the number of input channels per group is 21 in VGG-9 BNN models. Hence, the input size of each group is 189 , which is smaller than the number of rows in the SRAM array. Consequently, each group can be regarded as a 2-dimensional matrix with a size of . Under this circumstance, we can have the mapping strategy that all weights corresponding to each output channel are stored on one column of the SRAM array. The outputs of each group, which is binary (‘0’ or ‘1’), are obtained from the macros. We can manage FC layers with the above mapping strategy as well.
2.4. In-Memory Batch Normalization
Batch normalization (BN) is a technique used to stabilize the learning process, significantly reducing the number of training epochs. In BNNs, BN plays a crucial role in enhancing accuracy, making it an essential component [27]. BN can be described by the following equation.
(2)
where X and Y denote the input and output activations of BN, respectively. The parameters and represent the mean and variance of the input activations computed across a mini-batch, while and are learnable parameters corresponding to the scaling and shifting operations. The term is a small constant introduced to ensure numerical stability during the normalization process. During the backward propagation of the training, these four parameters are updated and used to normalize the output of the current batch. In the inference, these parameters become constant, and hence, BN can be regarded as a linear transformation function. As shown in Figure 4, in the inference, the output of the BN layer becomes the input of the activation function (1). In this work, we merge (1) and (2), whose function is named . The merged function can be expressed by(3)
where X is the output of the weighted-sum layer, and(4)
Most previous works assume that BN is computed in software [8,9,19,20], which requires analog-to-digital converters (ADCs) to convert the accumulated BL currents into high-precision (32-bit floating point) digital values. These digital values are then processed by digital processors, a method that incurs a significant energy overhead, especially for edge devices with strict power constraints. To mitigate this overhead, Ref. [28] proposed implementing BN directly in the hardware using additional cells.
In our approach, we address the problem by embedding the BN functionality into the differential CSA, as shown in Figure 2. Specifically, we merge the activation function and BN computation by introducing variable current biasing within the CSA. The required current biasing values, and , are derived from the conversion rule provided in Table 7 and are necessary to handle both positive and negative thresholds (). This approach eliminates the need for energy-intensive high-resolution ADCs.
During the inference phase of a BNN, the BN layer has unique learned parameters for each output channel, corresponding to each column in the SRAM-based CIM array. For each channel, the threshold value is calculated based on eqmergedxth and can be quantized from a 32-bit floating point representation to a fixed-point format, which has a range of . In our experiments, can be quantized up to a 5-bit integer for CONVNET and VGG-9, and a 6-bit integer for RESNET-18, with an SRAM array size of , resulting in an accuracy loss of less than 1%. The current biasing values are then regulated by a current-steering digital-to-analog converter (DAC), such as [29], to precisely adjust the bias currents, thereby enabling BN directly in the analog domain. It is anticipated that the high-resolution ADC will typically consume more power due to its increased complexity and 32-bit floating point requirements.
While the primary focus of this work is on presenting a variation-aware framework, a detailed examination of the CSA operation and the control of variable current biasing is beyond the scope of this study. A more comprehensive discussion on these aspects, including their implications for system performance and energy efficiency, will be addressed in future work.
3. A Variation-Aware Binary Neural Network Framework
3.1. Variation-Aware Models for SRAM-Based BNN CIM
In this section, we present a variation-aware BNN framework to enhance the reliability of CIM under process variations. The framework assumes SRAM-based CIM, the configuration of which is discussed in Section 2.2. To develop such a BNN framework, firstly, variation-aware models are investigated and derived as follows.
In the given configuration (Figure 1), as discussed, the MAC output is defined by ‘-’, which is described as
(5)
where is the weight stored in the SRAM array (i.e., is ‘+1’ or ‘−1’), is the word line () status (ON or OFF), which corresponds to activation values (‘1’ or ‘0’), and is the current margin, that is, the absolute value of difference between the BL and BLB currents for one cell (i.e., one bitwise multiply operation), where no process variations are assumed. It is important to acknowledge that, in practice, both ‘’ and ‘’ are subject to process variations, which can be interpreted as variations in the BNN weights, denoted as in Equation (5). Let us model the weight variation as . Consequently, the product of the weight and the activation, ‘’, can be redefined as(6)
In this work, we analyze the impact of process variations on both and cell currents through 10,000 MC simulations in 28 nm FD-SOI technology. For a 6T-SRAM bit-cell configuration, as shown in Figure 1, we considered a stored weight scenario as ‘+1’ (i.e., Q = 1 and QB = 0), as depicted in Figure 5. When the WL is activated (i.e., input neuron is 1), the and cell currents, affected by process variations, exhibit log-normal or normal distributions, characterized as , and , respectively, as illustrated in Figure 5. Then, we can sequentially derive the following equations.
(7)
From the distribution of and cell currents obtained through the MC simulations and Equation (7), we derive the resulting weight distribution under process variations, as shown in Figure 6. This analysis demonstrates that, due to process variations in the given SRAM-based CIM configuration, binary weights (−1/+1) in the BNN are transformed into analog weights (/), with their distributions following log-normal or normal patterns.
3.2. Variation-Aware Framework for Bi-Polar Neural Networks
The discussion of Section 3.1 shows that with the effect of process variations, each weight stored in memory array experiences process variations, with the weight variation as . Then, the weight stored in each SRAM cell is not an exact digital value of +1 or −1 but can be redefined as
(8)
where is a binarized weight, and and are random stochastic parameters to express the effect of process variations, whose distributions are obtained from (7). Our training framework is described as Algorithm 1, where the function of (8) is exploited. In the variation-aware training, we train BNNs based on Algorithm 1 from scratch.Furthermore, when exceeds a certain threshold, some SRAM cells experience flipping (both and are flipped) due to process variations. To account for this, Algorithm 1 incorporates a flipping function that flips the binarized weights (output of the Sign() function) with a specified probability, determined by the number of instances where both and are flipped.
Additionally, due to process variations in the CSA, the activation threshold of (1) varies. To address this, Algorithm 1 employs a stochastic activation function, as given by (9), instead of the deterministic activation function described in (1).
(9)
with(10)
where the standard deviation of is properly assumed. When the training step is completed, only binarized weights are left for the inference and the SRAM-based CIM. However, during inference, the quantized weights must be flipped and polarized again to assess the impact of process variations.In Algorithm 1, C is the cost function for minibatch, the learning rate decay factor, and L the number of layers. ∘ indicates element-wise multiplication. The function Sign() specifies how to binarize the weights. The StoFlip() function flips the binarized weights with a specified probability p, which is determined based on the number of cases where both and are flipped (as described in step 4 of Figure 7). The Polarize() function (8) is used to polarize the binarized one. The activations are clipped to [0, 1] by the Clip() function. The function StoQuantize() (9) specifies how to binarize the variation-aware activations. BatchNorm() and BackBatchNorm() defines how to batch-normalize and back-propagate the activations, respectively. Update() specifies how to update the parameters when their gradients are known. Straight-through estimator (STE) is used for estimating gradients for (1), as in [5]. The Split() and Merge() functions are for the input splitting and merging step, as discussed in Section 2.3.2. is the size of the SRAM array, which is set to 128/256/512.
Algorithm 1 Training a reconstructed L-layer BNN with variation-aware weights and activations. |
Require: a minibatch of inputs and target , previous weights W, previous BatchNorm parameters (, ), , weights initialization coefficients from [30] , and previous learning rate . |
3.3. Optimization of Biasing Voltages
In this section, we present the optimization methodology for biasing voltages of WLs and BLs of SRAM, respectively, expressed as and , which is shown in Figure 7. This methodology provides steps to find the optimal biasing voltages that achieve the best balance between accuracy and power consumption. It is important to note that the biasing voltages of and are critical factors influencing power consumption in SRAM-based CIM. However, power consumption is not directly addressed within our variation-aware BNN framework as outlined in Algorithm 1. Instead, the optimization methodology seeks to balance power consumption and accuracy by tuning these biasing voltages, as described in this section. The process begins by setting an initial configuration of and and running MC circuit simulations of the SRAM cell, as depicted in steps 1 and 2 of Figure 7. During these simulations, if the number of flips of or matches the total number of MC simulations, the and configuration is discarded to ensure reliable operation of SRAM-based CIM and accuracy for BNNs. Conversely, if this condition is not met, the mean and variance of the and distributions, along with the number of instances where both and are flipped (if any), are fed to the variation-aware BNN framework (Section 3.2), as outlined in Steps 3, 4, and 5 of Figure 7. Following the variation-aware training (step 6 in Figure 7) and variation-aware inferences (step 7 in Figure 7), we collected the average accuracy and compared it across different and configurations. This comparison aimed to identify the optimal configuration where accuracy remains acceptable while minimizing power consumption.
Upon determining the optimal biasing voltages for each BNN model, we obtain the corresponding trained weights, which are represented as either ‘−1’ or ‘+1’. These trained weights can be reused after the completion of the training process. However, for each new set of weights, the process of tuning the WL and BL voltages must be repeated to ensure optimal operation and accuracy. Therefore, for each BNN model with the corresponding optimal biasing voltages, we need to store the corresponding trained weights in the SRAM cells, with the stored weight scenarios as ‘+1’ (Q = 1 and QB = 0) and ‘−1’ (Q = 0 and QB = 1), as shown in Figure 5.
3.4. Modeling of IR Drop
The resistance of the power lines in an SRAM array causes IR drop, causing the drop of supply voltages. Such an effect is not considered in our experiments. However, we can easily model the drop effect by applying lower supply voltages in our MC simulations.
4. Validation of Our Framework
4.1. Experimental Setting
We assess the effectiveness of our proposed framework across various SRAM array sizes and different biasing voltages for WLs and BLs in a 6T-SRAM bit-cell configuration [9]. Notably, this framework can be extended to CIMs employing alternative memory technologies and SRAM bit-cell configurations. Using 10,000 MC simulations in 28 nm FD-SOI technology, we analyze the mean and variance of the and distributions, as well as the occurrence of simultaneous flips in and , as detailed in Table 8, for the stored weight scenario as ‘+1’ (i.e., Q = 1 and QB = 0), illustrated in Figure 5. These simulations were performed using HSPICE. The results, including the mean and variance of the cell current distributions and flip instances, are integrated into our variation-aware BNN framework. We further estimate the average inference accuracy of RESNET-18 and VGG-9 BNN models on the CIFAR-10 dataset, and the CONVNET BNN model on the MNIST dataset, both before and after applying the variation-aware training. The framework is implemented using the TensorFlow deep learning library.
During training, the loss is minimized using the Adam optimization algorithm from [31]. The initial learning rate is set to 0.01 for the VGG-9 and RESNET-18 BNN models, and 0.001 for the CONVNET BNN model. The maximum number of training epochs is set to 100 for CONVNET, 150 for VGG-9, and 300 for RESNET-18. The learning rate is decayed by a factor of 0.31 when the scalar statistics of validation accuracy show insufficient improvement. To account for process variation, variation-aware inference is performed 100 times on 10,000 validation images, and the resulting accuracies are averaged. In Equation (10), the standard deviation is assumed to be 10% of the threshold () value defined in Equations (1) and (9) for both training and inference.
For the baseline, we do not consider the effect of process variation while the input splitting technique, mentioned in Section 2.3.2, is used. For the RESNET-18 BNN model, we assume that short-cuts have the full-precision data format. Many SOTA works [7] have employed such an approach since the accuracy of RESNETs is sensitive to the quantization errors of short-cuts, which is followed in this work.
4.2. Results and Discussion
The average inference accuracies before implementing variation-aware training for CONVNET, RESNET-18, and VGG-9 BNN models, as shown in Figure 8, Figure 9, and Figure 10, indicate that in SRAM-based analog CIMs, the inference accuracies for the MNIST and CIFAR-10 datasets are significantly degraded by process variations. Specifically, with an SRAM array size of , for a word-line voltage of 0.9 V and a bit-line voltage of 0.4 V, the accuracies drop below 61%, 20%, and 50% for CONVNET, RESNET-18, and VGG-9, respectively.
Our proposed variation-aware training framework effectively mitigates this degradation. Figure 11 illustrates the inference accuracies of CONVNET after applying the variation-aware training, where the effect of process variations is also considered during inference. The results demonstrate that our framework significantly improves accuracy under process variations. For instance, with an SRAM array size of , a word-line voltage of 0.9 V, and a bit-line voltage of 0.4 V, the accuracy is 60.24% under process variations of 28 nm FD-SOI (Figure 8). Our variation-aware training framework significantly improves the accuracy for this array size and voltage configuration to 92.33%. Among the various bit-line and word-line voltage configurations, an SRAM array size of with = 0.4 V and = 0.1 V emerges as the optimal configuration for CONVNET, maintaining an acceptable accuracy of 98.08% (Figure 11) compared to the baseline of 98.92%, while also minimizing power consumption.
Similar results are observed for the RESNET-18 and VGG-9 BNN models. Our variation-aware training framework supports significant accuracy improvement under process variations. For instance, with an SRAM array size of , as shown in Figure 10, the accuracies for VGG-9 are 76.82% and 45.23% for two biasing cases of = 0.4/ = 0.1 and = 0.9/ = 0.4, respectively. In Figure 12, the accuracies for these two cases improve to 85.22% and 78.22%, respectively, validating the efficacy of our variation-aware training framework. Among the various word-line and bit-line voltage configurations, SRAM array sizes of with = 0.7 V/ = 0.4 V for RESNET-18 and = 0.6 V/ = 0.4 V for VGG-9 have been identified as the optimal setups. These configurations achieve accuracies of 77.07% for RESNET-18 (Figure 13) and 86.47% for VGG-9 (Figure 12), which are close to the baseline accuracies of 78.87% and 87.24%, respectively. Additionally, these setups effectively minimize power consumption.
As illustrated in Figure 11, Figure 12 and Figure 13, the accuracy under process variations improves with increasing and array size for two main reasons. Firstly, as shown in Table 4, Table 5 and Table 6, the number of groups required to be split decreases with larger array sizes, resulting in higher accuracy, consistent with the trend observed in [26]. Secondly, higher results in larger cell currents, providing better immunity to process variations. However, when exceeds a certain threshold, the SRAM cell currents or may flip due to a negative-read static noise margin [32], as marked with “Flipped” in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12. This does not ensure reliable operation of SRAM-based CIM and accuracy for BNNs. Considering this factor, we determined the optimal biasing points for and to be = 0.4 V/ = 0.1 V, = 0.7 V/ = 0.4 V, and = 0.6 V/ = 0.4 V for the CONVNET, RESNET-18, and VGG-9 BNN models, respectively. With an SRAM array size of , the accuracies of CONVNET on the MNIST dataset, and RESNET-18 and VGG-9 on the CIFAR-10 dataset are 98.08%, 77.07%, and 86.47%, respectively.
The results of our experiments demonstrate that process variations significantly impact the inference accuracies of BNN models implemented in SRAM-based CIMs. For instance, prior to applying variation-aware training, the average inference accuracies for the CONVNET, RESNET-18, and VGG-9 BNN models on the MNIST and CIFAR-10 datasets were notably degraded, dropping to as low as 61%, 20%, and 50%, respectively, under voltage configurations of = 0.9 V and = 0.4 V in a SRAM array.
The implementation of our variation-aware training framework effectively mitigated these losses. For example, applying variation-aware training improved the accuracy of CONVNET from 60.24% to 92.33% under the same voltage configurations. Additionally, our analysis identified the optimal biasing points for and for each model, which not only improved accuracy but also minimized power consumption. The optimal configurations for CONVNET, RESNET-18, and VGG-9 were determined to be = 0.4 V/ = 0.1 V, = 0.7 V/ = 0.4 V, and = 0.6 V/ = 0.4 V, respectively. Overall, our variation-aware training framework provides a robust solution to enhance the BNNs accuracy and energy efficiency of SRAM-based analog CIMs under process variations.
While our variation-aware framework effectively mitigates process variations arising from silicon (Si) manufacturing, we acknowledge that other sources of variation, such as aging, temperature fluctuations, and battery instability, may also impact the reliability of SRAM-based CIM architectures. These variations are particularly critical due to the sensitivity of WLs and BLs in analog computations. However, addressing these factors is beyond the scope of this work, and we leave this as future research to further enhance the robustness of CIM systems.
5. Conclusions
In this work, we address the challenge of process variation in SRAM-based BNN computation-in-memory (CIM) systems, aiming to balance accuracy and power consumption effectively. We developed mathematical models to capture the impact of process variations on analog computations in SRAM cells, validated through Monte Carlo simulations in 28 nm FD-SOI technology. Based on these models, we proposed a variation-aware BNN training framework that enhances accuracy despite process variations, as demonstrated through extensive simulations. Furthermore, we optimized the biasing of word lines and bit lines in SRAM to achieve an optimal trade-off between accuracy and power efficiency, making our approach particularly suitable for energy-efficient edge devices.
Conceptualization, M.-S.L. and I.-J.C.; methodology, M.-S.L. and I.-J.C.; software, M.-S.L.; validation, M.-S.L.; formal analysis, M.-S.L.; investigation, M.-S.L. and T.-D.N.; writing—original draft preparation, M.-S.L., T.-N.P., T.-D.N. and I.-J.C.; writing—review and editing, M.-S.L., T.-N.P., T.-D.N. and I.-J.C.; visualization, M.-S.L.; supervision, I.-J.C.; project administration, I.-J.C. All authors have read and agreed to the published version of the manuscript.
The raw data supporting the conclusions of this article will be made available by the authors on request.
The authors declare no conflicts of interest.
The following abbreviations are used in this manuscript:
CIM | Computation in memory |
DNNs | Deep neural networks |
BA | Binary Activation |
BNNs | Binary neural networks |
ADCs | Analog-to-digital converters |
CSA | Current sense amplifier |
MAC | Multiply-and-accumulation |
eNVM | Emerging non-volatile memories |
MC | Monte Carlo |
BN | Batch normalization |
FC | Fully connected |
BLs | Bit lines |
WLs | Word lines |
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1. 6T-SRAM-based CIM architecture for a BNN and truth table of input neurons and weights.
Figure 8. Average inference accuracy before variation-aware training of CONVNET BNN model on MNIST dataset at the FS corner and 85 °C.
Figure 9. Average inference accuracy before variation-aware training of RESNET-18 (full-precision shortcut) BNN model on CIFAR-10 dataset at the FS corner and 85 °C.
Figure 10. Average inference accuracy before variation-aware training of VGG-9 BNN model on CIFAR-10 dataset at the FS corner and 85 °C.
Figure 11. Average inference accuracy after variation-aware training of CONVNET BNN model on MNIST dataset at the FS corner and 85 °C.
Figure 12. Average inference accuracy after variation-aware training of VGG-9 BNN model on CIFAR-10 dataset at the FS corner and 85 °C.
Figure 13. Average inference accuracy after variation-aware training of RESNET-18 (full-precision shortcut) BNN model on CIFAR-10 dataset at the FS corner and 85 °C.
SRAM bit cells from SOTA works.
IEEE Access’20 [ | VLSI’19 [ | ASSCC’21 [ | DAC’18 [ | |
---|---|---|---|---|
SRAM bit cell | 10T + BEOL MOM cap | 6T (split WL) | 8T2C | 6T |
Technology | 22 nm | 28 nm | 28 nm | 65 nm |
Summary and comparison with previous works on SRAM-based BNN CIM.
VLSI’19 [ | ASSCC’21 [ | This Work | |
---|---|---|---|
Technique to | In-memory calibration | Charge-based | Software |
Advantages | Robust to | Robust to random | No hardware |
Disadvantages | Difficulty in coping with | Large cell area | Many training |
Comparison of two binary activation cases: (+1/−1) and (1/0).
Network | Dataset | Full | BNN | Split BNN | ||||||
---|---|---|---|---|---|---|---|---|---|---|
128 | 256 | 512 | ||||||||
(+1/−1) | (1/0) | (+1/−1) | (1/0) | (+1/−1) | (1/0) | (+1/−1) | (1/0) | |||
CONVNET [ | MNIST [ | 99.43 | 99.29 | 99.33 | 98.85 | 98.92 | 98.89 | 99.17 | 99.13 | 99.22 |
RESNET-18 [ | CIFAR-10 [ | 91.17 | 82.82 | 83.06 | 67.68 | 78.87 | 78.30 | 81.02 | 78.23 | 81.85 |
VGG-9 [ | CIFAR-10 [ | 93.71 | 89.77 | 91.36 | 86.94 | 87.24 | 87.63 | 88.58 | 88.35 | 88.79 |
Number of split groups for CONVNET BNN model on MNIST dataset.
Layer | Input Count per Output | Array Size | ||
---|---|---|---|---|
128 | 256 | 512 | ||
1 | 3 × 3 × 1 | - | - | - |
2 | 3 × 3 × 32 | 3 | 2 | 1 |
3 | 3 × 3 × 32 | 3 | 2 | 1 |
4 | 3 × 3 × 32 | 3 | 2 | 1 |
5 | 1568 | 14 | 7 | 4 |
6 | 512 | - | - | - |
Number of split groups for RESNET-18 BNN model on CIFAR-10 dataset.
Layer | Input Count per Output | Array Size | ||
---|---|---|---|---|
128 | 256 | 512 | ||
1 | 3 × 3 × 3 | - | - | - |
2→7 | 3 × 3 × 16 | 2 | 1 | 1 |
8→13 | 3 × 3 × 32 | 3 | 2 | 1 |
14→19 | 3 × 3 × 64 | 6 | 3 | 2 |
20 | 64 | - | - | - |
Number of split groups for VGG-9 BNN model on CIFAR-10 dataset [
Layer | Input Count per Output | Array Size | ||
---|---|---|---|---|
128 | 256 | 512 | ||
1 | 3 × 3 × 3 | - | - | - |
2 | 3 × 3 × 128 | 9 | 6 | 3 |
3 | 3 × 3 × 128 | 9 | 6 | 3 |
4 | 3 × 3 × 256 | 18 | 9 | 6 |
5 | 3 × 3 × 256 | 18 | 9 | 6 |
6 | 3 × 3 × 512 | 36 | 18 | 9 |
7 | 8192 | 64 | 32 | 16 |
8 | 1024 | 8 | 4 | 2 |
9 | 1024 | - | - | - |
Software to hardware conversion of
Software Implementation | Hardware Implementation |
---|---|
| |
where |
Number of instances where both
| 0.1 | 0.2 | 0.3 | 0.4 | |
---|---|---|---|---|---|
| |||||
| 0 | 0 | 0 | 0 | |
| 0 | 0 | 0 | 0 | |
| 138 | 0 | 0 | 0 | |
| Flipped | 311 | 0 | 0 | |
| Flipped | Flipped | 514 | 0 | |
| Flipped | Flipped | Flipped | 807 |
References
1. Wan, W.; Kubendran, R.; Schaefer, C.; Eryilmaz, S.B.; Zhang, W.; Wu, D.; Deiss, S.; Raina, P.; Qian, H.; Gao, B. et al. A compute-in-memory chip based on resistive random-access memory. Nature; 2022; 608, pp. 504-512. [DOI: https://dx.doi.org/10.1038/s41586-022-04992-8] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35978128]
2. Wen, T.-H.; Hsu, H.-H.; Khwa, W.-S.; Huang, W.-H.; Ke, Z.-E.; Chin, Y.-H.; Wen, H.-J.; Chang, Y.-C.; Hsu, W.-T.; Lo, C.-C. et al. 34.8 A 22nm 16Mb Floating-Point ReRAM Compute-in-Memory Macro with 31.2TFLOPS/W for AI Edge Devices. Proceedings of the 2024 IEEE International Solid-State Circuits Conference (ISSCC); San Francisco, CA, USA, 18–22 February 2024; pp. 580-582. [DOI: https://dx.doi.org/10.1109/ISSCC49657.2024.10454468]
3. Antolini, A.; Lico, A.; Zavalloni, F.; Scarselli, E.F.; Gnudi, A.; Torres, M.L.; Canegallo, R.; Pasotti, M. A Readout Scheme for PCM-Based Analog In-Memory Computing With Drift Compensation Through Reference Conductance Tracking. IEEE Open J.-Solid-State Circuits Soc.; 2024; 4, pp. 69-82. [DOI: https://dx.doi.org/10.1109/OJSSCS.2024.3432468]
4. Khaddam-Aljameh, R.; Stanisavljevic, M.; Mas, J.F.; Karunaratne, G.; Braendli, M.; Liu, F.; Singh, A.; Müller, S.M.; Egger, U.; Petropoulos, A. et al. HERMES Core—A 14 nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing. Proceedings of the 2021 Symposium on VLSI Circuits; Kyoto, Japan, 13–19 June 2021; pp. 1-2. [DOI: https://dx.doi.org/10.23919/VLSICircuits52068.2021.9492362]
5. Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1. arXiv; 2016; arXiv: 1602.02830
6. Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. arXiv; 2016; arXiv: 1603.05279v4
7. Kim, H.; Kim, K.; Kim, J.; Kim, J.-J. BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations. Proceedings of the International Conference on Learning Representations (ICLR); Addis Ababa, Ethiopia, 30 April 2020; Available online: https://openreview.net/forum?id=r1x0lxrFPS (accessed on 25 September 2024).
8. Yin, S.; Jiang, Z.; Seo, J.-S.; Seok, M. XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks. IEEE J.-Solid-State Circuits; 2020; 6, pp. 1733-1743. [DOI: https://dx.doi.org/10.1109/JSSC.2019.2963616]
9. Liu, R.; Peng, X.; Sun, X.; Khwa, W.-S.; Si, X.; Chen, J.-J.; Li, J.-F.; Chang, M.-F.; Yu, S. Parallelizing SRAM arrays with customized bit-cell for binary neural networks. Proceedings of the 55th Annual Design Automation Conference (DAC); San Francisco, CA, USA, 24–28 June 2018; [DOI: https://dx.doi.org/10.1109/DAC.2018.8465935]
10. Kim, H.; Oh, H.; Kim, J.-J. Energy-efficient XNOR-free in-memory BNN accelerator with input distribution regularization. Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD); Virtual Event 2–5 November 2020; [DOI: https://dx.doi.org/10.1145/3400302.3415641]
11. Choi, W.H.; Chiu, P.-F.; Ma, W.; Hemink, G.; Hoang, T.T.; Lueker-Boden, M.; Bandic, Z. An In-Flash Binary Neural Network Accelerator with SLC NAND Flash Array. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS); Seville, Spain, 12–14 October 2020; [DOI: https://dx.doi.org/10.1109/ISCAS45731.2020.9180920]
12. Angizi, S.; He, Z.; Awad, A.; Fan, D. MRIMA: An MRAM-Based In-Memory Accelerator. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. (Tcad); 2020; 5, pp. 1123-1136. [DOI: https://dx.doi.org/10.1109/TCAD.2019.2907886]
13. Saha, G.; Jiang, Z.; Parihar, S.; Xi, C.; Higman, J.; Karim, M.A.U. An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network. IEEE Access; 2020; 8, pp. 91405-91414. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.2993989]
14. Kim, J.; Koo, J.; Kim, T.; Kim, Y.; Kim, H.; Yoo, S.; Kim, J.-J. Area-Efficient and Variation-Tolerant In-Memory BNN Computing using 6T SRAM Array. Proceedings of the Symposium on VLSI Circuits; Kyoto, Japan, 9–14 June 2019; [DOI: https://dx.doi.org/10.23919/VLSIC.2019.8778160]
15. Oh, H.; Kim, H.; Ahn, D.; Park, J.; Kim, Y.; Lee, I.; Kim, J.-J. Energy-efficient charge sharing-based 8T2C SRAM in-memory accelerator for binary neural networks in 28nm CMOS. Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC); Busan, Republic of Korea, 7–10 November 2021; [DOI: https://dx.doi.org/10.1109/A-SSCC53895.2021.9634784]
16. Bhunia, S.; Mukhopadhyay, S.; Roy, K. Process Variations and Process-Tolerant Design. Proceedings of the 20th International Conference on VLSI Design Held Jointly with 6th International Conference on Embedded Systems (VLSID’07); Bangalore, India, 6–10 January 2007; [DOI: https://dx.doi.org/10.1109/VLSID.2007.131]
17. Yi, W.; Kim, Y.; Kim, J.-J. Effect of Device Variation on Mapping Binary Neural Network to Memristor Crossbar Array. Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE); Florence, Italy, 25–29 March 2019; [DOI: https://dx.doi.org/10.23919/DATE.2019.8714817]
18. Laborieux, A.; Bocquet, M.; Hirtzlin, T.; Klein, J.-O.; Nowak, E.; Vianello, E.; Portal, J.-M.; Querlioz, D. Implementation of Ternary Weights With Resistive RAM Using a Single Sense Operation Per Synapse. IEEE Trans. Circuits Syst. Regul. Pap.; 2021; 1, pp. 138-147. [DOI: https://dx.doi.org/10.1109/TCSI.2020.3031627]
19. Sun, X.; Peng, X.; Chen, P.-Y.; Liu, R.; Seo, J.-s.; Yu, S. Fully parallel RRAM synaptic array for implementing binary neural network with (+1, −1) weights and (+1, 0) neurons. Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC); Jeju, Republic of Korea, 22–25 January 2018; [DOI: https://dx.doi.org/10.1109/ASPDAC.2018.8297384]
20. Sun, X.; Yin, S.; Peng, X.; Liu, R.; Seo, J.-s.; Yu, S. XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks. Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE); Dresden, Germany, 19–23 March 2018; [DOI: https://dx.doi.org/10.23919/DATE.2018.8342235]
21. Liu, B.; Li, H.; Chen, Y.; Li, X.; Wu, Q.; Huang, T. Vortex: Variation-aware training for memristor X-bar. Proceedings of the 52nd Annual Design Automation Conference (DAC); San Francisco, CA, USA, 8–12 June 2015; [DOI: https://dx.doi.org/10.1145/2744769.2744930]
22. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, NV, USA, 27–30 June 2016; [DOI: https://dx.doi.org/10.1109/CVPR.2016.90]
23. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv; 2015; arXiv: 1409.1556v6
24. Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Technical Report 2009; Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 25 September 2024).
25. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE; 1998; 86, pp. 2278-2324. [DOI: https://dx.doi.org/10.1109/5.726791]
26. Kim, Y.; Kim, H.; Kim, J.-J. Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators. arXiv; 2019; arXiv: 1811.02187
27. Sari, E.; Belbahri, M.; Nia, V.P. How Does Batch Normalization Help Binary Training?. arXiv; 2020; arXiv: 1909.09139v3
28. Kim, H.; Kim, Y.; Kim, J.-J. In-memory batch-normalization for resistive memory based binary neural network hardware. Proceedings of the 24th Asia and South Pacific Design Automation Conference (ASPDAC); Tokyo, Japan, 21–24 January 2019; [DOI: https://dx.doi.org/10.1145/3287624.3287718]
29. Chen, T.; Gielen, G.G.E. A 14-bit 200-MHz Current-Steering DAC With Switching-Sequence Post-Adjustment Calibration. IEEE J.-Solid-State Circuits; 2007; 42, pp. 2386-2394. [DOI: https://dx.doi.org/10.1109/JSSC.2007.906200]
30. He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); Santiago, Chile, 7–13 December 2015; [DOI: https://dx.doi.org/10.1109/ICCV.2015.123]
31. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv; 2017; arXiv: 1412.6980v9
32. Arandilla, C.D.C.; Alvarez, A.B.; Roque, C.R.K. Static Noise Margin of 6T SRAM Cell in 90-nm CMOS. Proceedings of the UkSim 13th International Conference on Computer Modelling and Simulation (UKSIM); Cambridge, UK, 30 March–1 April 2011; [DOI: https://dx.doi.org/10.1109/UKSIM.2011.108]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Binary neural networks (BNNs) that use 1-bit weights and activations have garnered interest as extreme quantization provides low power dissipation. By implementing BNNs as computation-in-memory (CIM), which computes multiplication and accumulations on memory arrays in an analog fashion, namely, analog CIM, we can further improve the energy efficiency to process neural networks. However, analog CIMs are susceptible to process variation, which refers to the variability in manufacturing that causes fluctuations in the electrical properties of transistors, resulting in significant degradation in BNN accuracy. Our Monte Carlo simulations demonstrate that in an SRAM-based analog CIM implementing the VGG-9 BNN model, the classification accuracy on the CIFAR-10 image dataset is degraded to below 50% under process variations in a 28 nm FD-SOI technology. To overcome this problem, we present a variation-aware BNN framework. The proposed framework is developed for SRAM-based BNN CIMs since SRAM is most widely used as on-chip memory; however, it is easily extensible to BNN CIMs based on other memories. Our extensive experimental results demonstrate that under process variation of 28 nm FD-SOI, with an SRAM array size of
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer