Automatic Extraction and Compensation of P-Bit

Full text

Turn on search term navigation

1. Introduction

The traditional computing systems based on the von Neumann architecture typically utilize deterministic binary bits to encode information and perform calculations, leading to inefficient solutions for combinatorial optimization problems, which are often Non-deterministic Polynomial-time hard (NP-hard) or NP-complete problems, such as the knapsack problem, integer factorization, and traveling salesman problem (TSP). Probabilistic Bit (P-Bit), as the core device to construct the hardware entity of Ising computation [1], owing to its inherent stochasticity, is proposed for efficiently solving combinatorial optimization problems. Several hardware implementations of P-Bit devices are proposed, for instance, stochastic low barrier magnetic tunneling junctions (MTJs) [2,3,4,5,6], spin orbit torque (SOT)-driven MTJ [7], non-linear oscillators [8,9,10,11,12,13], diffusive memristors [14], metal–insulator-transition-based P-Bit [15], optical parametric oscillators [16,17,18,19,20], CMOS circuits [21,22,23,24,25], and resistive random access memories (RRAMs) [26].

Among them, MTJ P-Bit devices offer advantages like high speed (∼ns), ultra-low power (∼ $μ$ W), and small footprint (∼10 nm). However, in order to implement the stochastic property of P-Bit, the energy barrier $Δ E$ of the MTJ P-Bit needs to be reduced to approximately $1 k_{B} T$ . In practice, it is very hard to fabricate multiple MTJ P-Bits with the same $Δ E$ , especially since the $Δ E$ is too small to be controlled. This leads to severe intrinsic variation in MTJ P-Bit devices [27,28]. To ensure the correctness of Ising computation, the P-Bit devices are required to be calibrated one by one, which is prohibitively expensive for a large P-Bit array. Even worse, within an analog P-Bit system, the individual variations of P-Bits remain elusive to direct measurement and calibration. Several methods have been presented to address the variation issue, including time division multiplexing (TDM) [29,30], applying external magnetic fields and voltages [31], and configuring suitable resistances [2]. However, these approaches are either operationally complex or not scalable.

Boltzmann machines are stochastic recurrent neural networks inspired by Boltzmann distribution and have widespread applications in generative machine learning [32,33]. Recently, hardware implementations of restricted Boltzmann machines have been proposed [34,35]. In response to the variability of MTJ P-Bit devices, an in situ learning approach based on fully visible Boltzmann machine (FVBM) has been introduced [36], which updates the weight matrix to adapt to P-Bit variation. This approach regards P-Bits with variations as a unified system and trains suitable weights at the system level, rather than calibration. However, it requires retraining the weight matrix for different problem instances, significantly affecting its scalability and increasing the complexity.

This paper is structured as follows: Section 2 presents a behavioral model of P-Bits incorporating $α$ and $Δ V$ . Subsequently, by analyzing computations of the AND gate using both ideal and real P-Bits, it underscores the pressing need to address P-Bit variation. A weight compensation algorithm is then proposed in order to nullify variation by rederiving the weight matrix. Central to this methodology is the imperative requirement for precise knowledge of the $α$ and $Δ V$ values for each P-Bit. Consequently, Section 3 introduces a variation extraction algorithm based on the Boltzmann machine to capture individual P-Bit variations. Examination of the extraction inaccuracies on the AND gate array reveals the necessity of constructing a suitable density matrix, leading to the proposal of an adaptable 3D ferromagnetic model capable of achieving accurate and scalable array variation extraction. In Section 4, we demonstrate the effectiveness, transferability, and scalability of our approach by successfully solving the 16-city TSP and 21-bit integer factorization on a large P-Bit array with corrected variations given by the Automatic Extraction and Compensation methods.

2. Weight Matrix Rederivation to Compensate P-Bit Array Variation

The P-Bit device is the main building block of the Ising computer. It is actually a binary stochastic neuron (BSN) [37,38] which can fluctuate between −1 and 1 with probability that is tuned by the input voltage (or current). The curve of controlled probability with respect to input voltage is usually Sigmoid-like. The behavioral model of P-Bit is written as follows [39]:

(1) $m (t + 1) = s g n (t a n h (V (t)) - r (t))$

where

m (t + 1)

is the output P-Bit state at the next time step,

V (t)

represents the input voltage,

t a n h

implements the Sigmoid function, and

r (t)

is a uniformly distributed random number between −1 and 1.

2.1. Detrimental Impact of P-Bit Array Device Variation on Ising Computation

In our previous work, an ultra-fast field-free stochastic SOT P-Bit device is proposed and fabricated [7]. As shown in Figure 1a, a stack was deposited consisting of IrMn(7.5)/CoFe(2)/Ru(0.8)/CoFe(2)/CoFeB(1.9)/MgO /CoFeB(1.2)/W(5). Figure 1b displays the SEM image; the stack was etched into elliptical MTJs with a major axis of 137 nm and a minor axis of 107 nm. As illustrated in Figure 1c, The SOT MTJ structure involves adding a heavy metal layer beneath an MTJ, which is used to apply a SOT voltage. This voltage can influence the double-wall energy barrier via the SOT effect. The elliptical MTJ has a stray field that inherently favors the ‘0’ state. When the SOT voltage is low, the SOT MTJ tends to stay in the ‘0’ state. As the SOT voltage increases, the probabilities of the ‘0’ and ‘1’ states become 50/50%. While the SOT voltage is sufficiently high, the SOT MTJ is more likely to remain in the ‘1’ state. The Sigmoid curve of the SOT P-Bit device is fitted by the behavioral model, as shown in Figure 2. The behavioral model can fit the experimentally measured data well. However, upon measuring numerous SOT P-Bit devices, we found that the experimentally measured Sigmoid curves exhibit large intrinsic variation. As displayed in Figure 3, it is clearly observed that, compared to the ideal Sigmoid curve, the deviations of the real curve fall into two categories: stretched or compressed in shape, and rigid shift. This characteristic is also illustrated in both [7,40]. In ref [7], we have proposed to use $α$ to describe the degree of stretching and compression and $Δ V$ to describe the rigid shift of the Sigmoid curve. The behavioral model that incorporates $α$ and $Δ V$ to describe device variations is written as follows:

(2) $m (t + 1) = s g n (t a n h (α V (t) + Δ V) - r (t))$

Based on the modified behavioral model, we can fit our experimental Sigmoid curves with different combinations of $α$ and $Δ V$ . However, these device-to-device variations severely compromise the accuracy of Ising computation. To further examine the effect of the P-Bit device variation on Ising computation quantitatively, we connect three P-Bits to construct an AND logic gate. In Ising computing, the input voltage for each P-Bit will be determined as follows:

(3) $V_{i} (t + 1) = β (\sum_{j} W_{i j} m_{j} (t) + B_{i})$

where

β

is the inverse pseudo-temperature,

W_{i j}

is the coupling weight between the

i_{t h}

and

j_{t h}

P-Bit, and

B_{i}

is the bias of the

i_{t h}

P-Bit. The input voltage

V_{i}

will be calculated from the previous states of other P-Bits, which means the P-Bits are coupled with each other.

As depicted in Figure 4, three P-Bits are interconnected. The coupling weight matrix for the AND gate is also provided, and the truth table for the AND gate is given as Table 1. Each truth value of AND gate has a probability: $p = 1 / 4 = 0.25$ .

The result of AND gate computation in three scenarios of P-Bit variation is shown in Figure 5. As displayed in Figure 5a, the ideal P-Bit array obtains accurate results in AND gate computation at $β = 1$ ; each of the four correct states is approximately $18 %$ and this make $72 %$ total accuracy. While in Figure 5b, the real P-Bit array extracted from the experimental data with different $α$ and $Δ V$ cannot get accurate results in Ising computing. The accuracy from the real P-Bit array is $59 %$ , much smaller than the ideal P-Bit array. It is even worse that the four correct states are not equal; the states of P-Bits in computing are trapped in two states of high probability rather than fluctuating among the four correct states as expected.

We employed the Monte Carlo method to perform a statistic analysis of the AND gate. To better evaluate the computational results, we utilized the Kullback–Leibler divergence (KLD) [41]:

(4) $K L D [P_{c a l c} | | P_{d e s}] = \sum_{m} P_{d e s} (m) l o g (P_{d e s} (m) / P_{c a l c} (m))$

where the

P_{d e s}

is the designed probability distribution, and

P_{c a l c}

is the probability distribution calculated from Ising computer. The KLD measures the similarity between two probability distributions. A smaller KLD value indicates a better quality of the computed results. As shown in Figure 6, the horizontal axis represents the standard deviation of

α

, while the vertical axis represents the standard deviation of

Δ V

. Each pair of different

(σ (α), σ (Δ V))

is sampled 1000 times, and their KLDs are averaged and plotted as one point in the figure. As illustrated in Figure 6, a small standard deviation of

Δ V

results in a significantly large KLD for the AND gate computation, while the tolerance for

α

is higher. When the standard deviation of

α

is less than 0.5, the computation can ensure sufficient accuracy. Additionally, it has been indicated that the device variation measured from experiments without correction exceeds the range shown in Figure 6.

The variations in the real P-Bit array lead to the wrong results in Ising computing. As the scale of solved problem expands, the negative effect of P-Bit array variation on Ising computation will accumulate. The main drawback in hardware implementation of Ising computing is that the P-Bit devices have to be calibrated individually to get the same Sigmoid curves, which is not practical for a large array. A novel approach is required for addressing the P-Bit device variation, instead of expensively calibrating one by one.

2.2. Transferable Compensation for P-Bit Array Variation by Rederiving Weight Matrix

In this paper, we first propose the weight compensation method for addressing the P-Bit array variation. This method compensates the P-Bit device variation by rederiving the weight matrix instead of calibrating each P-Bit device individually. In Figure 7, the process of the weight matrix compensation is displayed. For the ideal P-Bit, the weight matrix $W_{i d e a l}$ is only calculated from the Ising Hamiltonian corresponding to the problem, while for the real P-Bit array, it is necessary to integrate extra parameters into the weight matrix to mitigate the variations inherent in the P-Bit device array.

In accordance with Equations (1) and (2), achieving identical output probabilities for P-Bit i in both the real P-Bit array and the ideal P-Bit array requires

(5) $t a n h (α V_{r e a l, i} + Δ V_{i}) = t a n h (V_{i d e a l, i})$

therefore, the applied input voltage must adhere to the subsequent relationship:

(6) $α_{i} V_{r e a l, i} + Δ V_{i} = V_{i d e a l, i}$

through a straightforward transformation, we can obtain the following:

(7) $V_{r e a l, i} = \frac{1}{α_{i}} V_{i d e a l, i} - \frac{1}{α_{i}} Δ V_{i}$

In the context of a whole P-Bit array, the above equation can be articulated in a matrix representation:

(8) $V_{r e a l} = d i a g (\frac{1}{α}) V_{i d e a l} - d i a g (\frac{1}{α}) Δ V$

where

α

Δ V

, and V are

n \times 1

vectors, and n is the number of P-Bits in computing. The

d i a g (\frac{1}{α})

is the diagonal matrix generated from vector

1 / α

. Equation (3) can also be expressed in matrix form:

(9) $V_{i n p u t} = W m + B$

where the V, m, and B are

n \times 1

vectors, and W is

n \times n

weight matrix. W and B are calculated from the Ising Hamiltonian based on the solved problem. Upon the substitution of Equation (9) into Equation (8), the resulting expression is as follows:

(10) $\begin{matrix} V_{r e a l} = d i a g (\frac{1}{α}) W_{i d e a l} m + \\ d i a g (\frac{1}{α}) B_{i d e a l} - d i a g (\frac{1}{α}) Δ V \end{matrix}$

therefore, we can derive new W and B for compensating the P-Bit array variation, denoting them as

(11) $\begin{matrix} W_{c o m p} = d i a g (\frac{1}{α}) W_{i d e a l} \\ B_{c o m p} = d i a g (\frac{1}{α}) B_{i d e a l} - d i a g (\frac{1}{α}) Δ V \end{matrix}$

through the aforementioned deductions, we have obtained the rederived weight and bias matrices crucial for ensuring the precise Ising computation of the real P-Bit array. We define two compensation parameters:

C_{α} = 1 / α

C_{Δ V} = - d i a g (\frac{1}{α}) Δ V

. The rederived weight matrix and bias matrix can be written as

(12) $\begin{matrix} W_{c o m p} = d i a g (C_{α}) W_{i d e a l} \\ B_{c o m p} = d i a g (C_{α}) B_{i d e a l} + C_{Δ V} \end{matrix}$

To verify the correctness of the weight matrix compensation, we initially conduct experimental measurements of the Sigmoid curves for three P-Bits and utilize a behavioral model to fit these data, thereby obtaining distinct values of $α$ and $Δ V$ . Then, we alter the weight matrix for computing the AND gate by integrating $C_{α}$ and $C_{Δ V}$ . As shown in Figure 5c, with the weight matrix compensation method, the real P-Bit array can calculate accurate results in the AND gate, and the results are similar to that of ideal P-Bit array without variations, which proves the reliability and rightness of the weight compensation.

It is noteworthy that the compensation parameters $C_{α}$ and $C_{Δ V}$ solely relate to P-Bit variations, rather than the weight matrix. This implies that even with changes in the solving problem, these compensation parameters remain unchanged. In essence, it eliminates the necessity to retrain the weight matrix. The prerequisite is merely obtaining the P-Bit variation parameters, $α$ and $Δ V$ , owing to our consideration of these variations as inherent and immutable attributes of P-Bit devices. This demonstrates the transferability of the weight compensation method across diverse problem domains.

It has been certified that weight compensation enables the real P-Bit array to compute similarly to the ideal P-Bit array without calibrating Sigmoid curves one by one or retraining the weight matrix. However, the most challenging key point lies in how to efficiently extract the $α$ and $Δ V$ from the large P-Bit array. Although calibration is not required, measuring and fitting the variations of P-Bit array are also complicated tasks. We seek to perform an expeditious and precise extraction of variations within an extensive P-Bit array.

3. Simultaneous Device Variation Extraction for Whole P-Bit Array

Boltzmann machine learning plays a significant role in uncovering and analyzing, in particular, complex data distributions. The Boltzmann machine can be mapped to the hardware system for the P-Bit, and it trains the weight matrix W for the given data distribution. The learning rule can be written as follows [36]:

(13) $W_{i, j} (t + 1) = W_{i, j} (t) + ϵ (⟨ v_{i} v_{j} ⟩ - ⟨ m_{i} m_{j} ⟩ - λ W_{i, j} (t))$

Here,

⟨ v_{i} v_{j} ⟩

is the average correlation between the

i_{t h}

and

j_{t h}

P-Bits in the given data distribution, and

⟨ m_{i} m_{j} ⟩

is the correlation of the outputs sampled from P-Bit i and P-Bit j in Ising computing,

λ

is the regulation parameter, and

ϵ

is the learning rate. In the training process, the

W_{i j}

will be learned from

⟨ m_{i} m_{j} ⟩

, while the

B_{i}

in the weight matrix corresponds to

⟨ m_{i} ⟩

Equation (13) can only train the required weight matrix for a specific problem. For another problem, it needs to retrain the other weight matrix, which requires more time consumption, and there is a lack of transferability and scalability. However, our objective is to extract the $α$ and $Δ V$ of each P-Bit device in the real array just once, thinking of them as the inherent attributes of each P-Bit, then use the weight compensation method to solve diverse unconventional problems. Such a process is transferable without complicatedly training again.

3.1. Automatic Extraction Algorithm Based on Boltzmann Machine Learning

To extract the P-Bit variation by Boltzmann machine learning, we need to observe and analyze the impact of $α$ and $Δ V$ on the data distributions in Ising computation. Let us treat $Δ V$ first. As shown in Figure 8, the red Sigmoid curve represents the ideal P-Bit, the black curve is P-Bit 1, and the blue curve is P-Bit 2. When $Δ V > 0$ , the ideal Sigmoid curve rigidly shifts to the left; all points of this curve have also moved upward for the input voltage. Consequently, the $⟨ m ⟩$ increases from $⟨ m_{i d e a l} ⟩$ to $⟨ m_{1} ⟩$ . Conversely, when $Δ V < 0$ , the $⟨ m ⟩$ will decrease from $⟨ m_{i d e a l} ⟩$ to $⟨ m_{2} ⟩$ . Based on this observation, in instances where the $⟨ m ⟩$ is larger than $⟨ m_{i d e a l} ⟩$ during the computational process, it is recommended to diminish the $Δ V$ in the learning, and vice versa.

As depicted in Figure 9, the $α$ of the ideal P-Bit is 1, the Sigmoid curve of P-Bit 1 with $α < 1$ is stretched, and the Sigmoid curve of P-Bit 2 with $α > 1$ is compressed. When $W_{i k} > 0$ , the interaction between P-Bit i and P-Bit k in Ising computing is positive, then the state of two P-Bits are the same direction. For P-Bit 1, $α < 1$ , the absolute value of the $⟨ m_{1} ⟩$ decreased, which resulted in the decrease in $⟨ m_{1} m_{k} ⟩$ from ${⟨ m_{1} m_{k} ⟩}_{i d e a l}$ . For P-Bit 2, $α > 1$ , the absolute value of the $⟨ m_{2} ⟩$ increased, which resulted in the increase in $⟨ m_{2} m_{k} ⟩$ from ${⟨ m_{1} m_{k} ⟩}_{i d e a l}$ . Building on this insight, in cases where the $⟨ m_{i} m_{k} ⟩$ is larger than ${⟨ m_{i} m_{k} ⟩}_{i d e a l}$ , it is advisable to reduce the $α_{i}$ in the Boltzmann machine learning. Furthermore, in cases where the $⟨ m_{i} m_{k} ⟩$ is smaller than ${⟨ m_{i} m_{k} ⟩}_{i d e a l}$ , $α_{i}$ should be increased in the Boltzmann machine learning. When $W_{i k} < 0$ , the above observation will be opposite.

Based on the analysis above, we propose the extraction algorithm for P-Bit array variation. As shown in Figure 10, firstly, the density matrix D is calculated as follows [36]:

(14) $D = X^{T} X / d$

where X is the dataset corresponding to the problem, and d is the order of the matrix. Density matrix D describes the average probability distribution of the correlation between P-Bits. For instance, the

D_{1, 2}

is actually the average correlation

⟨ v_{1} v_{2} ⟩

in Equation (13), which signifies that the desired ideal value of

⟨ m_{1} m_{2} ⟩

in Boltzmann machine learning is

D_{1, 2}

From the perspective of a whole P-Bit array, the $α_{i}$ will affect the correlation between the P-Bit i and other P-Bits; therefore, the compensation parameter $C_{α_{i}}$ will be learned from the sum of average correlation $\sum_{j} ⟨ m_{i} m_{j} ⟩$ , while the compensation parameter $C_{Δ V_{i}}$ is learned from $⟨ m_{i} ⟩$ . After learning of the compensation parameter reaching convergence, the variation of each P-Bit device in the array can be extracted as $α = 1 / C_{α}$ , $Δ V = - C_{Δ V} / C_{α}$ .

3.2. Inefficiency Extraction Based on Traditional Logic Gate

To verify the feasibility of the proposed extraction algorithm, we performed the extraction of the P-Bit array variations on the AND gate. Initially, we can calculate the density matrix D for the AND gate based on the truth table in Table 1 and Equation (14).

(15) $D_{A N D} = [\begin{matrix} 1 & 0 & 0.5 & 0 \\ 0 & 1 & 0.5 & 0 \\ 0.5 & 0.5 & 1 & - 0.5 \\ 0 & 0 & - 0.5 & 1 \end{matrix}]$

The density matrix D describes the ideal data distribution for the AND gate. In the learning of $α$ and $Δ V$ , our aim is to align the sampled data distribution in Ising computing with the ideal distribution by updating the compensation parameters $C_{α}$ and $C_{Δ V}$ . Finally, we can obtain the $α$ and $Δ V$ .

Figure 11 displays the outcomes of our proposed extraction algorithm applied to an AND gate. We randomly choose three fabricated SOT P-Bits of different variations. As depicted in Figure 11a, $α_{i}$ corresponds to P-Bit i, and the dash line means the real value of P-Bit variation. The training trends of $α_{1}$ and $α_{2}$ are correct, and the extracted values of $α_{1}$ and $α_{2}$ are somewhat close to their respective real values. However, the extracted $α_{3}$ is inaccurate. The extracted $Δ V$ results of P-Bit 1 and P-Bit 2 shown in Figure 11b are relatively aligned with the real value, while the inaccuracy of extracted $α_{3}$ has a detrimental impact on the $Δ V_{3}$ , causing the extracted $Δ V_{3}$ to slightly deviate from the real value. It can be explained that the $Δ V_{3}$ is calculated as $Δ V_{3} = - C_{Δ V_{3}} / C_{α_{3}}$ .

The proposed extraction algorithm has demonstrated effectiveness in the AND gate. However, it has not yet reached a high level of accuracy. The performance of extracted $α$ is subpar, which significantly influences the training process for $Δ V$ . We shall carefully examine the reasons for the inaccurate training of $α$ in the AND gate.

In the training rule of $α$ displayed in Figure 10, $α_{i}$ is learned from all $⟨ m_{i} m_{j} ⟩$ in the $i_{t h}$ row of density matrix. Nevertheless, when $W_{i j}$ in weight matrix comprises both positive and negative values, the training of $α_{i}$ will be affected. This occurs due to the contrasting training directions of $W_{i j} > 0$ and $W_{i j} < 0$ as analyzed in Figure 9.

Based on the preceding analysis, the weight matrix from the Ising Hamiltonian will notably influence the extraction of $α$ and $Δ V$ . As a consequence, a suitable Ising Hamiltonian for the proposed extraction algorithm should be meticulously constructed. Firstly, the Hamiltonian we construct needs to be scalable for extending the variation extraction algorithm to larger P-Bit arrays. Secondly, the corresponding density matrix of this Hamiltonian should be sparse to facilitate simpler and more accurate extraction. From the density matrix of the AND gate, we observe reduced accuracy in extraction when the density matrix is uneven. Hence, finally, a homogeneous density matrix is essential, showcasing advantages in training convergence rate.

3.3. 3D Ferromagnetic Hamiltonian Model for Scalable Large P-Bit Array

The Ising model, renowned for its depiction of ferromagnetism, prompts a pertinent inquiry: can this ferromagnetic model be harnessed for variation extraction purposes? Drawing inspiration from this, we have formulated a succinct and scalable ferromagnetic model. Within this constructed regular ferromagnetic lattice, spins exclusively interact with their immediate neighboring spins, ensuring the overall sparsity of the system. Notably, the density matrix associated with our developed ferromagnetic model adheres to uniformity required in the aforementioned criteria.

The constructed ferromagnetic Hamiltonian model has three categories of dimension: 1D, 2D, and 3D. Figure 12 displays three connection types for the ferromagnetic model. All P-Bit states in the ferromagnetic model exhibit a uniform orientation, and each P-Bit simultaneously interacts with $2 \times D i m e n s i o n$ adjacent P-Bits. For example, the P-Bit in the 3D ferromagnetic model has six adjacent P-Bits. The coupling strength J between P-Bits is the same, which means that every P-bit is equivalent. The right side gives the weight matrix and density matrix for the ferromagnetic model. In the weight matrix, the total number of 1 in each row is $2 \times D i m e n s i o n$ , which corresponds to the number of adjacent P-Bits for each P-Bit. The 1 which situated on the last position of the anti-diagonal in the matrix signifies the presence of periodic boundary conditions. The bias B on the diagonal of matrix are all 0. For the calculated density matrix, the total number of 1 in each row is $2 \times D i m e n s i o n + 1$ , since the density matrix has an order higher than the weight matrix, and the diagonal of density matrix is 1. The last column in matrix is 0, which infers that the ideal mean value of P-Bit state in the ferromagnetic Hamiltonian model is 0.

The three types of dimension for the ferromagnetic Hamiltonian model we have developed are meticulously designed to address the critical aspect centered on the variation extraction algorithm highlighted in the preceding analysis. However, ascertaining the dimensionality of the ferromagnetic model that is most compatible with the extraction algorithm requires empirical verification. To initiate this validation process, we measured the standard deviation of the variation for 600 fabricated P-Bit devices. The standard deviation of $α$ is approximately 0.3, and for $Δ V$ , it is around 1 V. Utilizing this level of P-Bit variation, we proceeded to generate a substantial number of P-Bit devices with diverse variations in simulations. In these simulations, the $α$ follows a Gaussian distribution with a mean value of 1 and a standard deviation of 0.3, while that of the $Δ V$ are 0 and 1 V.

As shown in Figure 13, based on these P-Bit devices, we constructed three kinds of P-Bit arrays including 1000 devices, respectively, utilizing 1D, 2D, and 3D ferromagnetic models (FM) to perform variation extraction. In Figure 13, different colors correspond to different P-Bits, where the solid lines demonstrate the evolving trends of extracted $α$ and $Δ V$ with the number of iterations, while the dash lines represent their real values. For the 1D-FM, there is a moderate deviation between the extracted values of $α$ and the real values, while the extraction direction is accurate. The extracted values of $Δ V$ are relatively close to the real values. In the case of 2D-FM, the training outcomes for the difficult $α$ are satisfactory, with the extracted variation being relatively consistent with the real value. Furthermore, the training result for $Δ V$ is highly precise, nearly identical to the real value. With regard to 3D-FM, the learning results are further improved compared to 2D-FM. The extracted values of $α$ are almost identical to the real value, which is remarkable for the challenging extraction of $α$ . As for $Δ V$ , the extracted values perfectly align with the real values.

We conducted a detailed analysis of the statistical standard deviation exhibited by the extraction algorithm across various dimensional ferromagnetic models. As illustrated in Table 2, the implementation of the extraction algorithm based on the ferromagnetic model notably mitigates device variation levels within extensive P-Bit arrays. Of particular significance is the 3D-FM model, which achieves a noteworthy reduction in the standard deviation of $α$ to an impressively low 0.032 and $Δ V$ to an exceptionally minimal 0.034 V. The specific Sigmoid curves before and after extraction are further provided. As shown in Figure 14, the distribution of curves without extraction is quite broad, whereas the extraction based on 3D-FM can reduce the variation level to a sufficiently low range. As shown at the orange point in Figure 6, this variation can achieve accurate Ising computing. After the above comparison, the final results indicate that the extraction algorithm based on 3D-FM can most accurately extract $α$ and $Δ V$ .

3.4. Discussions of the Automatic Extraction Algrithm

Upon scrutinizing the variation extraction outcomes performed on P-Bit array in the aforementioned models, a noteworthy observation emerges. Specifically, the extraction challenge associated with $α$ is markedly more formidable than that associated with $Δ V$ . This discrepancy can be attributed to the inherent nature of their relationships with input voltage V, as delineated in Equation (2). It becomes evident that $α$ exhibits a multiplicative dependence on $V_{i n p u t}$ , while $Δ V$ manifests an additive relationship with the same variable. Delving into the intricacies from the perspective of the weight matrix, the $Δ V_{i}$ singularly impacts the $B_{i}$ corresponding to P-Bit i, whereas $α_{i}$ exerts influence across all the $W_{i j}$ in the $i_{t h}$ row of the weight matrix. This distinction in the impact scope between $α$ and $Δ V$ underlies the observed difference in extraction difficulty. Given the direct influence of $α$ extraction on $Δ V$ in the training process, achieving precise $α$ extraction becomes crucial.

However, within weight matrix of the AND gate, the inconsistency of $W_{i j}$ results in a non-uniform influence of $α$ on weights. This factor collectively affects the training of $α$ . Challenges emerge due to the inherent extraction inaccuracies of $α$ and the non-scalable nature ingrained within the AND gate. Although addressing the accuracy of $α$ training within the AND gate is achievable, achieving seamless scalability presents challenges. Cascading logic gates amplify the complexity of the weight matrix, rendering the training process notably intricate.

Building upon the aforementioned analysis, we have devised the ferromagnetic model characterized by uniform W, B, and inherent scalability. Through the training of 1D, 2D, and 3D ferromagnetic models, a consistent improvement in training effectiveness is observed with the increasing dimensionality, particularly notable in the case of 3D-FM. In this scenario, the proposed extraction algorithm simultaneously and adeptly captures all variations in P-Bits.

The enhanced performance in higher-dimensional ferromagnetic models can be attributed to their increased stability. According to the Landau theorem, 1D ferromagnetic materials are significantly influenced by thermal fluctuations, thus making it challenging for them to exhibit ferromagnetism. In 2D ferromagnetic materials, the quasi-long-range order is lost above the critical temperature. In contrast, the 3D ferromagnetic materials possess higher degrees of freedom, experiencing less influence from entropy effects, thus exhibiting greater stability. In a more stable 3D ferromagnetic Hamiltonian model, the extraction of $α$ becomes notably clearer and more accurate.

It is worth noting that in a digital system involving both read and write processes, our method may not be more efficient than direct exponential fitting. However, in an analog system, the situation changes because convenient read and write operations are absent. Moreover, digital systems have two significant drawbacks compared to analog systems. (1) Power consumption: Digital systems using FPGA and DAC consume considerable power. (2) Speed: Digital systems require clocks for read and write operations, which means that the flipping speed of P-Bit devices is limited by the FPGA clock. The speed advantage of stochastic MTJ (ns) might be compromised by the read operations in digital systems (see Supplementary Materials in [40]). In contrast, an analog Ising system composed of resistors/capacitances networks [42,43] or tunable logic gates will have lower energy consumption as it does not require application-specific integrated circuits (ASICs) and high-precision DACs. Regarding speed, the advantage of ultra-fast stochastic MTJs [44] will be realized because the spins in an analog system can update asynchronously. The local minima in Ising computations will be mitigated by the speed differences between MTJs [29,30]. However, analog Ising systems face calibration challenges since the Sigmoid curve of P-Bits cannot be acquired directly. Therefore, our method is designed to be more suitable for analog Ising systems, where read and write operations are not feasible and direct exponential fitting cannot be performed. In such cases, our method helps extract the individual variations of each device and uses a weight compensation algorithm for correction, thereby achieving accurate results in Ising computing.

4. Transferable and Scalable Ising Computations on Large P-Bit Array with Variations

In addressing the challenge of device-to-device variations on P-Bit array within Ising computations, our proposed approach involves the utilization of both a weight matrix compensation method and an variation extraction algorithm. A crucial sequence for practical implementation becomes apparent: Extraction and Compensation. Specifically, for a substantial P-Bit array with variation, we initially employ the proposed extraction algorithm to extract the variations of each P-Bit, and store them for later use. Owing to the simplicity and scalability of the 3D ferromagnetic model employed in the extraction algorithm, the extraction based P-Bit array can be greatly expanded. Given the invariant Sigmoid curve of P-Bit devices, their $α$ and $Δ V$ remain constant. In essence, after performing the extraction algorithm once, we can consider the variations as inherent attributes of the P-Bits. Subsequently, we can utilize these variations in conjunction with weight matrix compensation to tackle diverse combinatorial optimization problems. Due to the transferability inherent in the weight compensation method, repetitive training is unnecessary. By incorporating specific compensation parameters denoted as $C_{α}$ and $C_{Δ V}$ into the weight matrix, the real P-Bits with variations can be treated as ideal P-Bits without variations for computational purposes. Hence, we can achieve transferable and scalable Ising computation using a large P-Bit array with variation. In the following, we will address the solutions to the two classical problems in probabilistic computation, TSP and semi-prime factorization. As the scale increases, the variation requirements for P-Bit devices also become more stringent. Therefore, validating our algorithm on these two large-scale problems further demonstrates its effectiveness and superiority.

4.1. Successful Solution for 16-City Traveling Salesman Problem

To validate the feasibility, transferability, and scalability of our proposed extraction algorithm and weight compensation method, we employed a large-scale P-Bit array to solve the traveling salesman problem (TSP). It is a challenging NP-complete problem that aims at finding the shortest path to visit N cities. The Ising Hamiltonian of TSP can be written as follows [1]:

(16) $\begin{matrix} H_{T S P} = \sum_{v = 1}^{N} {(1 - \sum_{j = 1}^{N} m_{v, j})}^{2} + \sum_{j = 1}^{N} {(1 - \sum_{v = 1}^{N} m_{v, j})}^{2} \\ + λ_{p} \sum_{u v j} W_{(u v)} m_{u, j} m_{v, j + 1} \end{matrix}$

where m is the visiting matrix, and

m_{v, j}

means that, at the

j_{t h}

time, the selection is made to visit city v. The first constraint indicates that a city can be visited only once; the second constraint implies that each visit should involve only a single city. The third item concerns the calculation of the travel path length, with

λ_{p}

serving as the penalty coefficient, and

W_{u v}

is the distance between city u and city v. It is worth noting that this constraint should take into account the distance between the starting city and the first chosen city, as well as the distance between the last visited city and the starting city, given that the TSP requires returning to the starting point eventually.

For a N-city TSP with given starting point, calculating the optimal tourist route requires ${(N - 1)}^{2}$ P-Bits. As illustrated in Figure 15, we computed a TSP involving 16 cities. Based on the P-Bit array with variations generated in simulations (see Section 3.2), we constructed a 3D ferromagnetic Hamiltonian model, then extracted the variation of each P-Bit from the 225-bit array. By employing the weight compensation method, we could mitigate the device-to-device variations in the computation. With an annealing technique, we finally obtained the correct route.

In Figure 15a,c,e,g, the $β$ represents the inverse temperature. The red point means the starting city is (0, 0). In observing the Automatic Extraction and Compensation P-Bit array, it becomes apparent that, with the progression of annealing, the increment in $β$ gradually directs the salesman to visit each city while progressively reducing the total tour length. Eventually, as the $β$ approaches 5, the Ising system converges to find the shortest and compliant tour length. For comparison, Figure 15g illustrates the inability of the real array with variations to produce the correct outcome when $β$ is set to 5. This highlights the efficacy and precision of our proposed Automatic Extraction and Compensation algorithm. Figure 15b,d,f,h displays the respective visiting matrix m. Figure 16 shows the evolution of the total energy during the annealing process performed by three types of P-Bit array, real, ideal, and Automatic Extraction and Compensation, where the insert figure displays the trend of $β$ with the epochs. We note that the variations present in the real P-Bit array tend to stagnate the Ising system in a local minimum. Nonetheless, the ideal P-Bit array and the real P-Bit array with the Automatic Extraction and Compensation algorithm can facilitate the system’s annealing to the global minimum.

We further evaluated the average path energy of TSP-16 over 100 trials across three different P-Bit arrays. As shown in Table 3, the average path energy differs markedly before and after applying our automatic correction method. The average path energy from the P-Bit array using the Automatic Extraction and Compensation method can be reduced to nearly match that of the ideal array.

4.2. Successful Solution for 21-Bit Integer Factorization

To demonstrate the effectiveness of our proposed algorithm across multiple problems, we further performed a 21-bit semi-prime factorization calculation on a large P-Bit array with variations, successfully factoring 2,035,153. As shown in Figure 17, our proposed Automatic Extraction and weight compensation algorithm can reduce the variation of the P-Bit device array sufficiently to complete the challenging task of semi-prime factorization. This factorization was achieved using the invertible logic method [3,6], which clamps the output of the multiplier to perform semi-prime factorization. According to the invertible logic method, Figure 17a presents the coupling matrix for the factorization of a 21-bit semi-prime 2,035,153. Additionally, we employed the Parallel Tempering (PT) method [45,46], generating multiple replicas at different temperatures to search for the optimal state. As shown in Figure 17b, periodically, these replicas were swapped. These swap stages enable the whole Ising system to escape local optima, ensuring that the lowest energy state operates at the lowest temperature (highest $β$ ). The swap probability between two replicas is written as follows [47]:

(17) $P_{s w a p} = m i n {1, e x p (Δ β (E_{i} - E_{j}))}$

where

E_{i}

E_{j}

are the energies at inverse temperature

β_{i}

β_{j}

. When the energy E is higher at a higher

β

, a lower energy state will be exchanged at this

β

. Figure 17c shows the energy trends of the replicas at different

β

, where different colors represent the energy states at different

β

values. The Ising system continuously searches and swaps low-energy states to the higher

β

, ultimately ensuring that the lowest energy state is maintained at the highest

β

. Figure 17d presents the values of factors A and B during the PT process. After approximately 35,000 iterations, the correct factorization was achieved, 2,035,153

= 2017 \times 1009

The effectiveness of our algorithm has been validated in both the challenging 21-bit semi-prime factorization and the TSP-16 problem performed on a large P-Bit array with variations. It has been demonstrated that the Automatic Extraction and Compensation algorithm can reduce the variations in the P-Bit array to a sufficiently low level to accurately perform complex computations.

The obtained results validate the efficacy of the proposed extraction algorithm and weight compensation method. The individual calibration needs of P-Bit devices are addressed through weight compensation. Due to the transferability inherent in this method, the weight matrix rederivation enables the treatment of real P-Bits as ideal ones for diverse problems, eliminating the necessity for weight matrix retraining. Furthermore, the extraction algorithm, coupled with 3D-FM, accurately extracts all variations within the P-Bit array, demonstrating scalability across array sizes. By clearly following an Extraction–Compensation sequence, a transferable (for any problem) and scalable (for any array size) Ising computation on the P-Bit array with variations is facilitated.

5. Conclusions

In this work, we initially proposed the behavioral model incorporating $α$ and $Δ V$ which fits our fabricated SOT P-Bit devices well. Then we introduced the Automatic Extraction and Compensation algorithm. The Extraction step can extract the variations of all the P-Bit devices across the whole large P-Bit array simultaneously, while the Compensation step mitigates the device variations by modifying the weight matrix according to the extracted variations. Finally, we employed the developed Extraction–Compensation process to facilitate accurate solutions for both 16-city TSP spanning 225 bits and 21-bit semi-prime factorization, thereby confirming the efficiency, transferability, and scalability of our algorithm. The term ’efficiency’ indicates that individual calibration for P-Bits with variations is not required. The term ’transferability’ means that weight matrix retraining for diverse problems is unnecessary. ’Scalability’ refers to the capability of expanding the array scale. The Automatic Extraction and Compensation algorithm enables the accurate and efficient completion of Ising computations on large P-Bit arrays with variations, therefore improving the efficiency, accessibility, and widespread applicability of probabilistic computers.

Author Contributions

Conceptualization, B.Z. and Y.L.; Methodology, B.Z.; Validation, B.Z., Y.L., T.G., J.Y. and Z.G.; Investigation, Y.L., T.G., J.Y. and Z.G.; Writing—original draft, B.Z.; Writing—review & editing, D.Z. and L.Z.; Project administration, D.Z. and L.Z.; Funding acquisition, D.Z. and L.Z. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

View Image - Figure 1. The general overview of fabricated the SOT P-Bit device. (a) The deposited structure of the stack. (b) The SEM image of the SOT MTJ. (c) The operation mechanism of the SOT P-Bit. The double-wall energy barrier can be modulated by SOT voltage.

Figure 1. The general overview of fabricated the SOT P-Bit device. (a) The deposited structure of the stack. (b) The SEM image of the SOT MTJ. (c) The operation mechanism of the SOT P-Bit. The double-wall energy barrier can be modulated by SOT voltage.

Figure 2. The behavioral model can fit the experimentally measured output probability curve very well.

View Image - Figure 3. The experimental Sigmoid curves fitted by the modified behavioral model exhibit large intrinsic variation which includes [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.]. [Forumla omitted. See PDF.] describes stretching or compression in shape and [Forumla omitted. See PDF.] represents rigid shift.

Figure 3. The experimental Sigmoid curves fitted by the modified behavioral model exhibit large intrinsic variation which includes [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.]. [Forumla omitted. See PDF.] describes stretching or compression in shape and [Forumla omitted. See PDF.] represents rigid shift.

Figure 4. The AND gate comprises three P-Bits, and the weight matrix for the AND gate is provided on the right.

View Image - Figure 5. Under three different P-Bit configurations, the computational outcomes of the AND gate are depicted, with red denoting the correct state and blue representing the wrong state. (a) The ideal P-Bit array successfully completes the computation, with an evenly distributed occurrence of the four correct states. (b) The real P-Bit array with variations fails to accurately complete the computation, displaying a significantly low occurrence of three correct states. (c) The real P-Bit array with weight compensation achieves accurate computation akin to the ideal P-Bit array.

Figure 5. Under three different P-Bit configurations, the computational outcomes of the AND gate are depicted, with red denoting the correct state and blue representing the wrong state. (a) The ideal P-Bit array successfully completes the computation, with an evenly distributed occurrence of the four correct states. (b) The real P-Bit array with variations fails to accurately complete the computation, displaying a significantly low occurrence of three correct states. (c) The real P-Bit array with weight compensation achieves accurate computation akin to the ideal P-Bit array.

View Image - Figure 6. The Monte Carlo analysis of [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] on the KLD for the AND gate. Each pair of different [Forumla omitted. See PDF.] has been sampled 1000 times, and their KLDs are averaged.

Figure 6. The Monte Carlo analysis of [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] on the KLD for the AND gate. Each pair of different [Forumla omitted. See PDF.] has been sampled 1000 times, and their KLDs are averaged.

View Image - Figure 7. The process of weight matrix compensation. To achieve the same output probability, it is necessary to satisfy that [Forumla omitted. See PDF.]. After the derivation, the new weight and bias matrices incorporating compensation parameters [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] enable the real P-Bits to compute like the ideal P-Bits.

Figure 7. The process of weight matrix compensation. To achieve the same output probability, it is necessary to satisfy that [Forumla omitted. See PDF.]. After the derivation, the new weight and bias matrices incorporating compensation parameters [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] enable the real P-Bits to compute like the ideal P-Bits.

Figure 8. The impact of [Forumla omitted. See PDF.] observed on the mean value of P-Bit in Ising computation.

Figure 9. The impact of [Forumla omitted. See PDF.] observed on the correlation between P-Bits in Ising computation.

View Image - Figure 10. Proposed extraction algorithm for P-Bit array variation with given Ising Hamiltonian. It is necessary to initially compute the density matrix D and continuously update the compensation parameters to attain the desired distribution in the sampled Ising computation, ultimately obtaining [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.]. The learning rate [Forumla omitted. See PDF.] used in this algorithm is 0.0001, while the regulation parameter [Forumla omitted. See PDF.] is 0.05.

Figure 10. Proposed extraction algorithm for P-Bit array variation with given Ising Hamiltonian. It is necessary to initially compute the density matrix D and continuously update the compensation parameters to attain the desired distribution in the sampled Ising computation, ultimately obtaining [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.]. The learning rate [Forumla omitted. See PDF.] used in this algorithm is 0.0001, while the regulation parameter [Forumla omitted. See PDF.] is 0.05.

View Image - Figure 11. The learning results of presented extraction algorithm on the AND gate. The solid lines display the respective extracted value of P-Bits, while the dash lines represent the real value. (a) The extracted [Forumla omitted. See PDF.] for AND. (b) The extracted [Forumla omitted. See PDF.] for AND.

Figure 11. The learning results of presented extraction algorithm on the AND gate. The solid lines display the respective extracted value of P-Bits, while the dash lines represent the real value. (a) The extracted [Forumla omitted. See PDF.] for AND. (b) The extracted [Forumla omitted. See PDF.] for AND.

View Image - Figure 12. Constructed 1D, 2D, and 3D ferromagnetic Hamiltonian models for the extraction algorithm. Their uniform weight matrix and density matrix are provided on the right.

Figure 12. Constructed 1D, 2D, and 3D ferromagnetic Hamiltonian models for the extraction algorithm. Their uniform weight matrix and density matrix are provided on the right.

View Image - Figure 13. The learning results of the extraction algorithm employed on constructed 1D, 2D, and 3D ferromagnetic models. As the dimension of the ferromagnetic model increases, the accuracy of variation extraction gradually improves. The 3D-FM achieves flawless extraction, with the extracted values perfectly aligning with the real values.

Figure 13. The learning results of the extraction algorithm employed on constructed 1D, 2D, and 3D ferromagnetic models. As the dimension of the ferromagnetic model increases, the accuracy of variation extraction gradually improves. The 3D-FM achieves flawless extraction, with the extracted values perfectly aligning with the real values.

View Image - Figure 14. The Sigmoid curves obtained from P-Bit arrays w/o correction and with extraction. According to Table 2, the [Forumla omitted. See PDF.] of the P-Bit array without extraction is ([Forumla omitted. See PDF.] V), while that of the P-Bit array with extraction based on 3D-FM is ([Forumla omitted. See PDF.] V).

Figure 14. The Sigmoid curves obtained from P-Bit arrays w/o correction and with extraction. According to Table 2, the [Forumla omitted. See PDF.] of the P-Bit array without extraction is ([Forumla omitted. See PDF.] V), while that of the P-Bit array with extraction based on 3D-FM is ([Forumla omitted. See PDF.] V).

View Image - Figure 15. The computing results of 16-city TSP solved by both the Automatic Extraction and Compensation P-Bit array and the real P-Bit array with variations. (a,c,e) The route of the 16-city TSP based on the Automatic Extraction and Compensation array from the annealing process at [Forumla omitted. See PDF.]. (g) The route of the real P-Bit array at [Forumla omitted. See PDF.]. As [Forumla omitted. See PDF.] gradually increased to a sufficient level, the Ising system ultimately computed the shortest and compliant tour length. For comparison, we observed that at [Forumla omitted. See PDF.], the real array failed to yield the correct results. (b,d,f,h) Their respective visiting matrix m.

Figure 15. The computing results of 16-city TSP solved by both the Automatic Extraction and Compensation P-Bit array and the real P-Bit array with variations. (a,c,e) The route of the 16-city TSP based on the Automatic Extraction and Compensation array from the annealing process at [Forumla omitted. See PDF.]. (g) The route of the real P-Bit array at [Forumla omitted. See PDF.]. As [Forumla omitted. See PDF.] gradually increased to a sufficient level, the Ising system ultimately computed the shortest and compliant tour length. For comparison, we observed that at [Forumla omitted. See PDF.], the real array failed to yield the correct results. (b,d,f,h) Their respective visiting matrix m.

View Image - Figure 16. The total energy of annealing process for 3 situations of the P-Bit array. The system of the real P-Bit array with variation is trapped in a local minimum, while the ideal P-Bit array and the real P-Bit array with the Automatic Extraction and Compensation algorithm can facilitate the systems annealing to the global minimum.

Figure 16. The total energy of annealing process for 3 situations of the P-Bit array. The system of the real P-Bit array with variation is trapped in a local minimum, while the ideal P-Bit array and the real P-Bit array with the Automatic Extraction and Compensation algorithm can facilitate the systems annealing to the global minimum.

View Image - Figure 17. (a) The coupling matrix for the factorization of a 21-bit semi-prime. (b) The illustrator for Parallel Tempering. (c) The energy trends of the replicas at different [Forumla omitted. See PDF.]. (d) The trends of factors A and B with increasing iterations.

Figure 17. (a) The coupling matrix for the factorization of a 21-bit semi-prime. (b) The illustrator for Parallel Tempering. (c) The energy trends of the replicas at different [Forumla omitted. See PDF.]. (d) The trends of factors A and B with increasing iterations.

Table 1

Truth table for AND gate.

A	B	$C_{out}$	$P_{ideal}$
0	0	0	0.25
0	1	0	0.25
1	0	0	0.25
1	1	1	0.25

Table 2

On different dimensional ferromagnetic models, we measured the statistical standard deviation of the variation extraction algorithm. The ’w/o extraction’ signifies the initial deviation predetermined in our simulations. The statistical standard deviation of the extraction algorithm is computed as follows: $σ_{α} = σ (α_{e x t r} - α_{r e a l})$ , $σ_{Δ V} = σ (Δ V_{e x t r} - Δ V_{r e a l})$ .

	w/o Extraction	1D-FM	2D-FM	3D-FM
$σ_{α}$	0.3	0.116	0.061	0.032
$σ_{Δ V}$	1 V	0.143 V	0.063 V	0.034 V

Table 3

The average path energy of 100 trials for TSP16 across 3 P-Bit arrays. The average path energy from the Automatic Extraction and Compensation P-Bit array can be brought very close to that of the ideal array.

P-Bit Array	Ideal	Real	Automatic Extraction and Compensation
Average Path Energy	17.837	34.080	18.080

References

1. Sutton, B.; Camsari, K.Y.; Behin-Aein, B.; Datta, S. Intrinsic optimization using stochastic nanomagnets. Sci. Rep.; 2017; 7, 44370. [DOI: https://dx.doi.org/10.1038/srep44370] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28295053]

2. Borders, W.A.; Pervaiz, A.Z.; Fukami, S.; Camsari, K.Y.; Ohno, H.; Datta, S. Integer factorization using stochastic magnetic tunnel junctions. Nature; 2019; 573, pp. 390-393. [DOI: https://dx.doi.org/10.1038/s41586-019-1557-9] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31534247]

3. Aadit, N.A.; Grimaldi, A.; Carpentieri, M.; Theogarajan, L.; Martinis, J.M.; Finocchio, G.; Camsari, K.Y. Massively parallel probabilistic computing with sparse Ising machines. Nat. Electron.; 2022; 5, pp. 460-468. [DOI: https://dx.doi.org/10.1038/s41928-022-00774-2]

4. Camsari, K.Y.; Salahuddin, S.; Datta, S. Implementing p-bits with embedded MTJ. IEEE Electron Device Lett.; 2017; 38, pp. 1767-1770. [DOI: https://dx.doi.org/10.1109/LED.2017.2768321]

5. Camsari, K.Y.; Sutton, B.M.; Datta, S. P-bits for probabilistic spin logic. Appl. Phys. Rev.; 2019; 6, 011305. [DOI: https://dx.doi.org/10.1063/1.5055860]

6. Camsari, K.Y.; Faria, R.; Sutton, B.M.; Datta, S. Stochastic p-bits for invertible logic. Phys. Rev. X; 2017; 7, 031014. [DOI: https://dx.doi.org/10.1103/PhysRevX.7.031014]

7. Yin, J.; Liu, Y.; Zhang, B.; Du, A.; Gao, T.; Ma, X.; Dong, Y.; Bai, Y.; Lu, S.; Zhuo, Y. et al. Scalable Ising computer based on ultra-fast field-free spin orbit torque stochastic device with extreme 1-bit quantization. Proceedings of the 2022 International Electron Devices Meeting (IEDM); San Francisco, CA, USA, 3–7 December 2022; IEEE: New York, NY, USA, 2022; pp. 36.1.1-36.1.4.

8. Wang, T.; Roychowdhury, J. Oscillator-based Ising machine. arXiv; 2017; arXiv: 1709.08102

9. Dutta, S.; Khanna, A.; Assoa, A.; Paik, H.; Schlom, D.G.; Toroczkai, Z.; Raychowdhury, A.; Datta, S. An Ising Hamiltonian solver based on coupled stochastic phase-transition nano-oscillators. Nat. Electron.; 2021; 4, pp. 502-512. [DOI: https://dx.doi.org/10.1038/s41928-021-00616-7]

10. Albertsson, D.I.; Zahedinejad, M.; Houshang, A.; Khymyn, R.; Åkerman, J.; Rusu, A. Ultrafast Ising Machines using spin torque nano-oscillators. Appl. Phys. Lett.; 2021; 118, 112404. [DOI: https://dx.doi.org/10.1063/5.0041575]

11. Houshang, A.; Zahedinejad, M.; Muralidhar, S.; Chȩciński, J.; Khymyn, R.; Rajabali, M.; Fulara, H.; Awad, A.A.; Dvornik, M.; Åkerman, J. Phase-binarized spin Hall nano-oscillator arrays: Towards spin Hall Ising machines. Phys. Rev. Appl.; 2022; 17, 014003. [DOI: https://dx.doi.org/10.1103/PhysRevApplied.17.014003]

12. McGoldrick, B.C.; Sun, J.Z.; Liu, L. Ising machine based on electrically coupled spin Hall nano-oscillators. Phys. Rev. Appl.; 2022; 17, 014006. [DOI: https://dx.doi.org/10.1103/PhysRevApplied.17.014006]

13. Zhang, B.; Lin, Z.; Liu, Y.; Wang, Y.; Zhang, D.; Gao, T.; Zeng, L. Compact Programmable True Random Number Generator Based on Spin Torque Nano-Oscillator. IEEE Trans. Nanotechnol.; 2022; 21, pp. 648-654. [DOI: https://dx.doi.org/10.1109/TNANO.2022.3216844]

14. Woo, K.S.; Kim, J.; Han, J.; Kim, W.; Jang, Y.H.; Hwang, C.S. Probabilistic computing using Cu0. 1Te0. 9/HfO2/Pt diffusive memristors. Nat. Commun.; 2022; 13, 5762. [DOI: https://dx.doi.org/10.1038/s41467-022-33455-x] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36180426]

15. Rhee, H.; Kim, G.; Song, H.; Park, W.; Kim, D.H.; In, J.H.; Lee, Y.; Kim, K.M. Probabilistic computing with NbOx metal-insulator transition-based self-oscillatory pbit. Nat. Commun.; 2023; 14, 7199. [DOI: https://dx.doi.org/10.1038/s41467-023-43085-6] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37938550]

16. Wang, Z.; Marandi, A.; Wen, K.; Byer, R.L.; Yamamoto, Y. Coherent Ising machine based on degenerate optical parametric oscillators. Phys. Rev. A; 2013; 88, 063853. [DOI: https://dx.doi.org/10.1103/PhysRevA.88.063853]

17. Marandi, A.; Wang, Z.; Takata, K.; Byer, R.L.; Yamamoto, Y. Network of time-multiplexed optical parametric oscillators as a coherent Ising machine. Nat. Photonics; 2014; 8, pp. 937-942. [DOI: https://dx.doi.org/10.1038/nphoton.2014.249]

18. Inagaki, T.; Haribara, Y.; Igarashi, K.; Sonobe, T.; Tamate, S.; Honjo, T.; Marandi, A.; McMahon, P.L.; Umeki, T.; Enbutsu, K. et al. A coherent Ising machine for 2000-node optimization problems. Science; 2016; 354, pp. 603-606. [DOI: https://dx.doi.org/10.1126/science.aah4243]

19. McMahon, P.L.; Marandi, A.; Haribara, Y.; Hamerly, R.; Langrock, C.; Tamate, S.; Inagaki, T.; Takesue, H.; Utsunomiya, S.; Aihara, K. et al. A fully programmable 100-spin coherent Ising machine with all-to-all connections. Science; 2016; 354, pp. 614-617. [DOI: https://dx.doi.org/10.1126/science.aah5178]

20. Honjo, T.; Sonobe, T.; Inaba, K.; Inagaki, T.; Ikuta, T.; Yamada, Y.; Kazama, T.; Enbutsu, K.; Umeki, T.; Kasahara, R. et al. 100,000-spin coherent Ising machine. Sci. Adv.; 2021; 7, eabh0952. [DOI: https://dx.doi.org/10.1126/sciadv.abh0952]

21. Takemoto, T.; Hayashi, M.; Yoshimura, C.; Yamaoka, M. A 2×30k-Spin Multi-Chip Scalable CMOS Annealing Processor Based on a Processing-in-Memory Approach for Solving Large-Scale Combinatorial Optimization Problems. IEEE J. Solid-State Circuits; 2019; 55, pp. 145-156. [DOI: https://dx.doi.org/10.1109/JSSC.2019.2949230]

22. Yamaoka, M.; Yoshimura, C.; Hayashi, M.; Okuyama, T.; Aoki, H.; Mizuno, H. 24.3 20k-spin Ising chip for combinational optimization problem with CMOS annealing. Proceedings of the 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers; San Francisco, CA, USA, 22–26 February 2015; IEEE: New York, NY, USA, 2015; pp. 1-3.

23. Su, Y.; Kim, H.; Kim, B. CIM-spin: A scalable CMOS annealing processor with digital in-memory spin operators and register spins for combinatorial optimization problems. IEEE J. Solid-State Circuits; 2022; 57, pp. 2263-2273. [DOI: https://dx.doi.org/10.1109/JSSC.2021.3139901]

24. Su, Y.; Kim, T.T.H.; Kim, B. Flexspin: A scalable CMOS Ising machine with 256 flexible spin processing elements for solving complex combinatorial optimization problems. Proceedings of the 2022 IEEE International Solid-State Circuits Conference (ISSCC); San Francisco, CA, USA, 20–26 February 2022; IEEE: New York, NY, USA, 2022; Volume 65, pp. 1-3.

25. Bae, J.; Oh, W.; Koo, J.; Yu, C.; Kim, B. CTLE-Ising: A Continuous-Time Latch-Based Ising Machine Featuring One-Shot Fully Parallel Spin Updates and Equalization of Spin States. IEEE J. Solid-State Circuits; 2023; 59, pp. 173-183. [DOI: https://dx.doi.org/10.1109/JSSC.2023.3318586]

26. Shin, J.H.; Jeong, Y.J.; Zidan, M.A.; Wang, Q.; Lu, W.D. Hardware acceleration of simulated annealing of spin glass by RRAM crossbar array. Proceedings of the 2018 IEEE International Electron Devices Meeting (IEDM); San Francisco, CA, USA, 1–5 December 2018; IEEE: New York, NY, USA, 2018; pp. 3.3.1-3.3.4.

27. Abeed, M.A.; Bandyopadhyay, S. Low energy barrier nanomagnet design for binary stochastic neurons: Design challenges for real nanomagnets with fabrication defects. IEEE Magn. Lett.; 2019; 10, 4504405. [DOI: https://dx.doi.org/10.1109/LMAG.2019.2929484]

28. Rahman, R.; Bandyopadhyay, S. The Strong Sensitivity of the Characteristics of Binary Stochastic Neurons Employing Low Barrier Nanomagnets to Small Geometrical Variations. IEEE Trans. Nanotechnol.; 2023; 22, pp. 112-119. [DOI: https://dx.doi.org/10.1109/TNANO.2023.3244139]

29. Zhang, B.; Liu, Y.; Gao, T.; Zhang, D.; Zhao, W.; Zeng, L. Time Division Multiplexing Ising Computer Using Single Tunable True Random Number Generator Based on Spin Torque Nano-Oscillator. Proceedings of the 2021 IEEE International Electron Devices Meeting (IEDM); San Francisco, CA, USA, 11–16 December 2021; IEEE: New York, NY, USA, 2021; pp. 27.6.1-27.6.4.

30. Liu, Y.; Gao, T.; Zhang, B.; Wang, Y.; Zhang, D.; Zeng, L. Time-division multiplexing Ising computer using single stochastic magnetic tunneling junction. IEEE Trans. Electron Devices; 2022; 69, pp. 4700-4707. [DOI: https://dx.doi.org/10.1109/TED.2022.3184651]

31. Lv, Y.; Bloom, R.P.; Wang, J.P. Experimental demonstration of probabilistic spin logic by magnetic tunnel junctions. IEEE Magn. Lett.; 2019; 10, 4510905. [DOI: https://dx.doi.org/10.1109/LMAG.2019.2957258]

32. International Neural Network Society (INNS)IEEE Neural Network Council Cooperating Societies Osborn, T.R. Fast teaching of Boltzmann machines with local inhibition. Proceedings of the International Neural Network Conference, Palais Des Congres; Paris, France, 9–13 July 1990; Springer: Dordrecht, The Netherlnds, 1990; 785.

33. Salakhutdinov, R.; Hinton, G. Deep Boltzmann machines. Proceedings of the Artificial Intelligence and Statistics, PMLR; Clearwater Beach, FL, USA, 16–18 April 2009; pp. 448-455.

34. Tsai, C.H.; Yu, W.J.; Wong, W.H.; Lee, C.Y. A 41.3/26.7 pJ per neuron weight RBM processor supporting on-chip learning/inference for IoT applications. IEEE J. Solid-State Circuits; 2017; 52, pp. 2601-2612. [DOI: https://dx.doi.org/10.1109/JSSC.2017.2715171]

35. Patel, S.; Canoza, P.; Salahuddin, S. Logically synthesized and hardware-accelerated restricted Boltzmann machines for combinatorial optimization and integer factorization. Nat. Electron.; 2022; 5, pp. 92-101. [DOI: https://dx.doi.org/10.1038/s41928-022-00714-0]

36. Kaiser, J.; Borders, W.A.; Camsari, K.Y.; Fukami, S.; Ohno, H.; Datta, S. Hardware-aware in situ learning based on stochastic magnetic tunnel junctions. Phys. Rev. Appl.; 2022; 17, 014016. [DOI: https://dx.doi.org/10.1103/PhysRevApplied.17.014016]

37. Ackley, D.H.; Hinton, G.E.; Sejnowski, T.J. A learning algorithm for Boltzmann machines. Cogn. Sci.; 1985; 9, pp. 147-169.

38. Neal, R.M. Connectionist learning of belief networks. Artif. Intell.; 1992; 56, pp. 71-113. [DOI: https://dx.doi.org/10.1016/0004-3702(92)90065-6]

39. Faria, R.; Kaiser, J.; Camsari, K.Y.; Datta, S. Hardware design for autonomous bayesian networks. Front. Comput. Neurosci.; 2021; 15, 584797. [DOI: https://dx.doi.org/10.3389/fncom.2021.584797] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33762919]

40. Si, J.; Yang, S.; Cen, Y.; Chen, J.; Huang, Y.; Yao, Z.; Kim, D.J.; Cai, K.; Yoo, J.; Fong, X. et al. Energy-efficient superparamagnetic Ising machine and its application to traveling salesman problems. Nat. Commun.; 2024; 15, 3457. [DOI: https://dx.doi.org/10.1038/s41467-024-47818-z]

41. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat.; 1951; 22, pp. 79-86. [DOI: https://dx.doi.org/10.1214/aoms/1177729694]

42. Dutta, S.; Khanna, A.; Gomez, J.; Ni, K.; Toroczkai, Z.; Datta, S. Experimental demonstration of phase transition nano-oscillator based Ising machine. Proceedings of the 2019 IEEE International Electron Devices Meeting (IEDM); San Francisco, CA, USA, 7–11 December 2019; IEEE: New York, NY, USA, 2019; pp. 37-38.

43. Danouchi, K.; Soumah, L.; Bouchard, C.; Disdier, F.; Fassatoui, A.; Phan, N.T.; Ezzadeen, M.; Delaet, B.; Viala, B.; Prenat, G. et al. Designing networks of resistively-coupled stochastic Magnetic Tunnel Junctions for energy-based optimum search. Proceedings of the 2023 International Electron Devices Meeting (IEDM); San Francisco, CA, USA, 9–13 December 2023; IEEE: New York, NY, USA, 2023; pp. 1-4.

44. Hayakawa, K.; Kanai, S.; Funatsu, T.; Igarashi, J.; Jinnai, B.; Borders, W.; Ohno, H.; Fukami, S. Nanosecond random telegraph noise in in-plane magnetic tunnel junctions. Phys. Rev. Lett.; 2021; 126, 117202. [DOI: https://dx.doi.org/10.1103/PhysRevLett.126.117202] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33798384]

45. Wang, C.; Hyman, J.D.; Percus, A.; Caflisch, R. Parallel tempering for the traveling salesman problem. Int. J. Mod. Phys. C; 2009; 20, pp. 539-556. [DOI: https://dx.doi.org/10.1142/S0129183109013893]

46. Aadit, N.A.; Grimaldi, A.; Carpentieri, M.; Theogarajan, L.; Finocchio, G.; Camsari, K.Y. Computing with invertible logic: Combinatorial optimization with probabilistic bits. Proceedings of the 2021 IEEE International Electron Devices Meeting (IEDM); San Francisco, CA, USA, 11–16 December 2021; IEEE: New York, NY, USA, 2021; pp. 40-43.

47. Aadit, N.A.; Mohseni, M.; Camsari, K.Y. Accelerating Adaptive Parallel Tempering with FPGA-based p-bits. Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits); Kyoto, Japan, 11–16 June 2023; IEEE: New York, NY, USA, 2023; pp. 1-2.

Word count: 8743

Show less

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

A Probabilistic Bit (P-Bit) device serves as the core hardware for implementing Ising computation. However, the severe intrinsic variations of stochastic P-Bit devices hinder the large-scale expansion of the P-Bit array, significantly limiting the practical usage of Ising computation. In this work, a behavioral model which attributes P-Bit variations to two parameters, $α$ and $Δ V$ , is proposed. Then the weight compensation method is introduced, which can mitigate $α$ and $Δ V$ of P-Bit device variations by rederiving the weight matrix, enabling them to compute as ideal identical P-Bits without the need for weights retraining. Accurately extracting the $α$ and $Δ V$ simultaneously from a large P-Bit array which is prerequisite for the weight compensation method is a crucial and challenging task. To solve this obstacle, we present the novel automatic variation extraction algorithm which can extract device variations of each P-Bit in a large array based on Boltzmann machine learning. In order for the accurate extraction of variations from an extendable P-Bit array, an Ising Hamiltonian based on a 3D ferromagnetic model is constructed, achieving precise and scalable array variation extraction. The proposed Automatic Extraction and Compensation algorithm is utilized to solve both 16-city traveling salesman problem (TSP) and 21-bit integer factorization on a large P-Bit array with variation, demonstrating its accuracy, transferability, and scalability.

Details

Title

Automatic Extraction and Compensation of P-Bit Device Variations in Large Array Utilizing Boltzmann Machine Training

Author

Zhang, Bolin¹

; Liu, Yu²; Gao, Tianqi¹; Yin, Jialiang¹; Guan, Zhenyu³; Zhang, Deming¹

; Lang, Zeng¹

¹ National Key Laboratory of Spintronics, Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China; [email protected] (B.Z.); [email protected] (Y.L.); [email protected] (T.G.); [email protected] (J.Y.); Fert Beijing Institute, MIIT Key Laboratory of Spintronics, School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
² National Key Laboratory of Spintronics, Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China; [email protected] (B.Z.); [email protected] (Y.L.); [email protected] (T.G.); [email protected] (J.Y.); School of Cyber Science and Technology, Beihang University, Beijing 100191, China; [email protected]
³ School of Cyber Science and Technology, Beihang University, Beijing 100191, China; [email protected]

First page

133

Publication year

2025

Publication date

2025

Publisher

MDPI AG

e-ISSN

2072666X

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/mi16020133

ProQuest document ID

3171136360

Automatic Extraction and Compensation of P-Bit Device Variations in Large Array Utilizing Boltzmann Machine Training

Jump to:

Full text

2. Weight Matrix Rederivation to Compensate P-Bit Array Variation

2.1. Detrimental Impact of P-Bit Array Device Variation on Ising Computation

2.2. Transferable Compensation for P-Bit Array Variation by Rederiving Weight Matrix

3. Simultaneous Device Variation Extraction for Whole P-Bit Array

3.1. Automatic Extraction Algorithm Based on Boltzmann Machine Learning

3.2. Inefficiency Extraction Based on Traditional Logic Gate

3.3. 3D Ferromagnetic Hamiltonian Model for Scalable Large P-Bit Array

4. Transferable and Scalable Ising Computations on Large P-Bit Array with Variations

4.1. Successful Solution for 16-City Traveling Salesman Problem

4.2. Successful Solution for 21-Bit Integer Factorization

Abstract

Details

Suggested sources