Implementation of Highly Reliable and

Full text

Turn on search term navigation

Introduction

To achieve high computing throughputs for data-intensive tasks in real time (e.g., autonomous driving, virtual reality, and deep learning), the development of in-memory computing (IMC) architectures is emerging as a research surge.^[^1,2^] The key concept of IMC is the implementation of arithmetic logic units in memory-based hardware to better exploit memory bandwidth and substantially improve energy efficiency during data migration. Consequently, dynamic random access memory (DRAM)- and static random access memory (SRAM)-based IMC architectures have been widely reported.^[^3,4^] However, these designs suffer inevitable shortcomings, such as in high standby power and long latencies caused by necessary write-back operations, which hinder their potential.

Emerging nonvolatile memory (eNVM)-based IMCs, owing to their advantageous energy efficiency and compatibility with CMOS process integration, have been extensively investigated,^[^5–11^] including resistive RAM (RRAM, a.k.a. memristor),^[^12,13^] phase-change memory (PCM),^[¹⁴^] ferroelectric RAM (FeRAM) or tunnel junction (FTJ),^[^15,16^] multiferroic material-based resistive memory,^[¹⁷^] and spin-transfer/spin–orbit torque (STT/SOT)-magnetic random access memory (MRAM).^[^18–20^] Particularly, SOT-MRAM is considered as a promising candidate for IMC as it demonstrates competitive switching speeds (subnanosecond, ns) and almost infinite endurance cycles owing to the decoupled write and read paths. For example, recent reports have revealed the potential of discrete SOT-MRAM devices that perform complete Boolean logic operations^[^21–23^] and neural network computing.^[^24,25^]

Moreover, several issues remain for SOT-MRAMs to be competitive. SOT-driven perpendicular magnetization switching typically requires the simultaneous application of an in-plane external magnetic field, increasing the design and operation complexity.^[^26,27^] A third terminal is necessary for SOT-MRAM cells to supply in-plane write current, which causes the cell area penalty compared to STT-MRAM.^[²⁸^]

Furthermore, exploring IMC applications with minimal electronics hardware design is paramount, particularly embedded systems pertaining to power and area consumption.^[²⁹^] Nowadays, the binary digital technology of magnetic tunnel junctions (MTJs) has experienced steady improvements in device density and cost.^[^30–36^] However, the further miniaturization of SOT-MTJs and access transistors has reached the physical limitations.^[³⁷^] Hence, developing a new type of SOT-MTJ device that equips multilevel states is imperative for handling the aforementioned issues, with a high speed, low power, and mitigated design complexity in both the SOT-MRAM cells array and periphery circuits. Some of the reported multistate realizations are based on in-plane magnetic anisotropic structures or geometric device fabrications with domain wall motion (DWM) models, e.g., the experimental demonstration of the two-level device based on DWM in a spin valve,^[³⁸^] three-level device with the half-ring shape,^[³⁹^] and four-state MTJ switchable with SOT.^[⁴⁰^] Despite the available reports on pure simulation and prototype DW-MTJ devices,^[^8,41–45^] development is hindered by system integration and operation efficiency limits. Explicitly, to date, few functional implementations exist of SOT-MTJ devices with reliable and switchable multistates in a synthetic, CMOS compatible, and field-free integration.

In this study, we experimentally implemented a novel DW-based SOT prototype cell structure with a multistate corresponding to the specific ratio of the component in a free layer (FL) that is parallel or antiparallel to the magnetization of the reference layer (RL) to address the data processing parallelism and power overhead issues of binary SOT-MTJ devices. Owing to unique DW nucleation and manipulation mechanisms, the reliable multistate with field-free deterministic magnetization switching characteristics is achieved through the tailoring synergistic effect of SOT and interfacial Dzyaloshinskii–Moriya interaction (iDMI).^[⁴⁶^] Furthermore, the modulation of iDMI strength also plays an important role in achieving a lower critical current and larger switching window.^[⁴⁷^] Such cell structures enable a simplified full adder (FA) implementation with only 1 MTJ device and 30 CMOS transistors, facilitating a robust SOT-driven DW-MTJ-based IMC execution. In particular, to the best of our knowledge, this is the first study to develop an array structure based on the proposed multistate SOT-DW-MTJ cell and codesigned device–circuit architecture. Extensive device and circuit-level simulations of the SOT-DW-MTJ-based quaternary states device and FA design were conducted using a graphics processing unit-accelerated micromagnetic simulation program and Cadence simulation toolkit, respectively.^[⁴⁸^] The results demonstrate that the reliable quaternary states were expounded with an applicable CMOS compatible process window. Importantly, the write/read latency and dynamic power of the proposed design can be reduced significantly. Furthermore, the proposed DW-MTJ can perform all 16 Boolean logic functions (i.e., BUF/NOT, OR/NOR, AND/NAND, XOR/XNOR, IMP/NIMP, and RIMP/RNIMP) within a single cycle, illustrating a great potential for other promising nonvolatile in-memory computings (nv-IMCs) and non-von Neumann architecture applications.

Results and Discussion Prototype Multistate SOT-DW Devices via iDMI Modulation Heavy Metal/Ferromagnet Heterostructure with iDMI Modulation

Figure 1a illustrates the Kerr hysteresis loops of W(t)/Co₂₀Fe₆₀B₂₀(0.9)/MgO(2)/W(3)/Ru(2) film stack samples determined from a polar magneto-optic Kerr effect (p-MOKE) microscope. Evidently, the grown samples with all different W thicknesses (t) show a strong perpendicular magnetic anisotropy (PMA) in the deposited heavy metal/ferromagnet (HM/FM) heterostructures at room temperature. As illustrated in Figure 1c,d, the square out-of-plane (OOP) M–H hysteresis loops results from the vibrating sample magnetometry (VSM) measurements demonstrate excellent PMA behaviors. The film stacks were subsequently patterned using photolithography and ion milling techniques into Hall bar devices with dimensions of 15 × 50 μm. The PMA characteristics of the devices after fabrication were verified by the square-shaped anomalous Hall effect (AHE) loops (Figure 1e,f). We determined that the saturation of R_xy–H first increases significantly after increasing W(t > 3.6 nm), which is explained by overcoming the dead-layer effect at the W(t)/CoFeB interface.^[⁴⁹^] The subsequent decrease in R_xy–H with increasing t is ascribed to the shunting effect of the W layer.

View Image - Figure 1. Stacked PMA HM/FM heterostructures with modulated iDMI. a) The magneto-optical Kerr hysteresis loops of films with different W thicknesses (t = 3.6–5.2 nm) measured by p-MOKE. The inset shows the schematic of films stack. b) The down-up domain wall motion velocity versus the impulse Hz field with the domain expansion evolution demo images from sample of t = 4.7 nm captured in 4.2 s span under Hz = 38 Oe. The linear fitting lines that elaborate the DW velocity obey the creep law. c) The out-of-plane and d) In-plane magnetic hysteresis loops are measured by VSM. e,f) Magnetotransport transverse Hall resistance Rxy characteristics via AHE measuring on the patterned Hall bar samples with various W thickness under Hz and Hx sweeping fields.

Figure 1. Stacked PMA HM/FM heterostructures with modulated iDMI. a) The magneto-optical Kerr hysteresis loops of films with different W thicknesses (t = 3.6–5.2 nm) measured by p-MOKE. The inset shows the schematic of films stack. b) The down-up domain wall motion velocity versus the impulse Hz field with the domain expansion evolution demo images from sample of t = 4.7 nm captured in 4.2 s span under Hz = 38 Oe. The linear fitting lines that elaborate the DW velocity obey the creep law. c) The out-of-plane and d) In-plane magnetic hysteresis loops are measured by VSM. e,f) Magnetotransport transverse Hall resistance Rxy characteristics via AHE measuring on the patterned Hall bar samples with various W thickness under Hz and Hx sweeping fields.

To study the iDMI characteristic dependence on the W thickness in the developed HM/FM heterostructures, the impulse-field driven DWM, using spatially resolved p-MOKE technique, was used to determine the expansion trajectory of the magnetic DW by increasing the applied OOP magnetic field (H_z), as shown in the inset of Figure 1b. Through the dedicated extraction of the magnetic DWM velocity (v), Figure 1b shows the relationship between ln(v) versus H_z^−1/4 of samples with different thicknesses of W. The linear fitting shows that DWM dynamics obey the creep law closely in the presence of disorder at low magnetic fields.^[^50,51^] Notably, the iDMI at the interfaces of the developed HM/FM heterostructures can be precisely modulated with a strong dependence on the HM (W) thickness from the correlation of the slope involving the iDMI constant (D) as reflected from the derived equations below, based on the creep law (Supplementary Note 1, Supporting Information)^[^50,52,53^][Image Omitted. See PDF]where A, K_eff, and M_s refer to the exchange stiffness constant, effective magnetic field, and saturation magnetization, respectively. α₀ is proportional to the correlation length of the disorder potential and pinning strength of the disorder (Supplementary Note 1, Supporting Information). Specifically, the thinner W provides a larger iDMI constant in a thickness range of 3–5 nm, which is in line with previous reports in W-based HM/FM heterostructure systems.^[^54–56^]

With such a HM thickness-dependent iDMI tunability motivation and previous reports,^[⁵⁷^] this can be used for effective DW pinning as the experiments and simulation sections demonstrate. To examine the HM/FM heterostructure properties and interfaces quality, a series of structural investigations were performed on the representative W(t = 5.2 nm) film stack samples. Figure 2a,b show the low- and high-resolution transmission electron microscopy (TEM) images with the clear crystalline structure and respective thickness of HM and FM from the W(5.2)/Co₂₀Fe₆₀B₂₀(0.9)/MgO(2)/W(3)/Ru(2) film stack samples. Notably, the high-angle annular dark-field scanning transmission electron microscopy (HAADF-STEM) image in Figure 2c illustrates the sharp interfaces between HM/FM heterostructures, which is consistent with the energy-dispersive X-Ray spectroscopy (EDS) elementary mapping and line-scan species distribution results (Figure 2d,e), demonstrating the high quality of grown multilayer thin-film stack.

View Image - Figure 2. Structural and quality properties. a) TEM and b) HRTEM images of t = 5.2 nm films stack. Corresponding c) HAADF-STEM images and d) EDS elementary mappings with e line scan distribution profiles.

Figure 2. Structural and quality properties. a) TEM and b) HRTEM images of t = 5.2 nm films stack. Corresponding c) HAADF-STEM images and d) EDS elementary mappings with e line scan distribution profiles.

SOT-Driven Magnetic Switching and Basic Device with Effective DW Pinning

To examine the current-induced deterministic SOT switching effectiveness, the anomalous Hall resistance measurements (schematics in Figure 3b) were performed on the sample with the W(t = 5.2 nm) Hall bar devices with various assistive in-plane magnetic fields (H_x). The pulsed current sweeps from −20 to 20 mA with a pulse width of 1 ms. As depicted in Figure 3a, the effective current-induced SOT-driven perpendicular magnetization switching is validated when the applied current reaches the critical switching current density (J_c), which varies from 0.8 × 10⁷ to 2.4 × 10⁷ A cm⁻². As expected, J_c decreases by increasing the H_x strength to break the structural symmetry.

View Image - Figure 3. Basic device with effective SOT-driven magnetic switching and DW pinning. a) Current-induced SOT switching under different in-plane magnetic fields and b) schematic measurement geometry setup. c) The optical microscopy image of developed device for multistates modulation by pulsed J. d) The schematic of p-MOKE setup with chip carrier holding for in situ magnetoelectrical transport probing. e) Corresponding domain wall position images under different pulsed J illustrating the state 1, 2, 3, and 4.

Figure 3. Basic device with effective SOT-driven magnetic switching and DW pinning. a) Current-induced SOT switching under different in-plane magnetic fields and b) schematic measurement geometry setup. c) The optical microscopy image of developed device for multistates modulation by pulsed J. d) The schematic of p-MOKE setup with chip carrier holding for in situ magnetoelectrical transport probing. e) Corresponding domain wall position images under different pulsed J illustrating the state 1, 2, 3, and 4.

To further demonstrate the tunability of DW position, the PMA W(5)/MgO(2)/Co₂₀Fe₆₀B₂₀(0.9)/W(5.2)/Ru(1.5) stacked films are patterned in a stripe geometric dimension of 2 μm (width) × 50 μm (length) with a 2 μm wide DW pinning center (PC), as shown in Figure 3c. Such PCs are induced by an enhanced iDMI that are generated by thinning the top layer W using a dedicated ion beam etching (IBE) process and boosting the iDMI strength in HM/FM interfaces with strain engineering.^[^58,59^] As previously reported, the iDMI strength can be regulated by changing the distance between HM atomic planes with strong 3d–5d orbital hybridization under the strain perpendicular to the film.^[^58,59^] Evidently, the reliable DW nucleation, motion, and pinning were realized in a controlled manner under specific number of pulsed J = 0.8 × 10⁷ A cm⁻² without H_x (schematically in Figure 3d) as reflected from the differentiated p-MOKE images (Figure 3e), manifesting the validity of the subsequently designed SOT-DW-MTJ devices. By changing the current polarity, the device can be reset to the initial state and make the process reproducible. Furthermore, the DW can be pinned reliably with a robust energy barrier at room temperature after a time span of longer than 5 months since fabrication with different pinning center gaps (Supplementary Video S1, Supporting Information), further corroborating that the iDMI induced pinning mechanism rather than the sources such as line edge roughness and magnetocrystalline defects. Such DW pinning stability is also confirmed by temperature-dependent DW pinning simulations up to 450 K without collapse or dissociation (data not shown).

Proposed Scalable SOT-DW-MTJ Device with Tailored iDMI for Multistates

Motivated by the effective DW motion and pinning governed by iDMI modulation in the prototype investigations, developing scalable SOT-DW-MTJ devices is promising with both highly efficient SOT efficiency and reliably robust DW pinning by tailoring appreciable iDMI in HM/FM interfaces. We further conducted extensive simulations on the velocity of DWM as a function of iDMI strength with different spin Hall angle values, i.e., derived from the HM (α-W, β-W, β-Ta, Pt, and Au_0.9Ta_0.1) formed interfaces (Figure S1, Supporting Information). We chose β-W as the HM material in our proposed device system to promote its CMOS process compatibility for energy-efficient IMC with large SOT efficiency.^[²⁴^]

Considering the prerequisite CMOS compatibility for codesign simulations, as demonstrated in Figure 4a, the proposed scalable SOT-DW-MTJ device structure consists of a basic films stack of W/CoFeB/MgO/CoFeB/W/Ru with a bottom electrode (BE) and top electrode (TE). The MgO functions as the barrier layer, isolating two CoFeB layers with PMA. The upper CoFeB acts as the RL. The lower CoFeB is the magnetic FL, hosting the dynamic movement of DW driven by SOT with two deterministic collective coordinates, i.e., DW position and angle tilted on a $\pm x$ axis. With a rigid domain wall width governed by $δ_{W} = \sqrt{\frac{A}{K_{u}}}$ ,^[⁶⁰^] where A and K_u are the exchange stiffness constant and effective magnetic anisotropy of FL, respectively, the chiral DW is initially formed at the end of the FL layer in a specifically defined region with a low K_u of 8 × 10⁵ J m⁻³. The β-phase W with a high spin Hall angle (θ_SH) of −0.35 serves as the spin–orbit coupling (SOC) layer and is adjacent to the ferromagnetic FL.^[⁶¹^] When the current (J) is injected into the SOC layer through the BE, a perpendicular spin current is generated from the HM, owing to the spin Hall effect, and then injected into the ferromagnetic FL, consequently driving the magnetic DWM along J with exerted SOTs.^[²⁰^] The direction of the magnetic DWM is reversed in the opposite current direction.

View Image - Figure 4. Proposed SOT-DW-MTJ device, simulated magnetization characteristics of FL with DWM and designed read and write circuits. a) Schematic of device with tailored DMI between FL and HM for effective DWM. Dimension is not plotted in scale. b) 3D plot of dynamic magnetization dependence on the pulse number under discrete 0.167 ns excitation with different J; inset shows specific magnetization switching loop driven by corresponding pulse number Npulse. c) Mz/Ms change with reversed J in a reset 1 ns wide pulse. d) CMOS-compatible single SOT-DW-MTJ hybrid circuit. e,f) Schematics of sensing amplifiers implemented.

Figure 4. Proposed SOT-DW-MTJ device, simulated magnetization characteristics of FL with DWM and designed read and write circuits. a) Schematic of device with tailored DMI between FL and HM for effective DWM. Dimension is not plotted in scale. b) 3D plot of dynamic magnetization dependence on the pulse number under discrete 0.167 ns excitation with different J; inset shows specific magnetization switching loop driven by corresponding pulse number Npulse. c) Mz/Ms change with reversed J in a reset 1 ns wide pulse. d) CMOS-compatible single SOT-DW-MTJ hybrid circuit. e,f) Schematics of sensing amplifiers implemented.

To effectively convert DW position information into electrical signals with differentiated multistates in MTJ, the uniquely defined W regions, which have thinner thicknesses from pattern engineering, are embedded periodically in the SOC layer to form FL/HM interfaced PCs with tailored iDMI, i.e., the short-range antisymmetric exchange that plays an important role in DW motion manipulation. In this study, four pattern engineered FL/HM (thin W) interfaces bear an iDMI constant of 1 mJ m⁻² larger than that of the FL/HM (thick W) interface on the same iDMI sign,^[^55,62^] as listed in Table S1, Supporting Information. With this design, the effective pinning fields are formed by generating the effective potential, enabling the confinement the magnetic DW inside of the desired PCs.^[⁶³^] When the magnetic DW is pinned at an interface on top of the four engineered FL/HM (thin W regions, denoted as PC1, PC2, PC3, and PC4) during the motion driven by the optimized J pulse(s), four discrete resistance states of MTJ (S1–S4) can be achieved and switched with specific net magnetization orientations portion between FL and RL. This is conducive to the formation of reliable and programmable multilevel resistance states that exceed the bistability of conventional memory elements.

Tailored DW Motion using Engineering iDMI in SOT-DW-MTJ

To demonstrate the magnetic DW motion dynamics under different J pulse number profiles, the micromagnetic simulation is performed using the MuMax3 simulator^[⁴⁸^] to investigate the DW nucleation, motion, and dynamic switching behaviors in the proposed device. The DW motion is fundamentally the transfer of spin angular momentum to the local magnetization M along the ferromagnetic FL track.

The materials and geometric parameters of the proposed device are listed in Table S1, Supporting Information. The simulation magnetic parameters are adopted from our experimental results and other reports.^[^55,62,64^] An FL is designed with a size of 320 × 70 × 0.8 nm. The SOC layer below the FL employs the same size to first achieve magnetic DW nucleation at the extended overhang and then drive the magnetic DW to proceed along the longitudinal J direction. In the SOC layer, four PCs with a size of 20 × 70 nm are formed to modulate the pinning potential at the corresponding interfaces. PC1 and PC4 are located near each end of the RL. A 30 nm gap separates the outer boundary of the PC and end of the FL to prevent the magnetic DW from being annihilated by the oscillation during the stabilization. The other two PCs, i.e., PC2 and PC3, are identically spaced between PC1 and PC4.

Magnetic DW Nucleation

The magnetic up domain nucleation is locally created and defined with a low K_u of 8 × 10⁵ J m⁻³ at each end of FL, as shown in Figure 4a.^[^65–68^] As demonstrated by our simulation results (Figure S2 and Video S2, Supporting Information), the up-down DW is reproducible at the nucleation zone edge by applying a preset 0.167 ns pulsed current with a density of 4.77 × 10⁸A cm⁻². Consequently, the up-down DW moves into the first PC1, as illustrated in Figure 4b. In this case, the corresponding magnetic DW displacement is 30 nm from the left end of the FL. Note that, during the simulation process, the nucleation region can prevent the annihilation of the magnetic DW at the boundary, sustaining the DW in the FL once the nucleation completes. Owing to the nonvolatility of the proposed device, the power consumption of initial nucleation can be ignored by our unique device construction design, and the following sections discuss the specific implementation.

Magnetic DW Motion and Magnetization State Manipulation

Upon successful magnetic DW nucleation, an optimized current pulse can be applied at or after entering the PCs (Figure S2 and Video S2, Supporting Information). Note that large current amplitudes or prolonged pulse widths cause the magnetic DW to cross the pinning area. In contrast, small current amplitudes or short pulse widths are insufficient to drive the magnetic DW motion away from the nucleation area and PCs. Therefore, the optimized driving current window is essential in the application of SOT-DW-MTJ multistate devices with an engineered iDMI, as demonstrated in Figure S3, Supporting Information, which is consistent with the complementary results in Supplementary Video S2, Supporting Information.

To investigate the impact of the magnetic DWM process on the FL magnetization M_z, extensive simulations were performed for the transient M_z/M_s evolution in a timeframe of 0–6 ns, providing guidance on the subsequent cell structure, circuit design, and the implementation of multistate SOT-DW-MTJ devices. As shown in Figure 4b, the dynamic change of M_z/M_s is simulated driven by J from 4.52 × 10⁷ to 5 × 10⁷A cm⁻² with a 0.167 ns pulse width, accompanying a DW relaxation time of 0.78 ns that was deliberately prolonged to 1 ns for the differentiated stabilization state plotting demo. Evidently, the M_z/M_s switching strongly depends on the pulse number, governing the specific MTJ device resistance state, which concurs with the results plotted in the inset of Figure 4b. In addition, such memristive synapses promise to perform vector–matrix multiplications with flexible tunable conductance, which is a prerequisite in constructing complex artificial neural networks. For instance, the M_z/M_s value decreases from first pulse excitation event until the magnetic DW enters and stabilizes inside PC1, in turn, the device stands in state S2. The similar simultaneous state switching process from S2 to S3 was also observed during the second pulse event with damping oscillation crossing the pinning area. This oscillation is attributed to the tilting of DW during magnetic domain propagation driven by a combined Oersted field resulting from the pulsed current and effective SOT field caused by SOC.^[⁶⁹^] The DWM dynamics and tilting can be ascribed to the central position q and azimuthal angle φ of the moment in the central DW and the tilting angle β for the DW plane.^[⁷⁰^] However, the MTJ provides the fourth state S4 in which the M_z/M_s is 1, without reflecting the oscillation of the magnetic DW,^[⁷¹^] i.e., the magnetic DW escapes the effective FL area edge aligned with RL and proceeds to the PC1 or PC4. Nevertheless, by applying continuous switching pulses for writing, implementing resistance state reading is reliable immediately after 1 ns, with an oscillation stabilization as illustrated by each plateau in Figure 4c. Therefore, this device enables nanosecond level switching between adjacent states, which further enhances the application potential of SOT-DW-MTJ in high-speed IMC.

From a practical perspective, the proposed DW-SOT-MTJ device is required to state reset in a controllable and efficient manner with DWM manipulation. When the current pulses are applied in the other polarity, the direction of the magnetic DW movement is reversed accordingly and the corresponding magnetic DW position also moves from the rightmost nucleation zone to the initial position. In the reset process from state S4 to S1 with direct device switching, as shown in Figure 4c, a relatively long 1 ns wide pulse current was applied in the reverse direction with the same amplitude as the single pulse writing event. In such cases, the magnetic DW can be reset to the initial state S1 at any position within a relax time of 0.6 ns. The complete DWM evolution dynamics are detailed in Figure S3 and Video S2, Supporting Information.

Multistate Write and Read Circuit Design

As shown in Figure 4d, the proposed core multistate write and read circuit consists of 2-transistor/1-resistor (2T1R) cell structure with two sense amplifiers (SAs). In the writing process, this 2T1R cell is equipped with an access transistor connected to the BE and SOC layers, enabling to the management of the profile and direction of pulse currents injected into the SOC layer for driving DWM in SOT-DW-MTJ device FL under the pulsed gate voltage V_in control. The TE connects the other transistor for read sensing and control. To examine the comprehensive writing and reading behaviors, the device–circuit level Cadence tool was used to simulate and analyze the performance of the designed write/read circuit. Table S2, Supporting Information, lists the simulation parameters. For more specifics on the Cadence model, refer to Supplementary Note 2, Supporting Information.

Write Process

The four resistance states in the SOT-DW-MTJ device can represent binary data “00,” “01,” “10,” and “11" reliably when functioning as a key component of the MRAM configured IMC. Upon successful DW nucleation, the DWM driving access transistor, shown in Figure 4d, is turned on when V_in = “1” with appropriate circuit time clock control and a single pulse is injected with a current density of 4.78 × 10⁷A cm⁻² and pulse width of 0.167 ns simultaneously (to realize stable switching, the parameter is extracted from Figure S2, Supporting Information). Now, the current pulse generated effective SOT drives the magnetic DW into the second PC with tailored DMI to write data “01,” i.e., state S2. Subsequently, in-sequence pulse applications can switch the SOT-DW-MTJ device to the third or fourth resistance state, the storage data of “10" and “11" information, respectively. Compared to the traditional binary SOT-MTJ memory cell, the applicable four-state solution that stems from our developed SOT-DW-MTJ device can significantly increase data storage density.

When such a component is managed as a Boolean logic operation device, the frequency-encoded binary information is used as a V_in signal to control the access transistor pulsed J activation for both magnetic DW nucleation and manipulation. When input information X = “0" with a low V_in, the transistor is turned off and a current pulse is not written into the multistate device. When input X = “1,” the transistor is turned on with a high V_in, enabling a single or series of well-profiled current pulses written to the multistate SOT-DW-MTJ to achieve objective magnetization states. In this manner, the constructed cell device and circuit architecture can realize BUF/NOT operation on a single signal X, OR/NOR, AND/NAND, XOR/XNOR, IMP/NIMP, RIMP/RNIMP operations of two input signals (X and Y), and full addition of three signals (X, Y and C_i) execution. The following sections discuss this methodology.

Read Process

To read the data stored in the magnetic component, the V_in signal is set to a low level, and the V_read signal is ramped up to a high level. At this time, the induced current generated by the upper circuit flows into the MTJ from the TE, and a conductive channel forms between the TE and BE. The induced voltage V_sen of the MTJ is extracted at the node as shown in Figure 4d. With 5000 iterations of the Monte Carlo simulation, the corresponding sensing voltage distribution manifests for four independent resistance state, as depicted in Figure 5a in the analysis with a standard deviation of 3σ at 27 °C. When the circuit is stable, the minimum point of the sensing margin of V_sen is between V_sen1 and V_sen2, and the sensing margin can be maintained above 110 mV for a robust reading reliability.^[⁷²^] Therefore, the reference voltage (V_ref1, V_ref2, and V_ref3) generated by the three reference resistors (Ref₁, Ref₂, and Ref₃) is used to distinguish the four resistance states. Two precharge SAs (SA₁ and SA₂) are employed to output the reading results of the device.^[^72,73^]

View Image - Figure 5. Sensing window verification and proposed IMC architecture based on developed multistate SOT-DW-MTJ devices. a) Monte Carlo simulation results of individual state sensing voltage distribution. b–e) Transient simulation results of four states reading operation. f) Computing in memory architecture based on binary MRAM subarray and SOT-DW-MTJ building blocks.

Figure 5. Sensing window verification and proposed IMC architecture based on developed multistate SOT-DW-MTJ devices. a) Monte Carlo simulation results of individual state sensing voltage distribution. b–e) Transient simulation results of four states reading operation. f) Computing in memory architecture based on binary MRAM subarray and SOT-DW-MTJ building blocks.

Specifically, when reading information of device is required, the enable signal EN_sen1 (marked in Figure 4e) is turned on. Hence, V_ref1 and V_sen generated by Ref₁ and device function as input to SA₁ within the same clock period, completing the reading of the upper bit. If the device is at the resistance state S1 or S2, the output result of SA₁, i.e., V_out1 in Figure 4e, is “0.” However, if the device is at resistance state S3 or S4, the result of the V_out1 output is “1.” This allows V_out1 to be used as the gate voltage control in the branch where Ref₃ is located, offering to set its opposite signal $\bar{V_{out1}}$ and control the turn-on/off process of the branch with Ref₂. If V_out1 = “0,” V_ref2 functions as input to SA₂. If V_out1 = “1,” V_ref3 functions as input to SA₂. In this case, the system triggers the enable signal EN_sen2 (Figure 4f) and turns on SA₂ to achieve lower bit reading. For example, if the device stores the “00,” the sensing voltage (V_sen) is smaller than V_ref1 and the output of SA₁ is “0.” Furthermore, the output of $\bar{V_{out2}}$ is high and results in a comparison between V_sen and V_ref2, i.e., the output V_out2 of SA₂ is “0” because V_sen < V_ref2. Therefore, the result of the read circuit is “V_out1, V_out2” = “0, 0,” as shown in Figure 5b. Similarly, the result of “V_out1, V_out2” is “0, 1,” “1, 0,” or “1, 1” when the resistance state of the cell is S2, S3, or S4, as shown in Figure 5c–e, respectively.

DW-SOT-MTJ-Based nv-IMC Implementation IMC Circuit Architecture

The developed multistate memory device is further exploited to function simultaneously as a logic operation unit in the operation of the IMC architecture,^[⁷⁴^] as demonstrated in Figure 5f. The system is constructed by a binarized MRAM storage array ① and corresponding individual column SAs (SA') ② whose output differentiates the binary MRAM unit signals as high or low level. If MRAM stores data 1 (0), SA’ outputs 1 (0). The output result of SA’ is used as the word line signal of the logic operation array ③ that consists of multistate devices in the same column to implement Boolean logic operations in this configured IMC architecture. The result of the compute process is recognized by the output units (i.e., SA₁ and SA₂), where the results can be stored in other storage units of MRAM, reducing redundant data transmission in traditional von Neumann architecture.

Boolean Logic Operations Design and Control

Owing to the reliable multistate interconversion characteristics, the SOT-DW-MTJ device functions as an elementary storage unit for MRAM, and warrants further exploration to implement a series of single and full set Boolean logic functions, e.g., BUF/NOT, OR/NOR, AND/NAND, XOR/XNOR, IMP/NIMP, and RIMP/RNIMP. The complete operation cycle comprises the reset, compute, and reading modes.^[⁷⁴^] At the beginning of each operation-reading cycle, a reset operation is executed. In brief, V_reset is set to high and enables the reset pulse passing through the SOC layer to initialize the SOT-DW-MTJ device resistance state to S1. In the compute mode, the input signal carries out the Boolean operation within the pulse of V_in to control the device writing. If input signal X = “1,” the write transistor connected to the BE is turned on, and a single current pulse is injected into the SOT-DW-MTJ device to switch the resistance state. When X = “0,” the transistor is turned off and the resistance state of the device is unchanged. Similarly, the second input signal Y is asynchronously enabled after the X signal to regulate the position of the magnetic DW in FL, and, in turn, to switch the resistance state of the device and proceed with the corresponding logic operation as elucidated below. The reading of computational results is determined by the selected reference resistance and SA output. Table S3, Supporting Information, lists the associated operation and output results of a single input X, as well as X and Y input cases for the multistate device.

BUF/NOT Operation

For the NOT operation, the X input functions as the gate voltage of the write access transistor and records the information in the device. In the same manner as the read operation, the enable signal EN_sen1 = “1.” Because the resistance state of the multistate device under a single signal operation is S1 (X = “0”) or S2 (X = “1”), V_out1 = “0. Then, the enable signal EN_sen2 = “1” and $\bar{V_{out1}}$ control the branch where Ref ₂ is turned on, whereas an output of SA₂–V_out2 is the data of “BUF X” and the $\bar{V_{out2}}$ output is the value of NOT X.

AND/NAND and XOR/XNOR Operation

Using multistate devices and two SAs, X AND/NAND Y and X XOR/XNOR Y operations can be realized. Input X and Y asynchronously control the write transistor and write the data contained to the multistate memory cell. For example, when “X = 0, Y = 0,” the device maintains S1; whereas when X and Y are 1, the device receives a single write pulse and switches from S1 to S2. Finally, if “X = 1, Y = 1,” the device receives two pulses and switches to S3. When the circuit reads the computation results, V_sen is compared to V_ref1. Only when the resistance state is “X = 1, Y = 1,” V_out1 is “1.” Therefore, the V_out1 corresponding to SA₁ realizes the output of the calculated result of the X AND Y logic operation, and the other output of SA₁– $\bar{V_{out1}}$ outputs the X NAND Y result.

Upon the output of SA₁ stabilization, the system sets the enable signal EN_sen2 = “1,” which starts the sense process of SA₂. The data of the lower bit are then output through V_out2. When “X = 0, Y = 0" and “X = 1, Y = 1,” V_out2 = “0.” In contrast, when “X = 0, Y = 1" or “X = 1, Y = 0,” the output of V_out2 is “1.” Therefore, V_out2 outputs the result of X XOR Y, and another output terminal of SA₂, i.e., $\bar{V_{out2}}$ , outputs the result of X XNOR Y.

OR/NOR Operation

For OR/NOR operations, the compute process is identical to that of XOR/XNOR. The information of X and Y is input asynchronously to drive the movement of the magnetic DW. When reading the related result, the reference voltage signal input changes to SA₁ during the OR/NOR operation. At this time, the enable signal EN_sen1 is “0” to turn off Ref ₁ and be replaced by EN_OR = “1,” generating the output of the reference voltage signal V_ref2'. The amplitude of V_ref2’ is between V_sen1 and V_sen2 to distinguish V_sen1 and V_sen2. When “X = 0, Y = 0,” V_out1 is “0” because of <V_ref2’. Otherwise, V_out1 = “1.” Therefore, with this configuration, the output V_out1 of SA₁ does not output the result of the AND logic operation, but obtains X OR Y, and another terminal $\bar{V_{out1}}$ outputs X NOR Y. Figure S4, Supporting Information, shows the other Boolean logic operations. This design methodology can be fan out for the on-demand IMC implementation efficiently. Figure 6a presents the circuit-level simulation results, indicating the developed basic units and circuit configuration and implementation of a series of Boolean logic operations.

View Image - Figure 6. Specific transient simulation results and verification of Boolean logic and FA implementation. a) Different full set Boolean logic functions implemented and b) the effects verification from FA constructed by SOT-DW-MTJ devices.

Figure 6. Specific transient simulation results and verification of Boolean logic and FA implementation. a) Different full set Boolean logic functions implemented and b) the effects verification from FA constructed by SOT-DW-MTJ devices.

Moreover, Table S3, Supporting Information, lists the comparison between different multifunction logic gates. The proposed design realizes all 16 Boolean logic operations and FA, which achieves the most logic computations using the same structure. Although the read and write circuit involve a certain transistor cost, the lowest average cost of each logic operation reduces the relative complexity of the proposed design. Importantly, the reduction of complexity provides the flexibility of the device in diversified operations. This further demonstrates that the proposed design is reconfigurable. In addition to the FA implementation, more logic operation functions can be performed by changing the circuit control signal instead of the device structure. Further, from a cell-level comparison of multistate devices as illustrated in Table S5, Supporting Information, our designed SOT-DW-MTJ cell structure also demonstrates a competitive performance in average write energy, latency, area, and CMOS compatibility aspects.

The developed read–write circuit unit manifests the asynchronous input of random binary signals C_i, X, and Y to realize the 1-bit magnetic FA operation function, as addition is the basic operation of the foundational arithmetic unit.^[^75,76^] In practice, C_i can also be the carry signal output of the previous adder arithmetic output unit SA₁. The operation cycle follows the reset, compute, and reading processes. The multistate SOT-DW-MTJ device realizes the storage and operation of the input signal and uses the output signal V_out1 to determine the reading operation result C_i+1, and the output V_out2 of SA₂ produces the reading of sum operation. Note that the cascading capability enables sufficient flexibility for extending the 1-bit FAs to multibit structure. In the circuit-level simulation, the results of all addition operations were specified in sequential logic, as shown in Figure 6b.

Compared with other conventional MTJ-based FAs as listed in Table S6, Supporting Information, our device can realize FA computations using a single MTJ and fewer peripheral MOS components, which is conducive to the optimization of the array area. SOT-driven DWMs have nanosecond writing and reading times, laying a foundation for high-speed IMC applications. Although all arithmetic units based on nonvolatile MTJ designs show zero static power consumption, this study almost significantly reduces the dynamic power consumption. Nevertheless, the overall performance requires further optimization to enhance competitiveness. Further study can lead to more in-depth hardware research in magnetic compressors,^[⁷⁷^] SOT-MRAM array integration, and in-memory neural network computing.^[^7,78^]

Conclusion

In summary, based on experimentally corroborated prototype domain wall devices, we demonstrated a new type of reliable MTJ single cell with multidiscrete memory states for IMC applications, implementing with FA and enriched Boolean logic operations. The synergistic utilization development of SOT and iDMI at interfaces between the ferromagnetic FL and heavy metal provides an effective magnetic domain wall motion and pinning manipulation, facilitating multistate switching under full electric field control. To the best of our knowledge, the specific multistate read and write circuit with an IMC architecture based on our developed device and cell structure were demonstrated for the first time with enhanced latency and power efficiency. The proposed design realizes all 16 Boolean logic functions and FA operation with a significantly reduced complexity of device units and peripheral circuits. This enables fan-out and cascading commercial MTJ functions and promises futuristic reconfigurable and high-speed IMC applications, for instance, edge computing for embedded AIoT and neuromorphic computing.

Experimental Section Thin-Film Deposition and Characterization

A series of multilayer heterostructure thin films were deposited on thermally oxidized silicon Si/SiO₂ substrates using DC and RF magnetron sputtering under a base pressure lower than 1.0 × 10⁻⁷ Torr at room temperature, and then an annealing process at 350 °C was then implemented inside a high vacuum chamber of 4 × 10⁻⁸ Torr for 1 h. The fabricated samples were mounted on a precise, nonmagnetic XYZ translation stage at room temperature. An electromagnet with a magnetic field strength of up to ≈10 000 G and 0.2 G resolution at the surface was applied an external out-of-plane magnetic field. Employing a Hall effect sensor, the direction and strength of the magnetic field were controlled and mapped during data collection. In MOKE imaging, a narrowband light-emitting diode illuminated the target sample, and a high bit-depth CMOS camera was used. A × 50 long-working-distance objective lens (Nikon) was used for MOKE imaging measurements. The magnetic anisotropy of the film sample was evaluated using VSM measured at room temperature. The TEM, STEM, and EDS (JEOL JEM-ARM200F) were used to characterize the structure of the cross-sectional HM/FM heterostructures.

Device Patterning and Electrical Measurement

The film stacked samples were patterned as Hall geometry using a standard photolithography process, and excrescent areas were etched by ion milling, followed by thermal evaporated Ti(20 nm)/Au(80 nm) electrodes at the end of the Hall probes. All electrical measurements were performed in a Physical Property Measurement System (PPMS). Multiple Keithley source meters (Keithley 2400, 2182, and 6221) were connected to the PPMS, enabling comprehensive transport measurements for the Hall bar devices. A constant of 100 μA DC was applied for the Hall measurement. In the switching measurement, 1 ms pulsed write current was first applied. After 100 ms, another small current (read current, 0.5 mA for 1 s) was applied and the Hall voltage signal was picked up simultaneously. The device temperature during the write pulse application was extracted by monitoring the longitudinal resistance. The temperature increased during the read pulse application was negligible.

Micromagnetic Simulation

Micromagnetic simulations were performed using MuMax3.^[⁴⁸^] Magnetic devices were initially relaxed with full energy minimization using a conjugate gradient method for the thin plate of a chiral magnet with a size of 320 × 70 × 0.8 nm on a 160 × 35 × 1 mesh. As described by Heisenberg's formalism for the exchange interaction, the magnetic moment originates from all electron angular momentum, and the total Hamiltonian is the sum of multiple magnetic interactions. The dynamics of the DW magnetic moments induced by antidamping-like SOT is governed by the Landau–Lifshitz–Gilbert (LLG) equation^[⁷⁹^][Image Omitted. See PDF]where γ is the gyromagnetic ratio, α the Gilbert damping constant, and $\hat{m}$ the unit vector along the magnetization of the ferromagnetic FL. $\hat{y}$ is the unit vector perpendicular to both the current $\hat{x}$ and inversion asymmetry $\hat{z}$ directions. By neglecting the field-like SOT for simplicity, $τ_{SOT} = (ℏ /2 e) (θ_{SH} / M_{S} t_{FL}) J_{C}$ describes the magnitude of the antidamping-like SOT. In this case, H_eff is considered as^[⁸⁰^][Image Omitted. See PDF]where A_ex is the exchange stiffness constant, K_d is the DW hard-axis anisotropy, and D₀ is the iDMI constant. In the present system, the current flowing along the x direction induces a SOT that exerts moments toward the y direction, leading the moment switching into the z direction driven by the torque from the iDMI effective field H_DMI. This antisymmetric exchange contributes to the total magnetic exchange interaction between two neighboring magnetic spins, S_i and S_j. Quantitatively, this is a term in the Hamiltonian that can be written as^[⁸¹^][Image Omitted. See PDF]

The H_SO is the effective magnetic field for the antidamping-like SOT and is expressed as[Image Omitted. See PDF]where μ_B, e, μ₀, and t are the Bohr magneton, electron charge, vacuum permeability, and FM film thickness, respectively. The corresponding dependence of effective anisotropy (K_eff) and PMA constant (K_u) were extracted. K_eff is calculated by K_eff = M_sH_s/2, where H_s is the saturation field obtained from the in-plane M–H curves. K_u is determined by K_u = M_sH_K/2 = M_s (H_s + 4πM_s)/2, where 4πM_S is the demagnetization field. The value of M_s is obtained by dividing the magnetic moment by the total volume of the Co₂₀Fe₆₀B₂₀ layers. The effective magnetic anisotropy field (H_K) is evaluated by fitting the hard-axis magnetic field dependence of R_xy–H loops.

Circuit Simulation

For the circuit-level simulation, a Verilog-A model of a SOT-DW-MTJ device was developed to cosimulate with the CMOS peripheral circuit using SPICE simulators. A foundry's 28 nm Product Development Kit (PDK) was used to verify the proposed design and assess the performance of Boolean logic operations, such as NOT, OR/NOR, AND/NAND, and XOR/XNOR.

Acknowledgements

This work was supported in part by the National Key R&D Program under grant nos. 2021YFB3601300 and 2019YFB2205100, the National Natural Science Foundation of China under grant nos. 62074164, 61888102, and 61821091, the Director Fund of Institute of Microelectronics and the Dedicated Fund of Chinese Academy of Sciences (grant nos. E0SR023002, E0ZR223010, and E0YR063004), and the Strategic Priority Research Program of the Chinese Academy of Sciences under grant no. XDB44010100. The authors would like to acknowledge the thin films stack samples depositions support from Qingdao Research Institute, Beihang University and also are grateful for the fruitful discussions with Dr. Kaihua Cao and Dr. Zhaohao Wang.

Conflict of Interest

The authors declare no conflict of interest.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Word count: 6846

Show less

© 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Emerging in‐memory computing (IMC) technology promises to tackle the memory wall bottleneck in modern systems. Promoted as a promising building block, nonvolatile spin–orbit torque (SOT) memory devices with sub‐ns and sub‐pJ processing capabilities are thereby extensively pursued. Herein, a new type of domain wall device is experimentally presented with multistates driven by nonvolatile SOT and Dzyaloshinskii–Moriya interaction, enabling time and energy‐efficient IMC with a full adder (FA) implementation based on magnetic tunnel junctions. Complementary micromagnetic and device–circuit cosimulation results show that the write/read latency of the proposed FA can be shortened to 1.25 ns/0.22 ns with an averaged writing energy of 8.41 fJ bit⁻¹, and the overall dynamic power is 26.25 μW, which is 4.43–51.96 times lower than state‐of‐the‐art alternatives. Moreover, the developed architecture can perform all 16 Boolean logic functions, warranting an extensive arithmetic operation. The experimental, micromagnetic, and circuit‐level simulation results show great potential in both fundamental research and new trajectories in technology development for nonvolatile in‐memory computing applications.

Details

Title

Implementation of Highly Reliable and Energy‐Efficient Nonvolatile In‐Memory Computing using Multistate Domain Wall Spin–Orbit Torque Device

Author

Lin, Huai¹; Xu, Nuo²; Wang, Di¹; Liu, Long¹; Zhao, Xuefeng³; Zhou, Yongjian⁴; Luo, Xuming⁵; Cheng, Song⁴; Yu, Guoqiang⁵; Xing, Guozhong¹

¹ Key Laboratory of Microelectronic Devices & Integrated Technology, Institute of Microelectronics, Chinese Academy of Sciences, Beijing, P. R. China; University of the Chinese Academy of Sciences, Beijing, P. R. China
² Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
³ Key Laboratory of Microelectronic Devices & Integrated Technology, Institute of Microelectronics, Chinese Academy of Sciences, Beijing, P. R. China; University of the Chinese Academy of Sciences, Beijing, P. R. China; School of Microelectronics, University of Science and Technology of China, Hefei, Anhui, P. R. China
⁴ Key Laboratory of Advanced Materials (MOE), School of Materials Science and Engineering, Tsinghua University, Beijing, P. R. China
⁵ Institute of Physics, Chinese Academy of Sciences, Beijing, China; Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing, China

Section

Research Articles

Publication year

2022

Publication date

Sep 2022

Publisher

John Wiley & Sons, Inc.

e-ISSN

26404567

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/aisy.202200028

ProQuest document ID

2716403911

Implementation of Highly Reliable and Energy‐Efficient Nonvolatile In‐Memory Computing using Multistate Domain Wall Spin–Orbit Torque Device

Jump to:

Full text

Abstract

Details

Suggested sources