To achieve high computing throughputs for data-intensive tasks in real time (e.g., autonomous driving, virtual reality, and deep learning), the development of in-memory computing (IMC) architectures is emerging as a research surge.[1,2] The key concept of IMC is the implementation of arithmetic logic units in memory-based hardware to better exploit memory bandwidth and substantially improve energy efficiency during data migration. Consequently, dynamic random access memory (DRAM)- and static random access memory (SRAM)-based IMC architectures have been widely reported.[3,4] However, these designs suffer inevitable shortcomings, such as in high standby power and long latencies caused by necessary write-back operations, which hinder their potential.
Emerging nonvolatile memory (eNVM)-based IMCs, owing to their advantageous energy efficiency and compatibility with CMOS process integration, have been extensively investigated,[5–11] including resistive RAM (RRAM, a.k.a. memristor),[12,13] phase-change memory (PCM),[14] ferroelectric RAM (FeRAM) or tunnel junction (FTJ),[15,16] multiferroic material-based resistive memory,[17] and spin-transfer/spin–orbit torque (STT/SOT)-magnetic random access memory (MRAM).[18–20] Particularly, SOT-MRAM is considered as a promising candidate for IMC as it demonstrates competitive switching speeds (subnanosecond, ns) and almost infinite endurance cycles owing to the decoupled write and read paths. For example, recent reports have revealed the potential of discrete SOT-MRAM devices that perform complete Boolean logic operations[21–23] and neural network computing.[24,25]
Moreover, several issues remain for SOT-MRAMs to be competitive. SOT-driven perpendicular magnetization switching typically requires the simultaneous application of an in-plane external magnetic field, increasing the design and operation complexity.[26,27] A third terminal is necessary for SOT-MRAM cells to supply in-plane write current, which causes the cell area penalty compared to STT-MRAM.[28]
Furthermore, exploring IMC applications with minimal electronics hardware design is paramount, particularly embedded systems pertaining to power and area consumption.[29] Nowadays, the binary digital technology of magnetic tunnel junctions (MTJs) has experienced steady improvements in device density and cost.[30–36] However, the further miniaturization of SOT-MTJs and access transistors has reached the physical limitations.[37] Hence, developing a new type of SOT-MTJ device that equips multilevel states is imperative for handling the aforementioned issues, with a high speed, low power, and mitigated design complexity in both the SOT-MRAM cells array and periphery circuits. Some of the reported multistate realizations are based on in-plane magnetic anisotropic structures or geometric device fabrications with domain wall motion (DWM) models, e.g., the experimental demonstration of the two-level device based on DWM in a spin valve,[38] three-level device with the half-ring shape,[39] and four-state MTJ switchable with SOT.[40] Despite the available reports on pure simulation and prototype DW-MTJ devices,[8,41–45] development is hindered by system integration and operation efficiency limits. Explicitly, to date, few functional implementations exist of SOT-MTJ devices with reliable and switchable multistates in a synthetic, CMOS compatible, and field-free integration.
In this study, we experimentally implemented a novel DW-based SOT prototype cell structure with a multistate corresponding to the specific ratio of the component in a free layer (FL) that is parallel or antiparallel to the magnetization of the reference layer (RL) to address the data processing parallelism and power overhead issues of binary SOT-MTJ devices. Owing to unique DW nucleation and manipulation mechanisms, the reliable multistate with field-free deterministic magnetization switching characteristics is achieved through the tailoring synergistic effect of SOT and interfacial Dzyaloshinskii–Moriya interaction (iDMI).[46] Furthermore, the modulation of iDMI strength also plays an important role in achieving a lower critical current and larger switching window.[47] Such cell structures enable a simplified full adder (FA) implementation with only 1 MTJ device and 30 CMOS transistors, facilitating a robust SOT-driven DW-MTJ-based IMC execution. In particular, to the best of our knowledge, this is the first study to develop an array structure based on the proposed multistate SOT-DW-MTJ cell and codesigned device–circuit architecture. Extensive device and circuit-level simulations of the SOT-DW-MTJ-based quaternary states device and FA design were conducted using a graphics processing unit-accelerated micromagnetic simulation program and Cadence simulation toolkit, respectively.[48] The results demonstrate that the reliable quaternary states were expounded with an applicable CMOS compatible process window. Importantly, the write/read latency and dynamic power of the proposed design can be reduced significantly. Furthermore, the proposed DW-MTJ can perform all 16 Boolean logic functions (i.e., BUF/NOT, OR/NOR, AND/NAND, XOR/XNOR, IMP/NIMP, and RIMP/RNIMP) within a single cycle, illustrating a great potential for other promising nonvolatile in-memory computings (nv-IMCs) and non-von Neumann architecture applications.
Results and Discussion Prototype Multistate SOT-DW Devices via iDMI Modulation Heavy Metal/Ferromagnet Heterostructure with iDMI ModulationFigure 1a illustrates the Kerr hysteresis loops of W(t)/Co20Fe60B20(0.9)/MgO(2)/W(3)/Ru(2) film stack samples determined from a polar magneto-optic Kerr effect (p-MOKE) microscope. Evidently, the grown samples with all different W thicknesses (t) show a strong perpendicular magnetic anisotropy (PMA) in the deposited heavy metal/ferromagnet (HM/FM) heterostructures at room temperature. As illustrated in Figure 1c,d, the square out-of-plane (OOP) M–H hysteresis loops results from the vibrating sample magnetometry (VSM) measurements demonstrate excellent PMA behaviors. The film stacks were subsequently patterned using photolithography and ion milling techniques into Hall bar devices with dimensions of 15 × 50 μm. The PMA characteristics of the devices after fabrication were verified by the square-shaped anomalous Hall effect (AHE) loops (Figure 1e,f). We determined that the saturation of Rxy–H first increases significantly after increasing W(t > 3.6 nm), which is explained by overcoming the dead-layer effect at the W(t)/CoFeB interface.[49] The subsequent decrease in Rxy–H with increasing t is ascribed to the shunting effect of the W layer.
Figure 1. Stacked PMA HM/FM heterostructures with modulated iDMI. a) The magneto-optical Kerr hysteresis loops of films with different W thicknesses (t = 3.6–5.2 nm) measured by p-MOKE. The inset shows the schematic of films stack. b) The down-up domain wall motion velocity versus the impulse Hz field with the domain expansion evolution demo images from sample of t = 4.7 nm captured in 4.2 s span under Hz = 38 Oe. The linear fitting lines that elaborate the DW velocity obey the creep law. c) The out-of-plane and d) In-plane magnetic hysteresis loops are measured by VSM. e,f) Magnetotransport transverse Hall resistance Rxy characteristics via AHE measuring on the patterned Hall bar samples with various W thickness under Hz and Hx sweeping fields.
To study the iDMI characteristic dependence on the W thickness in the developed HM/FM heterostructures, the impulse-field driven DWM, using spatially resolved p-MOKE technique, was used to determine the expansion trajectory of the magnetic DW by increasing the applied OOP magnetic field (Hz), as shown in the inset of Figure 1b. Through the dedicated extraction of the magnetic DWM velocity (v), Figure 1b shows the relationship between ln(v) versus Hz−1/4 of samples with different thicknesses of W. The linear fitting shows that DWM dynamics obey the creep law closely in the presence of disorder at low magnetic fields.[50,51] Notably, the iDMI at the interfaces of the developed HM/FM heterostructures can be precisely modulated with a strong dependence on the HM (W) thickness from the correlation of the slope involving the iDMI constant (D) as reflected from the derived equations below, based on the creep law (Supplementary Note 1, Supporting Information)[50,52,53][Image Omitted. See PDF]where A, Keff, and Ms refer to the exchange stiffness constant, effective magnetic field, and saturation magnetization, respectively. α0 is proportional to the correlation length of the disorder potential and pinning strength of the disorder (Supplementary Note 1, Supporting Information). Specifically, the thinner W provides a larger iDMI constant in a thickness range of 3–5 nm, which is in line with previous reports in W-based HM/FM heterostructure systems.[54–56]
With such a HM thickness-dependent iDMI tunability motivation and previous reports,[57] this can be used for effective DW pinning as the experiments and simulation sections demonstrate. To examine the HM/FM heterostructure properties and interfaces quality, a series of structural investigations were performed on the representative W(t = 5.2 nm) film stack samples. Figure 2a,b show the low- and high-resolution transmission electron microscopy (TEM) images with the clear crystalline structure and respective thickness of HM and FM from the W(5.2)/Co20Fe60B20(0.9)/MgO(2)/W(3)/Ru(2) film stack samples. Notably, the high-angle annular dark-field scanning transmission electron microscopy (HAADF-STEM) image in Figure 2c illustrates the sharp interfaces between HM/FM heterostructures, which is consistent with the energy-dispersive X-Ray spectroscopy (EDS) elementary mapping and line-scan species distribution results (Figure 2d,e), demonstrating the high quality of grown multilayer thin-film stack.
Figure 2. Structural and quality properties. a) TEM and b) HRTEM images of t = 5.2 nm films stack. Corresponding c) HAADF-STEM images and d) EDS elementary mappings with e line scan distribution profiles.
To examine the current-induced deterministic SOT switching effectiveness, the anomalous Hall resistance measurements (schematics in Figure 3b) were performed on the sample with the W(t = 5.2 nm) Hall bar devices with various assistive in-plane magnetic fields (Hx). The pulsed current sweeps from −20 to 20 mA with a pulse width of 1 ms. As depicted in Figure 3a, the effective current-induced SOT-driven perpendicular magnetization switching is validated when the applied current reaches the critical switching current density (Jc), which varies from 0.8 × 107 to 2.4 × 107 A cm−2. As expected, Jc decreases by increasing the Hx strength to break the structural symmetry.
Figure 3. Basic device with effective SOT-driven magnetic switching and DW pinning. a) Current-induced SOT switching under different in-plane magnetic fields and b) schematic measurement geometry setup. c) The optical microscopy image of developed device for multistates modulation by pulsed J. d) The schematic of p-MOKE setup with chip carrier holding for in situ magnetoelectrical transport probing. e) Corresponding domain wall position images under different pulsed J illustrating the state 1, 2, 3, and 4.
To further demonstrate the tunability of DW position, the PMA W(5)/MgO(2)/Co20Fe60B20(0.9)/W(5.2)/Ru(1.5) stacked films are patterned in a stripe geometric dimension of 2 μm (width) × 50 μm (length) with a 2 μm wide DW pinning center (PC), as shown in Figure 3c. Such PCs are induced by an enhanced iDMI that are generated by thinning the top layer W using a dedicated ion beam etching (IBE) process and boosting the iDMI strength in HM/FM interfaces with strain engineering.[58,59] As previously reported, the iDMI strength can be regulated by changing the distance between HM atomic planes with strong 3d–5d orbital hybridization under the strain perpendicular to the film.[58,59] Evidently, the reliable DW nucleation, motion, and pinning were realized in a controlled manner under specific number of pulsed J = 0.8 × 107 A cm−2 without Hx (schematically in Figure 3d) as reflected from the differentiated p-MOKE images (Figure 3e), manifesting the validity of the subsequently designed SOT-DW-MTJ devices. By changing the current polarity, the device can be reset to the initial state and make the process reproducible. Furthermore, the DW can be pinned reliably with a robust energy barrier at room temperature after a time span of longer than 5 months since fabrication with different pinning center gaps (Supplementary Video S1, Supporting Information), further corroborating that the iDMI induced pinning mechanism rather than the sources such as line edge roughness and magnetocrystalline defects. Such DW pinning stability is also confirmed by temperature-dependent DW pinning simulations up to 450 K without collapse or dissociation (data not shown).
Proposed Scalable SOT-DW-MTJ Device with Tailored iDMI for MultistatesMotivated by the effective DW motion and pinning governed by iDMI modulation in the prototype investigations, developing scalable SOT-DW-MTJ devices is promising with both highly efficient SOT efficiency and reliably robust DW pinning by tailoring appreciable iDMI in HM/FM interfaces. We further conducted extensive simulations on the velocity of DWM as a function of iDMI strength with different spin Hall angle values, i.e., derived from the HM (α-W, β-W, β-Ta, Pt, and Au0.9Ta0.1) formed interfaces (Figure S1, Supporting Information). We chose β-W as the HM material in our proposed device system to promote its CMOS process compatibility for energy-efficient IMC with large SOT efficiency.[24]
Considering the prerequisite CMOS compatibility for codesign simulations, as demonstrated in Figure 4a, the proposed scalable SOT-DW-MTJ device structure consists of a basic films stack of W/CoFeB/MgO/CoFeB/W/Ru with a bottom electrode (BE) and top electrode (TE). The MgO functions as the barrier layer, isolating two CoFeB layers with PMA. The upper CoFeB acts as the RL. The lower CoFeB is the magnetic FL, hosting the dynamic movement of DW driven by SOT with two deterministic collective coordinates, i.e., DW position and angle tilted on a axis. With a rigid domain wall width governed by ,[60] where A and Ku are the exchange stiffness constant and effective magnetic anisotropy of FL, respectively, the chiral DW is initially formed at the end of the FL layer in a specifically defined region with a low Ku of 8 × 105 J m−3. The β-phase W with a high spin Hall angle (θSH) of −0.35 serves as the spin–orbit coupling (SOC) layer and is adjacent to the ferromagnetic FL.[61] When the current (J) is injected into the SOC layer through the BE, a perpendicular spin current is generated from the HM, owing to the spin Hall effect, and then injected into the ferromagnetic FL, consequently driving the magnetic DWM along J with exerted SOTs.[20] The direction of the magnetic DWM is reversed in the opposite current direction.
Figure 4. Proposed SOT-DW-MTJ device, simulated magnetization characteristics of FL with DWM and designed read and write circuits. a) Schematic of device with tailored DMI between FL and HM for effective DWM. Dimension is not plotted in scale. b) 3D plot of dynamic magnetization dependence on the pulse number under discrete 0.167 ns excitation with different J; inset shows specific magnetization switching loop driven by corresponding pulse number Npulse. c) Mz/Ms change with reversed J in a reset 1 ns wide pulse. d) CMOS-compatible single SOT-DW-MTJ hybrid circuit. e,f) Schematics of sensing amplifiers implemented.
To effectively convert DW position information into electrical signals with differentiated multistates in MTJ, the uniquely defined W regions, which have thinner thicknesses from pattern engineering, are embedded periodically in the SOC layer to form FL/HM interfaced PCs with tailored iDMI, i.e., the short-range antisymmetric exchange that plays an important role in DW motion manipulation. In this study, four pattern engineered FL/HM (thin W) interfaces bear an iDMI constant of 1 mJ m−2 larger than that of the FL/HM (thick W) interface on the same iDMI sign,[55,62] as listed in Table S1, Supporting Information. With this design, the effective pinning fields are formed by generating the effective potential, enabling the confinement the magnetic DW inside of the desired PCs.[63] When the magnetic DW is pinned at an interface on top of the four engineered FL/HM (thin W regions, denoted as PC1, PC2, PC3, and PC4) during the motion driven by the optimized J pulse(s), four discrete resistance states of MTJ (S1–S4) can be achieved and switched with specific net magnetization orientations portion between FL and RL. This is conducive to the formation of reliable and programmable multilevel resistance states that exceed the bistability of conventional memory elements.
Tailored DW Motion using Engineering iDMI in SOT-DW-MTJTo demonstrate the magnetic DW motion dynamics under different J pulse number profiles, the micromagnetic simulation is performed using the MuMax3 simulator[48] to investigate the DW nucleation, motion, and dynamic switching behaviors in the proposed device. The DW motion is fundamentally the transfer of spin angular momentum to the local magnetization M along the ferromagnetic FL track.
The materials and geometric parameters of the proposed device are listed in Table S1, Supporting Information. The simulation magnetic parameters are adopted from our experimental results and other reports.[55,62,64] An FL is designed with a size of 320 × 70 × 0.8 nm. The SOC layer below the FL employs the same size to first achieve magnetic DW nucleation at the extended overhang and then drive the magnetic DW to proceed along the longitudinal J direction. In the SOC layer, four PCs with a size of 20 × 70 nm are formed to modulate the pinning potential at the corresponding interfaces. PC1 and PC4 are located near each end of the RL. A 30 nm gap separates the outer boundary of the PC and end of the FL to prevent the magnetic DW from being annihilated by the oscillation during the stabilization. The other two PCs, i.e., PC2 and PC3, are identically spaced between PC1 and PC4.
Magnetic DW NucleationThe magnetic up domain nucleation is locally created and defined with a low Ku of 8 × 105 J m−3 at each end of FL, as shown in Figure 4a.[65–68] As demonstrated by our simulation results (Figure S2 and Video S2, Supporting Information), the up-down DW is reproducible at the nucleation zone edge by applying a preset 0.167 ns pulsed current with a density of 4.77 × 108 A cm−2. Consequently, the up-down DW moves into the first PC1, as illustrated in Figure 4b. In this case, the corresponding magnetic DW displacement is 30 nm from the left end of the FL. Note that, during the simulation process, the nucleation region can prevent the annihilation of the magnetic DW at the boundary, sustaining the DW in the FL once the nucleation completes. Owing to the nonvolatility of the proposed device, the power consumption of initial nucleation can be ignored by our unique device construction design, and the following sections discuss the specific implementation.
Magnetic DW Motion and Magnetization State ManipulationUpon successful magnetic DW nucleation, an optimized current pulse can be applied at or after entering the PCs (Figure S2 and Video S2, Supporting Information). Note that large current amplitudes or prolonged pulse widths cause the magnetic DW to cross the pinning area. In contrast, small current amplitudes or short pulse widths are insufficient to drive the magnetic DW motion away from the nucleation area and PCs. Therefore, the optimized driving current window is essential in the application of SOT-DW-MTJ multistate devices with an engineered iDMI, as demonstrated in Figure S3, Supporting Information, which is consistent with the complementary results in Supplementary Video S2, Supporting Information.
To investigate the impact of the magnetic DWM process on the FL magnetization Mz, extensive simulations were performed for the transient Mz/Ms evolution in a timeframe of 0–6 ns, providing guidance on the subsequent cell structure, circuit design, and the implementation of multistate SOT-DW-MTJ devices. As shown in Figure 4b, the dynamic change of Mz/Ms is simulated driven by J from 4.52 × 107 to 5 × 107 A cm−2 with a 0.167 ns pulse width, accompanying a DW relaxation time of 0.78 ns that was deliberately prolonged to 1 ns for the differentiated stabilization state plotting demo. Evidently, the Mz/Ms switching strongly depends on the pulse number, governing the specific MTJ device resistance state, which concurs with the results plotted in the inset of Figure 4b. In addition, such memristive synapses promise to perform vector–matrix multiplications with flexible tunable conductance, which is a prerequisite in constructing complex artificial neural networks. For instance, the Mz/Ms value decreases from first pulse excitation event until the magnetic DW enters and stabilizes inside PC1, in turn, the device stands in state S2. The similar simultaneous state switching process from S2 to S3 was also observed during the second pulse event with damping oscillation crossing the pinning area. This oscillation is attributed to the tilting of DW during magnetic domain propagation driven by a combined Oersted field resulting from the pulsed current and effective SOT field caused by SOC.[69] The DWM dynamics and tilting can be ascribed to the central position q and azimuthal angle φ of the moment in the central DW and the tilting angle β for the DW plane.[70] However, the MTJ provides the fourth state S4 in which the Mz/Ms is 1, without reflecting the oscillation of the magnetic DW,[71] i.e., the magnetic DW escapes the effective FL area edge aligned with RL and proceeds to the PC1 or PC4. Nevertheless, by applying continuous switching pulses for writing, implementing resistance state reading is reliable immediately after 1 ns, with an oscillation stabilization as illustrated by each plateau in Figure 4c. Therefore, this device enables nanosecond level switching between adjacent states, which further enhances the application potential of SOT-DW-MTJ in high-speed IMC.
From a practical perspective, the proposed DW-SOT-MTJ device is required to state reset in a controllable and efficient manner with DWM manipulation. When the current pulses are applied in the other polarity, the direction of the magnetic DW movement is reversed accordingly and the corresponding magnetic DW position also moves from the rightmost nucleation zone to the initial position. In the reset process from state S4 to S1 with direct device switching, as shown in Figure 4c, a relatively long 1 ns wide pulse current was applied in the reverse direction with the same amplitude as the single pulse writing event. In such cases, the magnetic DW can be reset to the initial state S1 at any position within a relax time of 0.6 ns. The complete DWM evolution dynamics are detailed in Figure S3 and Video S2, Supporting Information.
Multistate Write and Read Circuit DesignAs shown in Figure 4d, the proposed core multistate write and read circuit consists of 2-transistor/1-resistor (2T1R) cell structure with two sense amplifiers (SAs). In the writing process, this 2T1R cell is equipped with an access transistor connected to the BE and SOC layers, enabling to the management of the profile and direction of pulse currents injected into the SOC layer for driving DWM in SOT-DW-MTJ device FL under the pulsed gate voltage Vin control. The TE connects the other transistor for read sensing and control. To examine the comprehensive writing and reading behaviors, the device–circuit level Cadence tool was used to simulate and analyze the performance of the designed write/read circuit. Table S2, Supporting Information, lists the simulation parameters. For more specifics on the Cadence model, refer to Supplementary Note 2, Supporting Information.
Write ProcessThe four resistance states in the SOT-DW-MTJ device can represent binary data “00,” “01,” “10,” and “11" reliably when functioning as a key component of the MRAM configured IMC. Upon successful DW nucleation, the DWM driving access transistor, shown in Figure 4d, is turned on when Vin = “1” with appropriate circuit time clock control and a single pulse is injected with a current density of 4.78 × 107 A cm−2 and pulse width of 0.167 ns simultaneously (to realize stable switching, the parameter is extracted from Figure S2, Supporting Information). Now, the current pulse generated effective SOT drives the magnetic DW into the second PC with tailored DMI to write data “01,” i.e., state S2. Subsequently, in-sequence pulse applications can switch the SOT-DW-MTJ device to the third or fourth resistance state, the storage data of “10" and “11" information, respectively. Compared to the traditional binary SOT-MTJ memory cell, the applicable four-state solution that stems from our developed SOT-DW-MTJ device can significantly increase data storage density.
When such a component is managed as a Boolean logic operation device, the frequency-encoded binary information is used as a Vin signal to control the access transistor pulsed J activation for both magnetic DW nucleation and manipulation. When input information X = “0" with a low Vin, the transistor is turned off and a current pulse is not written into the multistate device. When input X = “1,” the transistor is turned on with a high Vin, enabling a single or series of well-profiled current pulses written to the multistate SOT-DW-MTJ to achieve objective magnetization states. In this manner, the constructed cell device and circuit architecture can realize BUF/NOT operation on a single signal X, OR/NOR, AND/NAND, XOR/XNOR, IMP/NIMP, RIMP/RNIMP operations of two input signals (X and Y), and full addition of three signals (X, Y and Ci) execution. The following sections discuss this methodology.
Read ProcessTo read the data stored in the magnetic component, the Vin signal is set to a low level, and the Vread signal is ramped up to a high level. At this time, the induced current generated by the upper circuit flows into the MTJ from the TE, and a conductive channel forms between the TE and BE. The induced voltage Vsen of the MTJ is extracted at the node as shown in Figure 4d. With 5000 iterations of the Monte Carlo simulation, the corresponding sensing voltage distribution manifests for four independent resistance state, as depicted in Figure 5a in the analysis with a standard deviation of 3σ at 27 °C. When the circuit is stable, the minimum point of the sensing margin of Vsen is between Vsen1 and Vsen2, and the sensing margin can be maintained above 110 mV for a robust reading reliability.[72] Therefore, the reference voltage (Vref1, Vref2, and Vref3) generated by the three reference resistors (Ref1, Ref2, and Ref3) is used to distinguish the four resistance states. Two precharge SAs (SA1 and SA2) are employed to output the reading results of the device.[72,73]
Figure 5. Sensing window verification and proposed IMC architecture based on developed multistate SOT-DW-MTJ devices. a) Monte Carlo simulation results of individual state sensing voltage distribution. b–e) Transient simulation results of four states reading operation. f) Computing in memory architecture based on binary MRAM subarray and SOT-DW-MTJ building blocks.
Specifically, when reading information of device is required, the enable signal ENsen1 (marked in Figure 4e) is turned on. Hence, Vref1 and Vsen generated by Ref1 and device function as input to SA1 within the same clock period, completing the reading of the upper bit. If the device is at the resistance state S1 or S2, the output result of SA1, i.e., Vout1 in Figure 4e, is “0.” However, if the device is at resistance state S3 or S4, the result of the Vout1 output is “1.” This allows Vout1 to be used as the gate voltage control in the branch where Ref3 is located, offering to set its opposite signal and control the turn-on/off process of the branch with Ref2. If Vout1 = “0,” Vref2 functions as input to SA2. If Vout1 = “1,” Vref3 functions as input to SA2. In this case, the system triggers the enable signal ENsen2 (Figure 4f) and turns on SA2 to achieve lower bit reading. For example, if the device stores the “00,” the sensing voltage (Vsen) is smaller than Vref1 and the output of SA1 is “0.” Furthermore, the output of is high and results in a comparison between Vsen and Vref2, i.e., the output Vout2 of SA2 is “0” because Vsen < Vref2. Therefore, the result of the read circuit is “Vout1, Vout2” = “0, 0,” as shown in Figure 5b. Similarly, the result of “Vout1, Vout2” is “0, 1,” “1, 0,” or “1, 1” when the resistance state of the cell is S2, S3, or S4, as shown in Figure 5c–e, respectively.
DW-SOT-MTJ-Based nv-IMC Implementation IMC Circuit ArchitectureThe developed multistate memory device is further exploited to function simultaneously as a logic operation unit in the operation of the IMC architecture,[74] as demonstrated in Figure 5f. The system is constructed by a binarized MRAM storage array ① and corresponding individual column SAs (SA') ② whose output differentiates the binary MRAM unit signals as high or low level. If MRAM stores data 1 (0), SA’ outputs 1 (0). The output result of SA’ is used as the word line signal of the logic operation array ③ that consists of multistate devices in the same column to implement Boolean logic operations in this configured IMC architecture. The result of the compute process is recognized by the output units (i.e., SA1 and SA2), where the results can be stored in other storage units of MRAM, reducing redundant data transmission in traditional von Neumann architecture.
Boolean Logic Operations Design and ControlOwing to the reliable multistate interconversion characteristics, the SOT-DW-MTJ device functions as an elementary storage unit for MRAM, and warrants further exploration to implement a series of single and full set Boolean logic functions, e.g., BUF/NOT, OR/NOR, AND/NAND, XOR/XNOR, IMP/NIMP, and RIMP/RNIMP. The complete operation cycle comprises the reset, compute, and reading modes.[74] At the beginning of each operation-reading cycle, a reset operation is executed. In brief, Vreset is set to high and enables the reset pulse passing through the SOC layer to initialize the SOT-DW-MTJ device resistance state to S1. In the compute mode, the input signal carries out the Boolean operation within the pulse of Vin to control the device writing. If input signal X = “1,” the write transistor connected to the BE is turned on, and a single current pulse is injected into the SOT-DW-MTJ device to switch the resistance state. When X = “0,” the transistor is turned off and the resistance state of the device is unchanged. Similarly, the second input signal Y is asynchronously enabled after the X signal to regulate the position of the magnetic DW in FL, and, in turn, to switch the resistance state of the device and proceed with the corresponding logic operation as elucidated below. The reading of computational results is determined by the selected reference resistance and SA output. Table S3, Supporting Information, lists the associated operation and output results of a single input X, as well as X and Y input cases for the multistate device.
BUF/NOT OperationFor the NOT operation, the X input functions as the gate voltage of the write access transistor and records the information in the device. In the same manner as the read operation, the enable signal ENsen1 = “1.” Because the resistance state of the multistate device under a single signal operation is S1 (X = “0”) or S2 (X = “1”), Vout1 = “0. Then, the enable signal ENsen2 = “1” and control the branch where Ref 2 is turned on, whereas an output of SA2–Vout2 is the data of “BUF X” and the output is the value of NOT X.
AND/NAND and XOR/XNOR OperationUsing multistate devices and two SAs, X AND/NAND Y and X XOR/XNOR Y operations can be realized. Input X and Y asynchronously control the write transistor and write the data contained to the multistate memory cell. For example, when “X = 0, Y = 0,” the device maintains S1; whereas when X and Y are 1, the device receives a single write pulse and switches from S1 to S2. Finally, if “X = 1, Y = 1,” the device receives two pulses and switches to S3. When the circuit reads the computation results, Vsen is compared to Vref1. Only when the resistance state is “X = 1, Y = 1,” Vout1 is “1.” Therefore, the Vout1 corresponding to SA1 realizes the output of the calculated result of the X AND Y logic operation, and the other output of SA1– outputs the X NAND Y result.
Upon the output of SA1 stabilization, the system sets the enable signal ENsen2 = “1,” which starts the sense process of SA2. The data of the lower bit are then output through Vout2. When “X = 0, Y = 0" and “X = 1, Y = 1,” Vout2 = “0.” In contrast, when “X = 0, Y = 1" or “X = 1, Y = 0,” the output of Vout2 is “1.” Therefore, Vout2 outputs the result of X XOR Y, and another output terminal of SA2, i.e., , outputs the result of X XNOR Y.
OR/NOR OperationFor OR/NOR operations, the compute process is identical to that of XOR/XNOR. The information of X and Y is input asynchronously to drive the movement of the magnetic DW. When reading the related result, the reference voltage signal input changes to SA1 during the OR/NOR operation. At this time, the enable signal ENsen1 is “0” to turn off Ref 1 and be replaced by ENOR = “1,” generating the output of the reference voltage signal Vref2'. The amplitude of Vref2’ is between Vsen1 and Vsen2 to distinguish Vsen1 and Vsen2. When “X = 0, Y = 0,” Vout1 is “0” because of <Vref2’. Otherwise, Vout1 = “1.” Therefore, with this configuration, the output Vout1 of SA1 does not output the result of the AND logic operation, but obtains X OR Y, and another terminal outputs X NOR Y. Figure S4, Supporting Information, shows the other Boolean logic operations. This design methodology can be fan out for the on-demand IMC implementation efficiently. Figure 6a presents the circuit-level simulation results, indicating the developed basic units and circuit configuration and implementation of a series of Boolean logic operations.
Figure 6. Specific transient simulation results and verification of Boolean logic and FA implementation. a) Different full set Boolean logic functions implemented and b) the effects verification from FA constructed by SOT-DW-MTJ devices.
Moreover, Table S3, Supporting Information, lists the comparison between different multifunction logic gates. The proposed design realizes all 16 Boolean logic operations and FA, which achieves the most logic computations using the same structure. Although the read and write circuit involve a certain transistor cost, the lowest average cost of each logic operation reduces the relative complexity of the proposed design. Importantly, the reduction of complexity provides the flexibility of the device in diversified operations. This further demonstrates that the proposed design is reconfigurable. In addition to the FA implementation, more logic operation functions can be performed by changing the circuit control signal instead of the device structure. Further, from a cell-level comparison of multistate devices as illustrated in Table S5, Supporting Information, our designed SOT-DW-MTJ cell structure also demonstrates a competitive performance in average write energy, latency, area, and CMOS compatibility aspects.
The developed read–write circuit unit manifests the asynchronous input of random binary signals Ci, X, and Y to realize the 1-bit magnetic FA operation function, as addition is the basic operation of the foundational arithmetic unit.[75,76] In practice, Ci can also be the carry signal output of the previous adder arithmetic output unit SA1. The operation cycle follows the reset, compute, and reading processes. The multistate SOT-DW-MTJ device realizes the storage and operation of the input signal and uses the output signal Vout1 to determine the reading operation result Ci+1, and the output Vout2 of SA2 produces the reading of sum operation. Note that the cascading capability enables sufficient flexibility for extending the 1-bit FAs to multibit structure. In the circuit-level simulation, the results of all addition operations were specified in sequential logic, as shown in Figure 6b.
Compared with other conventional MTJ-based FAs as listed in Table S6, Supporting Information, our device can realize FA computations using a single MTJ and fewer peripheral MOS components, which is conducive to the optimization of the array area. SOT-driven DWMs have nanosecond writing and reading times, laying a foundation for high-speed IMC applications. Although all arithmetic units based on nonvolatile MTJ designs show zero static power consumption, this study almost significantly reduces the dynamic power consumption. Nevertheless, the overall performance requires further optimization to enhance competitiveness. Further study can lead to more in-depth hardware research in magnetic compressors,[77] SOT-MRAM array integration, and in-memory neural network computing.[7,78]
ConclusionIn summary, based on experimentally corroborated prototype domain wall devices, we demonstrated a new type of reliable MTJ single cell with multidiscrete memory states for IMC applications, implementing with FA and enriched Boolean logic operations. The synergistic utilization development of SOT and iDMI at interfaces between the ferromagnetic FL and heavy metal provides an effective magnetic domain wall motion and pinning manipulation, facilitating multistate switching under full electric field control. To the best of our knowledge, the specific multistate read and write circuit with an IMC architecture based on our developed device and cell structure were demonstrated for the first time with enhanced latency and power efficiency. The proposed design realizes all 16 Boolean logic functions and FA operation with a significantly reduced complexity of device units and peripheral circuits. This enables fan-out and cascading commercial MTJ functions and promises futuristic reconfigurable and high-speed IMC applications, for instance, edge computing for embedded AIoT and neuromorphic computing.
Experimental Section Thin-Film Deposition and CharacterizationA series of multilayer heterostructure thin films were deposited on thermally oxidized silicon Si/SiO2 substrates using DC and RF magnetron sputtering under a base pressure lower than 1.0 × 10−7 Torr at room temperature, and then an annealing process at 350 °C was then implemented inside a high vacuum chamber of 4 × 10−8 Torr for 1 h. The fabricated samples were mounted on a precise, nonmagnetic XYZ translation stage at room temperature. An electromagnet with a magnetic field strength of up to ≈10 000 G and 0.2 G resolution at the surface was applied an external out-of-plane magnetic field. Employing a Hall effect sensor, the direction and strength of the magnetic field were controlled and mapped during data collection. In MOKE imaging, a narrowband light-emitting diode illuminated the target sample, and a high bit-depth CMOS camera was used. A × 50 long-working-distance objective lens (Nikon) was used for MOKE imaging measurements. The magnetic anisotropy of the film sample was evaluated using VSM measured at room temperature. The TEM, STEM, and EDS (JEOL JEM-ARM200F) were used to characterize the structure of the cross-sectional HM/FM heterostructures.
Device Patterning and Electrical MeasurementThe film stacked samples were patterned as Hall geometry using a standard photolithography process, and excrescent areas were etched by ion milling, followed by thermal evaporated Ti(20 nm)/Au(80 nm) electrodes at the end of the Hall probes. All electrical measurements were performed in a Physical Property Measurement System (PPMS). Multiple Keithley source meters (Keithley 2400, 2182, and 6221) were connected to the PPMS, enabling comprehensive transport measurements for the Hall bar devices. A constant of 100 μA DC was applied for the Hall measurement. In the switching measurement, 1 ms pulsed write current was first applied. After 100 ms, another small current (read current, 0.5 mA for 1 s) was applied and the Hall voltage signal was picked up simultaneously. The device temperature during the write pulse application was extracted by monitoring the longitudinal resistance. The temperature increased during the read pulse application was negligible.
Micromagnetic SimulationMicromagnetic simulations were performed using MuMax3.[48] Magnetic devices were initially relaxed with full energy minimization using a conjugate gradient method for the thin plate of a chiral magnet with a size of 320 × 70 × 0.8 nm on a 160 × 35 × 1 mesh. As described by Heisenberg's formalism for the exchange interaction, the magnetic moment originates from all electron angular momentum, and the total Hamiltonian is the sum of multiple magnetic interactions. The dynamics of the DW magnetic moments induced by antidamping-like SOT is governed by the Landau–Lifshitz–Gilbert (LLG) equation[79][Image Omitted. See PDF]where γ is the gyromagnetic ratio, α the Gilbert damping constant, and the unit vector along the magnetization of the ferromagnetic FL. is the unit vector perpendicular to both the current and inversion asymmetry directions. By neglecting the field-like SOT for simplicity, describes the magnitude of the antidamping-like SOT. In this case, Heff is considered as[80][Image Omitted. See PDF]where Aex is the exchange stiffness constant, Kd is the DW hard-axis anisotropy, and D0 is the iDMI constant. In the present system, the current flowing along the x direction induces a SOT that exerts moments toward the y direction, leading the moment switching into the z direction driven by the torque from the iDMI effective field HDMI. This antisymmetric exchange contributes to the total magnetic exchange interaction between two neighboring magnetic spins, Si and Sj. Quantitatively, this is a term in the Hamiltonian that can be written as[81][Image Omitted. See PDF]
The HSO is the effective magnetic field for the antidamping-like SOT and is expressed as[Image Omitted. See PDF]where μB, e, μ0, and t are the Bohr magneton, electron charge, vacuum permeability, and FM film thickness, respectively. The corresponding dependence of effective anisotropy (Keff) and PMA constant (Ku) were extracted. Keff is calculated by Keff = MsHs/2, where Hs is the saturation field obtained from the in-plane M–H curves. Ku is determined by Ku = MsHK/2 = Ms (Hs + 4πMs)/2, where 4πMS is the demagnetization field. The value of Ms is obtained by dividing the magnetic moment by the total volume of the Co20Fe60B20 layers. The effective magnetic anisotropy field (HK) is evaluated by fitting the hard-axis magnetic field dependence of Rxy–H loops.
Circuit SimulationFor the circuit-level simulation, a Verilog-A model of a SOT-DW-MTJ device was developed to cosimulate with the CMOS peripheral circuit using SPICE simulators. A foundry's 28 nm Product Development Kit (PDK) was used to verify the proposed design and assess the performance of Boolean logic operations, such as NOT, OR/NOR, AND/NAND, and XOR/XNOR.
AcknowledgementsThis work was supported in part by the National Key R&D Program under grant nos. 2021YFB3601300 and 2019YFB2205100, the National Natural Science Foundation of China under grant nos. 62074164, 61888102, and 61821091, the Director Fund of Institute of Microelectronics and the Dedicated Fund of Chinese Academy of Sciences (grant nos. E0SR023002, E0ZR223010, and E0YR063004), and the Strategic Priority Research Program of the Chinese Academy of Sciences under grant no. XDB44010100. The authors would like to acknowledge the thin films stack samples depositions support from Qingdao Research Institute, Beihang University and also are grateful for the fruitful discussions with Dr. Kaihua Cao and Dr. Zhaohao Wang.
Conflict of InterestThe authors declare no conflict of interest.
Data Availability StatementThe data that support the findings of this study are available from the corresponding author upon reasonable request.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Emerging in‐memory computing (IMC) technology promises to tackle the memory wall bottleneck in modern systems. Promoted as a promising building block, nonvolatile spin–orbit torque (SOT) memory devices with sub‐ns and sub‐pJ processing capabilities are thereby extensively pursued. Herein, a new type of domain wall device is experimentally presented with multistates driven by nonvolatile SOT and Dzyaloshinskii–Moriya interaction, enabling time and energy‐efficient IMC with a full adder (FA) implementation based on magnetic tunnel junctions. Complementary micromagnetic and device–circuit cosimulation results show that the write/read latency of the proposed FA can be shortened to 1.25 ns/0.22 ns with an averaged writing energy of 8.41 fJ bit−1, and the overall dynamic power is 26.25 μW, which is 4.43–51.96 times lower than state‐of‐the‐art alternatives. Moreover, the developed architecture can perform all 16 Boolean logic functions, warranting an extensive arithmetic operation. The experimental, micromagnetic, and circuit‐level simulation results show great potential in both fundamental research and new trajectories in technology development for nonvolatile in‐memory computing applications.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Key Laboratory of Microelectronic Devices & Integrated Technology, Institute of Microelectronics, Chinese Academy of Sciences, Beijing, P. R. China; University of the Chinese Academy of Sciences, Beijing, P. R. China
2 Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
3 Key Laboratory of Microelectronic Devices & Integrated Technology, Institute of Microelectronics, Chinese Academy of Sciences, Beijing, P. R. China; University of the Chinese Academy of Sciences, Beijing, P. R. China; School of Microelectronics, University of Science and Technology of China, Hefei, Anhui, P. R. China
4 Key Laboratory of Advanced Materials (MOE), School of Materials Science and Engineering, Tsinghua University, Beijing, P. R. China
5 Institute of Physics, Chinese Academy of Sciences, Beijing, China; Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing, China