Content area
Spatial-temporal information perception is widely used for motion processing in dynamic scenes, but present technology requires relatively huge hardware resource consumption. The attention mechanism helps the human brain extract required information from tremendous data at a low cost. Here, we propose an attention-inspired artificial intelligence architecture based on hetero-dimensional modulations between zero-dimensional contact and two-dimensional electrostatic interfaces. An adaptive spatial-temporal information processing primitive is successfully implemented based on in-memory analog computing. Experiments of attention adjustments responding to different situations validate the adaptation capability to environmental changes. A demonstration of 5×5-unit data stream processing is conducted, and intensities of spatial and temporal information are varied with attention distribution from 0% to 100%. The attention-inspired device is applied to autonomous driving edge intelligence scenarios, showing high adaptability to traffic scene variations. The proposed architecture exhibits a tens-fold latency reduction, hundreds-fold area improvement, and thousands-fold energy saving compared to the conventional transistor-based circuit.
Pan et al. report an attention-inspired architecture for adaptive spatial-temporal information processing based on 0D-2D hetero-dimensional interface between MoS2 and Ag filament. Wafer-scale device array is prepared for in-memory analog computing and applied to autonomous driving edge intelligence scenarios.
Introduction
The wave of edge intelligence leads to more requirements for highly efficient spatial-temporal information perception hardware1. Conventional hardware solutions require complex pathways and separated equipment for sequential data storage, transmission, and processing, which lead to large time latency and energy costs2,3. In contrast with conventional electronics, the human brain understands spatial and temporal information from the surroundings at an extremely low cost4,5. The attention mechanism is used to extract significant information from tremendous data, and attention is dynamically adjustable with varying situations to ensure persistently effective information extraction in ever-changing environments6,7. In the human brain, frontoparietal attention networks adjust the attention in response to present situations, and direct regions in sensory cortexes to focus on specific types of information8, 9, 10–11. The attention mechanism achieves complete information perception and significant information enhancement.
Artificial intelligence hardware designs that mimic cognition approaches of the brain have emerged in recent years based on two-dimensional (2D) materials, exhibiting high operation speed and low power consumption12, 13, 14–15. 2D neuromorphic devices and systems have been developed for in-memory analog multiplication used for artificial neural networks16, 17, 18–19, and emulations of the brain’s neuron and synapse functions to perform artificial synaptic information processing20, 21, 22–23. Temporal information perception hardware has been investigated based on 2D neuromorphic devices. Temporal summation hardware performs an analog weighted summation of spatial mappings at individual time into the last frame based on fading memory characteristics of devices24, 25, 26–27. The strategy is much effective for single object detection with a dark background, but object confusion and information cover-up problems exist in luminous scenes. Temporal difference hardware extracts moving objects from the background by frame-wise subtraction28, 29, 30–31. However, spatial information is lost after temporal information extraction, and the recognition capacity is limited. Complete information perception is hard to be in situ achieved based on present neuromorphic mechanisms, and a novel physical mechanism for artificial intelligence hardware to emulate the brain’s attention mechanism should be developed.
This work proposes an attention-inspired device for in situ spatial-temporal information processing based on hetero-dimensional modulations. Zero-dimensional (0D) interface exhibits non-volatile state transfer behavior for data storage, and adjustable weighted analog computing is performed between input and stored data based on intrinsic interactions of 0D-2D hetero-dimensional interfaces. The attention-inspired device delivers a large state transfer ratio (109) and a high on/off shunt current range (108). An adaptive spatial-temporal information processing primitive is implemented based on reconfigurable properties of the attention-inspired device to perform attention distribution and determination computing functionalities. The attention is dynamically adjusted by situation variations. Adaptive information processing for a data stream with 5×5 units is conducted. Based on the validations, we demonstrate the attention-inspired device used for highly adaptive edge equipment. Attention-enhanced equipment exhibits full range adjustable spatial and temporal attention. A 190-fold reduction of area, 47-fold reduction of time latency, and 1411-fold reduction of energy consumption are achieved by the attention-inspired device in comparison with the conventional transistor.
Results
Attention-inspired information perception architecture
Here, attention-inspired devices are fabricated to emulate the brain’s attention mechanism (Fig. 1a). 2D monolayer MoS2 is grown by chemical vapor deposition (CVD) and is fabricated as the channel. Ag+ ions with large conductivity and diffusivity are suitable for the top electrode to form the filament32, and 0D contact interfaces between 2D transition metal dichalcogenides and Ag filament exhibit excellent contact properties and non-volatile programmable capabilities33,34. The scanning electron microscope image of a fabricated attention-inspired device, and corresponding transmission electron microscopy and energy dispersive spectroscopy characterizations are shown in Fig. 1b.
Fig. 1 Adaptive spatial-temporal information processing architecture based on attention-inspired devices. [Images not available. See PDF.]
a Schematic of an attention-inspired device. The highlighted structure implements 0D-2D hetero-dimensional modulations. The gray and black arrows represent 0D and 2D interfacial modulations. b Scanning electron microscopy characteristics of a fabricated attention-inspired device (Scale bar, 10 μm) and the corresponding transmission electron microscopy and energy dispersive spectroscopy mapping (Scale bar, 5 nm). c Schematic of the adaptive spatial-temporal information processing architecture. Determination network and attention distribution network perform functionalities of frontoparietal attention networks and attention-controlled sensory cortexes respectively. The output is a single-frame matrix containing adjustable spatial and temporal information. d An attention-inspired device unit in determination network with bidirectional responses to the situation. e An attention-inspired device unit in attention distribution network. In-memory adjustable analog computing of stored and input data is in situ conducted.
The attention-inspired device implements in-memory analog spatial-temporal computing based on interactive modulations of the 0D-2D hetero-dimensional interfaces. An attention-inspired adaptive spatial-temporal information processing architecture is illustrated in Fig. 1c. The attention-inspired device network for determination computing emulates frontoparietal attention networks to dynamically adjust and output the optimized attention. The attention distribution network mimics attention-controlled information enhancement of sensory cortexes, receives the information from a data stream, and outputs a single frame of data containing both spatial and temporal information. The intensities of spatial and temporal information in the output are adjusted by the attention distribution. In practical cases, both spatial and temporal information are required, and one type of information should be intensified to enhance recognition performance in specific situations. When spatial attention is increased, spatial information, including the motionless traffic light and the red body of the bus, is enhanced. Conversely, when temporal attention is enlarged, temporal information, including the outline of the moving bus, is enhanced.
Figurs 1d, e illustrate fundamental mechanisms of the attention-inspired device. For determination network, the attention-inspired device network exhibits a neuromorphic behavior (Fig. 1d). Weight data is stored by filament states of each device unit. Situations are encoded as logic signals and input to the network. Situation descriptions include: the vehicle has a high speed, there is heavy traffic on the road, etc. True (logic 1) or False (logic 0) indicates whether the situation description is real. The unit is resting when the situation is false, and is excitatory (w/-filament state) or inhibitory (w/o-filament state) when the situation is true. Network computing of multiple units is implemented to obtain optimized attention. For the attention distribution network, input data is sequentially stored in the attention-inspired device array by filament states (Fig. 1e). The present input data (t2) interacts with the previously stored data (t1). The transport curve of w/-filament state (red lines) is modulated by attention to adjust output spatial and temporal information intensities.
Attention-inspired device principles
Figure 2 illustrates the working mechanisms of the attention-inspired device. The Ag electrode (IN) is connected to the input data stream, and the control gate (CG) continuously adjusts the 0D-2D hetero-dimensional modulation characteristics (Fig. 2a). In writing mode, the input voltage VIN is applied to IN, and the drain is grounded (Fig. 2b). The CG voltage VCG controls the on-state current by buried gate electrostatic modulation of the MoS2 channel. 0D contact interface between MoS2 and Ag filament is formed by applying positive input voltage VIN, and is ruptured by negative VIN. Output characteristics curves of the MoS2 channel are illustrated in Fig. 2c. The MoS2 channel transfer curves are shown in Supplementary Fig. 1. Programmable filament state transfer curve of the attention-inspired device is illustrated in Fig. 2d. VIN is continuously scanned from path 1 to 4. Filament is formed when scanning VIN from 0.00 to 3.50 V, and is ruptured for VIN from 0.00 to −1.50 V. Compared to two-terminal filament formations35, filament forming by a semiconductor channel can realize lower activation time and parasitic capacitances, whereby promotes the stability of filament state transfer processes36. The MoS2/HfO2/Ag structure implements semiconductor-controlled filament state transfer without peripheral transistor circuits, eliminating interconnect circuits between transistors and memory devices. Filament states are stable and remain unchanged within an appropriate large VIN interval (−1.20 V to 2.00 V) and after removing the voltage supply. A high state transfer ratio (109) of the drain current ID that is largely higher than the semiconductor channel hysteresis (Supplementary Fig. 2) enables stable electrostatic modulation in w/o-filament state, and moderate shunt current induced by the 0D interface in w/-filament state. The shunt current IIN in w/-filament state is modulated by VCG, and enables the attention-inspired device to have bilateral bound characteristics in w/-filament state (Fig. 2e). The bilateral bound behavior is necessary for attention distribution computing. When VCG < − 3.00 V, IIN < 10 nA and the channel is cut off. For VCG from −3.00 to 3.00 V, IIN is increased with VCG. When VCG > 3.00 V, IIN is limited by the 0D interface resistance, and the variation of IIN with VCG is disabled. Experiments of filament state transfer processes and device performance have been implemented to verify stable and repeatable data storage functionalities (Supplementary Note 1). The device stability is relevant to the degradation effects of the dielectric film during the repeated filament-forming and rupture processes. Strategies to mitigate the degradation and enlarge the endurance have been reported37, 38–39. The performance of device would also be affected by external factors including proton or moisture that can be incorporated into dielectric films40,41. More discussions on device stability are provided in Supplementary Note 2.
Fig. 2 Working mechanisms of the attention-inspired device. [Images not available. See PDF.]
a Structure of an attention-inspired device working in writing and computing modes. Writing mode: b Writing mode configuration. c Output characteristics of the MoS2 channel. The drain-to-source voltage VDS = 1.00 V. d Attention-inspired device filament state transfer curve. The voltage scanning path is from 1 to 4. The control gate voltage VCG = 5.00 V. e Filament shunt current modulation characteristics with cut-off, gate-controlled, and resistive regions varying with VCG. Computing mode (attention distribution computing): f, IS − VIN transport curves under different VCG from −1.80 to 0.00 V. IS is the source current. Device structure schematics illustrate states of the attention-inspired device and current directions under each voltage configuration. IN and CG electrode voltage configurations are represented by blue (low voltage), red (high voltage), or gray (arbitrary voltage) colors. g, IS − VCG attention distribution characteristics. The input voltages VIN− = − 0.72 V, VIN+ = −0.36 V. h, Spatial and temporal attention varying with VCG that exhibit dynamic attention adjustment properties. Dash lines show the exponential fitting. The source voltage VS = − 0.20 V in (f−h). Computing mode (determination computing): i Bidirectional determination computing curves of the attention-inspired device. j Weight plasticity adjusted by VIN+. The dash line shows the linear fitting. VS = 0.20 V and VCG = 0.60 V in i − j. VD = 0.00 V and the substrate voltage VB = − 1.00 V in (f−j).
In computing mode, the attention-inspired device exhibits reconfigurable properties to implement attention distribution and determination computing modes by the source voltage VS and the CG voltage VCG. VCG is tunable for attention adjustment. For attention distribution computing, transport curves varying with VCG in different filament states to emulate attention-directed perception in the brain are illustrated in Fig. 2f (positive direction of the source current IS is from drain to source). In w/o-filament state (blue lines), current flows from drain to source, and bidirectional gating of VIN and VCG co-modulates IS. VIN controls the electron injection from source to channel, thereby determining the upper bound of IS (dark blue lines in Fig. 2f). VCG forms and modulates the potential barrier between the CG- and IN-controlled homojunction. When VCG < VIN, the large homojunction barrier is dominant to limit electron carrier density in the channel, so IS is increased with VCG. When VCG > VIN, the homojunction barrier is forwardly biased. With fixed VIN, the electron carrier density is unchanged by VCG, and IS is saturated. The saturation characteristics in w/o-filament state provide an interval of stable IS. In w/-filament state (red lines), shunt currents induced by 0D interface modulate IS transport behavior. VCG controls the potential of 0D interface (ϕ0D) by modulating the channel between drain and 0D interface, and VIN controls the 0D interface shunt current that is determined by the difference between ϕ0D and VIN based on Ohms Law. When VCG = − 1.80 V, the CG-controlled channel is bent upward to reduce the flow of electron carrier from 0D interface to drain. As VIN is reduced to −0.8 V, ϕ0D is decreased, and the potential difference between 0D interface and source is reversed by negative ϕ0D, so negative IS is exhibited (light red lines in Fig. 2f). When VIN is increased, the shunt current is reduced and the difference between ϕ0D and VS is decreased, so the absolute of negative IS is reduced and eventually reach to zero. With the increase of VCG, electron carrier flow from 0D interface to drain is enhanced by lowered band of the CG-controlled channel, ϕ0D approaches to or exceeds VS, and the IS − VIN transport curve is lifted (from light to dark red lines in Fig. 2f). For VCG > − 0.60 V, ϕ0D is close to the drain voltage VD, so the curve of IS approximates to the transfer curve of w/o-filament state. The energy band variations with different voltage configurations are shown in Supplementary Fig. 6. The saturation characteristics in w/o-filament state, bidirectional current modulations, and current approximation effect jointly enable the functionality and stability of attention distribution computing. Figure 2g illustrates attention distribution computing mechanism, where VIN− and VIN+ represent input voltage levels of logic 0 and 1, respectively. Transport curves in different states vary with VCG. Spatial attention (αspatial) and temporal attention (αtemporal) are used to quantify the intensities of spatial and temporal information in the output, which are given by:
1
2
where IS0 denotes the maximum source current under a given VCG. With the increase of VCG, αspatial is increased, and αtemporal is decreased (Fig. 2h), which enables dynamic adjustment of attention.For determination computing, filament states store weight information, and the determination signal is mapped from situation values. Drain and source are reversely biased. The attention-inspired device exhibits bidirectional IS responses, which enable bipolar weighted analog computing (Fig. 2i). In w/o-filament state, 2D homojunction is formed by the gating of VIN. For negative VIN (VIN < − 0.50 V), the homojunction potential barrier is reversely biased to cut off the channel. For positive VIN (VIN > 0.20 V), the barrier is forwardly biased, and IS is turned on (the blue line in Fig. 2i). In w/-filament state, when VIN < − 0.50 V, negative shunt current is imported from 0D interface, and ϕ0D is reduced. A large VCG (0.6 V) bends down the CG-controlled channel and increases electron carrier density from 0D interface to drain, which attenuates the potential reduction effect of ϕ0D induced by shunt current. For VIN < − 0.50 V, the homojunction potential barrier is reversely biased, and IS is reduced to the sub-threshold level. When VIN > 1.30 V, positive shunt currents are induced, ϕ0D is larger than VS, and IS is positive (the red line in Fig. 2i). Determination computing is performed by:
3
where Ie is denoted as the unit current, and w is the weight. w = w− in w/o-filament state and w = w+ in w/-filament state. xIN denotes the input data logic (xIN = 0 for VIN = VIN− and xIN = 1 for VIN = VIN+). When xIN = 0, the attention-inspired device is resting (IS is zero). When xIN = 1, the attention-inspired device exhibits excitatory (IS is positive, w/-filament state) or inhibitory (IS is negative, w/o-filament state) behaviors. The weight plasticity property of the attention-inspired device is shown in Fig. 2j, where the ratio of w+ and w− can be linearly adjusted from 0 to 1 by VIN+. Shunt current characterizations of the attention-inspired device corresponding to the functionalities in computing mode are provided in Supplementary Note 3. The switching between writing and computing modes is repeatable, and the 2D electrostatic modulation is stable after multiple filament state transfer cycles (Supplementary Note 4).Hardware adaptive spatial-temporal information processing
Functionalities of the attention-inspired device can be applied to adaptive spatial-temporal information processing. Figure 3a demonstrates an adaptive spatial-temporal information processing primitive based on attention-inspired device arrays. The array circuit schematic is shown in Supplementary Fig. 9. Situations are input to the determination network to determine the optimized attention (Fig. 3b). Idet denotes the determination current that linearly maps to the spatial attention. When spatial info. demand is true (T) and temporal info. demand is false (F), Idet is positively large (2.99 μA), spatial attention is higher than temporal attention, and spatial information is focused. When there is temporal but not spatial info. demand, Idet is negatively large (−2.81 μA), temporal attention is higher than spatial attention, and temporal information is focused. When both information is demanded or both is not demanded, Idet is close to 0, and both spatial and temporal information is obtained. Idet values in different situations are listed in Supplementary Table 2. A data stream with 5×5 units captures a moving object A and static object B (Fig. 3c), and is input to the attention-inspired device array. Data of each input frame is shown in Supplementary Fig. 10. A single frame is output from the array, containing adjustable spatial and temporal information under given attention distributions. Figure 3d illustrates the information processing flow. Frames of the data stream are sequentially input and stored. For each data update period, the input data (t2) interacts with the stored data (t1) and outputs the frame with enhanced spatial or temporal information, then the input data is stored in the array and will be interacted with the next input data in the following period. The output flow at t1 (Fig. 3e, h, k), t2 (Fig. 3f, i, l), and t3 (Fig. 3g, j, m) are obtained. When spatial information is focused (spatial attention is 100%, Fig. 3e−g), locations of object A and B at each time are shown in output frames, and temporal information is neglected. When complete information is focused (spatial attention is 40%, Fig. 3h−j), complete spatial and temporal information is extracted. The moving direction of A is detected, and locations of both A (moving) and B (static) are obtained. When temporal information is focused (spatial attention is 0%, Fig. 3l−m), the motion feature of A is largely highlighted and the static object B is neglected. Output source data matrices are provided in Supplementary Fig. 11. The current values less than 10 nA are under the noise level. The statistical analysis of the fabricated attention-inspired devices is shown in Supplementary Fig. 12.
Fig. 3 Attention-inspired device arrays for hardware adaptive spatial-temporal information processing. [Images not available. See PDF.]
a Optical image of a 2 inch wafer production. Scale bar, 1 cm. Inset: CVD Monolayer MoS2 on sapphire. Scale bar, 1 cm. Scanning electron microscopy image of a 5 × 5 attention-inspired device array (Scale bar, 100 μm) and each device unit (Scale bar, 10 μm). b, Experimental results of determination network. Vsit1 represents the situation 1 (spatial info. demand) voltage. Vsit2 represents the situation 2 (temporal info. demand) voltage. T (logic 0) means that the situation is true, and F (logic 1) means that the situation is false, which are represented by Vsit− = −1.00 V and Vsit+ = 2.50 V respectively. c Schematic of the input data stream that captures a moving object A (the arrow marks the moving direction), and a static object B. d Data processing flow of the attention-inspired device array. Each frame is sequentially input and stored in the array. (e−g) Outputs in t1 (e), t2 (f), and t3 (g) contain spatial information and neglect temporal information. h−j Outputs in t1 (h), t2 (i), and t3 (j) process complete spatial and temporal information. k−m Outputs in t1 (k), t2 (l), and t3 (m) focus on temporal information.
Attention-enhanced high-efficiency edge intelligence
The attention-inspired device can be used for edge intelligence scenarios that require immediate processing with reduced hardware overhead. Attention-enhanced autonomous driving platforms are demonstrated to reveal the adaptability of the attention-inspired device to perform full-range attention distribution and real-time response to dynamic situation variations for ever-changing environments. Figure 4a illustrates the output images of the attention-enhanced infrastructure under a full range of attention distribution from 0 to 100%. When spatial attention is 0%, the background and static objects are neglected. Temporal information of moving vehicles and pedestrians is detected. When spatial attention is 100%, the system exclusively detects spatial information. The output image contains all the objects, and does not extract temporal information. When spatial attention is not 0 or 100%, complete spatial and temporal information is detected by the attention-enhanced infrastructure. With the increase of spatial attention, the system draws more attention to spatial information and less to temporal information. Attention is adjustable in real-time to ensure dynamic adaptation to varying traffic situations. Detailed implementation processes are provided in Supplementary Note 5. Besides, attention-enhanced equipment is compatible with present artificial intelligence algorithms for complex information recognition tasks (Supplementary Note 6).
Fig. 4 Highly adaptive edge equipment and performance projections. [Images not available. See PDF.]
a Illustrations of the attention-enhanced infrastructure. Attention can be continuously adjusted from 0 to 100%, and examples under 6 values of spatial attention are shown in (a). (b−d) Illustrations of the attention-enhanced vehicle. Attentions are dynamically adjusted in different scenes, including wide street (b), congested street (c), and crossroad (d). e−f Simulated latency (e) and energy (f) projections of the conventional transistor-based circuit and the attention-inspired-device-based architecture for spatial-temporal information processing with attention adjustment from 0 to 100%. g Area, latency, and energy improvements of the attention-inspired device.
Attention-enhanced vehicle is demonstrated with dynamic responses to situation variations (Supplementary Table 5) in different scenes. When the vehicle is driving fastly on a wide street, temporal information of adjacent vehicles should be emphasized to avoid collision (Fig. 4b). Therefore, highlighted temporal information of adjacent vehicles is required. The attention-enhanced vehicle processes situations and draws major attention to temporal information. Resultantly, moving cars and road lines that are required for routing are highlighted in the output. When the vehicle is driving on a congested street, the movement of relatively fast vehicles should be monitored to remind the system to take care, and locations of slow or static vehicles should be detected as well (Fig. 4c). Spatial and temporal information should be captured with optimized proportion. The vehicle analyzes situations and outputs both spatial and temporal information, where the fast black car at the lower right side is highlighted and other vehicles with low speeds are located. When the vehicle is at a crossroad, static objects including traffic lights should be detected (Fig. 4d). Responding to situations, the vehicle attaches major attention to spatial information. The red left-turn signal and green straight-through signal are shown in the output. Attention-enhanced equipment has been verified with the capacity to capture significant information in various scenes, and has the potential to be applied to complicated and ever-changing environments.
For the attention-inspired device, both spatial and temporal information are in situ processed, whereby a large amount of computation resources is saved by the integrated multidimensional information processing functionality. Furthermore, temporal information is sequentially stored in the attention-inspired device, saving the peripheral memory units and data transmission operations. The proposed architecture performance is analyzed and illustrated in Fig. 4e−g. We have compared the proposed architecture to a standard complementary metal-oxide-semiconductor (CMOS) circuit composed of conventional transistors. Verilog-A models of the transistors and schematic circuits have been built, which perform 4-bit attention adjustment of spatial and temporal information with 6.7% attention precision (Supplementary Note 7). The time latencies (Fig. 4e) and energy costs (Fig. 4f) are measured with a variety of attention adjustments from 0% to 100%. The energy costs of peripheral memory units are not included in the total energy consumption of the transistor-based circuit. The attention-inspired device exhibits μs-level time latencies that are more than tenfold lower than the transistor, and maintains pJ-level energy consumption, achieving an energy reduction of three orders of magnitude. The area efficiency, average latency, and average energy are benchmarked (Fig. 4g). The attention-inspired device shows a 190-fold area reduction and a 47-fold latency decrease. Due to the largely reduced number of devices in the circuit, and the highly shortened propagation delay of each operation, 1411-fold energy reduction is achieved.
Discussion
In conclusion, 0D-2D hetero-dimensional modulations perform in situ attention-inspired information processing, and the modulation strength can be continuously adjusted in a large range. The attention-inspired device based on adjustable hetero-dimensional modulations realizes complete information perception, and has been utilized to establish the adaptive spatial-temporal information processing architecture. Attention-inspired device arrays have been used to process a 5 × 5-unit data stream. The optimized attention is adjusted with situation variations in real-time. Experiments of attention distribution from 0 to 100% show a full-range adjustment of spatial-temporal information intensities. The attention-inspired device has been applied to autonomous driving platforms. The attention-enhanced infrastructure and vehicle exhibit dynamic response capability to traffic scene variations. We have benchmarked the performance of the attention-inspired device with 99.5% area, 97.9% latency, and 99.9% energy reductions compared to the transistor-based CMOS circuit. We believe that the proposed attention-inspired architecture can lead to advances in spatial-temporal information perception for edge computing applications.
Methods
Transfer of monolayer MoS2
2-inch CVD Monolayer MoS2 on sapphire substrate was bought from Shenzhen 6Carbon Technology Co., Ltd. 8 wt% Poly(methyl methacrylate) (PMMA) solution was spin-coated at 1,500 rpm for 60 s on sapphire/MoS2 and baked at 100 °C for 120 s. After that, the MoS2/PMMA stack was lifted off by deionized water and transferred to the target sample. Then, the excess moisture was air-dried at room temperature for more than 12 h, and an annealing process at 60 °C for 2 h was conducted to remove residues and ensure adhesion between MoS2 and the substrate. The PMMA was removed by acetone for 1 h twice.
Device fabrication
The SiO2 (300 nm)/p + + Si substrate was patterned by a normal lithography process using AZ601 as the photoresist and wet etching process in a buffered oxide etch (BOE) solution, followed with 15 nm HfO2 layer deposition by atomic layer deposition (ALD). CG electrodes were fabricated through normal lithography using NR9-1000py as the photoresist followed by e-beam evaporation (EBE) of 1 nm Cr/15 nm Pd, and 15 nm ALD-grown HfO2 was deposited. The CVD monolayer MoS2 was wet-transferred on the substrate. The MoS2 pattern was defined by normal lithography using AZ601, and was etched by oxygen plasma treatment with O2 at 150 sccm and 150 W power for 7 min. Then, the drain and source electrodes were fabricated with normal lithography using NR9-1000py and EBE deposition of 5 nm Cr/35 nm Pd. 14 nm HfO2 was deposited by ALD to form the interfacial dielectric. Finally, normal lithography with NR9-1000py and EBE process was conducted to pattern and deposit the 50 nm Ag electrode. ALD processes are carried out at 200 °C.
Device characterization
The SEM images of devices were measured by Zeiss GeminiSEM 300 (Carl Zeiss, Germany) by Inlens secondary and backscatter electron imaging. The accelerating voltages were 7 kV (the middle image of Fig. 3a) and 10 kV (Fig. 1b, the right side image of Fig. 3a). TEM and EDS images were characterized by Talos F200S G2 S/TEM (Thermo Fisher Scientific). The accelerating voltage was 200 kV. Electrical characterizations were performed on a probe station (TS2000-HP, MPI) connecting to an Agilent B1500A semiconductor parameter analyzer. Pulse measurements were conducted using Keysight B1530A waveform generator/fast measurement units (WGFMUs). Electrical tests were carried out at room temperature and ambient environment.
Acknowledgements
This work was supported by the National Key R&D Program (2022YFB3204100, 2021YFC3002200), the National Natural Science Foundation (U20A20168, No. 62374099) of China, the Young Scientists Fund of the National Natural Science Foundation of China (No. 62404121), STI 2030—Major Projects under Grant 2022ZD0209200, Beijing Natural Science Foundation-Xiaomi Innovation Joint Fund (L233009) and Beijing Natural Science Foundation (L248104), the Research Fund from Tsinghua University Initiative Scientific Research Program, a grant from the Guoqiang Institute, Tsinghua University, Independent Research Program of School of Integrated Circuits, Tsinghua University, Tsinghua University Fuzhou Data Technology Joint Research Institute, and CIE-Tencent Robotics X Rhino-Bird Focused Research Program.
Author contributions
F.W. and J.P. conceived the idea and the project. T.-L.R., D.Y., Y.Y., and H.T. supervised the project. J.P., F.W., and Y.L. conducted the experiments. J.P., K.Q., and K.J. performed the simulations. Z.W., P.G., and J.Y. were involved in device fabrication and characterization. J.P. and F.W. co-wrote the manuscript with inputs from all the co-authors. All the authors discussed the results and commented on the manuscript.
Peer review
Peer review information
Nature Communications thanks Kah-Wee Ang, Ilia Valov and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
Image data used for demonstrations is preprocessed from the publicly available dataset42. Other data supporting the key findings of this study are provided in the article and the Supplementary Information file. Raw data of the current study are available from the corresponding authors via email.
Code availability
The codes for plotting the data are available from the corresponding author via email.
Competing interests
The authors declare no competing interests.
Supplementary information
The online version contains supplementary material available at https://doi.org/10.1038/s41467-025-62868-7.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Zhong, X., Law, M.-K., Tsui, C.-Y. & Bermak, A. A fully dynamic multi-mode CMOS vision sensor with mixed-signal cooperative motion sensing and object segmentation for adaptive edge computing. IEEE J. Solid-State Circuits55, 1684–1697 (2020).
2. Serrano-Gotarredona, T; Linares-Barranco, B. A 128×128 1.5% contrast sensitivity 0.9% FPN 3 µs latency 4 mW asynchronous frame-free dynamic vision sensor using transimpedance preamplifiers. IEEE J. Solid-State Circuits; 2013; 48, pp. 827-838.2013IJSSC.48.827S
3. Choo, K. D. et al. Energy-Efficient Low-Noise CMOS Image Sensor with Capacitor Array-Assisted Charge-Injection SAR ADC for Motion-Triggered Low-Power IoT Applications. in Proc. IEEE International Solid- State Circuits Conference - (ISSCC). 96-98 (IEEE, 2019).
4. Kandel, E. R. et al. Principles of NeuralScience Ed. 4 (McGraw-Hill, 2000).
5. Zhang, W et al. Neuro-inspired computing chips. Nat. Electron.; 2020; 3, pp. 371-382.2020ElL..56.371Z
6. Duncan, J. An adaptive coding model of neural function in prefrontal cortex. Nat. Rev. Neurosci.; 2001; 2, pp. 820-829.1:CAS:528:DC%2BD38Xmt1WitLk%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/11715058]
7. Jackson, J; Rich, AN; Williams, MA; Woolgar, A. Feature-selective attention in frontoparietal cortex: multivoxel codes adjust to prioritize task-relevant information. J. Cogn. Neurosci.; 2017; 29, pp. 310-321. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27626230]
8. Corbetta, M. Frontoparietal cortical networks for directing attention and the eye to visual locations: identical, independent, or overlapping neural systems?. Proc. Natl Acad. Sci. USA; 1998; 95, pp. 831-838.1998PNAS..95.831C1:CAS:528:DyaK1cXosFSgsg%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/9448248][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC33805]
9. Squire, L. R. et al. Fundamental neuroscience Ch. 23 (Elsevier, 2008).
10. Buschman, TJ; Kastner, S. From behavior to neural dynamics: an integrated theory of attention. Neuron; 2015; 88, pp. 127-144.1:CAS:528:DC%2BC2MXhs1ers7fN [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26447577][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4604109]
11. Scolari, M; Seidl-Rathkopf, KN; Kastner, S. Functions of the human frontoparietal attention network: evidence from neuroimaging. Curr. Opin. Behav. Sci.; 2015; 1, pp. 32-39. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27398396]
12. Mennel, L et al. Ultrafast machine vision with 2D material neural network image sensors. Nature; 2020; 579, pp. 62-66.2020Natur.579..62M1:CAS:528:DC%2BB3cXksVWmsLo%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32132692]
13. Jayachandran, D et al. A low-power biomimetic collision detector based on an in-memory molybdenum disulfide photodetector. Nat. Electron.; 2020; 3, pp. 646-655.
14. Huang, X et al. An ultrafast bipolar flash memory for self-activated in-memory computing. Nat. Nanotechnol.; 2023; 18, pp. 486-492.2023NatNa.18.486H1:CAS:528:DC%2BB3sXlslajsbY%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36941359]
15. Ning, H et al. An index-free sparse neural network using two-dimensional semiconductor ferroelectric field-effect transistors. Nat. Electron.; 2025; 8, pp. 222-234.
16. Ning, H et al. An in-memory computing architecture based on a duplex two-dimensional material structure for in situ machine learning. Nat. Nanotechnol.; 2023; 18, pp. 493-500.2023NatNa.18.493N1:CAS:528:DC%2BB3sXlslajtr0%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36941361]
17. Wu, G et al. Ferroelectric-defined reconfigurable homojunctions for in-memory sensing and computing. Nat. Mater.; 2023; 22, pp. 1499-1506.2023NatMa.22.1499W1:CAS:528:DC%2BB3sXitVeisr7L [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37770677]
18. Kang, JH et al. Monolithic 3D integration of 2D materials-based electronics towards ultimate edge computing solutions. Nat. Mater.; 2023; 22, pp. 1470-1477.2023NatMa.22.1470K1:CAS:528:DC%2BB3sXisV2is7vF [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38012388]
19. Ahmed, F. & Sun, Z. Empowering neuromorphic computing with topological states. J. Semicond. 45, 110401–110402 (2024).
20. Pan, C et al. Reconfigurable logic and neuromorphic circuits based on electrically tunable two-dimensional homojunctions. Nat. Electron.; 2020; 3, pp. 383-390.1:CAS:528:DC%2BB3cXhtlWgurnM
21. Liu, K et al. An optoelectronic synapse based on α-In2Se3 with controllable temporal dynamics for multimode and multiscale reservoir computing. Nat. Electron.; 2022; 5, pp. 761-773.1:CAS:528:DC%2BB38Xis1ansLvK
22. Kamaei, S et al. Ferroelectric gating of two-dimensional semiconductors for the integration of steep-slope logic and neuromorphic devices. Nat. Electron.; 2023; 6, pp. 658-668.1:CAS:528:DC%2BB3sXhvVGntrfL
23. Chen, Q et al. Complementary memtransistors for neuromorphic computing: how, what and why. J. Semicond.; 2024; 45, 061701.2024JSemi.45f1701C
24. Chen, J et al. Optoelectronic graded neurons for bioinspired in-sensor motion perception. Nat. Nanotechnol.; 2023; 18, pp. 882-888.2023NatNa.18.882C1:CAS:528:DC%2BB3sXos1erurY%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37081081]
25. Tan, H; van Dijken, S. Dynamic machine vision with retinomorphic photomemristor-reservoir computing. Nat. Commun.; 2023; 14, 2023NatCo.14.2169T1:CAS:528:DC%2BB3sXot1Wms74%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37061543][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10105772]2169.
26. Huang, M. et al. Plasmon-enhanced optoelectronic graded neurons for dual-waveband image fusion and motion perception. Adv. Mater. 37, e2412993 (2024).
27. Huang, H et al. In-sensor compressing via programmable optoelectronic sensors based on van der Waals heterostructures for intelligent machine vision. Nat. Commun.; 2025; 16, 1:CAS:528:DC%2BB2MXhtlalt7vL [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/40268944][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12019373]3836.
28. Zhang, Z et al. All-in-one two-dimensional retinomorphic hardware device for motion detection and recognition. Nat. Nanotechnol.; 2022; 17, pp. 27-32.2022NatNa.17..27Z [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34750561]
29. Zhou, Y et al. Computational event-driven vision sensors for in-sensor spiking neural networks. Nat. Electron.; 2023; 6, pp. 870-878.
30. Zhu, X et al. High-contrast bidirectional optoelectronic synapses based on 2D molecular crystal heterojunctions for motion detection. Adv. Mater.; 2023; 35, e2301468. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37014930]
31. Dang, Z et al. Object motion detection enabled by reconfigurable neuromorphic vision sensor under ferroelectric modulation. ACS Nano; 2024; 18, pp. 27727-27737.1:CAS:528:DC%2BB2cXitVCru7rO [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39324409]
32. Yang, Y et al. Electrochemical dynamics of nanoscale metallic inclusions in dielectrics. Nat. Commun.; 2014; 5, 2014NatCo..5.4232Y1:CAS:528:DC%2BC2cXhvF2mur3J [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24953477]4232.
33. Wang, XF et al. Two-mode MoS2 filament transistor with extremely low subthreshold swing and record high on/off ratio. ACS Nano; 2019; 13, pp. 2205-2212. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30694651]
34. Park, E et al. Quasi-zero-dimensional source/drain contact for fermi-level unpinning in a tungsten diselenide (wse2) transistor: approaching schottky-mott limit. ACS Nano; 2024; 18, pp. 29771-29778.1:CAS:528:DC%2BB2cXit1SrsLzJ [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39405176]
35. Du, X et al. Memristive feature and mechanism induced by laser-doping in defect-free 2D semiconductor materials. J. Semicond.; 2024; 45, 072701.1:CAS:528:DC%2BB2cXisFOmtLrP
36. Zhu, K et al. Hybrid 2D-CMOS microchips for memristive applications. Nature; 2023; 618, pp. 57-62.2023Natur.618..57Z1:CAS:528:DC%2BB3sXhtVyju7vJ [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36972685][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10232361]
37. Yan, X et al. Robust Ag/ZrO2/WS2/Pt memristor for neuromorphic computing. ACS Appl. Mater. Interfaces; 2019; 11, pp. 48029-48038.1:CAS:528:DC%2BC1MXit1Oqt7rI [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31789034]
38. Choi, S et al. SiGe epitaxial memory for neuromorphic computing with reproducible high performance based on engineered dislocations. Nat. Mater.; 2018; 17, pp. 335-340.2018NatMa.17.335C1:CAS:528:DC%2BC1cXltFOktbc%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29358642]
39. Kim, J., Kwon, O., Seo, J. & Hwang, H. Vertical-switching conductive bridge random access memory with adjustable tunnel gap and improved switching uniformity using 2D electron gas. Adv. Electron. Mater. 11, 2500650 (2024).
40. Valov, I. & Tsuruoka, T. Effects of moisture and redox reactions in VCM and ECM resistive switching memories. J. Phys. D Appl. Phys. 51, 413001 (2018).
41. Tsuruoka, T et al. Effects of moisture on the switching characteristics of oxide-based, gapless-type atomic switches. Adv. Funct. Mater.; 2011; 22, pp. 70-77.
42. Institue for AI Industry Research (AIR), Tsinghua University. Vehicle-Infrastructure Collaborative Autonomous Driving: DAIR-V2X Dataset (AIR, 2021).
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.