Content area
Highlights
A comprehensive review focused on the recent advancement of the advanced and artificial intelligence (AI) chip is presented.
The design tactics for the enhanced and AI chips can be conducted from a diversity of aspects, with materials, circuit, architecture, and packaging technique taken into considerations, for the pursuit of multimodal data processing abilities, robust reconfigurability, high energy efficiency, and enhanced computing power.
A broad outlook on the future considerations of the advanced chip is put forward.
Recent years have witnessed transformative changes brought about by artificial intelligence (AI) techniques with billions of parameters for the realization of high accuracy, proposing high demand for the advanced and AI chip to solve these AI tasks efficiently and powerfully. Rapid progress has been made in the field of advanced chips recently, such as the development of photonic computing, the advancement of the quantum processors, the boost of the biomimetic chips, and so on. Designs tactics of the advanced chips can be conducted with elaborated consideration of materials, algorithms, models, architectures, and so on. Though a few reviews present the development of the chips from their unique aspects, reviews in the view of the latest design for advanced and AI chips are few. Here, the newest development is systematically reviewed in the field of advanced chips. First, background and mechanisms are summarized, and subsequently most important considerations for co-design of the software and hardware are illustrated. Next, strategies are summed up to obtain advanced and AI chips with high excellent performance by taking the important information processing steps into consideration, after which the design thought for the advanced chips in the future is proposed. Finally, some perspectives are put forward.
Introduction
The past decade has witnessed the rapid progress of artificial intelligence (AI) techniques, which has revolutionized a wide range of fields, including the way to interpret information, the approach to discovery new materials, the method for creative work, and so on [1, 2, 3, 4, 5, 6, 7, 8–9]. Particularly, great progress has been made in the functional materials and novel devices [10, 11–12], which calls for AI to further promote these fields. Models of AI contain billions of parameters for the realization of high accuracy, which proposes high demand for the energy efficiency of processors. For instance, the deep neural network (DNN) model which contains many parameters can greatly promote the development of the recognition of images [13], the classification of videos, the transcription of speech [14, 15], and so on. To be specific, it has been verified that transformer and recurrent neural network transducer (RNNT) models with up to one billion parameters have shown a remarkable decrease in word error rate (WER) for the automated transcription of spoken English-language sentences. In addition to transcription, deep learning (DL) has also enhanced the performance of computer vision remarkably, which has been widely applied in the fields of autonomous driving [16], intelligent robotics [17], smart wearable devices [18, 19], and so on. Accordingly, new challenges have been put forward for the chips to handle these AI tasks. The advanced chips, which are featured with improved computing efficiency, reduced energy consumption, enhanced reliability, and excellent flexible expansion to be qualified for dealing with massive data, parallel tasks, and high concurrent requests proposed by the AI tasks, have drawn great attention, and significant progress of the advanced chips has been made by means of not only making improvements on the current silicon materials and silicon technologies, but also developing novel materials and modes [20]. For instance, data center chips, which are specifically designed for data centers, are featured with high performance and energy efficiency, and therefore, they are applied for cloud computing, AI training and inference, and big data analysis. Edge computing chips, which mainly pay attention to low latency, low power consumption, and miniaturization, have their advantages for the tasks required for real-time processing and environmental adaptability. Design thought for advanced chips referred to the process of transforming circuit structures and functions into physical layouts for the application of high-performance computing, covers wide aspects, which include but not limited to materials selection, device and circuit design, architecture optimization, and packaging technique development, and therefore, it is of importance for the rapid progress made in this field.
Many endeavors have been made to meet the challenges proposed by the AI tasks, with a lot of achievements and techniques emerging as the most promising approaches to address these issues [21, 22, 23, 24, 25–26]. For example, photonic computing makes it possible to process data faster and more energy efficiently [27]. At the meantime, the utilization of AI for optics can also improve the design and control of these optical systems [28, 29, 30, 31, 32–33]. Both the model training and inferential capability have been taken into considerations with the large-scale photonic chiplet and fully forward mode training being put forward. Computing-in-memory (CIM) which is inspired by the way in which human brain is used to process information has been put forward to resolve the von Neumann bottleneck [34]. Not only various synaptic arrays, but also efficient neuronal devices are developed. The advanced cognitive capabilities owned by the human brain have fueled a significant amount of AI research, which promote the development of sophisticated brain-inspired algorithms, as well as neuromorphic hardware with the pursuit to simulate various aspects of neural processing. Efforts have been made to develop efficient neuronal electronics. For instance, a novel dendrite function-like neuron has been developed [34]. Biocomputing, which is widely an interdisciplinary field combining biology and computer technology and uses other units instead of electrons or photons for information processing, has also emerged to address the existing issues. In addition to novel materials and new modes, improvements have also been made in areas of the conventional silicon-based chips, and more advanced preparation and packaging technology are proposed to deal with the increasing system complexity.
Significant progress has been made in both the hardware and the software of the advanced chips recently, which favors the fabrication of the chips. It is proposed that the fabrication of the chip bears some analogy to the construction of buildings. The fabricated chips can then be applied to handle various information to realize complexed and AI tasks, including computer vision, speech recognition and transcription, parallel imaging and all-optical classification, patients’ gaits classification, and other various fields, with Internet of Things (IoT), smart travel, smart robot, and smart home included (Fig. 1). An analog-AI chip with 35 million phase-change memory (PCM) devices has been developed [1]. A systemic energy efficiency of 74.8 peta-operations per second per watt is managed to be achieved by a type of all-analog photoelectronic chip [27]. Further to the inference chip, a fully forward mode (FFM) learning has been proposed for the training of optical neural networks, which is able to accomplish the compute-intensive training process on the physical system [35]. The fully hardware implementation of CIM has been experimentally realized by integrating neuron devices with a low accuracy loss [34]. Neuromorphic hardware equipped with associative learning abilities has been fabricated [36]. The low processor resting power of 0.42 mW has been achieved by a neuromorphic system on chip with the features of no-input calling for no energy, while a real-time power of as low as 0.70 mW can be realized for this system by the co-design of algorithm, software, and hardware [37]. The large-scale photonic chiplets, Taichi, which has millions-of-neurons capability with 160-tera-operations per second per watt (TOPS/W) energy efficiency, have been put forward. It has been verified that the high-fidelity AI-generated content can be realized by the photonic chiplet with up to two orders of magnitude of improvement in efficiency [38]. Publication number and the citation frequency of the papers concerning about the AI chip are counted from web of science. The data are collected with “AI chip” or “advanced chip” as topic words and are also filtered according to the actual relevance of the topic. As a result, an increasing number of original works have been published with high impact and sharply increasing citation frequency, which is demonstrated in Fig. 2. These results show that the research focused on the advanced chips has drawn great attention. The design strategies have been launched from various aspects, including materials, devices, circuits, architecture, and packaging techniques with the pursuit for multimodal data processing, reconfigurability, enhanced computing power, and high energy efficiency (Fig. 3). For instance, for multimodal data processing, which is required to handle different types of data, like images, sounds, and texts, proper packaging technology can facilitate the integration of different processing units more closely to enhance the processing speed, while reducing latency. Besides, the reconfigurable architecture which makes it possible for the hardware structure to be reconfigured according to different tasks also makes contribution to the multimodal data processing with the adjustment to different algorithm. However, reviews from the view of recent design tactics for AI chips are few. Herein, this review focused on the advanced design of the high-performance chips by means of not only making improvements on the current silicon materials and silicon technologies, but also developing novel materials and modes, like photonic computing, and the quantum processors, among which many can meet the challenges proposed by the rapidly developing AI technology.
[See PDF for image]
Fig. 1
Overview of the advanced and AI chip. The design for the software and hardware favors the fabrication of the chips, which bears some analogy to the construction of buildings. The fabricated chips can then be applied to handle various information to realize complexed and AI tasks
[See PDF for image]
Fig. 2
Publication and the citation frequency of the papers concerning about the AI chip. The data are collected from web of science with “AI chip” or “advanced chip” as topic words, and are also filtered according to the actual relevance of the topic
[See PDF for image]
Fig. 3
Design strategies about the advanced chips. Design strategies carried for a material/device, reproduced with permission from Ref. [36] Copyright 2024, Springer, b circuit, reproduced with permission from Ref [39]. Copyright 2024, Wiley–VCH GmbH, c architecture, reproduced with permission from Ref. [27] Copyright 2023, Nature, and d packaging technique, reproduced with permission from Ref. [40] Copyright 2024, Nature. The design objective of realizing e multimodal data processing, reproduced with permission from Ref [41]. Copyright 2024, Nature, f reconfigurability, reproduced with permission from Ref [42]. Copyright 2023, Wiley–VCH GmbH, g high energy efficiency, reproduced with permission from Ref. [1] Copyright 2023, Nature, and h enhanced computing power, reproduced with permission from Ref. [43] Copyright 2024, Nature
In this review, the basic background of AI chips was introduced first, as well as their working mechanisms, after which the design ideas in regard to software and hardware from the aspects of both the technique development for the conventional silicon-based chips, and the adoption of novel modes that extend the information processing from electrons, to photons, quantum, and biological elements, were demonstrated. Key factors which should be under consideration when designing the advanced chips were discussed from the view of the information processing procedures. Last but not least, we put forward some ideas with respect to the outlook of the advanced chips.
Mechanisms
The chips are applied to deal with various information and data. For instance, data can be collected from multimodal sensors. As for a typical task, the information is first captured by the sensors and is then digitized by a large number of analog-to-digital converters (ADCs) [27] (Fig. 4a). Data are then processed and transmitted (Fig. 4b, c). The neural network (NN) on a digital processing unit can then be made use of to process the information for recognition, classification, and other purposes. Edge computing can implement data processing at the sensors. In particularly, as to a sensing-computing system on chip (SoC), the sensors can be integrated onto the chips to provide the information to be processed. For example, by leveraging the DVS as the eye of the chip, an asynchronous chip can be designed [44, 45–46]. As the brightness of the scene changes, the DVS is managed to generate a stream of events asynchronously and sparsely, which can then be processed by the operation of the processor in the chip. However, it is proposed that not all sensors are solid state due to the diverse types of sensors, and therefore some are not suitable for integrated computing units. In addition to the sensing-computing system, there is also a high demand for large language model (LLM) acceleration, and therefore, how to provide strong computing power support should be taken into considerations.
[See PDF for image]
Fig. 4
Schematic illustration for the working mechanism of the advanced chips. Schematic illustration for the stage of a sensing, reproduced with permission from Ref [41]. Copyright 2024, Nature, b memorizing, reproduced with permission from Ref. [36] Copyright 2024, Springer, c transmitting, reproduced with permission from Ref. [59] Copyright 2022, Nature, d computing, reproduced with permission from Ref. [27] Copyright 2023, Nature, and e task implement, reproduced with permission from Ref [39]. Copyright 2024, Wiley–VCH GmbH. Schematic illustration for the method to improve the performance of chips by f borrowing high-level brain dynamic mechanisms, reproduced with permission from Ref. [37] Copyright 2024, Nature, g adopting bionic Design method, reproduced with permission from Ref. [36] Copyright 2024, Springer, and h applying novel modes, reproduced with permission from Ref. [43] Copyright 2024, Nature
The neuromorphic hardware learning from the information processing of human brain is a promising candidate for next-generation computer architectures because of its massive parallelism, robust fault tolerance, and high efficiency, which is different to the conventional architecture. The exploiting of the neuromorphic computing systems makes it possible to implement the parallel processing, which enables the execution of separate complex tasks by making use of several processors simultaneously, leading to the enhanced processing efficiency [39, 47, 48, 49–50]. Moreover, it is also expected for the neuromorphic systems to accomplish the processing of integrated signals from various inputs. The development of materials has promoted the realization of these functions greatly. The electrochemical artificial synapses can facilitate the simultaneous processing of multi-input signals via a unit device. The working mechanisms of the electrochemical artificial synapses composed of the electrolyte-based dielectric and ion-permeable semiconducting layer origin from the resistance tuning of the channel with penetrated ions and the retentive relaxation property.
Information is expected to be processed by the chips as the way of human brain, including learning, reasoning, and memorizing [39]. It turns out that human brain is managed to run even more complex neural networks with a total energy need of only 20 W [51, 52]. A variety of behaviors in the biological synapses, which are responsible for the information transmission between biological neurons, are simulated by the artificial neuromorphic electronics to handle the information collected by the sensors. Inspirations are also expected to be obtained from some high-level brain dynamic mechanisms in regard to the design of neuromorphic chips [53]. For the human brain, an important feature is to allocate its resources dynamically according to the required demand. To be specific, the salient stimuli can receive greater attention, which can be manifested by the heightened spiking activity in brain regions or the corresponding neurons associated with the stimulus. This high-level dynamic computing nature of the human brain is expected to be learned by the neuromorphic chips which are featured with minimal energy consumption for no input and significant variations for input changes. From the perspective of functional materials, some potential candidates, like two-terminal memristors which are featured with their compact synapse-like structures, have been extensively explored to equip the electronics with high complexity and improved completeness like the biological neurons for information transmission and processing [54, 55].
High capacity and high-throughput computing architectures are then required to handle the complex multimodality information collected from the environment [56] (Fig. 4d), and finally, the chips can be applied to implement various tasks (Fig. 4e). Great endeavors have been made to enhance these processes to improve the overall performance of the whole systems by a series of attempts, including borrowing high-level brain dynamic mechanisms (Fig. 4f), adopting bionic design approach (Fig. 4g), applying novel modes (Fig. 4h), and so on. Photonic processors are proposed to be a key to the hardware-based AI accelerators [23, 57, 58]. For the realization of in-memory photonic convolutional processing free of data movement between the memory and photonic processors, photonic tensor core incorporating phase-change-material photonic memories has been made use of. Generally, the data carried by each input coherent light at different wavelengths are weighted by the phase-change-material photonic memories. As a result, various tasks can be accomplished by the chips, ranging from computer vision, speech recognition, to gaits classification, which makes them to be qualified for a diversity of fields, including IoTs, smart homes, intelligent robotics, and so on.
Co-design of the Software and Hardware
AI relies on hardware and software to simulate human intelligence, and it is critical to carry out the co-design of both the software and the hardware for the advanced and AI chips. Specifically, software programming is of importance for the construction and training of NN, while hardware is crucial to process and handle the data for AI operation [60, 61–62]. For example, although a highly programmable accelerator architecture for analog-AI has been proposed, it has yet to be demonstrated in hardware for the reason that the simulation study contains several design assumptions, among which one is the application of a dense and efficient circuit-switched 2D mesh for the exchange of massively parallel vectors of neuron-activation data over short distances, and another is the successful realization of DNN models which are large enough to be relevant for the commercial applications while maintaining high accuracy [1]. As a result, these issues should be solved for the design and fabrication of the analog-AI chips. Another case in point is that efforts have been made to design the CIM-based hardware systems in accordance with the requirements of the AI algorithm to successfully implement the extensive tasks of AI, promoting the commercial production of the CIM-based chips [34]. In this case, elaborate designs are essential in terms of both the optimized algorithms and innovative hardware for the neuromorphic computing systems. Besides, an algorithm-software-hardware co-design has also been put forward to realize the spike-based dynamic computing in the neuromorphic chip, with the hardware featured with no running energy for no-input, and the complete software toolchain for the efficient deployment of algorithms in a variety of dynamic vision applications [37].
Software
Some General Principles for Software Design
AI algorithms have been evolved rapidly. The intricate cognitive capabilities achieved by the human brain have sparked extensive research in AI with the promotion of sophisticated brain-inspired algorithms. It is worthwhile mentioning that the device-algorithm co-optimizations need to be carried out for the real-world application. Particularly, the software toolchain with data management, model simulation, and host management included is beneficial to deploy the algorithms and models efficiently for various applications [37]. Moreover, when developing different chips, the challenges and solutions at the software level are various, and design of the software is of important for all of these techniques, which lies in the aspects of model, algorithm adaptation, and toolchain. For instance, as to memristor, the integrated memory and computing architecture is required, while optical path programming is essential for photonic computing.
To Collaborate with the Hardware
The design of the software plays a crucial role in achieving various advantages of the advanced chips by working together with the hardware [37]. For instance, endeavors were made to combine the high-level dynamic computing nature of the brain with machine intelligence to equip the neuromorphic computing with energy advantages. The hardware was developed to meet the demand from dynamic computing, which indicated that no-input consumed no energy. Meanwhile, the design for an attention-based framework was also carried out to meet the challenge of dynamic computing which was featured with the fact that varied inputs consumed the energy with large variance. To accomplish this goal, inspirations for designing the dynamic spiking neural networks (SNNs) were gained from the understandings of visual attention in neuroscience. To be specific, since attention is a limited resource, the brain only processes a part of sensory input selectively. The neural related to attention can be divided into four structural levels, including circuit level, area level, neuron level, and synaptic level, and a general classification of attention neural circuits is the top-down versus bottom-up dichotomy (Fig. 5a). Top-down allocates the attention to internal behavioral goals of the brain, which can be presented through the priority map, while bottom-up deploys attention corresponding to the physical salience of a stimulus. As for the design of the framework for neuromorphic computing, a typical spiking neuron model and attention-based dynamic SNNs were illustrated as Fig. 5b, c. It was worthwhile mentioning that the dynamic framework acted as plug-and-play attention modules with the membrane potential optimized in a data-dependent manner, and combinable strategies of refinement and masking were provided by this dynamic framework. It was verified that a real-time power as low as 0.70 mW was successfully achieved by this neuromorphic system.
[See PDF for image]
Fig. 5
Schematic of how software designs facilitate the development of advanced chips. a Schematic diagram for the attention-based dynamic response in neuroscience. Illustration for b a typical spiking neuron model and c attention-based dynamic SNNs. a–c Reproduced with permission from Ref. [37] Copyright 2024, Nature. d Schematic diagram of the optical neural network model for multimodal classification. e Schematic diagram of the drop-out algorithm. d–e Reproduced with permission from Ref [41]. Copyright 2024, Nature
To Conduct the Design of Algorithm
Some challenges brought by the explosive growth of the AI can be met by the design of algorithm, like the issue that multiple types of data are needed to be handled along with the boost development of the artificial intelligence generated content (AIGC) [63, 64, 65–66]. For example, it was pointed out that the majority of photonic neuromorphic processors for DL were able to handle only a single data modality for the reason that abundant parameters for training in optical domain were lack. To address this issue, a trainable diffractive optical neural network (TDONN) chip weas developed. In particular, the optical neural network model designed for the multimodal classification tasks was formed by three parts with an input layer, five hidden layers, and an output layer included (Fig. 5d). After the procedures of feature extraction and feature fusion, a feature vector was got from the datasets of different modalities, which was then applied as the input of the NN with the size of the feature vector matching the number of neurons. Each of the vector element was encoded into the optical signal by intensity modulation. In the hidden layers, the neurons were arranged in accordance with a multi-layer layout. The connection weights between each neuron were adjusted during training, and therefore trainable neurons were deemed as a critical prerequisite for reconfigurable TDONN, since the strong reconfigurability was essential for the multimodal DL. It took two steps for training of the TDONN chip, with the first one to extract the features and the second step to train the tunable diffractive units to accomplish the target tasks. It was worthwhile mentioning that customized gradient descent algorithm and drop-out mechanism of optical neurons were designed for the realization of the function. Firstly, an iteration threshold Titer was set for each neuron in the hidden layer of TDONN. During the iteration process, for the condition where the neuron could not increase CF after T adjustments, the neuron was set to be inactivated, and in the following iterations, this inactivated neuron would not be adjusted. As the training progresses, the number of deactivated neurons increased, and only the activated neurons needed to be tuned, leading to the reduce of the workload (Fig. 5e).
Hardware
Some General Principles for Hardware Design
Hardware design is imperative for promoting the development of different types of chips, the reason that it can solve the problems of different chips, making full use of these chips in various fields. To be specific, memristor, which can simulate the plasticity of biological synapses, plays a critical role in the brain-inspired computing. Photonic computing is featured with ultra high-speed, while it is also encountered with the problem of poor compatibility with silicon-based electronic chip. The computing power of quantum computing to deal with specific problems far exceeds that of classical computers, but the extremely low-temperature requirement is usually a challenge. Neuromorphic computing is managed to mimic the structure of human brain, and it can realize event-driven computing by means of asynchronous SNN, which is qualified for real-time perception and IoT. Accordingly, new circuit layout or material structure design is carried out to meet these challenges.
To Develop the Materials
The development of materials is served as one of the most important supports for the thriving chip industry. For instance, CIM-based hardware systems are designed according to the requirements from AI algorithm to accelerate the extensive computations by means of eliminating frequent data transfers between memory and processing units [67, 68–69]. Accordingly, many endeavors have been made on the development of non-volatile memories (eNVMs) for the purpose of storing the weights in neural networks, with the PCM, RRAM, ferroelectric field effect transistor (FeFET), and other eNVMs included. Besides, more advanced functions are expected to be realized with high-efficiency algorithm while maintaining low hardware costs and high flexibility for the accomplishment of different application scenarios. As for the design of the hardware, a series of factors, like the stability, uniformity, and feasibility for large-scale realization, should be taken into consideration. Accordingly, efforts have been made not only by adopting novel modes, like the neuromorphic computing, photonic computing, and quantum computing, but also by improving the existing silicon chips, like the development of the package technique.
To Exploit New Mode: Neuromorphic Computing
Much efforts have been made on mapping the biological behavior in the nervous system to the electrical behavior in various devices, and many techniques have been emerged as the most promising approaches to meet the challenges brought by the AI tasks. It turns out that excessive energy consumption occurs with a significant amount of data moving between memory and processor, which is known as the von Neumann bottleneck [1]. CIM is proposed to be a promising approach to meet the challenge of increasing computational tasks brought about by the rapidly booming AI [34]. For the DNN models containing many large fully connected (FC) layers for the natural language processing (NLP), enormous movements of data are required in conventional digital implementation, while amortization over the subsequent computing is lacking. Analog-AI hardware is managed to meet this challenge by means of leveraging arrays of non-volatile memory (NVM) to perform the multiply–accumulate (MAC) operations, so that these workloads can be dominated directly in the memory [70, 71, 72–73]. When neuron-excitation data are moved to the location of the weight data, where the computation is executed, both the time and the energy are promising to be reduced. When taking the finite endurance and the power-hungry programming of NVM devices into consideration, it is inevitable that such analog-AI systems should be fully weight stationary. A highly heterogeneous and programmable accelerator architecture for analog-AI has been developed with the energy efficiencies 40–140 times higher than those of cutting-edge graphics processing units, but it has yet to be demonstrated in hardware due to the fact that several design assumptions are included [74].
Although the rapid progress has been made in CIM technology, it is crucial to recognize that the majority of the non-linear computations for the results after linear matrix–vector multiplying relies on conventional complementary metal oxide semiconductor (CMOS) circuits, with ADCs and digital circuits for complex arithmetic included, leading to excessive area and energy costs [75, 76] (Fig. 6a). It is crucial to make exploration for hardware implementation of activation functions on the basis of emerging devices and functional materials. Inspiration was obtained from dendritic computation of the pyramid neurons in the brain cortex to deal with the overhead in the hardware implementation of activation functions [34]. The distinguished calcium-mediated dendritic action potentials (dCaAPs) were brought into focus of the researchers which were in the pyramid neurons of the human layer 2 and 3 cortex. When compared to conventional all-or-none action potentials (APs), it was observed that the amplitude of dCaAPs becomes maximal for a certain threshold-level stimuli and was dampened for stronger stimuli (Fig. 6b), and therefore it was proposed that this distinctive dCaAP made it possible for a single neuron to implement XOR classification which typically required multilayered neural networks because of its inherent linear non-separability. It was pointed out that the electronic elements featured with negative differential resistance (NDR) were promising candidates of such mimicry, for which the measured response decreased as the stimulus intensity increased (Fig. 6c). NDR characteristics could be found in a wide range of electronics, among which Mott materials were one of the best candidates. As a well-studied Mott material, vanadium oxide (VO2) was investigated as a potential substitute for conventional activation units of NN. Moreover, this novel activation unit was managed to be integrated within a non-von Neumann architecture, which was verified by co-implementing 1T1R arrays and these neurons on a single hardware platform (Fig. 6d).
[See PDF for image]
Fig. 6
Schematic of how hardware design promotes the development of different types of chips. a Schematic of the DNN structure and how to be realized by conventional hardware. b Schematic illustration of the calcium-mediated dendritic action potentials (dCaAPs) and the conventional all-or-none APs. c Schematic of NDR, insulator–metal transition (IMT), and the XOR operation realized in a single device. d Schematic illustration for the fully-hardware implementation of DNN. a–d Reproduced with permission from Ref. [34] Copyright 2024, Wiley–VCH GmbH. e Optical image of the completed spin qubit wafer. f Schematic of the device alignment and contact. g Various measurements used to extract the data. h The data used for statistical analysis. e–h Reproduced with permission from Ref. [100] Copyright 2024, Nature. i Circuit tier prefabrication on a sacrificial substrate. j Physically peeling off circuit tier, and k van der Waals dry lamination. l Optical images and m the zoomed-in image of prefabricated circuit tier on 2 inch sacrificial substrate. n Optical image of the final device. o Schematic diagram and p optical image of a 10-tier M3D system. i–p Reproduced with permission from Ref. [40] Copyright 2024, Nature
In addition to the imitating of the essential synaptic functions, the in-depth study of the underlying learning and memory mechanisms in the biological brain is also vital for the realization of intelligent information processing at the hardware level [77]. For instance, it is proposed that the hardware realization of associative learning makes contribution to improving the functionality of NN, enhancing the performance of machine learning (ML) algorithms [78, 79]. Furthermore, it can also promote the development of more autonomous machines which are featured with the ability to adapt and learn in dynamical environments without the requirement for pre-programming [80, 81–82].
To Exploit New Mode: Photonic Computing
In the post-Moore era, greater challenges have been proposed for the continuous demand of higher performance [38]. Photonic computing has offered significant advantages for the unprecedented light-speed and low-consumption computing [21, 22], which empowers much faster and more energy-efficient processing of data. In this case, the features of light are made use of to represent the information, and propagation and interference are taken advantaged of for computing [57, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94–95]. Meanwhile, the utilization of AI for optics can promote the design and control of optical systems. Recently, both the photons and the electrons have been used in an all-analog way to come up with a practical solution for the intelligent computing [27]. Moreover, the development of integrated photonics also makes contribution for the implementation of intelligent tasks by the photonic computing chips [25, 96, 97, 98–99].
To Exploit New Mode: Quantum Computing
In addition to the neuromorphic computing and photonic computing, quantum computing has been emerged as another advanced type of computing [100]. To promote the applications of spin qubit technology, physical qubit count is required to increase substantially, which makes it essential to fabricate spin qubit devices with the density, volume, and uniformity comparable with those of classical computing chips composed of billions of transistors [101]. The spin qubit technology is featured with its inherent advantages for scaling due to the qubit size, and another advantage is the native compatibility with CMOS manufacturing infrastructure. As a result, it is pointed out that manufacturing spin qubit devices with the same infrastructure as classical computing chips is managed to release the potential of spin qubits for scaling, and it is possible for them to offer an approach for building the fault-tolerant quantum computers. Furthermore, the scale of cryogenic device testing must be launched to enable efficient device screening [102, 103]. Spin qubits based on electrons in Si have demonstrated impressive control fidelities, but the challenges exist in the aspects of yield and process variation. Recently, some progress has been made to address this issue. One case in point was that a testing technique taking advantages of the cryogenic 300-mm wafer prober for collecting the data in high volume on the performance of hundreds of industry-manufactured spin qubit devices at 1.6 K was developed. It took about 2 h to cool 300-mm wafers to an electron temperature of 1.6 K [100], and the transmission electron micrograph of a Si/SiGe quantum dot qubit device cross section is shown in Fig. 6e. As is demonstrated in Fig. 6f, the device pads were then aligned to the probe pins, and devices were connected to measurement electronics at room temperature. A diversity of measurements could then be used to extract the data (Fig. 6g), and when this process on many devices across the wafer was repeated, the statistical analysis of wafer-scale trends was managed to be implemented by making use of the device data, which is illustrated in Fig. 6h.
To Promote the Integrating Technique
Besides the new materials and novel modes for the development of the advanced chips, progress has also been made in the aspect of integrating technique [40]. Monolithic three-dimensional (M3D) integration, for which multiple stacked tiers are fabricated sequentially on the same wafer by deposition of the upper tiers, has been proposed to overcome the scaling limitation with higher device density, and it enables new 3D computation systems, in which case various tiers, like the logic, memory, and sensor, are managed to be vertically interconnected [104, 105–106]. As to the silicon-based M3D integration, challenges exist in the aspect of the low thermal budget, for which the process temperature of upper tiers should not exceed the back-end-of-line temperature to get rid of the performance degradation. It has been pointed out that two-dimensional (2D) semiconductors are promising for M3D integration, which is attributed to their dangling-bonds-free surface and the ability to be integrated to various substrates [107, 108, 109, 110, 111, 112–113]. Recently, an alternative low-temperature M3D integration method by van der Waals lamination of entire prefabricated circuit tiers has been developed. The detailed integration processes included the procedures of circuit tier prefabrication on a sacrificial substrate, physically peeling off circuit tier and van der Waals dry lamination, which is demonstrated in Fig. 6i–k. It was noticeable that the prefabrication of all circuit stacks was based on standard photolithography processes, and it was compatible with wafer-scale M3D integration, which is demonstrated in Fig. 6l–n. A 10-tier M3D circuit within a total thickness of approximately 8 μm could be realized to verify the high-density M3D systems with multiple circuit tiers in the vertical direction, which is shown in Fig. 6o, p.
Strategies to Design Advanced and AI Chip with Enhanced Overall Performance
For Memory Purpose
The complexed and comprehensive simulations about the functions of the biological learning and memory are expected to be accomplished by the artificial neuromorphic devices [36]. A large amount of research has been launched focused on the neuromorphic electronics featured with massive parallelism, high efficiency, and capability. In particularly, as a form of associative learning, classical conditioning generally comprised of conditional stimuli (CS) and unconditioned stimuli (US) contains four features, including acquisition, extinction, recovery, and generalization, which are relevant to information storage, elimination of outdated information, rememorizing, and storage of new information in a cycle [114]. Accordingly, synaptic electronics equipped with associative learning capabilities are potential candidates for next-generation AI. Light has been used to coordinate with electrical devices to fully realize the aforementioned four features of classical conditional when taking the shortcomings of crosstalk, poor sustainability, and complex circuits for purely electrical signals with into account [36]. What is more, the difference in the aspect of relaxation times between and electrical stimuli and light endows the devices an inherent advantage to realize the characteristics of classic conditioning. The associative learning was accomplished by optoelectronic memristors based on Ag/TiO2 nanowires (NWs): ZnO quantum dots (QDs)/FTO (ATZ-based device). As is shown in Fig. 7a, the flower nectar was served as the US that caused the proboscis extension, while the flower odor was served as CS which must be trained through the coordination of the olfactory and proboscis nerves to trigger the proboscis extension directly. A two-port ATZ-based memristive device was designed to simulate the synaptic behavior with a structure of the vertical arrangement similar to that of the synapses (Fig. 7b), and the SEM of the as-prepared device is demonstrated in Fig. 7c. It was verified that in addition to the basic synaptic behaviors, more advanced synaptic functions like learning-forgetting-relearning functions could also be achieved.
[See PDF for image]
Fig. 7
Schematic for the design strategies of AI chips in regard to data memory and transfer. a Schematic illustration of the proboscis extension response. b Schematic of the ATZ-based device. c SEM of the as-prepared device. a–c Reproduced with permission from Ref. [36] Copyright 2024, Springer. d Schematic illustration of the multi-dimensional communication. d Reproduced with permission from Ref. [59] Copyright 2022, Nature
During Transmitted Process
The issue of data transfer limit for high-performance silicon chips has drawn a lot of attention, for which several schemes have been proposed [59]. Optical computing has great potential in improving the speed of a diversity of ML applications, which is attributed to its enhanced data transfer, low latency, and fast computation rate when taking the fact that light travels much faster than electrical signal under considerations [13, 23, 58, 115, 116]. Besides, the use of optical interconnects has become as a potential technology that can address this problem. It is pointed out that the chip-scale optical interconnects are promoted by the development of wavelength-division multiplexing (WDM) technique, which makes it possible to realize the parallel signal transmission by means of encoding data independently carried on multiple frequencies of light [117, 118]. After that, in order to further increase the link bandwidth, attentions have been paid on the other promising dimension of signal encoding for multiplexing, like the spatial domain. To be specific, the light can be decomposed into a series of optical beams with orthogonal spatial cross sections, and these orthogonal spatial modes can act as independent communication channels [119, 120, 121, 122, 123, 124, 125–126]. It is possible for each of them to support a full WDM link, leading to the multiplicative effect on the bandwidth of an optical link provided by the mode-division multiplexing (MDM). Latterly, progress has been made focused on the integrating mode and WDM on a chip [127, 128, 129, 130, 131, 132–133].
In an attempt to offer new dimensions of data transfer with the aim of fulfilling the growing need for speed, an integrated multi-dimensional system that integrated wavelength and mode multiplexing on a silicon photonic circuit for the on-chip and chip-to-chip interconnects was put forward [59] (Fig. 7d). A multi-wavelength laser source was evenly distributed into multiple WDM transmitter circuits with each WDM circuit encoding data independently onto different frequencies of light. An inverse-designed MDM multiplexer took the overlapping modes from the multiple WDM transmitters, and after that they were transformed into copropagating spatially orthogonal modes. The data could then be transmitted through chip-to-fiber couplers and multimode fiber to the receiver. The MDM-WDM demultiplexers were used to separate the mode and wavelength channels, and photodiodes were taken advantages of for detection. It was verified that a 1.12-Tb/s natively errorfree data transmission could be fulfilled.
At the Computing Stage
Dynamic computing is a promising approach in DL, and the dynamic neural networks are managed to adapt the computational graphs to the input in the inference stage, showing the attractive properties in many aspects [134]. The neuromorphic and traditional AI systems are two typical paradigms for dynamic computing [37]. Particularly, neurons in SNNs communicate through spike trains, and the spike-based neuromorphic computing is naturally featured with a dynamic computational graph, with only a small portion of the overall spiking neurons being active at any moment and the rest being idle. In contrast, the neurons in traditional Artificial Neural Networks (ANNs) exchange information via continuous values and are controlled by static computational graphs. As a result, dynamic algorithms are developed to implement dynamic computing (Fig. 8a, b).
[See PDF for image]
Fig. 8
Schematic for the design strategies of AI chips in regard to computing. Comparison between a spiking neuron and artificial neuron, and b the neuromorphic and traditional computing for a dynamic computing. a, b Reproduced with permission from Ref. [37] Copyright 2024, Nature. c Comparison between conventional and neuromorphic computing. d Schematic illustration and e schematic signal flow of the neuromorphic signal integration system. f The circuit diagram and g photographic image of the hydrogen explosion risk assessment system. c–g Reproduced with permission from Ref [39]. Copyright 2024, Wiley–VCH GmbH
The energy constraints become a major restriction to deploy traditional AI methods, and therefore high demand for the energy efficiency has also been proposed for the computing. Correspondingly, much efforts have been made to come up with the schemes for energy-efficient computing. For example, better energy efficiency can be offered by analog in-memory computing (analog-AI) as it can perform matrix–vector multiplications (MVM) in parallel on ‘memory tiles’ [1]. Besides, the neuromorphic computing provides a promising way for energy-efficient machine intelligence by learning from the way by which information is processed via brain, taking advantages of artificial neurons and the SNNs on neuromorphic chips. The neuromorphic computing meets the challenges of how to learn from the high-level brain dynamic mechanisms to realize the excellent computational efficiency [37].
In addition to the requirement from dynamic computing and energy constraints, high demand has also been put forward for the weight-reconfigurable capacity of computing for some fields, like the healthcare monitoring, on which occasion it is essential to finely reconfigure the relative intensity of weight from each input. In an attempt to achieve the precise and independent modification of each input, a neuromorphic computing system that was managed to integrate two different environmental information with reconfigurable weights by making use of a simple circuitry based on electrochemical artificial synapses was designed [39]. From the perspective of dealing with various environmental information, a complex logic circuit was essential with the increased complexity of the processor, since more environmental factors need to be taken into consideration for a conventional CMOS-based processor, while a single device was managed to handle these environmental information by neuromorphic computing with an electrolyte-based multi-input synapse, which is demonstrated in Fig. 8c. Schematic illustration of the neuromorphic signal integration system is shown in Fig. 8d. To be specific, the sensors were responsible for the transform of the raw data into electrical signals, and then a weight control circuit was made use of to assign weights to the signals. The processing synapse could then integrate the signals, and finally a logical decision could be made by the artificial neuron. Correspondingly, the schematic signal flow of this system is demonstrated in Fig. 8e. Action was executed if the synapse output exceeded the level of the criteria. It was noticeable that the potentiation of the processing synapse was modulated with the different weights for signals, leading to the different final action state even for the same environmental signals. A hydrogen explosion risk assessment system was designed accordingly, with the schematic circuit diagram shown in Fig. 8f and the photographic image demonstrated in Fig. 8g. Hydrogen concentration and temperature were used as the inputs, and the signals were then updated by the weight control circuit, after which procedure they were converted into a postsynaptic current to represent the hydrogen explosion risk by taken advantages of the multi-input artificial synapse.
Design Considerations for Future Advanced and AI Chip
A sharply increased calculations have been brought about with the development of AI technology [39]. The prosperity of AI is largely empowered by a significant amount of parameters and improved computing powers [34]. As to many vision tasks, short exposure time is essential to complete the tasks with ultra-low latency, calling for extremely high computing power [27]. In addition, the computing capability and energy efficiency are critical issues which need to be balanced for high-performance computing [135].
For High-Performance Computing
To Accelerate Computing Speed
The computing speed should be further accelerated to cooperate with the improved performance of various tasks at the algorithmic level [13, 136]. Large-bandwidth and high energy efficiency computing can be achieved by optical AI for which optics and photonics are fully leveraged. A fact that cannot be ignored is that digital devices remain to be the mainstream, and therefore it is essential to convert the optical signals into digital ones for vision tasks even after optical computing by means of large-scale photodiodes and power-hungry ADCs to conduct the necessary postprocessing procedures [27] (Fig. 9a). In an effort to address this issue, an optoelectronic hybrid architecture was designed, which was managed to reduce massive ADCs, and therefore vision tasks could be accomplished in a power-efficient and high-speed manner (Fig. 9b). To be specific, the information was encoded into light fields. The features of high-resolution images were extracted by using a multi-layer diffractive optical computing module at light speed, which was optical analog computing (OAC). It was worthwhile mentioning that the demand for optoelectronic conversion could be reduced by dimension reduction all optically. The electronic analog computing (EAC) with a 32 × 32 photodiode array was then introduced to convert optical signals into analog electronic ones due to the photoelectric effect, working as a nonlinear activation. These photodiodes are either connected to the V + positive line or V- negative line according to the weights in the static random-access memory (SRAM). Based on Kirchhoff’s law, the generated photocurrents were summed up on both lines, after which process the differential voltage of the computing lines V+ and V- was calculated by the analog subtractor as the output node. It was noticeable that by means of resetting the computing lines and updating weights, this system can output another pulse with different connections of photodiodes. The output could be used either as predicted labels of classification categories or as inputs of another digital neural network. Schematic diagram of the all-analog photoelectronic chip is demonstrated as Fig. 9c.
[See PDF for image]
Fig. 9
Design considerations of AI chips for high-performance computing. The workflow of a traditional optoelectronic computing, and b all-analog photoelectronic computing. c Schematic diagram of the all-analog photoelectronic chip. a–c Reproduced with permission from Ref. [27] Copyright 2023, Nature. d Schematic diagram for the conventional optics-related AI and e the general optical systems. f Schematic illustration of FFM onsite ML. d–f Reproduced with permission from Ref. [35] Copyright 2024, Nature. Schematic illustration of g a generalized unit cell with coherent light sources, and h the proposed photonic convolutional processing system with partially coherent light. i Schematic illustration for the N-fold enhancement in regard to parallelism. g–i Reproduced with permission from Ref. [43] Copyright 2024, Nature
Another challenge met by the optical computing is that they are implemented in silico on electronic computers, and therefore both strict modeling and large amounts of training data are essential (Fig. 9d). In particularly, optical AI primarily includes the optical emulation of electronic ANNs, and the photonic architecture design is conducted on electronic computers [24, 137]. Accordingly, it proposes the challenge of correcting the experimental system error which calls for extensive work to characterize the optical propagation spatially and temporally [83, 96, 98, 138]. As to AI empowered optical design, the system must also be modeled analytically or implicitly [139, 140–141]. It consumes more time for analytical and numerical modeling with the increase of the system complexity. It is pointed out that the precise modeling of a general optical system is difficult to be achieved due to the system imperfections and the complexity of light-wave propagation. Some efforts have been made to address these issues [35]. FFM learning was developed, which mapped optical systems to parameterized onsite neural networks. It was worthwhile mentioning that by taking advantages of spatial symmetry and Lorentz reciprocity, the necessity of backward propagation in the gradient descent training was eliminated. Specifically, as for general optical systems, free-space lens optics and integrated photonics were contained, with the modulation regions marked as dark green and propagation regions demonstrated as light green, in which occasion the refractive indexes were respectively, tunable and fixed (Fig. 9e). These regions in the optical system could be mapped to weights and neuron connections, which made it possible to construct a differentiable onsite neural network between the input and output (Fig. 9f).
To Realize High-Capacity Signal Processing
In addition to the method mentioned above, parallel multi-thread processing is also one of the key approaches to achieve high-speed and high-capacity signal processing, which is a promising way to meet the increasing demand for high-capacity datasets processing [142]. Recently, a photonic convolutional processing system using partially coherent light to realize boost computing parallelism without substantially sacrificing the accuracy has been proposed [43]. It was pointed out that a variety of system architectures for photonic convolutional processing was developed with the coherent light sources being applied in all of these cases. However, the operation of the coherent nanophotonic circuits needed the precise control of numerous phase shifters so that the desired coherent interference in the circuit could be achieved. A generalized unit cell to perform multiply-and-accumulate operations is illustrated in Fig. 9g, while a system with partially coherent light for parallelized photonic computing is proposed as Fig. 9h. It was worthwhile mentioning that for the system with partially coherent light for parallelized photonic computing, the coherent light source was not necessary, leading to less rigorous feedback control and thermal-management requirements. As for the partially coherent system, a Gaussian-shaped optical carrier could be sent to all input channels and summed in a bus waveguide, while for a coherent system, different input channels should receive optical carriers at distinct wavelengths to avoid intensity fluctuation. As a result, one MVM operation for input vectors of dimension N called for only one optical band for partially coherent system, while N optical bands were required with coherent light being applied, making it possible for the N-fold enhancement in parallelism as using partially coherent light (Fig. 9i).
With Improved Energy Efficiency
General Approaches to Improve the Energy Efficiency
In addition to the enhanced computing performance, the high energy efficiency is another important requirement for the advanced chips. For example, in regard to many vision tasks, the ADCs with high throughput and high precision reduce the imaging frame rate on account of limited data bandwidth, causing remarkable energy consumption [27]. Accordingly, efforts have been made on the design of an optoelectronic hybrid architecture in an all-analog way, to reduce the massive ADCs for the accomplishment of power-efficient vision tasks. Furthermore, neuromorphic computing tends to be a promising approach for energy-efficient machine intelligence by simulating the neurons of the human brain and using spiking neural networks [37]. It is proposed that the human brain is managed to allocate its resources dynamically according to the required demand [143, 144]. As a result, greater attention is paid to salient stimuli, which is proved via the heightened spiking activity of the brain regions or neurons associated with the stimulus. Additionally, endeavors have also been made to design the neuromorphic chip with no needs for the global or local clock signal, which efficiently avoids the redundant power consumed by the clock empty flips [37]. Furthermore, it is worthwhile mentioning that CIM is important in the field of AI, for which both the memory and processing functions can be integrated within the same module, leading to the enhanced efficiency. Memristors, which are featured with their striking similarity with biological counterparts in the aspect of device dynamics, play an important role in this field [145].
Analog In-memory Computing
The vast amounts of data transferred between memory and processor lead to the unessential energy consumption. Both the time and the energy are expected to be saved by the Analog-AI hardware with the function to apply arrays of non-volatile memory (NVM) to execute the MAC operations. One case in point was that an analog-AI chip was designed to recognize and transcript speech energy efficiently. It was noticeable that not only the fully end-to-end SWeq accuracy for a small keyword-spotting network but also the near-SWeq accuracy on the much larger MLPerf RNNT was verified [1]. Particularly, the tiny-model task of keyword-spotting network (KWS) on the Google speech-commands dataset was targeted. The MLPerf version of RNNT, which was a large data center network, was implemented on Librispeech. It was worthwhile mentioning that the model contained 45 million weights, which was implemented by more than 140 million PCM devices across five chips. This system demonstrated excellent power performance. To be specific, Chip 4 showed the best power performance of 12.40 TOPS/W, which was attributed to the most on-chip weights (Fig. 10a). It was proposed that there existed a correlation between the reported TOPS/W and the number of weights that were encoded on-chip. Another 25% improvement in TOPS/W could be achieved for chip 4 caused by the reducing the maximum input duration without large WER degradation, which is illustrated in Fig. 10b. Energy efficiency at different levels is illustrated in Fig. 10c, which reflected how the costs of data communication, incomplete tile usage, as well as the inefficient digital computing resulted to the fact that the large peak TOPS/W of the analog tile itself was down to the final sustained value of 6.94 TOPS/W. The full processing time of the overall system was estimated (Fig. 10d). It was noticeable that the average processing time for each sample was more than 104 times faster than the actual speech time, leading to a real-time factor of only 8 × 10–5. Number of operations performed on-chip versus off-chip in the RNNT experiment is shown in Fig. 10e. In contrast to the MLPerf submissions, a 14-fold improvement was managed to be realized by this system in regard to the samples per second per watt and TOPS/W (Fig. 10f).
[See PDF for image]
Fig. 10
Design considerations of AI chips with improved energy efficiency. a Measured power and TOPS/W corresponding to each chip. b An improvement in TOPS/W caused by the reducing the maximum input duration. c Energy efficiency at different levels. d Processing time and actual speech time. e Number of operations performed on-chip versus off-chip in the RNNT experiment. f Samples per second per watt and TOPS/W compared with MLPerf submissions. a–f Reproduced with permission from Ref. [1] Copyright 2023, Nature. g Power composition of AI systems. The case of h high resting power and i low resting power. j Physical display of Speck. k Illustration for the sensing-computing end-to-end SoC, and l its application scenarios. m Fully asynchronous architecture of Speck. The design of n SNN core, and o the asynchronous event-driven convolution. g–o Reproduced with permission from Ref. [37] Copyright 2024, Nature
Dynamic Computing with Asynchronous Chip
To reach the goal of energy efficiency, the composition of different power consumption should be taken into considerations. The power that is required to operate an AI system is usually composed of two aspects, resting power which is determined by the hardware design, and running power which relies on the model as the hardware is fixed [37] (Fig. 10g). It is proposed that for the great majority of hardware a significant amount of energy is consumed even when no computing is being done, leading to very high ratio of the resting power to the overall power. Consequently, it is difficult to reduce the overall power only by reducing the running power (Fig. 10h, i). To be specific, the chip architecture (asynchronous/synchronous) can leave an impact on the power consumption, and it has been proposed that the asynchronous architecture, for which the change of the circuit state is only caused by the change of the external input, is featured with the advantage of low power consumption compared with synchronous circuits. The event-driven mechanism is an approach for asynchronous chips to coordinate the work of each module. When taking the design strategies for sensing-computing chip into considerations, event-driven chips can be made use of, since the sensor can only wake up the chip when the environmental changes (such as temperature changes or motion triggers) are detected to complete data collection and transmission, leading to the improved energy efficiency and low latency.
In contrast to the most common neuromorphic hardware design which begins with the bottom of the compute stack, elaborated design can be conducted for the customization of the neuromorphic hardware which is to be applied at the edge for the specific purposes with low power consumption taken into consideration. One case in point was that a sensing-computing neuromorphic chip, Speck, was designed with a 128 × 128-pixel DVS integrated onto an asynchronous spike-based AI chip, which is shown in Fig. 10j. Speck was a sensing-computing end-to-end SoC with the always-on hardware applicable to various scenarios, such as Internet of things, smart travel, smart home, intelligent robotic, and so on (Fig. 10k, l). It was worthwhile mentioning that its processing pipeline was built with asynchronous digital logic, which made it possible for the chip to realize always-on low resting power consumption and optimum latency. To address the issue that the implementation of asynchronous circuits is complicated, the overall sensing to computing strategy was optimized. There was a central event router which is able to be configured to route events from any to any of the 9-SNN cores, and every core was managed to work independently and asynchronously, which was illustrated in Fig. 10m. As a result, the design effort could be limited to a single SNN core (Fig. 10n). Additionally, the asynchronous event-driven convolution was included as one of the core designs for the improvement of the computational efficiency as well (Fig. 10o).
Perspectives
Overall, the recent development, including but not limited to the co-design strategies for the software and hardware, the realization of enhanced overall performance, and the potential for broader application have been reviewed in depth. Great progress has been made in the field of advanced chips due to the high challenges brought by AI, which has revolutionized various aspects, ranging from information industry to material science. To execute the complex algorithmic programs and advanced tasks proposed by these new challenges, the elaborate design of chips covers every aspect, including materials, algorithm, architectures, processing technology, integrating method, and so on. Progress has been made on developing novel materials and models, as well as overcoming the shortcomings of the existing conventional materials and architectures for chips. New fabrication processes for both the production and the package of the devices have been developed, aiming to induce the cost and develop complex chips. The advanced chips are qualified to be applied for video recognition tasks, speech recognition and transcription, visual memory and many other fields, offering fast and efficient information processing functions (Fig. 11).
[See PDF for image]
Fig. 11
Outlook of the advanced chips
Summary for the state-of-the-art advanced and AI chips is illustrated in Table 1 with the performance, scales, other properties, and applications included. The quantitative indicators of the chips are critical to the systems. To be specific, energy efficiency refers to the effective amount of work completed by a chip with per unit of energy consumed when implementing a task, which makes sense for the environmental sustainability. The computing speed of a chip is the core indicator for measuring its data processing capability, which is important for shortening the task processing time and supporting complex tasks. For AI training which needs to handle large amounts of parameters, chips with high computing speed are managed to shorten the training cycle, accelerating technological iteration. The latency of a chip refers to the time interval from the triggering of an input to the generation of an effective output, which is a key indicator for measuring the response speed of a chip. While ensuring high energy efficiency and computing speed, reducing latency has become another challenge in chip design, which is especially essential for some real-time tasks. Besides, the abilities of integrating more transistors, realizing a larger area, or expanding to more application fields are also imperative for these systems. For example, the scale expansion of chips is in relevant to the change from achieving a single function to multi-functions or from small-scale to large-scale applications, which can leave impacts on a series of factors, like cost, power consumption, design complexity, and so on.
Table 1. Summary for the performance of the state-of-the-art advanced chips
Type | Energy efficiency (TOPS/W) | Computing speed (TOPS) | Latency | Scale | Accuracy | Other key features | Application | References | |
|---|---|---|---|---|---|---|---|---|---|
Neuromorphic chip | Analog-AI chip | 12.4 | – | 2.4 μs for each audio frame | To combines 35 million phase-change memory devices across 34 tiles and | With fully end-to-end SWeq accuracy for a small keyword-spotting network | To show a 14-fold improvement compared with traditional ones, and to demonstrated a WER of 9.258% | Speech recognition and transcription | [1] |
Integration of trainable dendritic neurons and high-density RRAM chip | – | – | 380 ns | – | ~ 90% | To realize 516 × and 1.3 × 105 × improvements on the LAE (LAE = Latency−1 × Area−1 × Energy−1) FoM when compared to digital and analog CMOS activation circuits | For CIM-based neuromorphic computing | [34] | |
Neuromorphic hardware | – | – | – | A 3 × 7 memristor array | An accuracy of 88.9% for handwriting digit recognition | To realize complex biological associative learning behaviors | Visual memory application | [36] | |
Sensing-computing neuromorphic chip | – | – | Less than 0.1 ms | To be an efficient medium-scale neuromorphic sensing-computing edge hardware | 92% | With the low processor resting power of 0.42 mW and real-time power as low as 0.70 mW | As edge computing devices for smart home application scenarios | [37] | |
Neuromorphic computing systems | – | – | – | – | – | To be weight-reconfigurable | For hydrogen explosion risk assessment | [39] | |
Neuromorphic optoelectronic computing system | 1.58 | 240.1 | – | – | With a blind-testing accuracy of 97.6% on 10,000 digit images | To be reconfigurable | For high-speed image and video recognition | [83] | |
Photonic chip | Photonic computing | 1 | 0.108 | With a 3 × 3 photonic tensor core, using phase-change-material photonic memories | 92.2% accuracy (92.7% theoretically) | To boost computing parallelism while maintaining the accuracy | To classify the gaits of ten patients with Parkinson’s disease with | [43] | |
A silicon photonic circuit | – | – | – | Multimode optical transmission between separate silicon chips | – | With a 1.12-Tb/s natively errorfree data transmission | Silicon photonic transmitters | [59] | |
A trainable diffractive optical neural network | 7.28 | 217.6 | 30.2 ps | With 1 × 16 neurons input, 5 × 16 neurons hidden and 1 × 4 neurons output layers | 85.7% accuracy for multimodal test sets | With high computing density (447.7 TOPS/mm2) | To accomplish four-class classification in different modalities | [41] | |
Photonic chiplet | 160 | – | 3.79 ms | With 4256 total neurons and a net scale of 13.96 million | Testing at 91.89% accuracy in the 1623-category Omniglot dataset | To experimentally achieve on-chip 1000-category-level classification and high-fidelity AI-generated content with up to two orders of magnitude of improvement in efficiency | For large-scale photonic computing and artificial general intelligence (AGI) | [38] | |
Photonic convolutional accelerator | – | 11.3 | < 200 ps | – | With an accuracy of 88% for recognition of handwritten digit images | For generating convolutions of images with 250,000 pixels | For real-time video recognition | [22] | |
An integrated photonic tensor core | 0.4 | – | – | With the matrix size being easily be scaled up to 40 × 40 | With an accuracy of 95.3% | With computing densities of more than 400 TOPS per mm2 | For parallel convolutional processing | [24] | |
An on-chip photonic DNN | – | 0.27 | 570 ps | To be scaled to a classifier with a larger number of pixels | With an accuracy of 93.8% for two-class classification of handwritten letters | With a classification time of under 570 ps | For image classification | [25] | |
Photonic processing unit | 0.2 | – | – | – | With the accuracy 96.6% of recognition | With a preeminent photonic-core compute density of over 1 TOPS mm−2 | For image reconstruction, video action recognition, and autonomous driving | [149] | |
All-analog photoelectronic chip | 7.48 × 104 | 4.6 × 103 | 72 ns for each frame | With two 400 × 400 SiO2 OAC layers and a 1,024 × 3 EAC layer | 92.6% for time-lapse video recognition task | With superior system robustness in lowlight conditions (0.14 fJ μm-2 each frame) | Time-lapse video recognition task | [27] | |
All-optical processing | 5.40 × 106 | – | – | – | 94.5% | To facilitate orders-of-magnitude-faster learning processes | To design non-conventional imaging modalities | [35] | |
Chip integrated meta surfaces | – | – | – | – | – | With the potential to be compatible with on-chip optical systems and to independently encode multiple optical parameters | For multidimensional encryption | [150] | |
Quantum chip | Quantum simulator | – | – | – | – | – | To realize the stable trapping of 512 ions in a 2D Wigner crystal | To run noisy intermediate-scale quantum algorithms | [151] |
Silicon-based chip | Biomimetic olfactory chips | – | – | – | With 10,000 individually addressable sensors per chip | With a prediction accuracy of up to 99.04% | With distinguishability of mixed gases and 24 distinct odors | To be integrated with vision sensors on a robot dog | [152] |
Si‑based optical memristive crossbar array | – | – | – | With a 5 × 5 optoelectronic synapse array | With a classification accuracy of 98.02% | To enables an ultralow power (2.8 × 10–13 J) fine-tuning process | For patient-specific issues | [153] |
Significant improvements of the advanced chips have happened and accompanied by the discovery of novel modes, the improvement of the package techniques, the accelerating of the efficiency, as well as the enhancement of computing power. This review offers a keen insight into the design strategies for the advanced and AI chips, with some perspectives for the chips applied in the future proposed as follows:
Endeavors have been made to equip the AI chips with more intelligent performance learning from biology. a) Efforts have been made focused on mapping the biological behavior to the electrical behavior in devices. It is expected for the systems to realize more complex biological performances. The associative learning behavior, which is commonly found in the cranial nerves of insects and is featured with the acquisition, extinction, restoration, and generalization, has been simulated by ZnO QDs‑based optoelectronic memristors, which provide novel scheme for the field of machine self-learning. It is desirable to develop chips learning from more advanced behaviors of the creatures. b) Extensive investigations have been carried out on neuromorphic devices based on the human brain, which is a potential candidate for the next-generation computer architecture. The method of how to learn from the high-level brain dynamic mechanisms to equip neuromorphic computing with more energy advantages is always in high demand. Endeavors have been made from both the software and the hardware aspects to address this issue. Moreover, chips used for dealing with image information are expected to be managed to handle the dynamic, diverse, and unpredictable scenes in real application scenarios, like autonomous driving. It is desirable to design the chips that are efficient in various fields to percept and address even the difficult issues existing in the real world. In particular, the dynamic computing, which is a critical feature of human brain, has been simulated by this system. In the future, more advanced strategies can be adopted for the realization of high-level brain dynamic mechanisms to fully achieve the brain advantages in many aspects.
Efforts can be made to make full use of the novel modes that extend the information processing from electrons, to photons, quantum, and biological elements, by taking advantages of the strengths and overcoming their weaknesses. a) Photonics-based systems are managed to provide high-speed computing units, and therefore efforts have been made focused on the algorithms design to exploit their unique advantages. For instance, approaches have been developed to realize the high throughput and precision by the successful application of cellular automata [146]. Ultrafast silicon photonic reservoir computing engine has been developed, which paves the way for high-speed photonic computing [147]. For photonic computing, to truly become a leading technology in the field of AI, a series of key challenges still need to be meet which mainly lies in the aspect of integration, dynamic reconfiguration capability, standardization, and cost issues. In particular, the compatibility of silicon-based photonic chips with the existing CMOS processes needs to be optimized, and the capacity of photonic chips to dynamically adapt to different tasks is expected, since the hardware of photonic chips is relatively fixed. b) Low power consumption and real-time requirements have promoted the application of CIM in many fields, like intelligent sensors and IoT. For example, some progress has been made for cryogenic in-Memory Computing recently [148]. In the future, more endeavors can be made to enhance the computing abilities of the memory by making use of new materials, such as two-dimensional materials and oxide semiconductors, and optimizing the circuit architectures. Besides, 3D packaging can also be applied for CIM to obtain the systems with excellent overall performance. c) Additionally, cellular computing has emerged focused on the analysis and modeling of real cellular processes to implement computing with the aspects of information processing and adaptation. Attempt has been made on the reprogrammable circuits that are managed to increase circuit flexibility and realize the scalability of complex cell-based computing devices. The feasibility of proposing several circuits by making use of only a small set of engineered cells that can be externally reprogrammed to implement simple logics in response to the specific inputs has successfully been proved. In the future, more efforts can be made focused on taking advantages of biological circuits to implement logics and meet numerous biological challenges.
The advanced chips that are qualified for real-world applications are always in high demand. Multi-input signals are usually needed to be processed properly by the advanced processors suitable for diverse external information in the open-world applications. The integrated signals from different input are needed to be handled accurately and timely. The version of GPT-4 has successfully accomplished the processing of multimodal data, like images and audio. A neuromorphic computing system applied for the risk assessment has been developed with several kinds of factors taking into considerations. In the future work, more work focused in the development of algorithms and hardware tailored for open-world applications can be conducted. The overall performances are expected to be enhanced for the chips to meet the high requirement proposed by the real-world applications.
The reconfigurable behavior is an important aim for computing hardware. For the chips with reconfigurable capacities, their function can be changed even after the accomplishment of the fabrication, and therefore multi-modal data and different tasks can be dealt with, making the high flexibility in adapting to different tasks feasible. It is especially critical to the chips used for some specific purposes like healthcare monitoring, for which it is imperative to finely reconfigure the relative intensity of weight updates from each input. Explorations have been made to equip different types of chips with strong reconfigurability. The reconfigurability and multimodal capability have been achieved for a TDONN chip by taking advantages of on-chip diffractive optics with massive tunable elements. The reconfigurability has also been available for the diffractive-interference hybrid photonic chiplet, which is acted as the fundamental building block for a diversity of advanced ML tasks, with 1000-category classification and content generation included. An all-analog chip combining electronic and light computing (ACCEL) is also equipped with the reconfigurability for different tasks without changing the OAC module. The integration of two different information with reconfigurable weights has been accomplished by a neuromorphic computing system. In the future, the high degree of adaptability to different assignments empowered by reconfiguration is expected to be accessible for more chiplet when it is necessary.
More explorations on large-scale integrations are expected to be made for chips. With the increasing of information, chips are required to be integrated to an ever-growing level to process the booming signals. The large-scale integrations of various chips are indispensable to getting rid of the shortcomings of each chip. For inorganic counterparts, like CMOS chips, an integration level in ultra-large-scale has been realized, while poor mechanical compatibility with organisms exists. It is ideal for the devices to overcome inherent shortcomings and accomplish the large-scale integration. Moreover, the integrations are closely related to the technologies. A diversity of techniques like photolithography, screening, printing, and shadow-mask evaporation has been developed. In the future, the continuous progress of the techniques is expected to be made in order to miniaturize these devices.
The application of sustainable materials in AI chips is one of the most important trends in this field with the aim of reducing the environmental impact and improving energy efficiency. Efforts can be made from various aspects, such as selecting degradable substrates, developing environmentally friendly manufacturing process, preparing environmentally friendly heat dissipation materials, and so on. Some bio-elastomers with active-controllable degradation rates have been designed, which can be applied as the bio-electronic substrates and encapsulation layers. In the future, more endeavors can be made to make a balance between meeting the high-performance requirements of AI chips and controlling the costs when using sustainable materials.
Acknowledgements
This work was supported by the Hong Kong Polytechnic University (1-WZ1Y, 1-W34U, 4-YWER).
Author Contributions
Ying Cao helped in methodology, investigation, writing—original draft. Yuejiao Chen and Xi Fan contributed to methodology.. Hong Fu helped in resources, methodology, writing—review & editing. Bingang Xu was involved in conceptualization, funding acquisition, methodology, supervision, writing—review & editing.
Declarations
Conflict of interest
The authors declare no interest conflict. They have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Ambrogio, S; Narayanan, P; Okazaki, A; Fasoli, A; Mackin, C et al. An analog-AI chip for energy-efficient speech recognition and transcription. Nature; 2023; 620,
2. Gao, L; Lin, J; Wang, L et al. Machine learning-assisted design of advanced polymeric materials. Acc. Mater. Res.; 2024; 5,
3. Benavides-Hernández, J; Dumeignil, F. From characterization to discovery: artificial intelligence, machine learning and high-throughput experiments for heterogeneous catalyst design. ACS Catal.; 2024; 14,
4. Fu, Y; Howard, A; Zeng, C; Chen, Y; Gao, P et al. Physics-guided continual learning for predicting emerging aqueous organic redox flow battery material performance. ACS Energy Lett.; 2024; 9,
5. Akbari, P; Zamani, M; Mostafaei, A. Machine learning prediction of mechanical properties in metal additive manufacturing. Addit. Manuf.; 2024; 91, [DOI: https://dx.doi.org/10.1016/j.addma.2024.104320] 104320.
6. Liu, Q; Chen, W; Yakubov, V; Kruzic, JJ; Wang, CH et al. Interpretable machine learning approach for exploring process-structure-property relationships in metal additive manufacturing. Addit. Manuf.; 2024; 85, [DOI: https://dx.doi.org/10.1016/j.addma.2024.104187] 104187.
7. Li, J; Zhou, M; Wu, H-H; Wang, L; Zhang, J et al. Machine learning-assisted property prediction of solid-state electrolyte. Adv. Energy Mater.; 2024; 14,
8. Zhou, X; Xu, C; Guo, X; Apostol, P; Vlad, A et al. Computational and machine learning-assisted discovery and experimental validation of conjugated sulfonamide cathodes for lithium-ion batteries. Adv. Energy Mater.; 2024; 15, 2401658. [DOI: https://dx.doi.org/10.1002/aenm.202401658]
9. Alibagheri, E; Ranjbar, A; Khazaei, M; Kühne, TD; Vaez Allaei, SM. Remarkable optoelectronic characteristics of synthesizable square-octagon haeckelite structures: machine learning materials discovery. Adv. Funct. Mater.; 2024; 34,
10. Jing, T; Xu, B; Yang, Y; Li, M; Gao, Y. Organogel electrode enables highly transparent and stretchable triboelectric nanogenerators of high power density for robust and reliable energy harvesting. Nano Energy; 2020; 78, [DOI: https://dx.doi.org/10.1016/j.nanoen.2020.105373] 105373.
11. Liu, Y; Xie, B; Hu, Q; Zhao, R; Zheng, Q et al. Regulating the Helmholtz plane by trace polarity additive for long-life Zn ion batteries. Energy Storage Mater.; 2024; 66, [DOI: https://dx.doi.org/10.1016/j.ensm.2024.103202] 103202.
12. Wen, J; Xu, B; Zhou, J. Toward flexible and wearable embroidered supercapacitors from cobalt phosphides-decorated conductive fibers. Nano-Micro Lett.; 2019; 11,
13. LeCun, Y; Bengio, Y; Hinton, G. Deep learning. Nature; 2015; 521,
14. Dahl, GE; Yu, D; Deng, L; Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process.; 2012; 20,
15. Hsu, W-N; Bolte, B; Tsai, YH; Lakhotia, K; Salakhutdinov, R et al. HuBERT: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process.; 2021; 29, pp. 3451-3460. [DOI: https://dx.doi.org/10.1109/TASLP.2021.3122291]
16. Wu, J; Guo, Y; Deng, C; Zhang, A; Qiao, H et al. An integrated imaging sensor for aberration-corrected 3D photography. Nature; 2022; 612,
17. Suleiman, A; Zhang, Z; Carlone, L; Karaman, S; Sze, V. Navion: a 2-mW fully integrated real-time visual-inertial odometry accelerator for autonomous navigation of nano drones. IEEE J. Solid State Circuits; 2019; 54,
18. Bai, J; Lian, S; Liu, Z; Wang, K; Liu, D. Smart guiding glasses for visually impaired people in indoor environment. IEEE Trans. Consum. Electron.; 2017; 63,
19. Starner, T. Project glass: an extension of the self. IEEE Pervasive Comput.; 2013; 12,
20. Si, J; Zhang, P; Zhao, C; Lin, D; Xu, L et al. A carbon-nanotube-based tensor processing unit. Nat. Electron.; 2024; 7,
21. Lin, X; Rivenson, Y; Yardimci, NT; Veli, M; Luo, Y et al. All-optical machine learning using diffractive deep neural networks. Science; 2018; 361,
22. Xu, X; Tan, M; Corcoran, B; Wu, J; Boes, A et al. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature; 2021; 589,
23. Wetzstein, G; Ozcan, A; Gigan, S; Fan, S; Englund, D et al. Inference in artificial intelligence with deep optics and photonics. Nature; 2020; 588,
24. Feldmann, J; Youngblood, N; Karpov, M; Gehring, H; Li, X et al. Parallel convolutional processing using an integrated photonic tensor core. Nature; 2021; 589,
25. Ashtiani, F; Geers, AJ; Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature; 2022; 606,
26. Zangeneh-Nejad, F; Sounas, DL; Alù, A; Fleury, R. Analogue computing with metamaterials. Nat. Rev. Mater.; 2021; 6,
27. Chen, Y; Nazhamaiti, M; Xu, H; Meng, Y; Zhou, T et al. All-analog photoelectronic chip for high-speed vision tasks. Nature; 2023; 623,
28. Genty, G; Salmela, L; Dudley, JM; Brunner, D; Kokhanovskiy, A et al. Machine learning and applications in ultrafast photonics. Nat. Photonics; 2021; 15,
29. Molesky, S; Lin, Z; Piggott, AY; Jin, W; Vucković, J et al. Inverse design in nanophotonics. Nat. Photonics; 2018; 12,
30. Palmieri, AM; Kovlakov, E; Bianchi, F; Yudin, D; Straupe, S et al. Experimental neural network enhanced quantum tomography. NPJ Quantum Inf.; 2020; 6, 20. [DOI: https://dx.doi.org/10.1038/s41534-020-0248-6]
31. Peurifoy, J; Shen, Y; Jing, L; Yang, Y; Cano-Renteria, F et al. Nanophotonic particle simulation and inverse design using artificial neural networks. Sci. Adv.; 2018; 4,
32. Hughes, TW; Minkov, M; Williamson, IAD; Fan, S. Adjoint method and inverse design for nonlinear nanophotonic devices. ACS Photonics; 2018; 5,
33. Piggott, AY; Lu, J; Lagoudakis, KG; Petykiewicz, J; Babinec, TM et al. Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer. Nat. Photonics; 2015; 9,
34. Yang, Z; Yue, W; Liu, C; Tao, Y; Tiw, PJ et al. Fully hardware memristive neuromorphic computing enabled by the integration of trainable dendritic neurons and high-density RRAM chip. Adv. Funct. Mater.; 2024; 34,
35. Xue, Z; Zhou, T; Xu, Z; Yu, S; Dai, Q et al. Fully forward mode training for optical neural networks. Nature; 2024; 632,
36. Wang, W; Wang, Y; Yin, F; Niu, H; Shin, Y-K et al. Tailoring classical conditioning behavior in TiO2 nanowires: ZnO QDs-based optoelectronic memristors for neuromorphic hardware. Nano-Micro Lett.; 2024; 16,
37. Yao, M; Richter, O; Zhao, G; Qiao, N; Xing, Y et al. Spike-based dynamic computing with asynchronous sensing-computing neuromorphic chip. Nat. Commun.; 2024; 15,
38. Xu, Z; Zhou, T; Ma, M; Deng, C; Dai, Q et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science; 2024; 384,
39. Choi, YJ; Roe, DG; Li, Z; Choi, YY; Lim, B et al. Weight-reconfigurable neuromorphic computing systems for analog signal integration. Adv. Funct. Mater.; 2024; 34,
40. Lu, D; Chen, Y; Lu, Z; Ma, L; Tao, Q et al. Monolithic three-dimensional tier-by-tier integration via van der Waals lamination. Nature; 2024; 630,
41. Cheng, J; Huang, C; Zhang, J; Wu, B; Zhang, W et al. Multimodal deep learning using on-chip diffractive optics with in situ training capability. Nat. Commun.; 2024; 15,
42. Bu, Y; Xu, T; Geng, S; Fan, S; Li, Q et al. Ferroelectrics-electret synergetic organic artificial synapses with single-polarity driven dynamic reconfigurable modulation. Adv. Funct. Mater.; 2023; 33,
43. Dong, B; Brückerhoff-Plückelmann, F; Meyer, L; Dijkstra, J; Bente, I et al. Partial coherence enhances parallelized photonic computing. Nature; 2024; 632,
44. Indiveri, G; Douglas, R. Neuromorphic vision sensors. Science; 2000; 288,
45. Lichtsteiner, P; Posch, C; Delbruck, T. A 128× 128 120 dB 15 $\mu$s latency asynchronous temporal contrast vision sensor. IEEE J. Solid State Circuits; 2008; 43,
46. Gallego, G; Delbruck, T; Orchard, G; Bartolozzi, C; Taba, B et al. Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell.; 2022; 44,
47. van de Burgt, Y; Melianas, A; Keene, ST; Malliaras, G; Salleo, A. Organic electronics for neuromorphic computing. Nat. Electron.; 2018; 1,
48. Beck, ME; Shylendra, A; Sangwan, VK; Guo, S; Gaviria Rojas, WA et al. Spiking neurons from tunable Gaussian heterojunction transistors. Nat. Commun.; 2020; 11,
49. Go, G-T; Lee, Y; Seo, D-G; Lee, T-W. Organic neuroelectronics: from neural interfaces to neuroprosthetics. Adv. Mater.; 2022; 34,
50. Qian, C; Choi, Y; Kim, S; Kim, S; Choi, YJ et al. Risk-perceptional and feedback-controlled response system based on NO2-detecting artificial sensory synapse. Adv. Funct. Mater.; 2022; 32,
51. Roy, K; Jaiswal, A; Panda, P. Towards spike-based machine intelligence with neuromorphic computing. Nature; 2019; 575,
52. Mehonic, A; Kenyon, AJ. Brain-inspired computing needs a master plan. Nature; 2022; 604,
53. Maunsell, JHR. Neuronal mechanisms of visual attention. Annu. Rev. Vis. Sci.; 2015; 1, pp. 373-391. [DOI: https://dx.doi.org/10.1146/annurev-vision-082114-035431]
54. Liu, Q; Gao, S; Xu, L; Yue, W; Zhang, C et al. Nanostructured perovskites for nonvolatile memory devices. Chem. Soc. Rev.; 2022; 51,
55. Chen, M; Sun, M; Bao, H; Hu, Y; Bao, B. Flux–charge analysis of two-memristor-based chua’s circuit: dimensionality decreasing model for detecting extreme multistability. IEEE Trans. Ind. Electron.; 2019; 67,
56. Fei, N; Lu, Z; Gao, Y; Yang, G; Huo, Y et al. Towards artificial general intelligence via a multimodal foundation model. Nat. Commun.; 2022; 13,
57. Zhou, H; Dong, J; Cheng, J; Dong, W; Huang, C et al. Photonic matrix multiplication lights up photonic accelerator and beyond. Light Sci. Appl.; 2022; 11,
58. Shastri, BJ; Tait, AN; Ferreira de Lima, T; Pernice, WHP; Bhaskaran, H et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics; 2021; 15,
59. Yang, KY; Shirpurkar, C; White, AD; Zang, J; Chang, L et al. Multi-dimensional data transmission using inverse-designed silicon photonics and microcombs. Nat. Commun.; 2022; 13,
60. Wang, Z; Joshi, S; Savel’ev, S; Song, W; Midya, R et al. Fully memristive neural networks for pattern classification with unsupervised learning. Nat. Electron.; 2018; 1,
61. Peterson, E; Lavin, A. Physical computing for materials acceleration platforms. Matter; 2022; 5,
62. Hippalgaonkar, K; Li, Q; Wang, X; Fisher, JW, III; Kirkpatrick, J et al. Knowledge-integrated machine learning for materials: lessons from gameplaying and robotics. Nat. Rev. Mater.; 2023; 8,
63. Huang, H; Zheng, O; Wang, D; Yin, J; Wang, Z et al. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int. J. Oral Sci.; 2023; 15,
64. Meskó, B. The impact of multimodal large language models on health care’s future. J. Med. Internet Res.; 2023; 25, [DOI: https://dx.doi.org/10.2196/52865] e52865.
65. Moor, M; Banerjee, O; Abad, ZSH; Krumholz, HM; Leskovec, J et al. Foundation models for generalist medical artificial intelligence. Nature; 2023; 616,
66. Wang, X; Chen, G; Qian, G; Gao, P; Wei, X-Y et al. Large-scale multi-modal pre-trained models: a comprehensive survey. Mach. Intell. Res.; 2023; 20,
67. Zidan, MA; Strachan, JP; Lu, WD. The future of electronics based on memristive systems. Nat. Electron.; 2018; 1,
68. Yan, B; Yang, Y; Huang, R. Memristive dynamics enabled neuromorphic computing systems. Sci. China Inf. Sci.; 2023; 66,
69. Sebastian, A; Le Gallo, M; Khaddam-Aljameh, R; Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol.; 2020; 15,
70. Ambrogio, S; Narayanan, P; Tsai, H; Shelby, RM; Boybat, I et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature; 2018; 558,
71. Khaddam-Aljameh, R; Stanisavljevic, M; Fornt Mas, J; Karunaratne, G; Brandli, M et al. HERMES-core: a 1.59-TOPS/mm2 PCM on 14-nm CMOS in-memory compute core using 300-ps/LSB linearized CCO-based ADCs. IEEE J. Solid State Circuits; 2022; 57,
72. Yao, P; Wu, H; Gao, B; Tang, J; Zhang, Q et al. Fully hardware-implemented memristor convolutional neural network. Nature; 2020; 577,
73. Wan, W; Kubendran, R; Schaefer, C; Eryilmaz, SB; Zhang, W et al. A compute-in-memory chip based on resistive random-access memory. Nature; 2022; 608,
74. Jain, S; Tsai, H; Chen, C-T; Muralidhar, R; Boybat, I et al. A heterogeneous and programmable compute-In-memory accelerator architecture for analog-AI using dense 2-D mesh. IEEE Trans. VLSI Syst.; 2023; 31,
75. Krestinskaya, O; Salama, KN; James, AP. Learning in memristive neural network architectures using analog backpropagation circuits. IEEE Trans. Circuits Syst. I Regul. Pap.; 2019; 66,
76. Giordano, M; Cristiano, G; Ishibashi, K; Ambrogio, S; Tsai, H et al. Analog-to-digital conversion with reconfigurable function mapping for neural networks activation function acceleration. IEEE J. Emerg. Sel. Top. Circuits Syst.; 2019; 9,
77. Hochstetter, J; Zhu, R; Loeffler, A; Diaz-Alvarez, A; Nakayama, T et al. Avalanches and edge-of-chaos learning in neuromorphic nanowire networks. Nat. Commun.; 2021; 12,
78. Schultz, W; Dickinson, A. Neuronal coding of prediction errors. Annu. Rev. Neurosci.; 2000; 23, pp. 473-500. [DOI: https://dx.doi.org/10.1146/annurev.neuro.23.1.473]
79. Poldrack, RA; Clark, J; Paré-Blagoev, EJ; Shohamy, D; Creso Moyano, J et al. Interactive memory systems in the human brain. Nature; 2001; 414,
80. Wang, Z; Li, C; Song, W; Rao, M; Belkin, D et al. Reinforcement learning with analogue memristor arrays. Nat. Electron.; 2019; 2,
81. Baek, JH; Kwak, KJ; Kim, SJ; Kim, J; Kim, JY et al. Two-terminal lithium-mediated artificial synapses with enhanced weight modulation for feasible hardware neural networks. Nano-Micro Lett.; 2023; 15,
82. He, K; Liu, Y; Yu, J; Guo, X; Wang, M et al. Artificial neural pathway based on a memristor synapse for optically mediated motion learning. ACS Nano; 2022; 16,
83. Zhou, T; Lin, X; Wu, J; Chen, Y; Xie, H et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics; 2021; 15,
84. Fu, T; Zang, Y; Huang, Y; Du, Z; Huang, H et al. Photonic machine learning with on-chip diffractive optics. Nat. Commun.; 2023; 14,
85. Goi, E; Chen, X; Zhang, Q; Cumming, BP; Schoenhardt, S et al. Nanoprinted high-neuron-density optical linear perceptrons performing near-infrared inference on a CMOS chip. Light Sci. Appl.; 2021; 10,
86. Zhang, H; Gu, M; Jiang, XD; Thompson, J; Cai, H et al. An optical neural chip for implementing complex-valued neural network. Nat. Commun.; 2021; 12, 457. [DOI: https://dx.doi.org/10.1038/s41467-020-20719-7]
87. Wang, T; Ma, S-Y; Wright, LG; Onodera, T; Richard, BC et al. An optical neural network using less than 1 photon per multiplication. Nat. Commun.; 2022; 13,
88. Wang, Z; Hu, G; Wang, X; Ding, X; Zhang, K et al. Single-layer spatial analog meta-processor for imaging processing. Nat. Commun.; 2022; 13,
89. Li, J; Mengu, D; Yardimci, NT; Luo, Y; Li, X et al. Spectrally encoded single-pixel machine vision using diffractive networks. Sci. Adv.; 2021; 7,
90. Rahman, MSS; Li, J; Mengu, D; Rivenson, Y; Ozcan, A. Ensemble learning of diffractive optical networks. Light Sci. Appl.; 2021; 10,
91. Feldmann, J; Youngblood, N; Wright, CD; Bhaskaran, H; Pernice, WP. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature; 2019; 569,
92. Shi, W; Huang, Z; Huang, H; Hu, C; Chen, M et al. LOEN: lensless opto-electronic neural network empowered machine vision. Light Sci. Appl.; 2022; 11,
93. Chang, J; Sitzmann, V; Dun, X; Heidrich, W; Wetzstein, G. Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Sci. Rep.; 2018; 8,
94. Bueno, J; Maktoobi, S; Froehly, L; Fischer, I; Jacquot, M et al. Reinforcement learning in a large-scale photonic recurrent neural network. Optica; 2018; 5,
95. Silva, A; Monticone, F; Castaldi, G; Galdi, V; Alù, A et al. Performing mathematical operations with metamaterials. Science; 2014; 343,
96. Shen, Y; Harris, NC; Skirlo, S; Prabhu, M; Baehr-Jones, T et al. Deep learning with coherent nanophotonic circuits. Nat. Photonics; 2017; 11,
97. Yuan, X; Wang, Y; Xu, Z; Zhou, T; Fang, L. Training large-scale optoelectronic neural networks with dual-neuron optical-artificial learning. Nat. Commun.; 2023; 14,
98. Zhou, T; Wu, W; Zhang, J; Yu, S; Fang, L. Ultrafast dynamic machine vision with spatiotemporal photonic computing. Sci. Adv.; 2023; 9,
99. Xu, Z; Yuan, X; Zhou, T; Fang, L. A multichannel optical computing architecture for advanced machine vision. Light Sci. Appl.; 2022; 11,
100. Neyens, S; Zietz, OK; Watson, TF; Luthi, F; Nethwewala, A et al. Probing single electrons across 300-mm spin qubit wafers. Nature; 2024; 629,
101. Wecker, D; Bauer, B; Clark, BK; Hastings, MB; Troyer, M. Gate-count estimates for performing quantum chemistry on small quantum computers. Phys. Rev. A; 2014; 90,
102. Brauns, M; Amitonov, SV; Spruijtenburg, P-C; Zwanenburg, FA. Palladium gates for reproducible quantum dots in silicon. Sci. Rep.; 2018; 8,
103. Dodson, JP; Holman, N; Thorgrimsson, B; Neyens, SF; MacQuarrie, ER et al. Fabrication process and failure analysis for robust quantum dots in silicon. Nanotechnology; 2020; 31,
104. Shulaker, MM; Hills, G; Park, RS; Howe, RT; Saraswat, K et al. Three-dimensional integration of nanotechnologies for computing and data storage on a single chip. Nature; 2017; 547,
105. Zhu, K; Pazos, S; Aguirre, F; Shen, Y; Yuan, Y et al. Hybrid 2D-CMOS microchips for memristive applications. Nature; 2023; 618,
106. Kim, S; Seo, J; Choi, J; Yoo, H. Vertically integrated electronics: new opportunities from emerging materials and devices. Nano-Micro Lett.; 2022; 14,
107. Jiang, J; Parto, K; Cao, W; Banerjee, K. Ultimate monolithic-3D integration with 2D materials: rationale, prospects, and challenges. IEEE J. Electron Devices Soc.; 2019; 7, pp. 878-887. [DOI: https://dx.doi.org/10.1109/JEDS.2019.2925150]
108. Tong, L; Wan, J; Xiao, K; Liu, J; Ma, J et al. Heterogeneous complementary field-effect transistors based on silicon and molybdenum disulfide. Nat. Electron.; 2022; 6,
109. Meng, W; Xu, F; Yu, Z; Tao, T; Shao, L et al. Three-dimensional monolithic micro-LED display driven by atomically thin transistor matrix. Nat. Nanotechnol.; 2021; 16,
110. Kang, J-H; Shin, H; Kim, KS; Song, M-K; Lee, D et al. Monolithic 3D integration of 2D materials-based electronics towards ultimate edge computing solutions. Nat. Mater.; 2023; 22,
111. Jayachandran, D; Pendurthi, R; Sadaf, MUK; Sakib, NU; Pannone, A et al. Three-dimensional integration of two-dimensional field-effect transistors. Nature; 2024; 625,
112. Tang, J; Wang, Q; Wei, Z; Shen, C; Lu, X et al. Vertical integration of 2D building blocks for all-2D electronics. Adv. Electron. Mater.; 2020; 6,
113. Liu, Y; Huang, Y; Duan, X. Van der waals integration before and beyond two-dimensional materials. Nature; 2019; 567,
114. Zhang, H; Zeng, H; Priimagi, A; Ikkala, O. Viewpoint: Pavlovian materials-functional biomimetics inspired by classical conditioning. Adv. Mater.; 2020; 32,
115. Jordan, MI; Mitchell, TM. Machine learning: trends, perspectives, and prospects. Science; 2015; 349,
116. Woods, D; Naughton, TJ. Photonic neural networks. Nat. Phys.; 2012; 8,
117. Miller, DAB. Device requirements for optical interconnects to silicon chips. Proc. IEEE; 2009; 97,
118. Miller, DAB. Attojoule optoelectronics for low-energy information processing and communications. J. Light. Technol.; 2017; 35,
119. Miller, DAB. Waves, modes, communications, and optics: a tutorial. Adv. Opt. Photon.; 2019; 11,
120. Wang, J; Yang, J-Y; Fazal, IM; Ahmed, N; Yan, Y et al. Terabit free-space data transmission employing orbital angular momentum multiplexing. Nat. Photonics; 2012; 6,
121. Richardson, DJ; Fini, JM; Nelson, LE. Space-division multiplexing in optical fibres. Nat. Photonics; 2013; 7,
122. Bozinovic, N; Yue, Y; Ren, Y; Tur, M; Kristensen, P et al. Terabit-scale orbital angular momentum mode division multiplexing in fibers. Science; 2013; 340,
123. Ryf, R; Randel, S; Gnauck, AH; Bolle, C; Sierra, A et al. Mode-division multiplexing over 96 km of few-mode fiber using coherent 6 × 6 MIMO processing. J. Lightwave Technol.; 2012; 30,
124. van Uden, RGH; Correa, RA; Lopez, EA; Huijskens, FM; Xia, C et al. Ultra-high-density spatial division multiplexing with a few-mode multicore fibre. Nat. Photonics; 2014; 8,
125. Kahn, JM; Miller, DAB. Communications expands its space. Nat. Photonics; 2017; 11,
126. Rademacher, G; Puttnam, BJ; Luís, RS; Eriksson, TA; Fontaine, NK et al. Peta-bit-per-second optical communications system using a standard cladding diameter 15-mode fiber. Nat. Commun.; 2021; 12,
127. Gabrielli, LH; Liu, D; Johnson, SG; Lipson, M. On-chip transformation optics for multimode waveguide bends. Nat. Commun.; 2012; 3, 1217. [DOI: https://dx.doi.org/10.1038/ncomms2232]
128. Luo, L-W; Ophir, N; Chen, CP; Gabrielli, LH; Poitras, CB et al. WDM-compatible mode-division multiplexing on a silicon chip. Nat. Commun.; 2014; 5, 3069. [DOI: https://dx.doi.org/10.1038/ncomms4069]
129. Miller, SA; Chang, Y-C; Phare, CT; Shin, MC; Zadka, M et al. Large-scale optical phased array using a low-power multi-pass silicon photonic platform. Optica; 2020; 7,
130. Frellsen, LF; Ding, Y; Sigmund, O; Frandsen, LH. Topology optimized mode multiplexing in silicon-on-insulator photonic wire waveguides. Opt. Express; 2016; 24,
131. Chang, W; Lu, L; Ren, X; Li, D; Pan, Z et al. Ultra-compact mode (de multiplexer based on subwavelength asymmetric Y-junction. Opt. Express; 2018; 26,
132. Tong, Y; Zhou, W; Wu, X; Tsang, HK. Efficient mode multiplexer for few-mode fibers using integrated silicon-on-insulator waveguide grating coupler. IEEE J. Quantum Electron.; 2020; 56,
133. Hu, H; Da Ros, F; Pu, M; Ye, F; Ingerslev, K et al. Single-source chip-based frequency comb enabling extreme parallel data transmission. Nat. Photonics; 2018; 12,
134. Han, Y; Huang, G; Song, S; Yang, L; Wang, H et al. Dynamic neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell.; 2021; 44,
135. Ang, JA; Mountain, DJ. New horizons for high-performance computing. Computer; 2022; 55,
136. Schrittwieser, J; Antonoglou, I; Hubert, T; Simonyan, K; Sifre, L et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature; 2020; 588,
137. Sludds, A; Bandyopadhyay, S; Chen, Z; Zhong, Z; Cochrane, J et al. Delocalized photonic deep learning on the Internet’s edge. Science; 2022; 378,
138. Wright, LG; Onodera, T; Stein, MM; Wang, T; Schachter, DT et al. Deep physical neural networks trained with backpropagation. Nature; 2022; 601,
139. Ma, W; Liu, Z; Kudyshev, ZA; Boltasseva, A; Cai, W et al. Deep learning for the design of photonic structures. Nat. Photonics; 2021; 15,
140. Mohammadi Estakhri, N; Edwards, B; Engheta, N. Inverse-designed metastructures that solve equations. Science; 2019; 363,
141. McNamara, A; Treuille, A; Popović, Z; Stam, J. Fluid control using the adjoint method. ACM Trans. Graph.; 2004; 23,
142. Wang, X; Xie, P; Chen, B; Zhang, X. Chip-based high-dimensional optical neural network. Nano-Micro Lett.; 2022; 14,
143. Moran, J; Desimone, R. Selective attention gates visual processing in the extrastriate cortex. Science; 1985; 229,
144. Moore, T; Zirnsak, M. Neural mechanisms of selective visual attention. Annu. Rev. Psychol.; 2017; 68, pp. 47-72. [DOI: https://dx.doi.org/10.1146/annurev-psych-122414-033400]
145. Duan, X; Cao, Z; Gao, K; Yan, W; Sun, S et al. Memristor-based neuromorphic chips. Adv. Mater.; 2024; 36,
146. Oguz, I; Yildirim, M; Hsieh, JL; Dinc, NU; Moser, C et al. Resource-efficient photonic networks for next-generation AI computing. Light Sci. Appl.; 2025; 14,
147. Wang, D; Nie, Y; Hu, G; Tsang, HK; Huang, C. Ultrafast silicon photonic reservoir computing engine delivering over 200 TOPS. Nat. Commun.; 2024; 15,
148. Günkel, T; Alcalà, J; Fernández, A; Barrera, A; Balcells, L et al. Field-induced phase transitions in cuprate superconductors for cryogenic in-memory computing. Small; 2025; 21,
149. Bai, B; Yang, Q; Shu, H; Chang, L; Yang, F et al. Microcomb-based integrated photonic processing unit. Nat. Commun.; 2023; 14, 66. [DOI: https://dx.doi.org/10.1038/s41467-022-35506-9]
150. Wan, S; Qu, K; Shi, Y; Li, Z; Wang, Z et al. Multidimensional encryption by chip-integrated metasurfaces. ACS Nano; 2024; 18,
151. Guo, S-A; Wu, Y-K; Ye, J; Zhang, L; Lian, W-Q et al. A site-resolved two-dimensional quantum simulator with hundreds of trapped ions. Nature; 2024; 630,
152. Wang, C; Chen, Z; Chan, CLJ; Wan, Z-A; Ye, W et al. Biomimetic olfactory chips based on large-scale monolithically integrated nanotube sensor arrays. Nat. Electron.; 2024; 7,
153. Kumar, D; Li, H; Kumbhar, DD; Rajbhar, MK; Das, UK et al. Highly efficient back-end-of-line compatible flexible Si-based optical memristive crossbar array for edge neuromorphic physiological signal processing and bionic machine vision. Nano-Micro Lett.; 2024; 16,
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.