Content area
The rise of the Internet of Things, autonomous navigation systems, and wearable devices created a growing need for ultra-compact, low-power, low-latency vision sensors that bridge the physical and digital worlds. Vision sensors capture vast amount of data that require swift processing for semantic scene understanding. However, most computer vision algorithms suffer from large power consumption and latency, necessitating the sacrifice of spatial resolution. Optical systems can potentially address these issues with large parallelism and spatial bandwidth for visual data processing. Particularly, free-space optical systems (encoders) can be easily adapted to conventional imaging systems. This paper details the current state of free-space optical encoders and discusses future opportunities for innovations. We also provide insights on where we can achieve optical advantages for computer vision tasks based on empirical evidences.
Introduction
The unprecedented development of artificial intelligence (AI), primarily driven by deep artificial neural networks (ANNs), has transformed versatile computer vision functionalities such as depth sensing1, object detection2, and image classification3. However, most computer vision algorithms incur a large computational burden, making them power hungry and slow, presenting a significant roadblock for real-time resource-constrained applications. Although digital hardware performance is improving exponentially every year with significant advancements in digital processing units (e.g., GPUs) and algorithmic efficiencies, the demand for computing power in AI far exceeds the current pace of progress4,5.
To address power and latency issues, conventional approaches dramatically sacrifice the spatial resolution (i.e., number of captured pixels) of the captured image. However, many applications such as pose estimation and image segmentation require a high-resolution image for accurate predictions6. One promising solution is to perform tailored image processing directly in the optical domain and thus balance the computational burden between optics and digital hardware/software. This can significantly reduce the burden on digital hardware as optical image processing can be done without additional power consumption at the speed of light. In addition, optical systems can potentially exploit multiple modalities of light, including wavelength and polarization, to extract more information from the scene7, which is beyond the capabilities of conventional cameras. Furthermore, the large information bandwidth and the inherent parallelism of the optical system can provide promising advantages for ANNs8. Especially in the field of computer vision, where the physical input is inherently in the optical domain, it is preferable to insert an optical processor without changing the whole system configuration.
To date, various optical systems have been employed for optical image processing, which can be broadly categorized into two groups: photonic integrated circuits (PICs) and free-space optics (FSOs). Many researchers used PICs consisting of on-chip lasers, waveguides, modulators, and interferometers to implement optical image processing due to their robust and programmable nature9, 10, 11, 12–13. However, PICs require free-space to waveguide coupling (via edge or grating couplers), and these couplers are placed on the edges of the chip, making the allowable input only one-dimensional (a unitless length dimension being L)14. As images are two-dimensional (L2), an additional serialization step is necessary to convert the two-dimensional scene into a one-dimensional input (Fig. 1a). In contrast, FSOs such as spatial light modulators (SLMs) have a two-dimensional input channel (Fig. 1b). This fundamental dimensional advantage of FSO over PIC makes FSO more suitable to handle high-resolution images with millions of pixels15. In addition, FSOs can operate at a low photon level as they do not need any additional couplers8,16. A summarized comparison between PICs and FSOs is in Table 1.
Fig. 1 Dimensionality and connectivity discrepancies between two optical platforms. [Images not available. See PDF.]
a Photonic integrated circuits feature a one-dimensional input channel, necessitating additional serialization of two-dimensional input images and limiting nearest-channel connectivity. b Free-space optics utilize a two-dimensional input channel, allowing direct processing of input images without altering their dimensionality. Different colors indicate distinct weights. Abbreviations: RR Ring Resonator, MZI Mach-Zehnder Interferometer.
Table 1. Pros and cons of PICs and FSOs for computer vision
Criteria | PICs | FSOs |
|---|---|---|
Active functionality (reconfigurability and nonlinearlity) | Easier to achieve due to long travel path and resonant structure | Difficult to achieve due to very short interaction length |
Compactness | On-chip | Bulky, but semiconductor/ meta-optics can help |
Alignment tolerance | High as monolithically integrated | Poor, but semiconductor/ meta-optics can help |
Scalability | The input scales with boundary (~ L) | The input scales with area (~ L2) |
Compatibility with incoherent light | Very poor | Mostly work with incoherent light |
In addition, FSOs offer the benefit of long-range connectivity, which can simplify the physical system. For example, in PICs, coupling input channels 2 and 101, would require at least 99 layers of switches, as each switching layer can only couple nearest neighbors (unless it has unique diffractive units17). Conversely, FSOs can couple far-apart pixels even with a single optic, just by engineering the point-spread function (PSF). In fact, any arbitrary vector-matrix multiplication can be performed by stacking only three optics18. Furthermore, FSOs generally possess much larger spectral bandwidths compared to PICs, as most on-chip switches and grating couplers are wavelength sensitive. This makes FSOs particularly promising to interface with incoherent light and natural scenes.
These advantages of FSOs, i.e., input channel dimension, long-range connectivity, and incoherent light compatibility, lead to significant practical benefits in computer vision implementations that operate directly on natural scenes. FSO encoder can be directly inserted in conventional image sensors with minimal changes of their physical configuration, and perform preliminary image processing before the digital computer.
Especially, in deep ANNs, the first few layers perform convolutional operations to extract feature maps from the raw input images19. At the same time, all the FSO imaging systems inherently perform convolutions, allowing them to easily replace the convolutional layers in deep ANNs. This makes the configuration of an optical encoder “an optical front-end and an digital back-end” particularly effective for deep ANNs, where high-dimensional information is initially extracted into low-dimensional features. These features can then be further processed at the digital back-end for tasks such as image classification, object detection, or semantic segmentation. For the rest of this paper, the term FSO encoder stands for an encoder as well as an optical front-end of the network and an optical co-processor at the same time.
Current state of the free-space optical encoder
First, we will discuss how the size and complexity of the FSO encoder can be dramatically reduced using emerging nanophotonic structures. For a long time, all the FSO encoders are implemented in discrete macroscopic optics (e.g., lens, mirror, polarizer, etc.) which have limited functionalities and result in inevitably large/complex system. This type of optical encoders widely utilize SLMs and 4f-systems. However, the increased size and weight pose a serious limitation for practical usage. They are also more prone to misalignment as they consist of many optical elements. Recently, metasurfaces that manipulate light with subwavelength structures have shown promising results to build more compact and multifunctional systems because of their large degree of freedom.
Then we will describe the physical nonlinear activation. The lack of photon-photon interaction that allows for large parallelism in optical encoders also results in a lack of nonlinear activation, which is crucial for most of the deep ANNs. To address optical nonlinearity, researchers often utilize either engineered light-matter interactions (which is fundamentally nonlinear)20 or multiple acquisitions of scattering21. Another important aspect of FSO encoder is their reconfigurability. For bulk systems, SLMs are already reconfigurable. However, in the case of metasurfaces, they are generally static after they are fabricated. Usually, lack of tunability is one of the most critical challenges of the metasurfaces. We will describe several ongoing efforts to achieve reconfigurable metasurfaces for various functional optical encoders in this paper.
Without relying on optics, several works address the challenges of both nonlinearity and reconfigurability on the digital back-end, which we term as hybrid optical/digital configuration. A static optical front-end performs linear operations (inference only), and rely on the digital back-end for the rest of the processes (including nonlinear, reconfigurable operations and error-correction).
Figure 2 classifies the FSO encoders for computer vision tasks according to three criteria: (1) the contributions of digital computation, (2) adaptability to the conventional optical system, and (3) the encoded design framework. Regarding the optical/digital configuration, some encoders require only a simple operation (e.g., subtraction) in the digital backend, while others necessitate much more complex processing (e.g., neural networks). There are also systems that do not require any digital backend operation at all, often referred to as all-optical systems. In terms of imaging configuration, conventional setups typically rely on both imaging optics and a camera to capture scenes and perform computer vision tasks. In such systems, the FSO encoder can be integrated as an add-on or used to replace one or more optical components. Throughout the paperFinally, we continually examinewill review how the FSO encoders are designed. The design of optical systems can be categorized into three categories: intuitive design, inverse design, and end-to-end design. An example of intuitive design is the use of Fano resonance for edge detection22. Inverse-designed optics can perform tailored convolutional operations by engineering their PSFs as same as the convolutional kernels23. The last approach encompasses end-to-end optimization, integrating both optical and digital systems within a single cohesive framework24.
Fig. 2 Classification criteria for the FSO encoders. [Images not available. See PDF.]
a Optical/digital configuration categories based on contributions of the digital backend. The digital backend can perform either simple or complex operations, or even no operation at all. b Imaging configuration categories based on adaptability to the conventional imaging system. The FSO encoder can be added to the conventional system or replace one of the optical components. Multiple layers of encoders can be used to build an entire optical system. c Design frameworks, including fundamental physical laws, digital operation being implemented in optics, and computer-optics co-optimization.
The following sections are primarily organized based on the imaging system configurations, though additional discussion on the digital backend contributions and design frameworks is also provided. In Section 2.1, we describe the traditional FSO architectures based on bulk optics used in neural networks. In Section 2.2, we illustrate how nanophotonics can significantly enhance both the compactness and robustness of the optical system. Finally, Sections 2.3 and 2.4 examine ongoing efforts to address two primary challenges faced by current nano-optical systems: tunability and nonlinearity.
Discrete macroscopic system
Many FSO encoders are inspired by the convolutional nature of optical imaging system under incoherent illuminations25. Manipulating the Fourier domain simplifies control over the PSF of the imaging system, where the image on the camera sensor corresponds to the convolutional output of the object and the PSF. According to the Fraunhofer approximation, an object placed a focal length away from a lens creates the Fourier transform of the scene in the focal plane. We can then place a mask at that focal plane to enable point-by-point multiplication, which then can be inverse Fourier transformed using another lens placed a focal length away. This whole system (from object to sensor) has a track length of four focal lengths, and is known as 4f-system. While there are a few exceptions, such as lens-less approaches with programmable spatial masks26, most macroscopic FSO encoders follow the 4f-system, placing either passive masks or SLMs on the Fourier plane to perform the convolutions.
Passive masks on the Fourier plane within a 4f system can effectively perform convolutional operations. In 2018, Chang et al. used diffractive optical elements made of fused silica to implement phase masks in the Fourier plane27. They experimentally achieved a classification accuracy of ~70.1% for 16-level Google QuickDraw images (Fig. 3a) and 51.8% for the grayscale CIFAR-10 dataset. Their low classification accuracies are due to the lack of nonlinear and fully-connected layers on their architecture. In 2019, Colburn et al. suggested an optical front-end replacing the first linear layer of the entire network, while digital back-end realizing subsequent layers23. They investigated how the optical front-end could contribute to digital computational tasks in terms of latency and energy consumption. They also examined the sensitivity of the optical encoder’s configuration in a hybrid optical/digital system, achieving a numerical classification accuracy of ~87.1% for cats and dogs classification and ~98.71% for MNIST dataset. One important conclusion of the paper is that just implementing one convolutional layer in optics provides almost no benefit in terms of power and latency compared to a purely digital ANN. This is because the computational operation of a single layer in most convolutional neural networks is very minimal. Achieving significant improvements in power efficiency and latency for an optical ANN over its digital counterpart requires either compressing multiple convolutional layers into a single layer − replacing it with a wide and shallow compressed structure that accounts for both convolutional operations and nonlinear activation functions. Alternatively, it involves implementing a series of optical layers equipped with physical photodetection and amplification hardware. We will discuss these topics in greater detail in later sections.
Fig. 3 Examples of bulk free-space optical encoders. [Images not available. See PDF.]
a Passive phase mask and 4f-system. Reprinted from ref. 27 under a CC BY license. b SLMs for optical front-end with digital back-end. Reprinted from ref. 30. ©The Authors, some rights reserved; exclusive licensee AAAS. Distributed under a CC BY-NC 4.0 license http://creativecommons.org/licenses/by-nc/4.0/”. Reprinted with permission from AAAS. c SLMs for all-optical processor. Reprinted with permission from ref. 33 ©Optical Society of America.
SLMs are the most representative building blocks for macroscopic FSOs, because they can modulate either phase or amplitude of the input light field in two dimensions. In 2020, Pad et al. utilized liquid-crstal based SLM and 4f-system to replace the first layer of a convolutional neural network and reduced the computational steps by more than two orders of magnitude of the hybrid optical/digital system28. The SLM in between the lens pair modulated the PSF of the imaging system using end-to-end optimization framework. After that, several other hybrid optical/digital configurations were implemented for computer vision tasks. Yuan et al. utilized SLMs and refractive optics to represent each ANN layer separately, minimizing the computational burden for architecture design and enabling large-scale neural networks29. They demonstrated classification tasks for the MNIST, CIFAR-10, and 5-class ImageNet datasets, achieving accuracies of about 96%, 51%, and 55%, respectively. Bernstein et al. demonstrated single-shot image classification using a pulsed laser and multiple SLMs, achieving classification accuracies of about 95% and 83% for the MNIST and Fashion-MNIST datasets, respectively (Fig. 3b)30. While the aforementioned works have the hybrid optical/digital configuration, there are several all-optical systems using SLMs, which can fully exclude the digital processor for computer vision (Fig. 3c)31, 32–33. In 2020, Zhou et al. proposed in situ optical backpropagation training of diffractive optics using SLMs updated by sensor output31. In 2024, Latifpour et al. performed hyperdimensional vector-matrix and matrix-matrix multiplications in all-optical configurations using diffraction gratings and SLMs34. Digital micromirror devices can modulate the amplitude of the input light with two-dimensional spatial degree of freedom with faster dynamics. Zhou et al. utilized both spatial and temporal encoding to maximize the degree of freedom with fewer parameters35. They achieved a classification accuracy of about 88.4% for the MNIST dataset. We note that, most works neglect to report system-level energy and latency, making it unclear if they are more energy-efficient or faster than complete digital systems. They often overlook the energy consumption of SLMs and the camera, with commercial SLMs being particularly power-hungry and their energy efficiency unverified.
Passive compact system
Metasurfaces, or flat optics, provide an unprecedented level of multi-functionalities due to their extensive degrees of freedom. Various optimization methodologies such as adjoint optimization, neural surrogate modeling, and differentiable photonic simulators have been implemented for metasurface design36, 37–38. Very recently, a large language model based metasurface design tool has been investigated for wider accessibility from the users39. Metasurfaces also offer significant advantages over conventional optical systems in terms of weight and form factor, making them highly compatible with compact optical setups. While various constraints and functionalities can be incorporated into metasurface designs, it is nearly impossible to control all individual sub-wavelength scatterers for large phase variance within the metasurface. Electro-optical modulation enables rapid reconfigurability but incurs significant additional power consumption and faces challenges in individual pixelized control40,41. On the other hand, thermo-optic modulation faces low speed and significant crosstalks. Phase-change materials have garnered interest due to their nonvolatile nature, which eliminates static energy consumption; however, achieving large phase modulation for individual scatterers remains nearly impossible42, 43–44. Consequently, most meta-optical FSO encoders are passive (neither tunable nor consuming additional energy), in stark contrast to macroscopic optical systems that rely on SLMs. In this section, we will describe the current state of FSO encoders constructed from passive and compact (mostly) metasurfaces, categorized into three different design frameworks: (1) physical inspiration, (2) digital-to-optical, (3) end-to-end, and (4) all-optical.
Physical inspiration played an important role in the early stage of meta-optical image processing, such as edge detection. In 2019, Cordaro et al. reported that one-dimensional meta-optical grating structure can perform either the first- or second-order differentiation optically enabling one-dimensional edge detection45. In 2020, Zhou et al. utilized a Fano-resonant metasurface to achieve second-order differentiation in momentum space, extracting high-frequency components of the image for edge detection (Fig. 4a)22. Two years later, from the same group, Zhang et al. employed multilayer film structures to generate a wavelength-dependent optical transfer function, demonstrating edge detection without lithography46. In 2024, Dias and Groep utilized an exciton polariton dispersion for two-dimensional edge detection47. In the same year, Swartz et al. utilized an orbital angular momentum of the incoming light to create the Laguerre-Gaussian beam and perform edge detection with digital subtraction48. While edge detection via metasurfaces can drastically reduce information content, in practice its utility is limited because the entire sensor still needs to be read out, which negates energy efficiency gains. However, integrating this optical edge detection with event cameras—which read out only the pixels where intensity changes occur—can enhance energy efficiency. This observation underscores that the compression achievable through metasurfaces, as a generalizable framework for a non-learned front-end, must be coupled with an appropriate sensor to realize the desired performance benefits49.
Fig. 4 Examples of compact and passive free-space optical encoders. [Images not available. See PDF.]
a Physics-inspired Fano resonator for edge detection. Reprinted with permission from the authors22. b Optical metasurfaces having digital convolutional kernels as their PSFs optically replaces the convolutional layer of the deep neural network. Reprinted with permission from the authors52. c End-to-to optimized metasurface with better edge detection performance for the thermal imaging. Reprinted from ref. 48 under a CC BY license.
Digital-to-optical approach represents a one-way process where the metasurface is designed for predetermined digital operations which are already done in the digital domain. This mostly involves convolutional optical systems replacing digital convolutional layers of the network. In many cases, the metasurfaces having two-dimensional kernels of the convolutional layers as their PSFs are employed, then the optical convolution naturally occurs just via light propagation. Wang et al. showed that free-form meta-optics can replace general convolutional operations by simulation50. In 2022, Zheng et al. used one metasurface for image duplication and another metasurface to implement polarization-based kernel weights51. They achieved about 93.1% accuracy for classifying the MNIST hand-written digits. In 2024, they employed one metasurface to split the input image (based on the helicity of light), while the second metasurface implemented the kernel weights. Two different circularly-polarized images were used to implement the positive and negative kernels, respectively (Fig. 4b)52. Their hybrid optical/digital network achieved about 98.6% and 88.8% accuracies for MNIST hand-written digits and fashion images, respectively. We emphasize that none of these accuracies are better than what can be achieved using state-of-the-art digital computing architecture, although some benefit in latency is expected. Additionally, these datasets are extremely simple, and as such irrelevant to the power/ latency limitation of the modern electronic ANNs.
Note that almost all deep ANNs contains multiple convolutional layers in their front-end, while nonlinear (e.g., ReLu) and pooling layers are placed in between each of the convolutional layers. Replacing all the convolutional layers with optics requires series of optics-electronic-optics conversion systems which eventually ends up with more energy consumption and latency compared to the purely digital system. This is particularly critical for FSO system. While in principle such conversions can be performed in PIC with very low power53, performing such for many pixels (which is again the primary benefit of optical frontend) is non-trivial. As such, extrapolating results from a single beam to multiple optical beams is naive and often leads to unfounded optimism in such repeated conversion system. As such, a contiguous optical frontend (without any intermediate conversion to electrical domain) effectively implementing multiple convolutions is more desirable. We note here that, AI research already has shown that it is indeed possible to achieve similar performance as deep ANN, using a shallow but wide network54. Thus it is indeed possible to envision a large linear frontend without any nonlinearity as a dimensionality reduction block. Following this philosophy, Xiang et al. compressed the AlexNet to realize a simplified neural network with a single convolutional layer followed by a nonlinear and a fully-connected layer55. With this framework, Wirth-Singh et al. replaced the digital convolutional layer with a single metasurface array having PSFs of the compressed kernels, and achieved about 93% accuracy for the MNIST handwritten dataset56. Choi et al. leveraged the multi-functionality of metasurfaces to create polychromatic metasurfaces, where a single metasurface has three different PSFs for red, green, and blue colors, significantly reducing the complexity of the network57. They achieved a state-of-the-art classification accuracy of about 73.2% with more than two orders of magnitude reduction of the total system level energy consumption and four orders of magnitude reduction of the latency. They also employed the same metasurface encoder for another dataset (i.e., High-10) with additional transfer learning approach, paving the way for generalized passive optical encoder with variable digital back-end. We emphasize that while this is one of the first demonstrations of somewhat non-trivial datasets beyond MNIST, these are still very simple problems compared to the current state-of-the-art AI problems.
End-to-end design framework puts both optical front-end and digital back-end in the same training loop to co-optimize them. End-to-end design can in principle fully utilize the large degree of freedom of metasurfaces to handle more complex tasks; however, it may require large computational resources for training (which is very important in many cases) as well as it may be ill-posed to very specific constraints (e.g., metasurface dimensions, scatterer size, wavelength, focal length, etc.). Swartz et al. showed that the power of end-to-end design framework can outperform the purely physics-inspired design in terms of edge detection performance for thermal imaging (Fig. 4c)48. Li et al. employed triplet metasurfaces to capture diffracted images of the MNIST handwritten dataset with a single-pixel camera, and reconstructed the images using a neural network at the digital back-end58. Bai et al. demonstrated all-optical selective imaging of target classes of objects using triplet metasurfaces59. They also used information from multiple diffracted wavelengths to classify the MNIST handwritten dataset with about 87.74% accuracy using simple differential processing in the digital back-end60. Because these works operate in the terahertz regime (hence long wavelength), they are less susceptible to misalignment. We note that, while both Terahertz and visible/ infrared photonics are electromagnetic waves, the fabrication and alignment tolerances are vastly different. As such, translating these experiments to visible wavelength may require fundamental innovations in both manufacturing and packaging technologies.
In the visible wavelength range, Huang et al. employed an end-to-end optimized metasurface for image classification tasks, resulting in an advance in classification accuracy for the MNIST handwritten dataset under a low-power and -latency regime61. An anecdotal conclusion of the paper is that it may be possible to achieve system level power and latency benefit, but this happens when the overall classification accuracy is lower. We emphasize that almost all the published experimental results on optical ANN have a similar trend. This finding can also be explained noting that as a deep ANN can essentially simulate any function, it can also implement the functionality of the optics. As such, we believe any research claiming better classification performance of an optical ANN over a full-fledged electronic one might have a poorly trained electronic ANN. In that sense, applications which require the FSO encoder must operate under strict computing power and/or latency constraints such as battery-powered mobile devices, where efficiency matters most, even at the expense of marginal accuracy loss. This field of low-power computer vision becomes more significant as more mobile devices are equiped with high-resolution cameras and need to process image real-time with very limited computing, battery, and momory resources. Wei et al. considered oblique angle-incident light-dependent PSFs and created a large number of optical convolutional kernels to achieve significant image classification accuracies for various datasets using an end-to-end optimized optical encoder and achieved 72.76% accuracy for the CIFAR-10 dataset, while showing four orders of magnitude reduction in the training variables24. Liang et al. showed that a single metasurface (not an array) can simultaneously operate multiple kernel convolutions, and achieved moderate classification accuracies of about 98.59%, 92.63% and 68.67% for the MNIST, Fashion-MNIST, and CIFAR-10 dataset, respectively62.
All-optical configurations represent when the computer vision tasks are done (inference, in particular) all in the optical domain without digital computational contributions. Some does not require digital computation even during the training process by demonstrating the backpropagation in optical domain31. In 2018, Lin et al. demonstrated an all-optical classification using multilayer diffractive optics, and achieved an accuracy of about 86.33% for the MNIST handwritten dataset63. The following year, in 2019, Li et al. utilized multilayer diffractive optics along with predefined positive/negative regions of interest in the camera sensor, achieving much higher classification accuracies: 98.59%, 91.06%, and 51.44% for the MNIST, Fashion-MNIST, and grayscale CIFAR-10 datasets, respectively64. In 2021, Leonard et al. investigated in details how the various metasurface parameters affect the performance (i.e., accuracy) of the all-optical neural network65. In 2022, Mengu et al. used multilayer diffractive optics to classify two different classes at the same time when they are overlapped to each other66. Qian et al. utilized double-layer metasurfaces to recognize the rabbit’s postures (i.e., sit, stand or walk)67. Gao et al. utilized triple-layer metasurfaces to measure the angle of incidence of the radio frequency light beyond the diffraction limit68. Mengu et al. demonstrated multispectral imaging system using multilayer diffractive optics in terahertz wavelengths69. And Li et al. demonstrated polarization multiplexed diffractive neural network70. Similarly, Wang et al. designed the polarization sensitive metasurfaces with Jones Matrix configuration, demonstrating polarization multiplexed neural networks for multiple orthogonal tasks (i.e., MNIST, Fashion-MNIST, and KMNIST) with accuracies of about 98%, 89%, and 89%, simultaneously71. Yan et al. used two different metasurfaces of eight-level step phase plates as encoder and decoder to retrieve the depth information of the image with ultralow energy72. All-optical systems have some benefits that no digital computing unit is required and computer vision tasks can be completely done in analogue. However, lack of nonlinearity and tunability of the compact optics hinder their performances (e.g, accuracy) for even a slightly complex dataset such as CIFAR-10.
Tunable compact system
As we discussed beforehand, post-fabrication tunability of metasurfaces remains one of the most significant challenges. To date, only a few tunable metasurfaces have been demonstrated for computer vision tasks. Zhang et al. employed a stretchable metasurface to switch its functionality between bright-field imaging and differential imaging (Fig. 5a)73. Wang et al. used a similar configuration to Zheng et al.52, but they increased the number of convolutional kernels by utilizing liquid crystals and polarization-dependent PSFs of the metasurface74. They produced four times more convolutional kernels, corresponding to four different linear polarization directions of the input light modulated by the liquid crystal. This achieved classification accuracies of 98.5% and 90.9% for the MNIST and Fashion-MNIST datasets, respectively. Recently, Cheng et al. employed semi-passive masks made of a nonvolatile phase change material (Ge2Sb2Te5) as shown in Fig. 5b75. They employed spectrum-specific write-and-erase modulation of the phase change material enabling multiple tasks depending on the wavelength of carrier photons, making the optical encoder suitable with MNIST, Fashion-MNIST, KMINIST, OracleMNIST, and OverheadMNIST datasets simultaneously75.
Fig. 5 Examples of compact and tunable free-space optical encoders. [Images not available. See PDF.]
a Liquid crystal integrated metasurface with 4-fold enhanced number of PSFs according to the its polarization direction of the input light. Reprinted with permission from ref. 74 ©Optical Society of America. b Phase change material based quasi-tunable write-and-erase metasurface array for multiple computer vision tasks. Reprinted from ref. 75 under a CC BY license.
Nonlinear system
For almost all deep ANNs, nonlinear activation functions are necessarily required (except for the simplest handwritten dataset which are well-known to be linearly separable). In this section, we will discuss various nonlinear optics for computer vision tasks. We note that, in general nonlinear activation using physical optics is difficult as photons do not interact with each other. However, this becomes more difficult for FSO encoder, as we need to perform all the nonlinear activation in parallel to preserve the speed up from the parallel linear operation76.
In 1995, Saxena et al. introduced ferroelectricity of liquid crystal light valve as a nonlinear optical neural network, which has a sigmoid profile77. In 2019, Yan et al. implemented the ferroelectric thin film made by photorefractive crystal in a Fourier domain, and successfully demonstrated a diffractive deep neural network78. Recently, an all-optical fully-forward training has been implemented using a photorefractive all-optical nonlinearity, which has the following input-output relationship:
1
and achieved about 60.0% and 59.5% classification accuracies for the 4-class CIFAR-10 and 4-class ImageNet datasets, respectively79. They also implemented an electro-optical intensity-induced phase nonlinearity, which has the following input-output relationship:2
for the MNIST handwritten dataset and achieved about 93.0% classification accuracy79.Matuszewski et al. proposed exciton-polariton as an energy-efficient and all-optical nonlinear activation functional platform80. Wright et al. utilized an optical second harmonic generation in a nonlinear crystal, achieving about 97% accuracy for the MNIST handwritten dataset using a digital front-end and multilayer optical implementations81. In addition, they applied the same configuration for audio classification tasks, and employed other configurations of mechanical and electronical nonlinear circuits for the MNIST dataset.
Although we have described various nonlinear optical platforms, saturable absorbers are one of the most representative platforms which are comparably easy to be demonstrated with atomic vapor cells and quantum dots. Zuo et al. utilized electromagnetically induced transparency with a magneto-optically trapped atom system (85Rb), where the on-resonant probe laser output is nonlinear to the input beam82. Ryou et al. used an evacuated vapor cell filled with thermal rubidium atoms as a saturable absorber for illumination, achieving about 6% accuracy enhancement for MNIST handwritten dataset classification tasks compared to the passive linear network83. Similarly, Yang et al. used an evacuated vapor cell filled with cesium atoms as a saturable absorber and achieved much better accuracy (around 84%) for the MNIST handwritten dataset84. These studies used the same thermo-optical nonlinear response between the input-output intensities, described by the following equation:
3
with fitting parameters a and b. On the other hand, saturable absorbers using quantum dots effectively emulate the ReLU function85. Huang et al. showed that the classification accuracies of the optical neural network can be incremented by more than 10% by inserting a nonlinear layer using quantum film for the 10-class hand-drawn classification and 5-class action recognition tasks (Fig. 6a)20.Fig. 6 Examples of nonlinear free-space optical encoders. [Images not available. See PDF.]
a Quantum dot based saturable absorber for nonlinear optical neural network. Reprinted from ref. 20. ©The Authors, some rights reserved; exclusive licensee AAAS. Distributed under a CC BY-NC 4.0 license. Reprinted with permission from AAAS. b Multilayer light-emitting-diodes and photodetectors generating nonlinear optical layers for deep neural network. Reprinted from ref. 87 under a CC BY-NC-ND license. c Sequential linear scattering in a cavity converted into a nonlinear scattering. Reprinted from ref. 21 under a CC BY license.
In another approach, Wang et al. employed an image intensifier, which amplifies the photons by photon-to-electron and electron-to-photon conversions and provides nonlinearity86. The nonlinear response of the input-output illumination is described by the following relationship:
4
with fitting parameters a, b, c, and d. Wang et al. used 36 different local intensifying channels, achieving about 79.0% accuracy for 10-class handwritten figure classification, around 93.0% accuracy for 5-class cell organelle classification, and about 61.1% accuracy for 8-class speed-limit-signs recognition, all with a miniaturized digital back-end. One important aspect of such intensifier based nonlinearity is the signal regeneration, which allows cascading multiple layers, and in our opinion, this approach is one of the most promising ways to implement nonlinear activation. Recently, Song et al. employed multilayer two-dimensional arrays of light-emitting-diodes and photodetectors to create alternating optical and optoelectronic layers (Fig. 6b)87. Their energy-efficient and nonlinear neural network achieved classification accuracies far exceeding the linear neural network for the MNIST and four-class nonlinear spiral classifications. The amounts of increments are far greater for nonlinear tasks compared to the linearly solvable task (e.g., handwritten digit) as it necessarily require nonlinear operations in their network.Other works utilized multiple scattering inside a cavity, converting a linear scattering process into a nonlinear one. Here, the basic idea is to employ time-varying systems to effectively implement an optical nonlinearity. We emphasize that any time-modulation also requires a physical nonlinearity, and classifying such multiple scattering systems as purely “physical linear” system is not accurate. However, the light input and output relationship is indeed nonlinear. Yildirim et al. used multiple reflections between a mirror and an SLM, creating higher-order (i.e., nonlinear) activation based on linear modulation88. They achieved classification accuracies of about 36%, 84%, and 88% for the ImageNet, Fashion MNIST, and MNIST handwritten datasets, respectively, with scalable representations of their system. Eliezer et al. and Xia et al. integrated a digital micromirror device with a spherical cavity (Fig. 6c)21,89. Due to multiple scattering inside the spherical cavity, the modulation from the digital micromirror device became nonlinear, similar to what is demonstrated in a multimode fiber90. Wang et al. implemented multiple nonlinear scattering inside a diffuser made of lithium niobate nanoscrystals, and achieved large enhancement of classicifation accuracies for 10-class sign language digits, 24-class American Sign Language alphabet, and CIFAR-10 RGB image datasets91. However, given the poor classification accuracy, it remains unclear, if such nonlinearity can be useful in practice.
Opportunities
As we discussed, none of the reported works on optical encoders so far have solved a problem that is relevant for real-time low-power ANN. To date, FSO encoders have been demonstrated on well-refined image datasets (e.g., MNIST, CIFAR10, or ImageNet) for image classification tasks. The most challenging AI problems today involve object detection, image segmentation and semantic scene understanding. But these problems have not yet been experimentally demonstrated by any FSO encoders. This remains an important open problem for the FSO encoders, although some theoretical proposals exist92. In addition to the more advanced computational tasks, the development of tunable and universal FSO encoders for multiple computer vision applications using knowledge transfer method57 will further enhance their significance in the near future. While wavelength and polarization discriminating meta-optics were employed in meta-optical frontend52,57,74, it is expected that meta-optics can provide more multi-functionality to achieve higher information density. Engineered sensors also present an important research direction. Finally, it is important to identify the application space, where optical encoders will be the most beneficial.
Near-sensor or in-sensor computing
To address the challenges posed by the large number of optical channels in FSO encoders, near-sensor and in-sensor computing configurations have garnered attention, where some processes are performed at the sensory level93. This approach can significantly reduce the amount of redundant data that needs to be transferred to digital back-ends, thereby mitigating issues related to energy consumption, latency, and security during image readout. To achieve this, photoelectronic responses of pixelized sensors need to be investigated at each pixel level to enable data processing at the sensory level. Wang et al. produced an array of photodetectors made of two-dimensional heterostructure (WSe2/hBN/Al2O3) and utilized its gate voltage-dependent photoresponses for edge detection, image classification, and object tracking94. Yang et al. developed a graphene-germanium heterostructure to create multi-terminal photodetectors capable of amplifying dim events in the scene95. By selectively controlling the dynamic responses of pixelized photodetectors, they successfully extracted edge information from the scene and utilized it for facial recognition. Recently, Song et al. employed multiple layers of photodetectors and light-emitting diodes to demonstrate a scalable physical nonlinear optical network, achieving about 92% accuracy on the MNIST handwritten dataset87. In addition to near-sensor and in-sensor computing, there are also studies on multidimensional imaging (e.g., intensity, spectrum, and polarization) that utilize reconstruction neural networks to decouple the dispersion-related responses of photodetectors96,97.
Task-specific or general-purpose FSOs
In deep neural networks, the presence of multiple convolutional layers and nonlinear activation functions in the frontend provides significant advantages, including hierarchical feature extraction (from low-level to high-level features) and efficient optimization within spatial structures. Although an infinitely wide but shallow neural network can theoretically approximate any continuous function54, it is impractical due to the exorbitant number of neurons required. Conversely, in the optical domain (particularly in FSO) applying extremely wide linear layers is often more practical than implementing nonlinear activation functions. As a result, FSO encoders generally replace only the linear operations, deferring nonlinear activation to the digital backend unless the dataset is sufficiently simple, where nonlinear activation may be unnecessary. However, executing an infinite number of operations optically remains impractical, leading to inherent trade-offs between accuracy and generalizability in FSO-based networks. Furthermore, FSOs typically exhibit limited reconfigurability compared to PICs (Table 1), making them more suitable for task-specific applications rather than general-purpose architectures.
Applications
Amid the rapid developments of unmanned vehicles, self-decision making and on-site data processing have become inevitable for several reasons. First, human resources are often insufficient to manage the large number of unmanned vehicles (e.g., swarm flights of drones)98. Second, data transfer from these vehicles requires significant energy and introduces latency, making on-site decision-making essential99. Furthermore, vehicles that solely capture data without real-time interaction with their surroundings (e.g., habitat/surveillance monitoring100) face substantial memory storage challenges. Consequently, on-site digital data processing and the subsequent energy consumption become critical issues for unmanned vehicles, especially given their stringent constraints on weight, battery life, and computing capacity. In addition, most of the information collected by unmanned vehicles are visual information. In these regards, FSO encoders offer significant potential to mitigate these issues for unmanned vehicles, minimizing energy consumption and latency for on-site data processing and decision-making, especially when power and latency requirements are very stringent. However, it is likely that this advantage will result in a reduced accuracy; so we think that this FSO encoder will be more suitable for statistical analysis type tasks, not human-life-threatening decision-making.
Alongside of the power and latency benefits that FSO encoder can provide, it can also provide a security benefits. Scenes which are already processed by FSO encoders and then captured by cameras often lose high-resolution information and be extracted by their low-resolution features. Since only the optically-processed features need to be captured on the camera, the fear of misusage of the images (which is one of the biggest concerns of wide-spread computer visions101) becomes much less as the already processed images are only applicable for specific predefined tasks (e.g., object recognition). On the other hand, as we need fewer number of pixels to capture the low-dimensional features, we can minimize the camera size which becomes a lot easier to be integrated with micro-robots102.
Methods
Written informed consent to publish the identifiable image in Fig. 1 was obtained from the subject’s parent/legal guardian (see Declarations).
Acknowledgements
The research is supported by National Science Foundation (EFRI-BRAID-2223495). Part of this work was conducted at the Washington Nanofabrication Facility/ Molecular Analysis Facility, a National Nanotechnology Coordinated Infrastructure (NNCI) site at the University of Washington with partial support from the National Science Foundation via awards NNCI-1542101 and NNCI-2025489.
Author contributions
M.C. and A.M. wrote and reviewed the manuscript.
Data availability
No datasets were generated or analysed during the current study.
Competing interests
The authors declare no competing interests.
Ethics declarations
Written consent to publish the photograph in Fig. 1 was obtained from the subject’s parent/legal guardian. The image was provided as a voluntary contribution to this article.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Fu, H., Gong, M., Wang, C., Batmanghelich, K. & Tao, D. Deep ordinal regression network for monocular depth estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2002–2011 (2018).
2. Tan, M., Pang, R. & Le, Q. V. Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10781–10790 (2020).
3. Simonyan, K. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
4. Sevilla, J. et al. Compute trends across three eras of machine learning. In 2022 International Joint Conference on Neural Networks (IJCNN), 1–8 (IEEE, 2022).
5. Narayanan, D. et al. Efficient large-scale language model training on gpu clusters using megatron-lm. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1–15 (2021).
6. Yuan, Y et al. Hrformer: high-resolution vision transformer for dense predict. Adv. Neural Inform. Process. Syst.; 2021; 34, pp. 7281-7293.
7. Chen, H; Wolff, LB. Polarization phase-based method for material classification in computer vision. Int. J. Comput. Vis.; 1998; 28, pp. 73-83.
8. Wang, T et al. An optical neural network using less than 1 photon per multiplication. Nat. Commun.; 2022; 13, 2022NatCo.13.123W 123.
9. Xu, Z et al. Large-scale photonic chiplet taichi empowers 160-tops/w artificial general intelligence. Science; 2024; 384, pp. 202-209.2024Sci..384.202X
10. Pai, S et al. Experimentally realized in situ backpropagation for deep learning in photonic neural networks. Science; 2023; 380, pp. 398-404.2023Sci..380.398P
11. Chen, Y et al. All-analog photoelectronic chip for high-speed vision tasks. Nature; 2023; 623, pp. 48-57.2023Natur.623..48C
12. Chen, Z et al. Deep learning with coherent vcsel neural networks. Nat. Photonics; 2023; 17, pp. 723-730.2023NaPho.17.723C
13. Senanian, A; Wright, LG; Wade, PF; Doyle, HK; McMahon, PL. Programmable large-scale simulation of bosonic transport in optical synthetic frequency lattices. Nat. Phys.; 2023; 19, pp. 1333-1339.
14. Gu, Z; Ma, Q; Gao, X; You, JW; Cui, TJ. Direct electromagnetic information processing with planar diffractive neural network. Sci. Adv.; 2024; 10, eado3937.
15. Hamerly, R; Bernstein, L; Sludds, A; Soljačić, M; Englund, D. Large-scale optical neural networks based on photoelectric multiplication. Phys. Rev. X; 2019; 9, 021032.
16. Ma, S.-Y., Wang, T., Laydevant, J., Wright, L. G. & McMahon, P. L. Quantum-limited stochastic optical neural networks operating at a few quanta per activation. Nat. Commun.16, 359 (2025).
17. Zhu, H et al. Space-efficient optical computing with an integrated chip diffractive neural network. Nat. Commun.; 2022; 13, 2022NatCo.13.1044Z 1044.
18. Spall, J; Guo, X; Barrett, TD; Lvovsky, A. Fully reconfigurable coherent optical vector–matrix multiplication. Opt. Lett.; 2020; 45, pp. 5752-5755.2020OptL..45.5752S
19. Yamashita, R; Nishio, M; Do, RKG; Togashi, K. Convolutional neural networks: an overview and application in radiology. Insights Imaging; 2018; 9, pp. 611-629.
20. Huang, Z et al. Pre-sensor computing with compact multilayer optical neural network. Sci. Adv.; 2024; 10, eado8516.
21. Eliezer, Y; Rührmair, U; Wisiol, N; Bittner, S; Cao, H. Tunable nonlinear optical mapping in a multiple-scattering cavity. Proc. Natl. Acad. Sci. USA; 2023; 120, e2305027120.4637853
22. Zhou, Y; Zheng, H; Kravchenko, II; Valentine, J. Flat optics for image differentiation. Nat. Photonics; 2020; 14, pp. 316-323.2020NaPho.14.316Z
23. Colburn, S; Chu, Y; Shilzerman, E; Majumdar, A. Optical frontend for a convolutional neural network. Appl. Opt.; 2019; 58, pp. 3179-3186.2019ApOpt.58.3179C
24. Wei, K et al. Spatially varying nanophotonic neural networks. Sci. Adv.; 2024; 10, eadp0391.
25. Lu, T; Wu, S; Xu, X; Yu, FT. Two-dimensional programmable optical neural network. Appl. Opt.; 1989; 28, pp. 4908-4913.1989ApOpt.28.4908L
26. Liu, C et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron.; 2022; 5, pp. 113-122.
27. Chang, J; Sitzmann, V; Dun, X; Heidrich, W; Wetzstein, G. Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Sci. Rep.; 2018; 8, pp. 1-10.
28. Pad, P. et al. Efficient neural vision systems based on convolutional image acquisition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12285–12294 (2020).
29. Yuan, X; Wang, Y; Xu, Z; Zhou, T; Fang, L. Training large-scale optoelectronic neural networks with dual-neuron optical-artificial learning. Nat. Commun.; 2023; 14, 2023NatCo.14.7110Y 7110.
30. Bernstein, L et al. Single-shot optical neural network. Sci. Adv.; 2023; 9, eadg7904.
31. Zhou, T et al. In situ optical backpropagation training of diffractive optical neural networks. Photonics Res.; 2020; 8, pp. 940-953.
32. Pierangeli, D; Marcucci, G; Conti, C. Photonic extreme learning machine by free-space optical propagation. Photonics Res.; 2021; 9, pp. 1446-1454.
33. Gu, Z; Gao, Y; Liu, X. Optronic convolutional neural networks of multi-layers with different functions executed in optics for image classification. Opt. Express; 2021; 29, pp. 5877-5889.2021OExpr.29.5877G
34. Latifpour, MH; Park, BJ; Yamamoto, Y; Suh, M-G. Hyperspectral in-memory computing with optical frequency combs and programmable optical memories. Optica; 2024; 11, pp. 932-939.
35. Zhou, T; Wu, W; Zhang, J; Yu, S; Fang, L. Ultrafast dynamic machine vision with spatiotemporal photonic computing. Sci. Adv.; 2023; 9, eadg4391.
36. Deng, Y; Ren, S; Fan, K; Malof, JM; Padilla, WJ. Neural-adjoint method for the inverse design of all-dielectric metasurfaces. Opt. Express; 2021; 29, pp. 7526-7534.2021OExpr.29.7526D
37. Pestourie, R; Mroueh, Y; Nguyen, TV; Das, P; Johnson, SG. Active learning of deep surrogates for pdes: application to metasurface design. npj Comput. Mater.; 2020; 6, 2020npjCM..6.164P 164.
38. Li, Z et al. Inverse design enables large-scale high-performance meta-optics reshaping virtual reality. Nat. Commun.; 2022; 13, 2022NatCo.13.2409L 2409.
39. Kim, M., Park, H. & Shin, J. Nanophotonic device design based on large language models: multilayer and metasurface examples. Nanophotonics14, 1273–1282 (2025).
40. Wu, PC et al. Dynamic beam steering with all-dielectric electro-optic iii–v multiple-quantum-well metasurfaces. Nat. Commun.; 2019; 10, 2019NatCo.10.3654W 3654.
41. Benea-Chelmus, I-C et al. Gigahertz free-space electro-optic modulators based on mie resonances. Nat. Commun.; 2022; 13, 2022NatCo.13.3170B 3170.
42. Tara, V. et al. Non-volatile reconfigurable transmissive notch filter using wide bandgap phase change material antimony sulfide. IEEE Journal of Selected Topics in Quantum Electronics30, 1–8 (2024).
43. Fang, Z et al. Nonvolatile phase-only transmissive spatial light modulator with electrical addressability of individual pixels. ACS nano; 2024; 18, pp. 11245-11256.
44. Shalaginov, MY et al. Reconfigurable all-dielectric metalens with diffraction-limited performance. Nat. Commun.; 2021; 12, 2021NatCo.12.1225S 1225.
45. Cordaro, A et al. High-index dielectric metasurfaces performing mathematical operations. Nano Lett.; 2019; 19, pp. 8418-8423.2019NanoL.19.8418C
46. Zhang, X; Bai, B; Sun, H-B; Jin, G; Valentine, J. Incoherent optoelectronic differentiation based on optimized multilayer films. Laser Photonics Rev.; 2022; 16, 2200038.2022LPRv..1600038Z
47. Dias, B. S. & van de Groep, J. High-na 2d image edge detection using tamm plasmon polaritons in few-layer stratified media. ACS Photonics12, 311–319 (2024).
48. Swartz, BT; Zheng, H; Forcherio, GT; Valentine, J. Broadband and large-aperture metasurface edge encoders for incoherent infrared radiation. Sci. Adv.; 2024; 10, eadk0024.
49. Gehrig, D; Scaramuzza, D. Low-latency automotive vision with event cameras. Nature; 2024; 629, pp. 1034-1040.2024Natur.629.1034G
50. Wang, H et al. Design of compact meta-crystal slab for general optical convolution. ACS Photonics; 2022; 9, pp. 1358-1365.
51. Zheng, H et al. Meta-optic accelerators for object classifierswant. Sci. Adv.; 2022; 8, eabo6410.
52. Zheng, H et al. Multichannel meta-imagers for accelerating machine vision. Nat. Nanotechnol.; 2024; 19, pp. 471-478.2024NatNa.19.471Z
53. Miller, DA. Attojoule optoelectronics for low-energy information processing and communications. J. Lightwave Technol.; 2017; 35, pp. 346-396.2017JLwT..35.346M
54. Jacot, A., Gabriel, F. & Hongler, C. Neural tangent kernel: convergence and generalization in neural networks. Adv. Neural Inform. Process. Syst.31, 8571–8580 (2018).
55. Xiang, J; Colburn, S; Majumdar, A; Shlizerman, E. Knowledge distillation circumvents nonlinearity for optical convolutional neural networks. Appl. Opt.; 2022; 61, pp. 2173-2183.2022ApOpt.61.2173X
56. Wirth-Singh, A. et al. Compressed meta-optical encoder for image classification. Adv. Photonics Nexus4, 026009 (2025).
57. Choi, M. et al. Transferable polychromatic optical encoder for neural networks. Nat. Commun.16, 5623 (2025).
58. Li, J et al. Spectrally encoded single-pixel machine vision using diffractive networks. Sci. Adv.; 2021; 7, eabd7690.2021SciA..7.7690L
59. Bai, B et al. To image, or not to image: class-specific diffractive cameras with all-optical erasure of undesired objects. ELight; 2022; 2, 14.
60. Bai, B et al. All-optical image classification through unknown random diffusers using a single-pixel diffractive network. Light Sci. Appl.; 2023; 12, 69.2023LSA..12..69B
61. Huang, L et al. Photonic advantage of optical encoders. Nanophotonics; 2024; 13, pp. 1191-1196.2023Nanop.13.1191H
62. Liang, R. et al. Metasurface-generated large and arbitrary analog convolution kernels for accelerated machine vision. ACS Photonics11, 5430–5438 (2024).
63. Lin, X et al. All-optical machine learning using diffractive deep neural networks. Science; 2018; 361, pp. 1004-1008.2018Sci..361.1004L3837095
64. Li, J; Mengu, D; Luo, Y; Rivenson, Y; Ozcan, A. Class-specific differential detection in diffractive optical neural networks improves inference accuracy. Adv. Photonics; 2019; 1, 046001.2019AdPho..1d6001L
65. Léonard, F; Backer, AS; Fuller, EJ; Teeter, C; Vineyard, CM. Co-design of free-space metasurface optical neuromorphic classifiers for high performance. ACS Photonics; 2021; 8, pp. 2103-2111.
66. Mengu, D; Veli, M; Rivenson, Y; Ozcan, A. Classification and reconstruction of spatially overlapping phase images using diffractive optical networks. Sci. Rep.; 2022; 12, 2022NatSR.12.8446M 8446.
67. Qian, C et al. Dynamic recognition and mirage using neuro-metamaterials. Nat. Commun.; 2022; 13, 2022NatCo.13.2694Q 2694.
68. Gao, S et al. Super-resolution diffractive neural network for all-optical direction of arrival estimation beyond diffraction limits. Light Sci. Appl.; 2024; 13, 161.
69. Mengu, D; Tabassum, A; Jarrahi, M; Ozcan, A. Snapshot multispectral imaging using a diffractive optical network. Light Sci. Appl.; 2023; 12, 86.2023LSA..12..86M
70. Li, J; Hung, Y-C; Kulce, O; Mengu, D; Ozcan, A. Polarization multiplexed diffractive computing: all-optical implementation of a group of linear transformations through a polarization-encoded diffractive network. Light Sci. Appl.; 2022; 11, 153.2022LSA..11.153L
71. Wang, Y; Yu, A; Cheng, Y; Qi, J. Matrix diffractive deep neural networks merging polarization into meta-devices. Laser Photonics Rev.; 2024; 18, 2300903.2024LPRv..1800903W
72. Yan, T et al. Nanowatt all-optical 3d perception for mobile robotics. Sci. Adv.; 2024; 10, eadn2031.
73. Zhang, X et al. Reconfigurable metasurface for image processing. Nano Lett.; 2021; 21, pp. 8715-8722.2021NanoL.21.8715Z
74. Wang, M et al. Multichannel meta-imagers based on electrically tunable metasurfaces for accelerating matrix operations. Opt. Express; 2024; 32, pp. 39915-39923.
75. Cheng, Y et al. Photonic neuromorphic architecture for tens-of-task lifelong learning. Light Sci. Appl.; 2024; 13, 56.2024LSA..13..56C
76. Ryou, A; Colburn, S; Majumdar, A. Image enhancement in a miniature self-imaging degenerate optical cavity. Phys. Rev. A; 2020; 101, 013824.2020PhRvA.101a3824R
77. Saxena, I; Fiesler, E. Adaptive multilayer optical neural network with optical thresholding. Opt. Eng.; 1995; 34, pp. 2435-2440.1995OptEn.34.2435S
78. Yan, T et al. Fourier-space diffractive deep neural network. Phys. Rev. Lett.; 2019; 123, 023901.2019PhRvL.123b3901Y
79. Xue, Z et al. Fully forward mode training for optical neural networks. Nature; 2024; 632, pp. 280-286.
80. Matuszewski, M et al. Energy-efficient neural network inference with microcavity exciton polaritons. Phys. Rev. Appl.; 2021; 16, 024045.2021PhRvP.16b4045M
81. Wright, LG et al. Deep physical neural networks trained with backpropagation. Nature; 2022; 601, pp. 549-555.2022Natur.601.549W
82. Zuo, Y et al. All-optical neural network with nonlinear activation functions. Optica; 2019; 6, pp. 1132-1137.2019Optic..6.1132Z
83. Ryou, A et al. Free-space optical neural network based on thermal atomic nonlinearity. Photonics Res.; 2021; 9, pp. B128-B134.
84. Yang, M; Robertson, E; Esguerra, L; Busch, K; Wolters, J. Optical convolutional neural network with atomic nonlinearity. Opt. Express; 2023; 31, pp. 16451-16459.2023OExpr.3116451Y
85. Shi, W et al. Lensless opto-electronic neural network with quantum dot nonlinear activation. Photonics Res.; 2024; 12, pp. 682-690.
86. Wang, T et al. Image sensing with multilayer nonlinear optical neural networks. Nat. Photonics; 2023; 17, pp. 408-415.2023NaPho.17.408W
87. Song, A; Murty Kottapalli, SN; Goyal, R; Schölkopf, B; Fischer, P. Low-power scalable multilayer optoelectronic neural networks enabled with incoherent light. Nat. Commun.; 2024; 15, 10692.
88. Yildirim, M; Dinc, NU; Oguz, I; Psaltis, D; Moser, C. Nonlinear processing with linear optics. Nat. Photonics; 2024; 18, pp. 1076-1082.
89. Xia, F et al. Nonlinear optical encoding enabled by recurrent linear scattering. Nat. Photonics; 2024; 18, pp. 1067-1075.
90. Teğin, U. Scalable optical learning operator. Nat. Comput. Sci.; 2021; 1, pp. 542-549.2021NatCo.12.542T
91. Wang, H. et al. Large-scale photonic computing with nonlinear disordered media. Nat. Comput. Sci.4, 429–439 (2024).
92. Liu, Q., Swartz, B. T., Kravchenko, I., Valentine, J. G. & Huo, Y. Extrememeta: high-speed lightweight image segmentation model by remodeling multi-channel metamaterial imagers. Journal of Imaging Science and Technology, 1–10 (2024).
93. Zhou, F; Chai, Y. Near-sensor and in-sensor computing. Nat. Electron.; 2020; 3, pp. 664-671.
94. Wang, S. et al. Networking retinomorphic sensor with memristive crossbar for brain-inspired visual perception. Natl. Sci. Rev.8, nwaa172 (2021).
95. Yang, Y et al. In-sensor dynamic computing for intelligent machine vision. Nat. Electron.; 2024; 7, pp. 225-233.
96. Fan, Y. et al. Dispersion-assisted high-dimensional photodetector. Nature630, 77–83 (2024).
97. Jiang, H et al. Metasurface-enabled broadband multidimensional photodetectors. Nat. Commun.; 2024; 15, 8347.
98. Hassanalian, M; Abdelkefi, A. Classifications, applications, and design challenges of drones: a review. Prog. Aerosp. Sci.; 2017; 91, pp. 99-131.
99. Kaufmann, E et al. Champion-level drone racing using deep reinforcement learning. Nature; 2023; 620, pp. 982-987.2023Natur.620.982K
100. Woodget, AS; Austrums, R; Maddock, IP; Habit, E. Drones and digital photogrammetry: from classifications to continuums for monitoring river habitat and hydromorphology. Wiley Interdiscip. Rev. Water; 2017; 4, e1222.
101. Zeng, E., Mare, S. & Roesner, F. End user security and privacy concerns with smart homes. Thirteenth Symposium on Usable Privacy and Security, 65–80 (2017).
102. Kim, S; Hsiao, Y-H; Chen, Y; Mao, J; Chen, Y. Firefly: an insect-scale aerial robot powered by electroluminescent soft artificial muscles. IEEE Robot. Autom. Lett.; 2022; 7, pp. 6950-6957.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.