Content area
Full text
Introduction
The computer-aided design of functional molecules, such as those related to materials and drugs, has gained increasing interest in both scientific and industrial communities1, 2, 3–4. A central concept in functional molecule design is molecular editing, which encompasses the generation, modification and evolution of molecules towards desired properties with specific structural features. As a common demand during drug design, the alteration or optimization of a lead compound is often required to enhance its potential activity and suitability for development into better drug candidates5, 6–7. However, such function-oriented molecular editing can be challenging due to the non-linear constrained optimization problem it presents within the vast chemical space. Consequently, conventional in-silico lead optimization typically involves resource-intensive screening in a trial-and-error manner and relies on specific expert knowledge8, 9–10.
Recent advances in diffusion-based GenAI11,12 have made significant progress in the field of image editing. Particularly, in computer vision (CV), scalable GenAIs built upon converged architectures13,14, usually termed as foundation models, have pushed the boundaries of applications such as text-to-image generation, image inpainting, compositing, and style transfer, etc.15, 16–17. The success of GenAI in image processing demonstrates the potential for applying well-established generative learning algorithms to molecular sciences, offering promising solutions to the challenges of molecular editing. Unfortunately, these powerful GenAIs cannot be applied directly to molecular generation, since, unlike images, 3D molecular entities are strictly constrained by intrinsic physical and chemical principles. Particularly, in addition to trans-rotational equivariance which is known to cause incompatibility with modern foundational model architecture18,19, molecules also exhibit ubiquitous and property-determining symmetries embedded within various point groups20. To combat these challenges, many researchers are developing domain-specific GenAIs for molecules, compromising the scalability or compatibility with existing foundation models. These attempts are mostly focused on designing new model architectures based on domain-specific priors and assumptions, and diverge significantly from the mainstream GenAI which has largely converged. Such a gap in technical momentum not only induces incompatibility with the rapid progress brought by mainstream GenAI, but also obstacles the transaction of well-established GenAI methodologies into molecular science.
Aiming at a more general approach and to keep with the momentum of foundational GenAI, we develop here a methodology that allows reuse...