Content area
Full text
Introduction
By selecting metals and organic linkers, we can potentially synthesize millions of possible metal-organic frameworks (MOFs)1. Given this high chemical tunability, MOFs can be tailor-made for various applications. MOFs have been explored extensively over the past two decades for applications ranging from catalysis2 and gas separation3 to sensing and electronics4, and over 15,000 porous5,6 and 120,0007 nonporous MOFs have been synthesized. With the integration of automation and artificial intelligence (AI) in chemical synthesis8, 9–10, we can expect the discovery rate of new MOFs to grow even further.
Despite this significant progress, connecting newly synthesized materials to their best possible applications remains a challenge11. Determining the properties of a material requires extensive characterization and testing, often demanding expertise, resources, and infrastructure that may not be available to the researchers that synthesized the material. This hinders the realization of the full potential of new materials. There are several examples of materials that were found to be most effective for applications other than initially intended applications. Al-PMOF, initially synthesized for its photocatalytic properties12, was only found years later to be highly effective at separating CO2 from wet flue gases13. Similarly, SBMOF-1, originally created for CO2 capture14, turned out to be exceptional at separating Xenon from Krypton. These remarkable rediscoveries were enabled by high-throughput computational screening and machine learning studies years after the initial study were published11,13,15, 16–17. However, such methods require precise crystal structure information, often in a computation-ready format, which is complex to obtain and generally not available immediately after a new MOF is synthesized5,18,19. Therefore, developing methods that use only the data available upon synthesis can greatly accelerate materials matching to potential applications.
In this work, we present a multimodal model that utilizes data readily available at the point of MOF synthesis, specifically the powder X-ray diffraction (PXRD) pattern, represented as a spectrum, and the chemical precursors (metal and linker), encoded as text strings. To enhance the model’s performance (particularly on small datasets), we leverage existing MOF structures from databases, represented as crystal graphs, to pretrain the model using a self-supervised learning framework. This pretraining enables the model to achieve high...




