Abstract

Natural language-based generative artificial intelligence (AI) has become increasingly prevalent in scientific research. Intriguingly, capabilities of generative pre-trained transformer (GPT) language models beyond the scope of natural language tasks have recently been identified. Here we explored how GPT-4 might be able to perform rudimentary structural biology modeling. We prompted GPT-4 to model 3D structures for the 20 standard amino acids and an α-helical polypeptide chain, with the latter incorporating Wolfram mathematical computation. We also used GPT-4 to perform structural interaction analysis between the anti-viral nirmatrelvir and its target, the SARS-CoV-2 main protease. Geometric parameters of the generated structures typically approximated close to experimental references. However, modeling was sporadically error-prone and molecular complexity was not well tolerated. Interaction analysis further revealed the ability of GPT-4 to identify specific amino acid residues involved in ligand binding along with corresponding bond distances. Despite current limitations, we show the current capacity of natural language generative AI to perform basic structural biology modeling and interaction analysis with atomic-scale accuracy.

Details

Title
Generative artificial intelligence performs rudimentary structural biology modeling
Author
Ille, Alexander M. 1 ; Markosian, Christopher 1 ; Burley, Stephen K. 2 ; Mathews, Michael B. 3 ; Pasqualini, Renata 4 ; Arap, Wadih 5 

 Rutgers, The State University of New Jersey, School of Graduate Studies, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796); Rutgers Cancer Institute, Newark, USA (GRID:grid.516084.e) (ISNI:0000 0004 0405 0718); Rutgers New Jersey Medical School, Division of Cancer Biology, Department of Radiation Oncology, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796) 
 Rutgers, The State University of New Jersey, Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Piscataway, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796); Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, Piscataway, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796); Rutgers Cancer Institute, New Brunswick, USA (GRID:grid.516084.e) (ISNI:0000 0004 0405 0718); University of California-San Diego, Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, San Diego, USA (GRID:grid.266100.3) (ISNI:0000 0001 2107 4242) 
 Rutgers, The State University of New Jersey, School of Graduate Studies, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796); Rutgers New Jersey Medical School, Division of Infectious Disease, Department of Medicine, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796) 
 Rutgers Cancer Institute, Newark, USA (GRID:grid.516084.e) (ISNI:0000 0004 0405 0718); Rutgers New Jersey Medical School, Division of Cancer Biology, Department of Radiation Oncology, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796) 
 Rutgers Cancer Institute, Newark, USA (GRID:grid.516084.e) (ISNI:0000 0004 0405 0718); Rutgers New Jersey Medical School, Division of Hematology/Oncology, Department of Medicine, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796) 
Pages
19372
Publication year
2024
Publication date
2024
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3095332045
Copyright
© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.