Content area

Abstract

The flood of protein structural Big Data is coming. With the belief that biotech researchers deserve powerful analysis engines to overcome the challenge of rapidly increasing computational demands, we are devoted to developing efficient protein structural alignment search algorithms to assist researchers as they push the frontiers of biological sciences and technology. Here, we present SARST2, an algorithm that integrates primary, secondary, and tertiary structural features with evolutionary statistics to perform accurate and rapid alignments. In large-scale benchmarks, SARST2 outperforms state-of-the-art methods in accuracy, while completing AlphaFold Database searches significantly faster and with substantially less memory than BLAST and Foldseek. It employs a filter-and-refine strategy enhanced by machine learning, a diagonal shortcut for word-matching, a weighted contact number-based scoring scheme, and a variable gap penalty based on substitution entropy. SARST2, implemented in Golang as standalone programs available at https://10lab.ceb.nycu.edu.tw/sarst2 and https://github.com/NYCU-10lab/sarst, enables massive database searches using even ordinary personal computers.

SARST2 enables rapid exploration of protein structure space. In minutes, it scans the 214-million-entry AlphaFold Database on a personal computer, revealing homologs with higher accuracy and lower memory/disk usage than leading methods.

Details

1009240
Business indexing term
Title
SARST2 high-throughput and resource-efficient protein structure alignment against massive databases
Author
Lo, Wei-Cheng 1   VIAFID ORCID Logo  ; Warshel, Arieh 2   VIAFID ORCID Logo  ; Lo, Chia-Hua 3 ; Choke, Chia Yee 4 ; Li, Yan-Jie 5 ; Yen, Shih-Chung 6   VIAFID ORCID Logo  ; Yang, Jyun-Yi 4   VIAFID ORCID Logo  ; Weng, Shih-Wen 7   VIAFID ORCID Logo 

 Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017); Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017); Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017); The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017) 
 Department of Chemistry, University of Southern California, Los Angeles, CA, USA (ROR: https://ror.org/03taz7m60) (GRID: grid.42505.36) (ISNI: 0000 0001 2156 6853) 
 Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017); Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan (ROR: https://ror.org/00zdnkx70) (GRID: grid.38348.34) (ISNI: 0000 0004 0532 0580) 
 Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017) 
 Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017) 
 Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017); Department of Chemistry, University of Southern California, Los Angeles, CA, USA (ROR: https://ror.org/03taz7m60) (GRID: grid.42505.36) (ISNI: 0000 0001 2156 6853) 
 Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017); Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan (ROR: https://ror.org/00se2k293) (GRID: grid.260539.b) (ISNI: 0000 0001 2059 7017) 
Publication title
Volume
16
Issue
1
Pages
8691
Number of pages
16
Publication year
2025
Publication date
2025
Section
Article
Publisher
Nature Publishing Group
Place of publication
London
Country of publication
United States
Publication subject
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-09-30
Milestone dates
2025-08-28 (Registration); 2024-03-02 (Received); 2025-08-26 (Accepted)
Publication history
 
 
   First posting date
30 Sep 2025
ProQuest document ID
3255958618
Document URL
https://www.proquest.com/scholarly-journals/sarst2-high-throughput-resource-efficient-protein/docview/3255958618/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-10-01
Database
ProQuest One Academic