Content area

Abstract

The flood of protein structural Big Data is coming. With the belief that biotech researchers deserve powerful analysis engines to overcome the challenge of rapidly increasing computational demands, we are devoted to developing efficient protein structural alignment search algorithms to assist researchers as they push the frontiers of biological sciences and technology. Here, we present SARST2, an algorithm that integrates primary, secondary, and tertiary structural features with evolutionary statistics to perform accurate and rapid alignments. In large-scale benchmarks, SARST2 outperforms state-of-the-art methods in accuracy, while completing AlphaFold Database searches significantly faster and with substantially less memory than BLAST and Foldseek. It employs a filter-and-refine strategy enhanced by machine learning, a diagonal shortcut for word-matching, a weighted contact number-based scoring scheme, and a variable gap penalty based on substitution entropy. SARST2, implemented in Golang as standalone programs available at https://10lab.ceb.nycu.edu.tw/sarst2 and https://github.com/NYCU-10lab/sarst, enables massive database searches using even ordinary personal computers.

SARST2 enables rapid exploration of protein structure space. In minutes, it scans the 214-million-entry AlphaFold Database on a personal computer, revealing homologs with higher accuracy and lower memory/disk usage than leading methods.

Full text

Turn on search term navigation

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.