Abstract

Accurately predicting how DNA sequence drives gene regulation and how genetic variants alter gene expression is a central challenge in genomics. Borzoi, which models over ten thousand genomic assays including RNA-seq coverage from over half a megabase of sequence context alone promises to become an important foundation model in regulatory genomics, both for massively annotating variants and for further model development. However, its reliance on handcrafted, relative positional encodings within the transformer architecture limits its computational efficiency. Here we present Flashzoi, an enhanced Borzoi model that leverages rotary positional encodings and FlashAttention-2. This achieves over 3-fold faster training and inference and up to 2.4-fold reduced memory usage, while maintaining or improving accuracy in modeling various genomic assays including RNA-seq coverage, predicting variant effects, and enhancer-promoter linking. Flashzoi's improved efficiency facilitates large-scale genomic analyses and opens avenues for exploring more complex regulatory mechanisms and modeling.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* https://github.com/johahi/borzoi-pytorch

Details

Title
Flashzoi: An enhanced Borzoi model for accelerated genomic analysis
Author
Hingerl, Johannes C; Karollus, Alexander; Gagneur, Julien
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2024
Publication date
Dec 21, 2024
Publisher
Cold Spring Harbor Laboratory Press
Source type
Working Paper
Language of publication
English
ProQuest document ID
3147788671
Copyright
© 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.