Content area

Abstract

DNA data storage is rapidly emerging as a promising solution for long-term data archiving, largely due to its exceptional durability. However, the synthesis of DNA strands remains a significant bottleneck in terms of cost and speed. To address this, new methods have been developed that encode information by concatenating long data-carrying DNA sequences from pre-synthesized DNA subsequences – known as motifs – from a library. Reading back data from DNA storage relies on basecalling–the process of translating raw nanopore sequencing signals into DNA base sequences using machine learning models. These sequences are then decoded back into binary data. However, current basecalling approaches are not optimized for decoding motif-carrying DNA: they first predict individual bases from the raw signal and only afterward attempt to identify higher-level motifs. This two-step, motif-agnostic process is both imprecise and inefficient. In this paper we introduce Motif Caller, machine learning model designed to directly detect entire motifs from raw nanopore signals, bypassing the need for intermediate basecalling. By targeting motifs directly, Motif Caller leverages richer signal features associated with each motif, resulting in significantly improved accuracy. This direct approach also enhances the efficiency of data retrieval in motif-based DNA storage systems.

Details

1009240
Business indexing term
Title
Motif caller for sequence reconstruction in motif-based DNA storage
Author
Agarwal, Parv 1 ; Pinnamaneni, Nimesh 2 ; Heinis, Thomas 1 

 Department of Computing, Imperial College London, London, UK (ROR: https://ror.org/041kmwe10) (GRID: grid.7445.2) (ISNI: 0000 0001 2113 8111) 
 Helixworks Technologies, Cork, Ireland 
Volume
15
Issue
1
Pages
39236
Number of pages
13
Publication year
2025
Publication date
2025
Section
Article
Publisher
Nature Publishing Group
Place of publication
London
Country of publication
United States
Publication subject
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-11-10
Milestone dates
2025-10-01 (Registration); 2025-07-10 (Received); 2025-10-01 (Accepted)
Publication history
 
 
   First posting date
10 Nov 2025
ProQuest document ID
3270646425
Document URL
https://www.proquest.com/scholarly-journals/motif-caller-sequence-reconstruction-based-dna/docview/3270646425/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-11
Database
ProQuest One Academic