Content area

Abstract

Background

The Sequence Alignment/Map Format Specification (SAM) is one of the most widely adopted file formats in bioinformatics and many researchers use it daily. Several tools, including most high-throughput sequencing read aligners, use it as their primary output and many more tools have been developed to process it. However, despite its flexibility, SAM encoded files can often be difficult to query and understand even for experienced bioinformaticians. As genomic data are rapidly growing, structured, and efficient queries on data that are encoded in SAM/BAM files are becoming increasingly important. Existing tools are very limited in their query capabilities or are not efficient. Critically, new tools that address these shortcomings, should not be able to support existing large datasets but should also do so without requiring massive data transformations and file infrastructure reorganizations.

Results

Here we introduce SamQL, an SQL-like query language for the SAM format with intuitive syntax that supports complex and efficient queries on top of SAM/BAM files and that can replace commonly used Bash one-liners employed by many bioinformaticians. SamQL has high expressive power with no upper limit on query size and when parallelized, outperforms other substantially less expressive software.

Conclusions

SamQL is a complete query language that we envision as a step to a structured database engine for genomics. SamQL is written in Go, and is freely available as standalone program and as an open-source library under an MIT license, https://github.com/maragkakislab/samql/.

Details

1009240
Title
SamQL: a structured query language and filtering tool for the SAM/BAM file format
Publication title
Volume
22
Pages
1-8
Publication year
2021
Publication date
2021
Section
Software
Publisher
Springer Nature B.V.
Place of publication
London
Country of publication
Netherlands
Publication subject
e-ISSN
14712105
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2021-10-02
Milestone dates
2021-03-26 (Received); 2021-09-22 (Accepted)
Publication history
 
 
   First posting date
02 Oct 2021
ProQuest document ID
2582941463
Document URL
https://www.proquest.com/scholarly-journals/samql-structured-query-language-filtering-tool/docview/2582941463/se-2?accountid=208611
Copyright
© 2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2024-10-08
Database
ProQuest One Academic