Content area

Abstract

Motivation: Accurate sequence length profiling is essential in bioinformatics, particularly in genomics and proteomics. Existing tools like SeqKit and the Trinity toolkit, among others provide basic sequence statistics but often fall short in offering comprehensive analytics and plotting options. For instance, SeqKit is a very complete and fast tool for sequence analyses, that delivers useful metrics (e.g., number of sequences, average, minimum, maximum length), and can returns the range of sequence shorter or longer (one side, not both at once) on a given lengths. Similarly, Trinity's utility pearl-based scripts provide detailed contig length distributions (e.g., N50, median, and average lengths) but do not encompass the total number of sequences nor offer graphical representations of data. Results: Given that key sequence analysis tasks are distributed among separate tools, we introduce SeqLengthPlot: an easy-to-use Python-based script that fills existing gaps in bioinformatics tools on sequence length profiling, crucial. SeqLengthPlot generates comprehensive statistical summaries, filtering and automatic sequences retriving from the input FASTA (nucleotide and proteins) file into two distinct files based on a tunable, user-defined sequence length, as well as the plots or dynamic visualizations of the corresponding sequences.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* In this version of the preprint, I have included the SeqLengthPlot Python script, along with the input and output files used for the analysis, to facilitate reproducibility of this previous version. Please note that a new version of SeqLengthPlot, introducing significant enhancements such as command-line functionality through flags, has been released alongside the corresponding peer-reviewed article (https://academic.oup.com/bioinformaticsadvances/article/5/1/vbae183/7905457). The GitHub repository has also been updated to reflect the latest version of the script. This update aims to avoid confusion for users cloning the repository and provides resources for testing the new version.

* https://github.com/danydguezperez/SeqLengthPlot

* http://dx.doi.org/10.17632/pmxwfjyyvy.1

* http://dx.doi.org/10.17632/3rtbr7c9s8.1

* http://dx.doi.org/10.17632/wn5kbk5ryy.1

* http://dx.doi.org/10.17632/sh79mdcm2c.1

* http://dx.doi.org/10.17632/zmvvff35dx.1

Details

1009240
Title
SeqLengthPlot: An easy-to-use Python-based Tool for Visualizing and Retrieving Sequence Lengths from fasta files with a Tunable Splitting Point
Publication title
bioRxiv; Cold Spring Harbor
Publication year
2025
Publication date
Jan 15, 2025
Section
Confirmatory Results
Publisher
Cold Spring Harbor Laboratory Press
Source
BioRxiv
Place of publication
Cold Spring Harbor
Country of publication
United States
University/institution
Cold Spring Harbor Laboratory Press
Publication subject
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Milestone dates
2024-06-09 (Version 1)
ProQuest document ID
3155862245
Document URL
https://www.proquest.com/working-papers/seqlengthplot-easy-use-python-based-tool/docview/3155862245/se-2?accountid=208611
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-01-16
Database
ProQuest One Academic