SparkBLAST: scalable BLAST processing using

Abstract

Background

The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis.

Results

Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times.

Conclusions

The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing.

Details

Title

SparkBLAST: scalable BLAST processing using in-memory operations

Author

Rodrigo de Castro, Marcelo; Catherine dos Santos Tostes; Davila, Alberto M R; Senger, Hermes; Fabricio A B da Silva

Publication year

2017

Publication date

2017

Publisher

Springer Nature B.V.

e-ISSN

14712105

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1186/s12859-017-1723-8

ProQuest document ID

1915060472

SparkBLAST: scalable BLAST processing using in-memory operations

Abstract

Details

Full text options

Suggested sources

SparkBLAST: scalable BLAST processing using in-memory operations

Content area

Abstract

Details

Full text options

Suggested sources