Abstract

Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions being to address sustainability and limit power consumption. In this paper, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings to middleware the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of the Hadoop project, representing different designs of the Hadoop middleware. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that in many cases storing only part of the data in SSDs results in significant energy savings and execution speedups.

Details

Title
Hybrid HDFS: decreasing energy consumption and speeding up Hadoop using SSDs
Author
Polato, Ivanilton; Barbosa, Denilson; Hindle, Abram; Kon, Fabio
Publication year
2015
Publication date
Aug 24, 2015
Publisher
PeerJ, Inc.
e-ISSN
21679843
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1960499795
Copyright
© 2015 Polato et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.