Content area

Abstract

We present CosmoHub (https://cosmohub.pic.es), a web application based on Hadoop to perform interactive exploration and distribution of massive cosmological datasets. Recent Cosmology seeks to unveil the nature of both dark matter and dark energy mapping the large-scale structure of the Universe, through the analysis of massive amounts of astronomical data, progressively increasing during the last (and future) decades with the digitization and automation of the experimental techniques. CosmoHub, hosted and developed at the Port d'Informació Científica (PIC), provides support to a worldwide community of scientists, without requiring the end user to know any Structured Query Language (SQL). It is serving data of several large international collaborations such as the Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe Survey (PAUS) and the Marenostrum Institut de Ciències de l'Espai (MICE) numerical simulations. While originally developed as a PostgreSQL relational database web frontend, this work describes the current version of CosmoHub, built on top of Apache Hive, which facilitates scalable reading, writing and managing huge datasets. As CosmoHub's datasets are seldomly modified, Hive it is a better fit. Over 60 TiB of catalogued information and \(50 \times 10^9\) astronomical objects can be interactively explored using an integrated visualization tool which includes 1D histogram and 2D heatmap plots. In our current implementation, online exploration of datasets of \(10^9\) objects can be done in a timescale of tens of seconds. Users can also download customized subsets of data in standard formats generated in few minutes.

Details

1009240
Title
CosmoHub: Interactive exploration and distribution of astronomical data on Hadoop
Publication title
arXiv.org; Ithaca
Publication year
2020
Publication date
Mar 10, 2020
Section
Computer Science; Astrophysics; Physics (Other)
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2020-03-11
Milestone dates
2020-03-04 (Submission v1); 2020-03-10 (Submission v2)
Publication history
 
 
   First posting date
11 Mar 2020
ProQuest document ID
2374911174
Document URL
https://www.proquest.com/working-papers/cosmohub-interactive-exploration-distribution/docview/2374911174/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-03-29
Database
ProQuest One Academic