Running Jobs in the Vacuum

Abstract

We present a model for the operation of computing nodes at a site using Virtual Machines (VMs), in which VMs are created and contextualized for experiments by the site itself. For the experiment, these VMs appear to be produced spontaneously "in the vacuum" rather having to ask the site to create each one. This model takes advantage of the existing pilot job frameworks adopted by many experiments. In the Vacuum model, the contextualization process starts a job agent within the VM and real jobs are fetched from the central task queue as normal. An implementation of the Vacuum scheme, Vac, is presented in which a VM factory runs on each physical worker node to create and contextualize its set of VMs. With this system, each node's VM factory can decide which experiments' VMs to run, based on site-wide target shares and on a peer-to-peer protocol in which the site's VM factories query each other to discover which VM types they are running. A property of this system is that there is no gate keeper service, head node, or batch system accepting and then directing jobs to particular worker nodes, avoiding several central points of failure. Finally, we describe tests of the Vac system using jobs from the central LHCb task queue, using the same contextualization procedure for VMs developed by LHCb for Clouds.

Details

Title

Running Jobs in the Vacuum

Author

McNab, A¹; Stagni, F²; M Ubeda Garcia²

¹ School of Physics and Astronomy, University of Manchester, UK
² CERN, Switzerland

Publication year

2014

Publication date

Jun 2014

Publisher

IOP Publishing

ISSN

17426588

e-ISSN

17426596

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1088/1742-6596/513/3/032065

ProQuest document ID

2576665331

© 2014. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Running Jobs in the Vacuum

Jump to:

Abstract

Details

Suggested sources