ProQuest
Abstract/Details

Stateful data-parallel processing

Castro Fernandez, Raul.   Imperial College London (United Kingdom) ProQuest Dissertations & Theses,  2016. 10175343.

Abstract (summary)

Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial - it brings domain-specific knowledge from broad fields - but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs - with explicit state that they can update, such as matrices in machine learning algorithms - and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems. Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state - spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.

Indexing (details)


Identifier / keyword
(UMI)AAI10175343; Social sciences
Title
Stateful data-parallel processing
Author
Castro Fernandez, Raul
Number of pages
0
Degree date
2016
School code
8350
Source
DAI-C 74/12, Dissertation Abstracts International
University/institution
Imperial College London (United Kingdom)
University location
England
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Note
Bibliographic data provided by EThOS, the British Library’s UK thesis service: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.684352
Dissertation/thesis number
10175343
ProQuest document ID
1827578565
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
https://www.proquest.com/pqdtglobal/docview/1827578565