Abstract

Translate

Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial - it brings domain-specific knowledge from broad fields - but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs - with explicit state that they can update, such as matrices in machine learning algorithms - and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems. Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state - spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.

Details

Title

Stateful data-parallel processing

Author

Castro Fernandez, Raul

Year

2016

Publisher

ProQuest Dissertations & Theses

Source type

Dissertation or Thesis

Language of publication

English

ProQuest document ID

1827578565

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Stateful data-parallel processing

Content area

Abstract

Details

Suggested sources