Content area

Abstract

Processing complex computations on high volume streaming data in real time is a challenge for many organizational data processing systems. Such systems should produce results with low latency while processing billions of messages daily. In order to address these requirements distributed stream processing systems have been developed. Although high performance is one of the main goals of these systems, there is less attention has been paid for inter node communication performance which is a key aspect to achieve overall system performance. In this thesis we describe a framework for enhancing inter node communication efficiency. We compare performance of our system with Twitter Storm and Yahoo S4 using an implementation of Pan Tompkins algorithm which is used to detect QRS complexities of an ECG signal using a 2 node graph. Our results show our solution performs 4 times better than other systems. We also use four level node graph which is used to process smart plug data to test the performance of our system for a complex graph. Finally we demonstrate how our system is scalable and resilient to faults.

Details

Title
Achieving high-throughput distributed, graph-based multi-stage stream processing
Author
Suriarachchi, Amila
Year
2015
Publisher
ProQuest Dissertations & Theses
ISBN
978-1-339-39233-2
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
1757999334
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.