Content area

Abstract

Processing complex computations on high volume streaming data in real time is a challenge for many organizational data processing systems. Such systems should produce results with low latency while processing billions of messages daily. In order to address these requirements distributed stream processing systems have been developed. Although high performance is one of the main goals of these systems, there is less attention has been paid for inter node communication performance which is a key aspect to achieve overall system performance. In this thesis we describe a framework for enhancing inter node communication efficiency. We compare performance of our system with Twitter Storm and Yahoo S4 using an implementation of Pan Tompkins algorithm which is used to detect QRS complexities of an ECG signal using a 2 node graph. Our results show our solution performs 4 times better than other systems. We also use four level node graph which is used to process smart plug data to test the performance of our system for a complex graph. Finally we demonstrate how our system is scalable and resilient to faults.

Details

1010268
Classification
Title
Achieving high-throughput distributed, graph-based multi-stage stream processing
Number of pages
39
Degree date
2015
School code
0053
Source
MAI 55/03M(E), Masters Abstracts International
ISBN
978-1-339-39233-2
Committee member
Pallickara, Sangmi L.; Venkatachalam, Chandrasekaran
University/institution
Colorado State University
Department
Computer Science
University location
United States -- Colorado
Degree
M.S.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
1606492
ProQuest document ID
1757999334
Document URL
https://www.proquest.com/dissertations-theses/achieving-high-throughput-distributed-graph-based/docview/1757999334/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic