Abstract

XML parsing is a core operation performed on an XML document and can cause performance bottlenecks in applications and system processing large volumes of XML data. Parallelism is a natural way to boost the parsing performance. Leveraging multicore processors can offer a cost-effective solution. We study a data parallel algorithm called ParDOM for XML DOM parsing that builds an in-memory tree structure for an XML document. ParDOM has two phases In the first phase, an XML document is partitioned and parsed in parallel. In the second phase, the partial DOM node tree structures, are linked together (in parallel) to build a complete DOM node tree. ParDOM offers fine-grained parallelism by adopting a flexible chunking scheme and it can be conveniently implemented using a data parallel programming model that supports map and sort operations. We show that ParDOM yields better scalability than PXP [24] – a recently proposed parallel DOM parsing algorithm.

Details

Title
A study of a data parallel algorithm for XML DOM parsing
Author
Shah, Bhavik Bharatkumar
Year
2009
Publisher
ProQuest Dissertations & Theses
ISBN
978-1-109-61969-0
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
304944228
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.