Abstract

In the field of Scientific Computing, two trends are clear: the size of data sets in use is growing rapidly and microprocessor performance is improving through increases in parallelism, rather than through clock rate increases. Further, Extensible Markup Language (XML) is increasingly being used to encode large data sets, and SOAP is being used to provide Grid services – uses XML and SOAP were never designed for, and naïve implementations of these standards can lead to performance penalties. As these trends continue, past assumptions about the value of seeking out parallel algorithms should be revisited.

Lexical analysis has traditionally been seen as an inherently serial process. This work seeks to challenge that viewpoint. We start by tracking the performance of state of the art in XML parsers and SOAP toolkits through benchmarks for scientific computing applications. We continue to study the space through an examination of the effects of current workstation- and server-class computer systems' caching mechanisms on parser performance. Finally, we propose Piximal, an NFA-based parser which uses spare processors to reduce XML parse time. The limits of the Piximal approach to parallel XML parsing are examined.

Details

Title
Analysis and optimization for processing grid-scale XML datasets
Author
Head, Michael Reuben
Year
2009
Publisher
ProQuest Dissertations & Theses
ISBN
978-1-109-56418-1
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
305108571
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.