Content area
Full text
Abstract-As an extensible markup language, XML palys a more and more important role in data representation and data exchange over Internet. XML parsing, however, has a poor reputation for low performance. Many methods have been proposed to solve this problem, but none of them has been entirely satisfactory. Reusing XML parsing results is a novel but very effective and promising way to improve XML parsing performance. Serializing the XML parsing results into consistent mediums, such as file and database, and restoring the original XML parsing results from them, can avoid parsing the same XML document repetitively. To achieve this goal, it is necessary to keep the content and structure information of XML nodes in meta-type, such as integer, to make sure that the parsing results can be serialized and restored undistortedly. The testing results show that reusing XML parsing results can significantly improve XML parsing performance, and save large amount of space as well.
Index Terms-XML parsing, DOM, reusability, VTD Record, R-MED-struct
I. INTRODUCTION
Although used more and more widely in data representation and data exchange, how to parse XML effectively remains the key problem for its further adoption. Many technologies have been proposed to improve XML parsing performance around DOM, which is the most widely used XML parsing model. Due to DOM's intrinsic limitation [1, 5], they can not solve this problem fundamentally. We argue that reusing XML parsing results is an effective and promising approach for high-performance XML parsing. That is, serializing the parsing results into consistent mediums, such as file and database, and restoring the original parsing results from them directly for later use. This avoids parsing the same XML document repetitively and is very attractive in cases where no updating operation is required on the XML document.
This paper presents a simple non-extractive XML parser named R-NEMXML (Reusable NEMXML), and how it reuses the XML parsing results. Unlike DOM parsers, R-NEMXML does not create and destroy discrete node objects during parsing. Instead, it encodes the content and structure information of XML nodes in 64-bit integers and performs all operations on these integers. In this way, R-NEMXML improves XML parsing performance significantly and is able to take the benefit of reusing XML parsing results.
The rest of this paper is...





