Content area
Full text
Abstract: In distributed query processing, good estimation algorithms of communication costs are critical for query processing, including distributed XML queries. There are techniques that estimate a communication cost for distributed SQL query processing, and some of techniques are adopted in numerous distributed SQL processors. Therefore adopting the processing techniques for SQL queries for the communication cost-based processing of the distributed XML queries seems natural. Unfortunately, however, the tree-structured XML document is different from the table-shaped relational data. These structural differences make adopting the techniques for SQL queries difficult. This study defines some of the considerations for estimating the communication cost of distributed XML queries, and proposes a method for communication cost-based query processing. The experiments show that the proposed algorithm is reasonable for estimating the communication cost for distributed XML queries.
(ProQuest: ... denotes formulae omitted.)
1 Introduction
In the Internet, there exist not only XML data, but also non-XML data such as relational data and web information source described by URI. One of the popular methods for integrating heterogeneous data under XML is to make non-XML data to be considered as XML data by using XML views [1-3]. Users can see the distributed heterogeneous data via the XML view and search using a distributed XML query. In this environmental, the processing a distributed XML query is one of the issues and a cost-based processing is one of popular optimisation techniques.
Whereas the database community has been doing research on distributed query processing of relational queries for 30 years, the field of distributed XML query processing is still in its infancy. Therefore it is a natural choice to refer to approaches for traditional distributed SQL query processing for distributed XML query processing [4]. We have learned from the experience on relational query processing, that cost-based approaches outperform heuristic-based approaches in most cases. Accordingly a lot of approaches for distributed SQL query processing are primarily aimed at reducing the communication cost and especially the cost of communication clearly dominates in a wide area network [5, 6]. Therefore research on a cost-based processing for XML query is a new issue and the communication cost-based processing is one of the most efficient cost-based query processing techniques. However, adopting the techniques for distributed SQL query processing is...





