Content area
In distributed query processing, good estimation algorithms of communication costs are critical for query processing, including distributed XML queries. There are techniques that estimate a communication cost for distributed SQL query processing, and some of techniques are adopted in numerous distributed SQL processors. Therefore adopting the processing techniques for SQL queries for the communication cost-based processing of the distributed XML queries seems natural. Unfortunately, however, the tree-structured XML document is different from the table-shaped relational data. These structural differences make adopting the techniques for SQL queries difficult. This study defines some of the considerations for estimating the communication cost of distributed XML queries, and proposes a method for communication cost-based query processing. The experiments show that the proposed algorithm is reasonable for estimating the communication cost for distributed XML queries. [PUBLICATION ABSTRACT]
Abstract: In distributed query processing, good estimation algorithms of communication costs are critical for query processing, including distributed XML queries. There are techniques that estimate a communication cost for distributed SQL query processing, and some of techniques are adopted in numerous distributed SQL processors. Therefore adopting the processing techniques for SQL queries for the communication cost-based processing of the distributed XML queries seems natural. Unfortunately, however, the tree-structured XML document is different from the table-shaped relational data. These structural differences make adopting the techniques for SQL queries difficult. This study defines some of the considerations for estimating the communication cost of distributed XML queries, and proposes a method for communication cost-based query processing. The experiments show that the proposed algorithm is reasonable for estimating the communication cost for distributed XML queries.
(ProQuest: ... denotes formulae omitted.)
1 Introduction
In the Internet, there exist not only XML data, but also non-XML data such as relational data and web information source described by URI. One of the popular methods for integrating heterogeneous data under XML is to make non-XML data to be considered as XML data by using XML views [1-3]. Users can see the distributed heterogeneous data via the XML view and search using a distributed XML query. In this environmental, the processing a distributed XML query is one of the issues and a cost-based processing is one of popular optimisation techniques.
Whereas the database community has been doing research on distributed query processing of relational queries for 30 years, the field of distributed XML query processing is still in its infancy. Therefore it is a natural choice to refer to approaches for traditional distributed SQL query processing for distributed XML query processing [4]. We have learned from the experience on relational query processing, that cost-based approaches outperform heuristic-based approaches in most cases. Accordingly a lot of approaches for distributed SQL query processing are primarily aimed at reducing the communication cost and especially the cost of communication clearly dominates in a wide area network [5, 6]. Therefore research on a cost-based processing for XML query is a new issue and the communication cost-based processing is one of the most efficient cost-based query processing techniques. However, adopting the techniques for distributed SQL query processing is difficult because of the tree-like structure of the XML data, which is in contrast with the table-structure of the relational data [7, 8]. For example, a parent-child relationship between two XML nodes is one of the considerations for distributed XML query processing, but it does not exist between the relational data. Therefore the structural differences between SQL and XML data should be considered.
The contributions of this paper are 3-fold. First, some problems are defined for applying the method for estimating communication cost for distributed SQL query processing to XML query processing. Second, an algorithm for estimating the communication cost of a distributed XML query is proposed. Third, an experiment is performed to show that the algorithm is reasonable for distributed XML query processing based on the communication cost.
The remainder of this paper is organised as follows. Section 2 describes the related work. Section 3 describes some considerations for estimating a communication cost of a distributed XML query and our estimating methods. In Section 4, we evaluate our algorithm. Finally, we summarise the contributions of this paper and indicate some directions for future research work in Section 5.
2 Related work
There are several studies on the optimisation of traditional query processing over distributed environments [5, 6, 9-13] that consider the structure of distributed queries. One of the popular previous techniques for query optimisation is to estimate communication cost and select a processing order with a minimum. In [12, 14] proposed a cost model for finding a minimal communication cost in SQL queries. In case of the optimisation for a distributed XML query, the approaches used for query optimisation are not different from the traditional queries because XML query expressions borrow features from SQL query expressions. However, using their cost model without modifying the XML queries is impossible because of the structural difference between relational data and XML data. In [15, 16] proposed some laws for XML query optimisation derived from relational algebra optimisation and object-oriented query optimisation. However, their optimisation method is intended for some specific cases of XML queries only and distributed environments are not considered. Some studies on the processing of a distributed XML query have been conducted in the research on XML query processing. In [17-19] focused on the processing cost for a distributed XML query, such as XPath expression, I/O cost, execution time of operations etc. However, these approaches are not general methods because they are useful only in native XML systems, resulting in their dependence on the storage method or indexing technique of a specific application.
To process a distributed XML query, Re et al. [20] defined the XQueryD language as an extension of the XQuery language and proposed a method for distributed query processing based on the XQyeryD language. In [21, 22] generated local XML queries from a distributed XML query and processed the distributed query through query shipping. Their basic approach for the local query shipping is similar to that of this paper. However, the communication cost to generate a query plan was not considered in their work. In previous work [23], we proposed the method to estimate a join cost for processing a distributed XQuery query. However, the join cost is just one element to estimate a communication cost. Therefore this paper focuses on the communication cost to process a distributed XML query, especially considering XML data model. Our proposed approach is a general method given that it only uses the given input XML query and views for communication cost-based processing. Therefore the results of this paper can be adopted by any application for processing a distributed XML query.
3 Communication cost-based distributed XML query processing
3.1 Considerations
When more than two local systems are involved, the distributed query processing algorithm often needs to estimate incrementally the communication cost after the join operation. The processor initially determines the communication cost from joining the two local systems and then determines the cost of joining with a third local system and so on. Thus, the distributed query processor incrementally estimates the final communication cost from the intermediate cost produced.
Fig. 1 shows the two kinds of possible strategies for processing the distributed query written for three local systems. In the first case, the operation between the 'A' and 'B' systems is processed and then joins the 'C' system. In the second case, 'B' and 'C' systems are joined before joining the 'A' system. In the third case, all join operations are finally processed in a single system. However, the third case is not considered because we intend to determine the optimal processing order of local queries based on the communication cost. For this purpose, this paper estimates the communication cost of all possible processing cases and then selects the best case with the minimal communication cost.
As previous mentioned, some studies like [5, 12, 14] have proposed a method for processing a distributed SQL query based on the communication cost. However, there are some considerations to adopt the method for processing a distributed SQL query for a distributed XML query because the difference between two query languages. The difference of SQL query and XML query is caused by the difference of relational data model and XML data model which are target data models for two query languages. Table 1 shows the differences between the relational and XML data models.
Fig. 2 shows the relational views that describe the name and cardinality of the relational data in three distributed local systems and a distributed SQL query. In [14] proposed a method for estimating the communication cost of a distributed relational query. Using the method proposed in [14], we can estimate that it can provide the lowest communication cost to join the 'B' and 'C' systems before joining the 'A' system. The communication cost to transfer the data of 'C' to the 'B' system is 6. The result after joining the 'B' and 'C' systems is 3 using the following formula defined in [14]. Therefore the communication cost to transfer to the 'A' system is 9 because each of the three records has three attributes.
...
Consequently, we can estimate that the total communication cost is 15.
Fig. 3 shows the three XML views that describe the structure of the data stored in the three local systems and an example of an XML query written for the three views. This paper used XQuery as the XML query language because it is one of the standard query languages for searching through XML data. The XML view can be expressed in various languages such as XML DTD, XML Schema and XQuery. The XML view contains information about the structure of stored data, occurrence information and cardinality, among others. In the case of SQL, the cardinality of the attributes described in the view is used to estimate the join cost [8, 24]. Therefore this paper also assumes that the cardinality information of each node from the local XML views can be obtained. Of course, we can use the Count() function which is defined in XQuery functions and operations [25], if the local view does not provide the cardinality information. However, this paper does not intend to estimate the cardinality even if it has already been researched in [26]. The labels under each node express the pair of occurrence and cardinality. For example, the label (*, 7) for the 'b1' node expresses that it can occur zero or more times in the 'B' node as a child node and that the total number of the 'b1' node stored in the local system is 7. The communication cost is one of the main factors for processing a distributed XQuery query written for XML views as in the case of a distributed SQL query. However, applying the method for estimating the communication cost of a distributed SQL query to an XQuery query is difficult because of two reasons.
The first reason is the relationship between the XML nodes is not always 1:1, whereas relational data have a 1:1 relationship in all cases. For example, we can guess that the relational data stored in three local systems is like Fig. 4 based on Fig. 2. However, it is difficult to guess the XML data in local systems from XML views. If the XML instances in the local systems are like Fig. 5, then the method for estimating a communication cost of SQL query can be applied to XQuery query without modification. However, if the structure of XML data in 'B' local system is like Fig. 6, then the algorithm for relational data may not be a valid method for the estimation of the communication cost.
Fig. 7 shows the difference of relationship between relational data and XML data.
The occurrence of an XML node is represented by '*', '+', '?' and '1' symbols, indicating that the child element occurrences are zero or more, one or more, zero or one and only one, respectively. Therefore the mapping relationship between two XML nodes is one of the sixteen cases: one-to-one (1:1, 1:?, ?:1 and ?:?), one-to-many (1:*, 1: + , ?: * and ?: + ), many-to-one (*:1, *:?, + :1 and + :?) and many-to-many (*:*, + :*, *: + and + : + ). Consequently, we should consider these 16 cases for estimating the communication cost of a distributed XQuery query.
The second reason is due to the tree-structure of the XML data. Fig. 8 shows the local views for the two local systems, 'A' and 'B', and an example of the distributed XQuery query written for the two systems. The example query returns the subtree composed of the 'a2', 'b4' and 'b5' nodes filtered by the join operation between the 'a4' node and 'b2' nodes. If the example data in Fig. 8 is relational data and a distributed query processor ships the data in the local system 'A' to the local system 'B' then the query processor will only transfer the two nodes, 'a4' and 'a2', to process the example query. However, in the case of XML data, these two nodes are not enough to generate a correct result. Thus, the 'a1' and the 'a3' nodes are required to confirm the relationship between the 'a2' and the 'a4' nodes, and to generate a final result tree that includes all the descendants of the 'a2' node. If the distributed query processor ships the data in the local system 'B' then the 'b1' and 'b3' nodes have to be transferred to the 'A' system with the 'b2', 'b4' and 'b5' nodes. Thus, there are two kinds of required nodes to process a distributed query aside from the nodes described in the query. One is the descendant of the return node described in the RETURN clause of a distributed XML query. If the node described in the RETURN clause is non-terminal, then all the descendant nodes of the return nodes should be considered to estimate the communication cost. The other required node is the edge node related to the nodes described in a distributed query. We can find the edge nodes in the shortest path between the two nodes written in the query, and they are also considered for the cost estimation.
3.2 Estimating the communication cost
In the case of SQL, the total cost for processing a distributed query is calculated by the CPU cost, I/O cost and communication cost. The communication cost is clearly dominant, especially in a wide area network. The typical ratio between the communication cost and the I/O cost (for one page) is about 20:1. Hence, most distributed query processors of wide area networks ignore any local processing cost and concentrate on minimising the communication cost [14, 27]. The communication cost can be estimated by the following formula
Communication cost = A + B . X
Where A is the communication initialisation time, B is the reciprocal of the network speed and X is the size of the data transmitted. The environment for processing a distributed XML query is also similar to that for a distributed SQL query. Therefore this paper adopts the aforementioned formula and assumes that A = 0 and B = 1 to simplify our algorithm. This assumption will not affect our algorithm performance.
The three XML local views showing the data structure in the local systems 'A', 'B' and 'C' are presented in Fig. 9. Table 2 is an example of an XQuery query written for Fig. 9; the query includes two join operations. If the operation between the two local systems 'A' and 'B' is processed first, then the local system 'C' will receive the XML fragment, including the 'b4' and 'b6' nodes filtered by the previous operation. In contrast, if the operation between the 'b6' and the 'c1' nodes is processed first, the data size for shipping depends on the number of nodes in the XML subtree, including the filtered 'b3' and 'b4' nodes. Therefore, to decide on the optimal processing order, we should estimate how many nodes are filtered by the previous operations. This paper defines two kinds of structural relationship between two nodes to estimate the size of nodes needed for processing the next operations from the nodes selected by the previous operations. One is the 'PfromC', which estimates the expected number of parent nodes from the number of selected child nodes. For example, we can estimate the number of 'b2' nodes from the number of 'b3' nodes filtered by the operation in the example query in Fig. 9. The other is the 'CfromP', which is opposite of the 'PfromC'.
A parent node can have child nodes or not. Therefore finding a parent node that includes a child node selected by the previous operation is not easy. The 'PfromC' is classified into four cases (1| ?| *| + ) according to the occurrence of a child node. If the occurrence of a child node is '1,' then the relationship between a parent node and the child node is always 1:1. Therefore the number of parent nodes is the same as the number of child nodes. If the occurrence of a child node is '?,' then a parent node has zero or one child node. Therefore the expected number of parent nodes is the same as the number of child nodes because the number of child nodes is less than the number of parent nodes in all cases. In Fig. 9, the number of the 'b3' nodes is estimated to be 4 after the operation between the two local systems 'A' and 'B' using the formula for a distributed SQL query as described in Section 3.1. Moreover, the number of the 'b2' nodes can be estimated to be 4 because the 'b2' node is a parent node of the 'b3' node, and the occurrence of the 'b3' node is '1.' Definition 1 estimates the expected number of parent nodes from the number of selected child nodes when the occurrence of a child node is '1' or '?.'
Definition 1:
- c : Child Nodes
- sc : The number of selected child nodes
- ep : The expected number of parent nodes . ep = E(sc) = sc, if the occurrence of c is e1f or e?f
If the occurrence of a child node is '*', then a parent node has zero or more child nodes. Therefore this paper defines the minimum and maximum number of parent nodes and estimates the expected number of parent nodes considering every possible case. Definition 2 estimates the expected number of parent nodes related to the child nodes filtered by previous operations when the occurrence of a child node is '*'. The maximum number (MaxP) of the parent node is when a single parent node has a single child node. MaxP is between the total number of parent nodes and the number of selected child nodes.
Definition 2:
...
In Fig. 9, the minimum expected number of 'b1' nodes is 1, and the maximum is 4 when the occurrence of the 'b2' node is '*' and the number of selected 'b2' nodes is 4. By Definition 2, the number of cases to relate the selected 'b2' nodes to a single 'b1' node can be calculated by 4C1 * R(1), the answer of which is 4. The 4C2 * R(2), 4C3 * R(3) and 4C4 * R(4) are used to calculate the number of cases to relate the selected 'b2' nodes to the two, three and four 'b1' nodes, respectively. The answers for each case are 84, 144 and 24, respectively, and the expected number of 'b1' nodes is 2.7. Thus, if the occurrence of a child node is '*', and the number of selected child node is 4 among four child nodes, then we can estimate the expected number of parent nodes to be 2.7 among the four parent nodes.
If the occurrence of a child node is ' + ', then a parent node has one or more child nodes. Therefore child nodes always outnumber the parent nodes. If the number of child nodes is the same as the parent nodes, every parent node has a single child node. Definition 3 estimates the expected number of parent nodes when the occurrence of a child node is ' + '. In Fig. 9, the number of selected 'b3' nodes is estimated to be 5 after the operation between the two local systems 'B' and 'C'. By Definition 3, the number of cases to relate the selected 'b6' nodes to a single 'b5' node is calculated by 3C1 * R(1), the result of which is 3. The 3C2 * R(2) and 3C3 * R(3) are used to calculate all possible cases for relating two and three 'b5' nodes, the answers of which are 90 and 150, respectively. Therefore the final expected number of the 'b5' node is ~ 2.6.
Definition 3:
...
The 'CfromP' is also classified into four cases (1| ?| *| + ) according to the occurrence of a parent node. However, in all cases, the expected probability of the child nodes is the same as the probability of the selected parent nodes because every child node has to be related to a single parent node. Thus, we can assume that the parent nodes have child nodes at the same rate. It is possible that a single parent node has all child nodes, but this case is also possible for every parent node. Therefore Definition 4 estimates the expected number of child nodes based on the number of parent nodes filtered by previous operations. In Fig. 9, if two 'b2' nodes are selected among the four nodes, we can estimate the expected number of 'b4' node to be 1.
Definition 4:
...
In the case of the distributed XQuery query in Table 2, when the operation between the two local systems 'A' and 'B' is processed first, a distributed query processor transfers 'b1,' 'b4,' 'b5' and 'b6' nodes to the local system 'C'. Using our methods, the expected number of 'b1,' 'b4,' 'b5,' and 'b6' nodes are 2.7, 1.4, 2.1 and 2.6, respectively. Therefore the total cost for the communication is about 8.8. If the operation between the local systems 'B' and 'C' is processed before the other operation, then 'b1,' 'b2,' 'b3' and 'b4' nodes are shipped to the local system 'A', and the number of each node is 2.6, 1.3, 2.6 and 2.6, respectively. Therefore the total cost is 9.1. In the end, the communication cost to process the operation between the local systems 'B' and 'C' and then join the 'A' system is more expensive than that in the other case.
4 Experiment
The focus of this paper is to estimate a communication cost based on the number of XML nodes which are transferred among local systems. Therefore the specification of hardware and software for our experiment is out of this evaluation. In a real situation, of course, there are a lot of other factors for calculating a communication cost of XML data such as network traffic, different size of XML nodes and various performances of local systems. However, this paper assumes that the environment of a system and network for query processing is a stable and the size of all XML nodes in the local systems is the same. Therefore the experiment is executed by the comparison between the number of nodes estimated by our algorithm and the real number of XML nodes transferred after processing a local query. For the first experiment, we selected a subset of XML data describing research papers published by KIISE (The Korean institute of information science and engineering). The sample XML data are a real data served by KISTI (Korea institute of science and technology information). KISTI defined the XML schema for describing papers published in domestic journals and conferences. They also manage reference information describing a relationship among papers. Fig. 10 shows the views for three local systems and the query graph of XQ0 written for the views. The XML view shows the structure and size of XML document in a local system. Two views for 'J' local system and 'C' local system in Fig. 10 shows partial information about articles published in KIISE journals and conference proceedings from 2001 to 2010. The view for 'R' local system describes partial information about reference in 5000 articles which are randomly selected from articles published in KIISE during the same period. The labels around each node in the views express the pair of occurrence and cardinality of the node. For example, the 'keywords' node in the view for the 'J' local system can occur zero or one time as a child node of the 'article' node and the total number of the 'keywords' node is 3801 in our sample XML document. The destination of XQ0 is toward the three local systems of 'R', 'J' and 'C'. These systems are related by the two join operations expressed by a dotted line. The nodes in italics and red font are the result nodes declared in the RETURN clause of the XQ0. For example, 'R' system and 'J' system are joined by 'title' node and 'reference' node and 'keyword' node are returned as the result of operation. The method to generate local XQuery queries from a distributed XQuery query is introduced in our previous research [28].
To process the distributed XQuery query XQ0, we have two kinds of possible strategies. In the first case, the operation between the 'R' and 'J' systems is processed and then joins the 'C' system (R-J-C). In the second case, 'C' and 'J' systems are joined before joining the 'R' system (C-J-R). In order to find the method with the lower communication cost between above two cases, this paper uses three kinds of costs. One is a real communication cost, another is a cost estimated by our algorithm, and the other is a cost using the approach for processing a distributed SQL query.
Table 3 shows the cost estimated by the method for distributed SQL query. If the method for distributed SQL is applied for the 'R-J-C' case, then the cost for transferring 'title' and 'reference' nodes in 'R' system is 83843 and the cost for transferring nodes after join between 'R' and 'J' systems is 90602. Therefore the total cost is 174445 nodes. In comparison, the total communication cost of the 'C-J-R' case is 158047. Finally, a distributed query processor will select the 'C-J-R' approach with the lower cost than the 'R-J-C'. In real situation, however, the communication cost of the 'R-J-C' case is 152485 and the cost for the 'C-J-R' is 249564. The reason two costs are not the same is because the estimating method for SQL is not suitable for semi-structural data. Table 4 shows the communication cost estimated by our approach. To process the join operation between 'R' system and 'J' system, all nodes in the path between 'title' and 'reference' nodes are transferred to 'J' system. Also an occurrence of parent-child node is not always 1:1. For example, the estimated number of 'authors' node after join 'C' system and 'J' system is not 2854, even if the filtering rate of 'name' node is 11251: 39615.
For another experiment, we used XMark, which describes 'auction' data and is one of the popular datasets for the XML benchmark [29]. The 'auction' data are XML documents with single XML schema. Fig. 11 shows an overview of the references that connect sub-trees in XMark data. Each sub-tree has cross references. To generate local XML instances in distributed local systems, we split the 'auction' document into multiple documents. The root nodes of the local XML documents are nodes in the second or third level of the ¡®auction¡- document, such as ¡®open_auctions¡-, ¡®people¡- node and so on. The XMark standard supplies a random generator which generates a sample document by inputting data size automatically. For the first experiment, we select three local systems and generate three local XML documents by splitting sub-trees of the sample XML document. The total size of the sample XML documents for the experiment is ¡«10 MB.
Fig. 12 shows the views for three local systems and the query graph of XQ1 written for the views. Local systems are related by the three join operations. 'I' system, 'P' system and 'OA' system describe item information, person information and auction information, respectively. Graphs 'A' and 'B' in Fig. 13 are the comparison results between a real cost and an estimated cost using our algorithm and between a real cost and an estimated cost using the approach for processing a distributed SQL query. The result of the actual measurement of the real cost shows that the processing order with minimum communication cost is the 'I-P-OA' method, which joins between the 'I' and 'P' systems and then processes the join operation related to the 'OA' system. The result of the actual measurement of the real cost shows that the processing order with minimum communication cost is the 'I-P-OA' method, which joins between the 'I' and 'P' systems and then processes the join operation related to the 'OA' system. The result of our approach is also the same as the actual measurement. However, the result using the approach for a distributed SQL query shows that the 'I-OA-P' method has the lowest cost. In the case of the 'I-P-OA' and 'P-I-OA' methods, the difference between the real cost and the estimated cost using our approach is larger than that of the other methods. The reason is that the join operation between the two systems 'I' and 'P' does not significantly reduce the 'category' nodes that are less frequent compared with the other nodes.
Fig. 14 is the query graph of XQ2 written for the four XML views. Fig. 15 shows the comparison between a real cost and an estimated cost using our approach for processing the XQ2. The total size of the sample XML data is ¡«13 MB. The cost graph drawn by our algorithm is similar to a real communication cost. We determined that our method is independent of the number of local systems. After the evaluation, our algorithm was found to be an efficient approach for deciding the optimal processing order of a distributed XQuery query.
5 Conclusion
In this paper, we proposed an algorithm to estimate the communication cost for processing a distributed XML query. In this algorithm, we addressed the different points for estimating the communication cost between distributed SQL and XML queries. Moreover, we classified the relationships between the nodes in an XML document and then proposed four definitions to estimate the communication cost based on the classified relationships. To evaluate our algorithm, we compared the estimated communication cost with the cost in a real situation. Experiments showed that our approach is one of the reasonable methods for estimating the communication cost of a distributed XML query. The result can be applied to any XML query processor that focuses on the integration and search of distributed data in the Internet because our algorithm is a general method.
In the future, we will focus on the feature of the functions and operations in distributed XML queries. Additionally, we will deal with the parallel processing of partial expressions in XML queries.
6 References
1 Murali, M.: 'XML views'. Proc. Int. Conf. Encyclopedia of Database Systems, September 2009, pp. 3656-3659
2 Vania, M.P.V., Fernando, C.L., Valdiana, S.A., Marco, A.C.: 'A mapping-driven approach for SQL/XML view maintenance'. Proc. Int. Conf. ICEIS 2008, Barcelona, Spain, June 2008, pp. 65-73
3 Ioana, M., Daniela, F., Donald, K.: 'Answering XML queries over heterogeneous data sources'. Proc. Int. Conf. 27th VLDB, Rome, Italy, September 2001, pp. 241-250
4 Hanna, K., Krzysztof, S., Kazimierz, S.: 'Distributed query optimization in the stack-based approach'. Proc. Int. Conf. HPCC 2005, Sorrento, Italy, September 2005, pp. 904-909
5 Min, J.Y., Phillip, C.Y.S.: 'Adaptive join algorithms in dynamic distributed databases', Distributed and Parallel Databases, 1997, 5, (1), pp. 5-30
6 Liu, L., Pu, C., Richine, K.: 'Distributed query scheduling service: an architecture and its implementation', Intl. J. Cooperative Inf. Syst., 1998, 7, (2 & 3), pp. 123-166
7 Dan, S.: 'Query decomposition and view maintenance for query languages for unstructured data'. Proc. Int. Conf. VLDB'96, Bombay, India, February 1996, pp. 227-238
8 Marko, S., Ling, F., Willem, J.: 'Web-based distributed XML query processing'. Proc. Int. Conf Intelligent Search on XML Data, May 2003, pp. 207-216
9 Philip, A.B., Nathan, G., Eugene,W., Christopher, L.R., James, B.R. Jr.: 'Query processing in a system for distributed databases (SDD-1)', ACM Trans. Database Syst., 1981, 6, (4), pp. 602-625
10 Ali, I., Eldosouky, H., Arafat, A.M.T., Ali, Eldin: 'New heuristic approaches for improving distributed query processing based on the enhancement of semi-join strategies'. Proc. Int. Conf Statistics, Computer Science, and Operational Research, Egypt, December 2001, pp. 22-24
11 Donald, K.: 'The State of the art in distributed query processing', ACM Computing Surveys, 2001, 32, (4), pp. 422-469
12 Xuemin, L., Maria, E.O.: 'An efficient processing of a chain join with the minimum communication cost in distributed database systems', Distributed and Parallel Databases, 1995, 3, (1), pp. 69-83
13 Jian, L., Amol, D., Samir, K.: 'Minimizing communication cost in distributed multi-query'. Proc. Int. Conf. ICDE 2009, China, March 2009, pp. 772-783
14 Min, J.Y., Phillip, C.-Y.S.: 'Adaptive join algorithms in dynamic distributed databases', Distributed and Parallel Databases, 1997, 5, (1), pp. 5-30
15 Flavius, F., Geert-Jan, H., Cristian, P.: 'XAL: an algebra for XML query optimization'. Proc. Int. Conf. Australasian Database Conf., Melbourne, Australia, January 2002, pp. 49-56
16 Xin, Z., Bradford, P., Elke, A.R.: 'Honey, I shrunk the XQuery!: an XML algebra optimization approach'. Proc. Int. Conf. WIDM 2002, LcLean, Virginia, USA, November 2002, pp. 15-22
17 Norman, M., Sven, H., Carl-Christian, K., Guido, M.: 'XQuery processing in Natix with an emphasis on join ordering'. Proc. Int. Conf. XIME-P 2004, Paris, France, June 2004, pp. 49-54
18 Alan, H., Josef, B., Leonidas, G., et al.: 'Mixed mode XML query processing'. Proc. Int. Conf. 29th VLDB, Berlin, Germany, September 2003, pp. 225-236
19 Andreas, M.W., Christian, M., Theo, H.: 'Towards cost-based query optimization in native XML database management systems'. Proc. Int. Conf. SYRCoDIS, Saint-Petersburg, Russia, May 2008, pp. 1-12
20 Re, C., Brinkley, J.F., Hinshaw, K.P., Suciu, D.: 'Distributed XQuery'. Proc. Int. Conf. IIWeb, Toronto, Canada. August 2004, pp. 116-121
21 Ying, Z., Nan, T., Peter, A.B.: 'Efficient distribution of full-fledged XQuery'. Proc. Int. Conf. ICDE, Shanghai, China, March 2009, pp. 565-576
22 Le, T.T.T., Doan, D.D.: 'Query decomposition using the XML declarative description language'. Proc. Int. Conf. ICCSA, Singapore, May 2005, pp. 1066-1075
23 Park, J.-H., Kang, J.-H.: 'Processing strategy for global XQuery queries based on XQuery join cost', J. Inf. Sci. Eng., 26, (2), pp. 659-672
24 Liang, H.Y., Mong-Li, L., Wynne, H.: 'Finding hot query patterns over an XQuery stream', VLDB J J., 13, (4), pp. 318-332
25 Vanja, J., Tore, R.: 'Query decomposition for a distributed object-oriented mediator system', Distributed and Parallel Databases, 11, (3), pp. 307-336
26 Le, T.T.T., Doan, D.D., Virendrakumar, C.B., Harold, B.: 'A bottom-up strategy for query decomposition'. Proc. Int. Conf. ICDIM 2006, Hong Kong, China, December 2006, pp. 215-221
27 Tamer, M.O., Patrick, V.: 'Principles of distributed database systems' (Prentice-Hall, New Jersey, USA, 1999, 2nd edn.).
28 Park, J.-H., Kang, J.-H.: 'Local Query Generation for Distributed XQuery Queries'. Proc. Int. Conf. CyberC 2010, Huangshan, China, October 2010, pp. 9-12
29 Albrecht, S., Florian, W., Martin, L.K., Michael, J.C., Ioana, M., Ralph, B.: 'XMark: a benchmark for XML data management'. Proc. Int. Conf. 28th VLDB, Hong Kong, China, August 2002, pp. 974-985
Jong-Hyun Park, Ji-Hoon Kang
Department of Computer Science & Engineering, Chungnam National University, Gung-dong, Youseong-Gu, Daejeon, South Korea
E-mail: [email protected]
Copyright The Institution of Engineering & Technology May 2013
