Content area
P2P systems allow users to share and access different resources over Internet. Actually, they are an excellent alternative to client/server ones, since they are fault tolerant and more scalable. Nonetheless, searching pertinent peers (i.e., peers which share pertinent resources) to user queries in large scale P2P systems is a major problem. In the literature, this task is commonly named query routing. In this respect, the author introduces in this paper a new query-based routing method for unstructured P2P systems. The main originality of our method is the exploitation of the strong connections between query terms and relevant peers, which provided pertinent results for those queries, to define a user profile model. Thus, the computation of the user profile comes down to extracting special types of triadic formal concepts. The user profile is used latter to efficiently route the forthcoming queries. Simulation results show that our method is more effective than the baseline method.
Asbtract - P2P systems allow users to share and access different resources over Internet. Actually, they are an excellent alternative to client/server ones, since they are fault tolerant and more scalable. Nonetheless, searching pertinent peers (i.e., peers which share pertinent resources) to user queries in large scale P2P systems is a major problem. In the literature, this task is commonly named query routing. In this respect, the author introduces in this paper a new query-based routing method for unstructured P2P systems. The main originality of our method is the exploitation of the strong connections between query terms and relevant peers, which provided pertinent results for those queries, to define a user profile model. Thus, the computation of the user profile comes down to extracting special types of triadic formal concepts. The user profile is used latter to efficiently route the forthcoming queries. Simulation results show that our method is more effective than the baseline method.
Keywords: P2P, User Profile, Learning, Query Routing
(ProQuest: ... denotes formulae omitted.)
1Introduction
P2P systems allow users to share and access different resources over Internet. Indeed, in P2P systems, peers play equal roles (i.e., client and server at the same time). Hence, each peer is responsible for indexing its shared documents and for searching pertinent documents to queries received from other peers (i.e., server role). Furthermore, it can also send queries to retrieve pertinent documents shared by other peers (i.e., client role). Actually, P2P systems are an excellent alternative to client/server systems, since they are fault tolerant, autonomic, scalable and cost effective. Nonetheless, locating relevant peers (i.e., peers that share pertinent resources) in large scale P2P systems remains a challenging issue. In the literature, this task is commonly named query routing [1, 2]. Existing query routing approaches in fully distributed P2P systems (i.e., unstructured P2P systems) rely on pure flooding or controlled flooding mechanisms. Within the former mechanism the requesting peer broadcasts the search query to all its neighbors. Whenever a neighboring peer receives the query it returns a response to the requesting peer if it shares pertinent documents. Thereafter, it continues to forward the query to all its neighbors and so on until a Time To Live (TTL) threshold is decremented to 0. Indeed, this method generates an excessive number of messages leading to scalability issues. Furthermore, it does not quickly locate the target peers since the query is randomly propagated through the P2P overlay. The literature witnesses a wealthy number of approaches that have attempted to enhance the effectiveness and the efficiency of the pure flooding mechanism. Existing methods could be roughly categorized as content-based methods [1-5] or query-based ones [6-13]. The formers rely on meta-data of the shared documents by neighboring peers to create a global index at each peer. This latter will be of use to guide the query forwarding process, and therefore, it increases the effectiveness and the efficiency of the routing method. Nonetheless, as a downside of content-based methods peers must exchange a lot of messages to create their indexes. Query-based methods rely on past queries and the related query hits to build a local knowledge base at each peer. The latter stores a set of user interests, which are implicitly captured from past queries and query hits and will be of use to route the forthcoming queries. In fact, query-based methods are more interesting than the content-based ones, since they don't have to exchange extra messages for building the knowledge base. In this respect, the author introduces in this paper a new query-based routing method for unstructured P2P systems. The main originality of our method is the exploitation of the strong connections between query terms and relevant peers that provided pertinent results for these queries to model the user profile. Thus, the computation of the user profile comes down to extracting special types of triadic formal concepts [18]. The user profile is used latter to efficiently route the forthcoming queries.
The remainder of this paper is organized as follows. In Section 2, the author presents the existing query routing methods in unstructured P2P systems. Section 3 details the proposed query-based routing method. The performance evaluation and the obtained results are reported in Section 5. Section 6 concludes this study and sketches future directions.
2Related work
The dedicated literature witnesses several studies that have tried to enhance the pure flooding query routing method in unstructured P2P systems. As aforementioned, the existing methods can be categorized as query-based methods content-based ones. In this study, the researcher focuses on the query-based routing methods since our method is belong this category. In this respect, in [7] authors were based on social metaphors to introduce a routing scheme called REMINDIN. The latter built at each peer a local repository (i.e., ontology), which consists of a set of RDF statements describing the past sent queries ant their associated query hits. In addition, it maintains the source of each statement, its relative confidence and the overall confidence in each peer. Kalogeraki et al. [8] were introduced a query routing method referred as Intelligent Search (IS). Within IS, every peer builds a profile for each of its neighbors. The profile contains the most recent queries for which the neighbor provided pertinent documents. It will be used by the forwarder peer to rank its neighbors with respect to their relevance to the query. Thereafter, the forwarder peer routes the query to the first k neighbors having the highest score. In Route Learning [9]every peer gradually accumulates knowledge from past queries sent to its neighbors. Indeed, it maintains a feature space for each of its neighbors. The forwarder uses a Parzen Windows Estimation, which is an estimation technique widely used in classification problems [9], to choose the relevent neighbors with respect to the query content and the neighbors feature space. Indeed, authors considered the neighbors as n classes and they defined a classifier to classify the query based the feature space of the class (i.e., neighbor) and the query content then forward the query to the relevant neighbors (i.e., classes). Self-Learning Query Routing [12] makes a friendship relation between peers having close interests. Indeed, it learns the interest of the contacted peers based on their answers to the previous sent queries. The forwarder peer routes the forthcoming queries to its friend peers instead of randomly chosen neighbors. If the search fails in the selected peers, the forwarder peer will broadcast the query to its neighborhood. In [6], the authors introduced a semantic query routing model including three main components: Past information component, User profile component and Peers selection component. The first component formally defines the gathered data from past queries and the associated query hits. Indeed, such as data are generally stored in the log file of each peer. The second component formally defines a representation of the user profile based on the gathered data. Authors considered that the user profile is a set of user interests. Each user interest is triplet (a set of query identifiers, their common terms and a set of positive peers which provided answers for all the queries). Finally, the third component offers a set of operators allowing to select the relevant peers from the user profile. In addition, the authors suggested a query routing method as an instance of the proposed model. They relied on two formal contexts (i.e., two binary matrices). The first context represents relationships between queries and associated terms. The second context represents relationships between queries and positive peers. Within these representations, the suggested method didn't be able to find the relevant interests as described in their model. In this paper, the researcher relies on the user profile model suggested in [6] and introduces an efficient method for extracting the relevant user interests.
3Proposed method
Our proposed query routing method includes two main parts:
1. User's profile extraction: It aims to generate a user profile for each peer. To this end, the researcher relies on past queries to learn the user profile, which consists on a set of user interests.
2. Query routing: It uses the user profile to efficiently routes the search queries.
3.1 User profile extraction
The main aim of this part is to create a user profile for each peer from its sent queries. Worth noting that the log file contains historical information of past queries, such as, the identifier of the query, queries terms (i.e., a set of key words), peers that provided pertinent resources for the query and the downloaded documents. In the following, give a formal definition of the user profile.
3.1.1 Key notions
Definition 1 (User profile)
A user profile Pr is a set of user interests. Formally,
...
Where I is a user interest.
Definition 2 (User interest)
A user interest I is a triplet (Qi, Ti, Pi), where:
* Qi: is a set of queries;
* T{. is a set of common terms of all the queries belong to the Qi set;
* Pr. is a set of positive peers, which replied to all the queries belong to the Qi set;
Hence, the computing of the user profile comes down to find strong relationships between the set of queries, their common key-words and the associated peers. To this end, the researcher adopts the Triadic Formal Concept Analysis (TFCA) theory [16]. In the following, the researcher defines and adapts some notion of TFCA.
Definition 3 (Formal context)
A triadic formal context is a quadruple (Q, T, P, I), where I is an incidence relation between the sets Q, T and P (IQ Q x T x P). In our case, Q, T and P represent the sets of queries (objects), query-terms (attributes) and positive peers (conditions), respectively. Consequently, < q x
t x p > Ç / indicates that the query q has the term t and has received an answer from the peer p.
To better understand the above definition, the researcher provides, in the following, an illustrative example.
Example 1
Table 1 depicts a triadic context (Q, T, P, I) representing a part of a peer log file, where Q = ⅛, q2, q3} is a set of queries, T = ⅛, t2, t3] is a set of query terms and P = (Pi, P2, Рз} is a set of peers.
Definition 4
Let (Q, T, P, I) be a triadic context and Q' is a subset of Q (Q' Q Q) . For T' Q T, P' Q P and the projected formal context (T, P, 1q) two concept-forming operators are formally defined as follows:
...
Definition 5
Let (Q, T, P, I) be a triadic context and T' is a subset of T (T' Q T) . For Q' Q Q , P' Q P and the projected formal context (Q, P, IT) two concept-forming operators are formally defined as follows:
...
Definition 6
Let (Q, T, P, I) be a triadic context and P' is a subset of P (P' Q P) . For Q' Q Q, T' Q T and the projected formal context (Q, T, IP) two concept-forming operators are formally defined as follows:
...
Definition 7 (Triadic concept)
Let (Q, T, P, I) be a triadic context, the triple (Q!, T', P') E Q x T x P is called a triadic concept if:
...
Where Q', T' and P' represent respectively the "extent", "intent", and "condition".
Example 2
According to Definition 7, from the triadic formal context of Example 1, The researcher extracts the following triadic concepts:
...
3.1.2Application of the TFCA theory of extracting the user profile
From Definition 7 and Example 2, it is deduced that each triadic concept represents exactly a user interest as defined in Definition 2. Indeed, the issue of extracting the user interests from the logfile comes down to find a set of triadic concepts from a triadic formal context (Q, T, P, I), where Q, T and P represent, respectively, the sets of queries (objects), the set of query-terms (attributes) and the set of positive peers (conditions). The following algorithm illustrates the different steps of extracting the user profile, which is a set of user interests, from the logfile using a triadic concepts extraction algorithm.
Input:
L: Logfile data
Output:
Pr : User profile
Step 1: Generate a triadic formal context K=(Q, T, P, I) from logfile L
Step 2: Generate a set of triadic concepts Tr form Kusing an algorithm of triadic concepts extraction.
Step 3: For each triadic concept Tų E Tr do:
1. Create a user interest = Tų
2. Add Ii to Pr
End For
3.2Query routing
The researcher introduces here a query routing algorithm for unstructured P2P systems based on the gossiping mechanism. Indeed, to route a given query q the forwarder send it to the k relevant peers. It is important to note that the forwarding peer exploits its user profile to select the appropriate peers. The following algorithm illustrates the different steps to select the relevant k peers from the user profile P with respect to the query q.
Input:
Pr : User profile
q: Query to forward
k: Number of peers to select
Output:
Rel: List of relevant peers
Step 1: Compute the similarity between each user interest Ii = (Q¡, T¡, P¡) e Pr and the query q using the following Jacob similarity function:
...
Step 2: Sort Pr according to the similarity between user interests and the query.
Step 3:
Set a counter i to 1
Set Rel = 0
While 3 Iļ = (Qi, Tt, ) e Pr and \Rel\ < κ do
...
End While
4Performance evaluation
In the sequel, the researcher presents the evaluation of our query routing method versus pure flooding method.
4.1 Simulation settings
The network simulation is performed by the PeerSim simulator [15]. It is important to note that the author used the dataset presented in [13]. The simulation settings are summarized in Table 1.
4.2 Evaluation metrics
To assess the proposed query routing method, the following metrics are used:
Recall: For a query q, the recall is defined as:
...
Precision: For a query q, the precision is defined as:
...
Search cost: It is defined as the number of exchanged messages per query.
4.3 Results
Figure [1] depicts the average recall of our method compared with pure flooding. The author observes that the recall of the pure flooding method is stable at around 0.32. However, the recall of our method increases whenever the number of sent queries increases. Indeed, the recall is improved over the time, as peer profiles are learned. The author notes that our method increases the overall recall of pure flooding by around 80 %.
Figure [2] shows the average precision of our method compared with pure flooding. The author remarks that the precision of the pure flooding method is stable at around 0.29. Nonetheless, the precision of our method goes up whenever the number of sent queries goes down. Hence, our method increases the overall precision of pure flooding by around 86%. Indeed, the author obtained these results since our method routes the query according to its content to the target peers using the created user profile by peer.
Figure [3] depicts the search cost, in terms of the number of exchanged messages, of our method compared with pure flooding. The author notes that the number of exchanged messages of pure flooding is stable at around 25 messages per query. However, the average number of exchanged messages per query within our method decreases as far as the number of sent queries increases and consequently the users' profiles are learned. Indeed, it achieved up to 10 % less messages than pure flooding.
5 Conclusion
In this paper, the author has introduced a new querybased learning method for P2P systems. Indeed, our method exploits the strong connections between query terms and relevant peers that provided pertinent results for these queries to model the user profile. This latter has been computed through extracting triadic formal concepts from past queries then used latter to efficiently route the forthcoming queries. The obtained results have showed that our method outperforms the baseline approach. As future work, the author suggests to perform different simulation scenarios by varying various parameters.
6 References
[1]. Callan J P, Lu Z, Croft W B, Searching distributed collections with inference networks. In: The 18th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR '95), pp 21-28.
[2]. Chawathe Y, Ratnasamy S, Breslau L, Making gnutella-like p2p systems scalable. In: Proceedings of the 2003 conference on applications, technologies, architectures, and protocols for computer communications (SiGcOMM'03). Karlsruhe, Germany, pp 407-418.
[3]. Jin H, Ning X, Chen H, Yin Z, Efficient query routing for information retrieval in semantic overlays. In: Proceedings of the 21st Annual ACM Symposium on Applied Computing (SAC' 2006). Dijon, France, pp 23-27.
[4]. Kumar A, Xu J, Zegura E W, Efficient and scalable query routing for unstructured peer-to-peer networks. In: Proceedings of the 24th IEEE International Conference on Computer Communications (INFOCOM' 2005). Miami, USA, pp 1162-1173.
[5]. Lu J, Callan J, Content-based retrieval in hybrid peerto-peer networks. In: Proceedings of the twelfth international conference on Information and knowledge management (CIKM 2003). New Orleans, LA, USA.
[6]. Khedija Arour, Taoufik Yeferny, Learning model for efficient query routing in P2P information retrieval systems. Peer-to-Peer Networking and Applications 8(5): 741-757 (2015).
[7]. Christoph T, Steffen S, Adrian W, Semantic query routing in peer-to-peer networks based on social metaphors. In: 13th International World Wide Web Conference (WWW' 2004). New York City, USA, pp 55-68.
[8]. Kalogeraki Vana G D, D Z Y, A local search mechanism for peer-to-peer networks. In: Proceedings of the eleventh international conference on Information and knowledge management (CIKM' 2002). McLean, Virginia, USA, pp 300-307.
[9]. Ciraci S, Korpeoglu I, Ulusoy z, Reducing query overhead through route learning in unstructured peerto-peer network. Netw Comput Appl 32(3): 550-567.
[10]. Crespo A, Garcia-Molina H, Routing indices for peer-to-peer systems. In: Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02). Vienna, Austria, pp 23-30.
[11]. Tsoumakos D, Roussopoulos N, Adaptive probabilistic search for peer-to-peer networks. In: Proceedings of the 3rd International Conference on Peer-to-Peer Computing (P2P'2003), pp 102-109.
[12]. Haitao Chen, Zhenghu Gong, and Zunguo Huang, Self-learning Routing in Unstructured P2P Network, International Journal of Information Technology Vol. 11 No. 12, 2005.
[13]. Goh S, Kalnis P, Bakirs S, Tan K L, Real datasets for file-sharing peer-to-peer systems. In: 10th International Conference on Database Systems for Advanced Applications (DASFAA' 2005). Beijing, China, pp 201-213.
[14]. Jelasity M, Montresor A, Jesi G P, Voulgaris S The peersim simulator. http://peersim.sf.net.
[15]. Fei Hao, Doo-Soon Park, Geyong Min, Young-Sik Jeong, Jong-Hyuk Park, k-Cliques mining in dynamic social networks based on triadic formal concept analysis, Neurocomputing, 57-66, 2016.
Copyright The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp) 2019