Content area
Full Text
Abstract-The tremendous explosion of Big Data such as social network data has made it urgent to mine important information embedded in the huge graphs. One of the mining tasks is to discover the most frequent graph in a large graph. This task is equivalently to the subgraph isomorphism problem, which is to discover the subgraph of a large graph that is isomporphic to a small input graph. The subgraph isomorphism problem is inherently a NP-hard problem, thus an exact algorithm is impossible when graphs are huge, which are the cases these days. In this study, we reported the implementation of three heuristic optimization algorithms: simulated annealing (SA), (1+1) evolutionary algorithm, and genetic algorithm (GA). We applied these algorithms on randomly generated graphs for identifying the isomorphic subgraphs, and compared the performance of these three algorithms. Our experimental results have shown that GA performed better than the other two algorithms overall.
(ProQuest: ... denotes formulae omitted.)
I. INTRODUCTION
Graph models have been applied in many fields such as Bioinformatics, Chemoinformatics, Computer Science, and Socialology. Graphs can be represented to study proteinprotein interaction networks [1], chemical components [2], computer networks [3], social networks [4], semantic web [5] and XML data [6]. For instance, the availability of huge amounts of social media data has made it possible for researchers to analyze social networks using graph models, and lead new discoveries and insights through studying graph structures.
While there are many problems have been solved by using graph models in the past decades, this is not the end. Instead, the explosive growth of data will put more challenges to researchers, especially in the era of Big Data. For instance, finding most frequent subgraphs in a large graph will pose a great challenge as the graphs get larger and larger. In the frequent subgraph problem, frequent subgraph candidates are first generated, and the frequency of each candidate will be checked [7]. The checking of the candidates within a large graph is is known as the subgraph isomorphism problem, which is a NP-complete problem [8].
The algorithms of finding most frequent subgraphs in graph datasets can be roughly classifed into three categories: (1) Mathematical graph-theory based approaches, which include FSG [9], MoFa/Moss [10], Gaston [11], gSpan [12], CloseGraph [13],...