Content area
Full Text
ABSTRACT-The Google internet search engine is well known for its success. In addition to it being of intrinsic interest, the underlying algorithm can be applied to any matrix Markov process and so has broad interdisciplinary applications. This article is directed to readers across the sciences and technology, and is intended to give a generally accessible introduction to the mathematical basis of the Google search algorithm.
Early search engines for the internet gave out long lists of unordered "title matches". In 1998, Brin and Page published their results (Brin and Page, 1998) on PageRank, a new approach to ranking websites, which in part led to the success of Google(TM). The approach appeals directly to the natural graph structure of the internet. In an efficient way, it uses the citation (link) pattern of the network to rank sites. The result, therefore, does not give some absolute rank of a document, but measures how a document fares in the "citation competition" relative to other documents in the same network. It is commonly supposed that once a month or so Google runs a crawl and selection from the internet to allow for an update of their database and website rankings.
The success of the Google search engine now is common knowledge; and mathematical analysis of the PageRank algorithm is based on well-known (but non-trivial) results that reach back to the 20th century. The mathematician may recall the fixed-point theorems of Brower, Hahn-Banach (Istratescu, 1981) and the matrix theorem of Perron-Frobenius (Berman and Plemmons, 1994; Meyer, 2000). There is a body of literature on search methods. An excellent survey article that includes a discussion of PageRank (as well as other eigenvector methods for web information retrieval) is (Langville and Meyer, 2005). A further bibliographical source can be found in (Kamvar et al., 2004; Sankaralingam et al., 2003). Specialized articles take the defining equation for PageRank as a starting point, and go on to consider: Aspects of how to use the defining equation (Craven ace. 2004; Rogers, 2002); Theoretical results on convergence rates (Kamvar et al., 2004); and Illustrations of non-trivial implementation of algorithms, and their refinements (Sankaralingam et al., 2003). There is a very good article for mathematics students (Bryan and Leise, 2006). The derivation of the equation,...