Content area
Full Text
http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = Int J Digit Libr (2016) 17:119141 DOI 10.1007/s00799-015-0144-4
http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = http://crossmark.crossref.org/dialog/?doi=10.1007/s00799-015-0144-4&domain=pdf
Web End = A quantitative approach to evaluate Website Archivability using the CLEAR+ method
Vangelis Banos Yannis Manolopoulos
Received: 28 April 2014 / Revised: 15 February 2015 / Accepted: 25 February 2015 / Published online: 12 March 2015 Springer-Verlag Berlin Heidelberg 2015
Abstract Website Archivability (WA) is a notion established to capture the core aspects of a website, crucial in diagnosing whether it has the potential to be archived with completeness and accuracy. In this work, aiming at measuring WA, we introduce and elaborate on all aspects of CLEAR+, an extended version of the Credible Live Evaluation Method for Archive Readiness (CLEAR) method. We use a systematic approach to evaluate WA from multiple different perspectives, which we call Website Archivability Facets. We then analyse http://archiveready.com
Web End =archiveready.com , a web application we created as the reference implementation of CLEAR+, and discuss the implementation of the evaluation workow. Finally, we conduct thorough evaluations of all aspects of WA to support the validity, the reliability and the benets of our method using real-world web data.
Keywords Web archiving Website Archivability
Web harvesting
1 Introduction
The number of indexed World Wide Web pages is estimated to be 1.75 billion in early 2014 according to major search engines [35].1 The approximate numbers of Tumbler blogs (over 80 million) and WordPress websites (over 50 million) also suggest the Web as a popular channel of information exchange. For example, 3.5 billion of the WordPress
V. Banos (B) Y. Manolopoulos
Department of Informatics, Aristotle University, 54124 Thessalonki, Greecee-mail: [email protected]
Y. Manolopoulose-mail: [email protected]
webpages are being visited every month. These channels of communication are not limited to the younger generation: the average age of active users on social media networks is estimated to be around 3745 years [12]. Volume, rate of production, and associated demographics in itself are not reasons for archiving material. However, it would be foolish to consider that all the information being produced has no value. The level of trafc, at least, suggests social...