Content area
Full text
Keywords
Web site classification, Neural networks, Non-profit organizations
Abstract
Describes an approach automatically to classify and evaluate publicly accessible World Wide Web sites. The suggested methodology is equally valuable for analyzing content and hypertext structures of commercial, educational and non-- profit organizations. Outlines a research methodology for model building and validation and defines the most relevant attributes of such a process. A set of operational criteria for classifying Web sites is developed. The introduced software tool supports the automated gathering of these parameters, and thereby assures the necessary "critical mass" of empirical data. Based on the preprocessed information, a multi-methodological approach is chosen that comprises statistical clustering, textual analysis, supervised and non-- supervised neural networks and manual classification for validation purposes.
Introduction and analytical objectives
The principal idea behind this paper is the use of autonomous software tools to capture the characteristics of commercial Web information systems, determine their specific importance, and store them in a central data repository. The utilization of dedicated software agents to examine Web sites is more efficient and immune against intra- and interpersonal variances than human evaluation. Thus, the inclusion of thousands of systems becomes feasible, compared to samples limited to tens or hundreds in previous efforts (Bucy et al., 1999; Selz and Schubert, 1997; Witherspoon, 1999). Naturally, these advantages come at the expense of sacrificing non-quantifiable, frequently recipient-dependent information. These previous efforts conducted manual Web site assessment, and therefore did not have access to the necessary resources to cover more significant samples. Schubert and Selz (1999), for example, developed an on-line survey tool with an extensive list of evaluation criteria for Web assessment. And while their results include a plethora of valuable information, their data collection was limited to around 70 assessments for several Web sites.
The ultimate goal is to develop a consistent analysis and evaluation framework of publicly accessible hypertext structures. However, we have to be cognizant of the relevant attributes to be able to evaluate and group them (Tesch, 1990). Thus both the framework and the software prototype imply the definition of measurable, operational criteria. These criteria will be investigated and preprocessed by the tool for each Web site separately. Possible applications of such a tool fall into three areas (see...





