Content area
Purposes
This paper aims to present an objective summary of the current state of research concerning the evaluation criteria of map metadata. The undertaken research identifies which authors and to what extent the discussed issues related to the metadata of objects collected in digital libraries, with particular emphasis on cartographic materials.
Design/methodology/approachIndependent reviewers analysed the basic articles data. Selected papers were subject to quality assessment, based on the full text and 12 questions. Finally, iterative backward reference search was conducted.
FindingsThe results demonstrate that there are no universal criteria for metadata evaluation. There are no works that would assess the metadata of cartographic studies, although numerous publications point to the need for this type of work.
Practical implicationsMetadata evaluation allows users to obtain knowledge whether objects found in the library are relevant for their needs.
Originality/valueThe criteria and methods most often used for assessing metadata quality which can be adopted to map metadata evaluation have been identified. The authors identified the existing research gaps and proved that there is a need for research contributions in the field of evaluating map metadata.
1. Introduction
Descriptions of maps have been collected for centuries. In the beginning, they were paper catalogues (Andrew and Larsgaard, 1999) and the data referred to paper maps. In the 1990s, many cartographic collections were digitised. For example, in the mid-1990s, maps were scanned and made available on CDs at the Malpass Library at Western Illinois University (Allen, 2008). At the same time, the technology required for geographic information systems (GIS) was developing rapidly (March and Scarletto, 2017). In the US, the Association of Research Libraries’ (ARL) GIS Literacy Project was launched (Davie et al., 1999), which enabled ARL member libraries to create GIS services within their libraries. An early example of integrating GIS into non-geographic subject areas advised business liaisons to encourage GIS technology to complement other data sources and analyses (Norris and Tenner, 2000). At the beginning of the twenty-first century, digital libraries were considering how to store map scans: whether they should be colour or monochrome, what the scanning resolution should be and what the costs of storing maps using various techniques were (Bidney, 2019).
Nowadays, many large and small libraries, archives and museums have put their rapidly growing collections of digital content online, creating an immense wealth of scholarly and cultural information. Currently, there is a dynamic development of digital humanities. Digital humanities can be defined as new ways of doing research that involve collaborative, transdisciplinary and computationally engaged research, teaching and publishing. They bring digital tools and methods to the study of the humanities with the recognition that the printed word is no longer the main medium for knowledge production and distribution (Burdick et al., 2012).
Margaritopoulos et al. (2012), Neumaier et al. (2016) and Stvilia and Gasser (2008) emphasised that end users and providers have different needs related to the description of objects in digital libraries. It could be that end users are mostly interested in descriptive metadata, whereas providers may use additional kinds of metadata, such as administrative metadata, to maintain the collection.
Description of metadata for objects stored in libraries, museums, or archives is very well known and widespread. Metadata schemas were developed, such as the MARC standard (for MAchine Readable Cataloguing) for the representation and communication of bibliographic and related information in machine-readable form (LoC MARC, 2019); the Dublin Core metadata schema, that can be used to describe digital resources, as well as physical resources (DCMI, 2012); and the ISO 19115 metadata standard designed specifically for representing geographical information (ISO, 2014). There are also some well-known data content standards – for example, resource description and access (RDA) for descriptive cataloguing, provides instructions and guidelines for formulating bibliographic data (RDA, 2019) – and conceptual models, such as the functional requirements for bibliographic records (FRBR). Developed by the International Federation of Library Associations and Institutions (IFLA), the FRBR is a conceptual entity-relationship model that relates user tasks of retrieval and access in online library catalogues and bibliographic databases from a user’s perspective (IFLA, 2009). Its extensions – the functional requirements for authority data (FRAD) which added a model for the description of authority data and relates that to the user’s needs (IFLA, 2013) and functional requirements for subject authority data (FRSAD) which was intended to support global sharing and reuse of subject authority data (IFLA, 2010) – were also developed by IFLA. The consolidation of the separately developed conceptual models: FRBR, FRAD and FRSAD is a library reference model (LRM): a high-level conceptual reference model developed within an entity-relationship modelling framework (IFLA, 2017).
All the above initiatives aim to provide the correct characteristics of the objects, facilitate access to them and allow the exchange of information between institutions. However, the metadata of shared resources are not always of good quality, which may impede their use and leave the collections underused (Stvilia and Gasser, 2008). Theoretically, a sufficient description exists when all metadata elements are populated with values. However, in practice, this is not what happens in the real world. Relevant surveys by Friesen (2004), Guinchard (2002) and Najjar et al. (2003) have shown that indexers tend to fill out only particular metadata elements that could be considered popular, while they ignore other, less popular elements. The creation of metadata is a task that requires significant effort and financial expenditures and, most importantly, the involvement of knowledgeable and experienced people (Barton et al., 2003; Liddy et al., 2002). Since all these requirements are, generally, difficult to be fully met, it is rather common in the majority of digital repositories to have incomplete metadata (Margaritopoulos et al., 2012).
However, the resources of digital libraries are not only books and articles. They also include such objects as cartographic materials – maps or atlases. The description of their metadata has not yet been standardised and the standards binding in digital libraries to describe other sources are often used to describe them. Studies have been carried out (Larsgaard, 2007; Morse, 2012; Reese, 2006), which indicate that currently used standards do not fully reflect the wealth that is found in cartographic resources. Therefore, the authors have undertaken research on the development of a method of describing cartographic materials in digital libraries. For this purpose, it is important to analyse the current method of describing the metadata of objects collected in digital libraries, their criteria and methods for assessing their quality, to be able to assess to what extent commonly used metadata standards can be adapted to the description of maps or atlases.
Currently, the method of data sharing is changing. Initially, the basic way of sharing maps on the Internet were catalogues, in which maps are accessible as images without references to the space (UTL, 2019; WIG, 2019). With time, they started to be additionally equipped with the tools allowing the display of maps with the ability to zoom them in and out (LoC, 2019; NLA, 2019). Together with the technological development, maps started to be published with the use of geoinformation solutions. They present the coverage of the old map content against the background of present geographical data, but the map itself is still presented in the form of a picture without references to geographical space (NYPL, 2019; Old Maps Online, 2019). The next step in the development of such systems resulted in presenting old maps with references to the present geographical space (Mapire, 2019; NLS, 2019) or mixed the different ways of publication described above (DRMC, 2019). Nowadays, spatial data sharing services are used to publish cartographic materials. The most popular one is Web Map Service (WMS) (OGC, 2019), which allows the opening and use of a referenced old map in desktop software (Maps with a Past, 2019). A growing number of non-geographers, not cartographers, use maps sharing in all these ways in their research.
Neumaier et al. (2016) highlighted that metadata quality issues in open data portals have been identified as one of the core problems for wider adoption and also as a barrier for the overall success of open data. In fact, there have been a number of reports confirming the existence of a quality problem in open data. The formulation and implementation of metadata quality requirements are extremely important for developing interoperability in the area of digital libraries. To achieve that, different metadata systems need to move towards integration that should be based on the standardization, normalization and enrichment of metadata schemas through the application of common terminologies, vocabularies, classifications and so on (Solodovnik, 2011).
Data quality will require more attention in the coming years. Authors should question the quality of the data being shared, because inaccurate or less rigorous data can have deleterious effects when being repurposed in a research context. Moreover, authors also need to ensure that standards are in place and being used when creating and sharing metadata, establishing stable file formats and securing storage options. In the era of open data and cloud storage, data literacy is extremely important. As more and more data is created and shared, people need to understand how it can and cannot be used (March and Scarletto, 2017). The answers to these questions can be obtained by reliable assessment of the metadata of these maps. The difficulty in conducting such an evaluation results from the absence of a metadata description standard dedicated to maps that would reflect the specificity of cartographic studies. Important information for the maps include, for example, geographical coordinates that allow users to search for studies of the same area, quite often indicating what the details of these studies are.
Therefore, the main goal of this work is to provide an objective summary of the current state of research concerning the evaluation criteria of metadata in general, but especially map metadata. The authors looked for a “systematic literature review” (SLR) in a list of selected journals, but unfortunately, they were unable to find existing SLRs regarding the evaluation of map metadata. However, during the search, the authors found quite good reviews and SLRs concerning GIS in libraries, GIScience and cartography (Klonner et al., 2016; March and Scarletto, 2017; Steiger et al., 2015). Despite the lack of compliance with the area of the present research interests, those reviews have been used as a source of good practice for the development of a systematic review of the literature on the assessment of metadata.
The undertaken research aimed at identifying which authors and to what extent issues related to the metadata of objects collected, among others in digital libraries, with particular emphasis on cartographic materials, in their research. In particular, it was analysed whether studies were undertaken to assess the quality of metadata, including map metadata and, if so, which evaluation criteria were taken into account. The research aimed at summarizing the current state of knowledge as well as indicating the directions of possible further research in this area.
A classic, non-systematic literature review could be insufficient to answer the above questions, because it usually is of little scientific value and cannot be reproduced on the basis of a written paper. A SLR summarizes the existing work in a reliable way. Conducting a literature review requires specific procedures in line with the defined search strategy, which ensures the completeness of the search (Kitchenham and Charters, 2007). Systematic review is an overview of primary studies which contains an explicit statement of objectives, materials and methods and has been conducted according to explicit and reproducible methodology. It can include results against a given hypothesis (Greenhalgh, 1998).
2. Review methodology
The main subject of this systematic literature review is to analyse the criteria of evaluating metadata, especially in the case of cartographic documents. The review was prepared based on the guidelines of Kitchenham and Charters (2007). The review started with the development of the review protocol (Attachment 1). It describes the project’s background, selection criteria, quality assessment and methodology of data extraction, as well as methods of synthesis and results.
The review began with defining clear and precise research questions, as this enables the proper selection of primary studies (Okoli and Schabram, 2010). The key question refers to the evaluation of metadata in digital libraries (first research question, RQ1), but the authors were also interested in the details of evaluating maps metadata (second research question, RQ2). Therefore, the following research questions have been defined:
What are the most important criteria when assessing the quality of metadata?
How are archival cartographic documents metadata evaluated in digital libraries?
The authors chose the following inclusion criteria to determine the usability of particular papers for the review:
Only journals and conference proceedings coming from peer reviewed sources because they are widely recognised and appreciated sources presenting the results of research.
Primary studies selected from sources published in the last 17 years (2003-2019) because this was the period when the intensive development of research in the area of digital libraries took place. Earlier, in late 1990s, several digital repository metadata evaluation studies were carried out (Moen et al., 1998a, 1998b). In the early 2000s, the rules currently in force were developing and the technology used in digital libraries began to develop quickly. In 2003, ISO 19115 was created and this year was, therefore, adopted as the cut-off date.
Papers should be written in English, because English is the most popular language for the publication of research results, conferences and publications.
The following exclusion criteria were applied:
Elimination of duplicate content; that is, studies written by the same authors with just slight differences and minor extensions or those that were based on the same data. Authors sometimes publish the same results in different places, in works slightly different in title.
Papers that were not devoted to the following topics to eliminate works that do not discuss any aspect of the issues that are of interest:
method for metadata quality evaluation;
standard of spatial data;
standard of archival documents; and
data in digital libraries.
The authors used an iterative search strategy to develop the most appropriate set of keywords to use (Kitchenham and Charters, 2007). It also allowed for the minimisation of search bias. The authors used the keywords described in the review protocol to search in five electronic databases and included all articles published and indexed before 1 May 2019.
All the selected studies have been conducted with the use of the Mendeley (2019) Reference Management Software and Zotero (2019), where titles, keywords and abstracts have been imported. In the Mendeley software, duplicates have been checked and excluded. Afterwards, two independent reviewers analysed the basic data (title, keywords and abstract) and tagged the articles by using specific tags. Those tags were related to methods for metadata quality evaluation, standards of archival documents description and types of data used in digital libraries. The tags that were used include:
MetEva, related to the methods of metadata quality evaluation;
StanSd, related to a standard of spatial data;
StanArc, related to a standard of archival documents description; and
DatDl, related to a data type in a digital library.
The same applied to the excluded materials. They received tags too, in this case explaining the reason for exclusion. Further explanations can be found in the Notes section for the selected entries:
papers not mentioning any of the topics of interest;
papers not falling in the search range 2003-2019;
papers written in a language that none of the reviewers knew;
duplicate content not detected by the automatic tool within Mendeley; and
problems with obtaining the full text.
Selected papers were taken for quality assessment, based on the full-text and 12 questions. For each question, the authors assigned scores of 0, one, or two points. When the condition was not fulfilled, the paper received 0. Papers scored one for positive answers to questions:
Did untoward events occur during the study (i.e. did the authors encounter any difficulties during the research and were they described in the article)?
Is practical significance discussed?
Are there any negative findings presented/limitations presented?
Are there any future research studies mentioned? (Or has the work been completed?).
Finally, score two was assigned for positive answers to questions:
Are the aims/research questions clearly stated?
Are there any methods of evaluation metadata mentioned?
Does the study analyse how archival cartographic documents are described?
Does it analyse how spatial data are described?
Are there any related works mentioned (from the past)?
Are there any project results presented (tool/platform/method/prototype/final product)?
Do the numbers add up across different tables and subgroups?
Have all study questions been answered or has the research goal been fulfilled?
The total number of points was calculated and papers that had not achieved a minimum number of points were excluded. After the quality assessment of the papers, an iterative backward reference search was also conducted to identify additional primary studies related to the topic of this review.
At the same time, an analysis of the year and the type of publication was also conducted. It helped to determine if any particular year or term was more fruitful in valuable papers, as well as whether the type of publication was reflected in the quality of research. Based on the topic of primary studies, selected papers were divided into groups answering the research questions. The authors added detailed questions to help answer each research problem separately. RQ1 aimed at checking which criteria were the most important when assessing the quality of metadata. To this end, the following detailed questions were distinguished:
What kind of metadata was used (standards)?
What evaluation criteria were used?
Were these methods automatic, semi-automatic, or manual?
What tools, software, or programming language were used to perform the evaluation?
Was there any information about the accuracy of the process provided? How was it measured?
The aim of the second research question was to check how the metadata of cartographic documents were evaluated. To this end, the question has been further elaborated on with:
What types of archival cartographic documents were used?
How archival cartographic documents were described (standards)?
How detailed was the metadata used?
Which institutions distributed those data?
What was the data extent on a map (scale, range)?
Were cartographic metadata evaluated?
Based on these questions, specific research was conducted about the evaluation of metadata, in particular the metadata of cartographic documents. The results are presented in the Review results section.
3. Review results
The review was conducted according to the above methodology. Each step of the search strategy, together with the number of papers selected in each step is presented in Figure 1, 304 studies were identified in five different databases and these search results are presented in Table I. The most relevant papers were found in Web of Science (Web of Science, 2019) and Scopus (Scopus, 2019), in each of these databases more than 100 articles were selected, which accounts for almost 75 per cent of the total number of selected papers. Among the 304 articles, 47 duplicates were found, which were then excluded. As a result of basic data analysis (title, keywords and abstract) and the discussion of contradictions between two reviewer’s choices, a set of 25 primary studies were selected. They were then used for quality assessment based on the full-text of papers.
The set of 25 primary studies was assessed in an Excel sheet by assigning points to each detailed question. Primary studies scored from three to 16 out of 20 points. Minimum scores of 14 for RQ1 and 12 for RQ2 were established, because papers on this level were valuable for the topic. This level was achieved by nine papers, while one paper answered both research questions. Based on the references of nine papers, two primary studies were added that went through full-paper screening. Finally, 11 primary studies that showed relevance to the research questions were identified.
Figure 2 shows the relationship between the year of publication and the number of publications retrieved from the databases. Unfortunately, it is impossible to identify a clear trend of publishing in the 2003-2019 period. Most publications were published in 2007 and 2015 (21) and the largest number of publications, which are the final result, in 2008, 2009 and 2016.
The analysis of item type revealed that most publications (161) were journal articles (Figure 3). In particular, for the final result, 74 were journal papers. It seems that this type of publication is the most valuable. The main journal was the International Journal on Digital Libraries (6 excluded and 2 included papers), another important publication was the Lecture Notes in Computer Science (7 excluded papers) and The Electronic Library (4 excluded and 1 included papers). Analysing articles in terms of the research questions, nine were found that were connected to the first research question and two were related to the second research question, while one of the papers was related to two research questions. Authors very often emphasised the importance of automatic evaluation of metadata and showed different possibility to develop this. Unfortunately, only a few of the papers described metadata for cartographic documents and presented an assessment of this kind of metadata:
What are the most important criteria when assessing the quality of metadata?
Most authors emphasised the complexity of this process due to the fact that quality means the degree to which objects fit the given needs.
Table II presents a summary of the articles used. The subjects of particular research projects were mainly papers, documents, books and untypical documents, such as maps (Kuźma and Mościcka, 2018) and archaeological documents (Gavrilis et al., 2015) collected by digital libraries. The size of the analysed material ranged from 2,000 (Kortemeyer, 2016) to 2,000,000 items (Gavrilis et al., 2015). The standards of metadata were different, but the most typical was the Dublin core – four out of nine papers used this standard. Some of the papers examined queries to databases and they focused on users of objects in digital libraries (Kortemeyer, 2016; Ochoa and Duval, 2009)
In view of the fact that there is no universal methodology to evaluate metadata, the authors of the chosen papers defined their own methods of metadata quality assessment. All of the authors who developed evaluation of metadata emphasised that it is important to know what quality means. Commonly, quality is defined as the measure of fitness for a task. The tasks that metadata should enable in a digital repository are to help the user to find, identify, select and obtain resources (O’Neill, 2011). The need for good quality metadata records becomes a necessity given the large quantities of digital content that are available through digital repositories and the increasing number of web services that use this content (Gavrilis et al., 2015). The quality of the metadata will be directly proportional to how much it facilitates those tasks. Ochoa and Duval (2009) stated that an ideal measurement of metadata quality for fast growing repositories should have two characteristics: to provide useful measurement of the quality (meaningfulness) and to be automatically calculated for each metadata instance inserted in the repository (scalability). Gavrilis et al. (2015) emphasised that the assessment of metadata quality is multidimensional and several parameters should be taken into account to specify the context in which metadata are generated and used.
The authors point out that, if we want to create useful measurement of quality, it is important to define criteria of measurement (evaluation). Authors of nine papers described 16 such criteria (Figure 4): ability, accessibility/retrievability/function of the extracted information, accuracy, appropriateness/existence, auditability, completeness, confidence, conformance to expectations, consistency, difficulty, discrimination, efficiency, logical consistency and coherence, provenance, significance/size, similarity and timeliness.
Some of the criteria were named differently but the definitions were very similar, such as significance (Moreira et al., 2009) and size (Ivanyukovich et al., 2007). Definitions of these criteria are presented in Table III.
The highest number of criteria was eight (Moreira et al., 2009) and the lowest was one (Margaritopoulos et al., 2012; Stvilia and Gasser, 2008) (Table IV). The most popular and most important criterion was completeness, defined as the degree to which the metadata instance contains all the information needed to have a comprehensive representation of the described resource (Ochoa and Duval, 2009), as it was used in six out of nine examinations. A very similar definition was provided by Moreira et al. (2009) “completeness reflects how many of the attributes specified in a metadata standard have their values defined in a metadata specification”. On the other hand, Gavrilis et al. (2015) emphasised that completeness is a complex measure that can be analysed into three partial measures:
completeness of the mandatory set of elements;
completeness of the ‘recommended’ element set; and
completeness of optional elements.
Completeness is easy to be automated in digital libraries and the projects which considered this criterion counted quality in an automatic way.
Moreira et al. (2009) highlighted that assessment depends on different components, and for realistic evaluation one needs to assess all components used for sharing a digital object. They distinguished three components of evaluation. According to the review, the authors expanded Moreira et al.’s classification and added one group, users. All 16 criteria can be divided into four groups:
digital objects:
accessibility (i.e., the rights to the digital object);
conformance to expectations;
significance or size;
similarity; and
timelines.
metadata specification:
completeness;
conformance or consistency;
accuracy;
provenance;
appropriateness or existence; and
auditability.
service:
efficiency; and
confidence.
users
difficulty
discrimination
ability
The analysed papers focused on a different aspect of evaluation (Table IV). Five out of nine studies described criteria related to digital objects. Ivanyukovich et al. (2007) highlighted size as a significance of digital objects and functions of the extracted information; it shows that you can re-use information already extracted from metadata. Moreira et al. (2009) described four criteria connected with digital objects. The first of them is accessibility, which reflects the rights of a certain community of users to access (parts of) the digital objects of a digital library (DL). The second one is significance, which indicates the importance of digital objects regarding a specific factor, such as number of downloads, or the number of citations, and so on. The third factor is similarity, which estimates how related or close two digital objects are. Finally, the fourth is timeliness, which indicates how recent the digital objects of the DL are. Ochoa and Duval (2009) presented three criteria: accessibility, conformance to expectations and timeliness and stated that metadata should change whenever the described object changes (currency). A complete metadata instance should also be available by the time the object is inserted in the repository.
In the evaluation of metadata, six criteria may be distinguished: accuracy, appropriateness or existence, auditability, completeness, consistency or conformance and provenance. Seven out of nine papers used criteria of metadata evaluation. Neumaier et al. (2016) mentioned accuracy, appropriateness and consistency; Moreira et al. (2009), completeness and conformance; Ochoa and Duval (2009), accuracy, completeness and provenance; Gavrilis et al. (2015), accuracy, appropriateness, auditability, completeness and consistency; Margaritopoulos et al. (2012) and Stvilia and Gasser (2008), completeness; and Kuźma and Mościcka (2018), accuracy, completeness and consistency.
Only two of nine papers assessed service. Efficiency was assessed by Moreira et al. (2009) and Ochoa and Duval (2009) and confidence only by Moreira et al. (2009). Efficiency indicates the speed of execution of services and confidence indicates the probability of success (no failures) of the execution of a particular service (Moreira et al., 2009).
Based on Kortemeyer’s (2016) research, criteria related to users can be added. He proposed difficulty, discrimination and ability towards e-learning. The prevalent latent trait is frequently called “ability” and, loosely speaking, it measures how “good” the student is. On summative assessments, the goal is frequently to distinguish the “good” from the “bad” students, so good test items are those that can be solved by students with high ability and cannot be solved by students with low ability. This is usually expressed in terms of the probability of a student with ability to successfully solve a particular item. The simplest measure of difficulty is again the ratio of successful transactions to total transactions, this time on a particular item rather than by a particular student. Discrimination denotes how well the item separates “good” and “bad” students.
The next aspect of the research was connected with the automation of evaluation. Ochoa and Duval (2009) pointed out that quality measurement should be automatic and described the three main disadvantages of manual quality measurement:
Manual quality estimation is only valid at sampling time. If a considerable amount of new resources is inserted into the repository, the assessment could become inaccurate and the estimation must be redone;
Only the average quality can be inferred with these methods. The quality of individual metadata instances can only be obtained for those instances contained in the sample; and
Obtaining the quality estimation in this way is costly. Human experts should review a number of objects that is always increasing due to the growth of repositories.
Various approaches to automation of evaluation were described by Moen et al. (1998a, 1998b); Najjar et al. (2003); Shreeves et al. (2005) and Stvilia and Gasser (2008), but these studies automatically obtained a basic estimation of the quality of each individual metadata instance without the cost generated by manual quality review. However, they do not provide a similar level of “meaningfulness” as a human-generated estimation (Ochoa and Duval, 2009).
It has been noticed that six out of nine papers described automatic evaluation (Gavrilis et al., 2015; Kortemeyer, 2016; Margaritopoulos et al., 2012; Moreira et al., 2009; Neumaier et al., 2016; Stvilia and Gasser, 2008), while two discussed two semi-automatic evaluation (Kuźma and Mościcka, 2018; Ochoa and Duval, 2009) and one manual evaluation (Ivanyukovich et al., 2007). During the projects, different tools for evaluation were created; for example, the “Open Data Portal Watch” (Neumaier et al., 2016). In other cases, the researchers used known tools, which had been modified to suit the given purpose. They also created their own apps: 5SQual (Moreira et al., 2009), LON-CAPA (Kortemeyer, 2016), OAIster (Stvilia and Gasser, 2008), or prototypes/models (Ivanyukovich et al., 2007; Kuźma and Mościcka, 2018; Ochoa and Duval, 2009).
Unfortunately, only one paper (Ochoa and Duval, 2009) described the accuracy of the process. Quality metrics was correlated with human-made quality assessment:
How are archival cartographic documents metadata evaluated in digital libraries?
Unfortunately, only two articles were found that addressed the subject matter of describing cartographic documents and took into account spatial data. Table V presents a summary of the articles used. The main subjects of particular studies were: the maps most heavily used by the students (Allen, 2008) and maps in academic digital libraries (Kuźma and Mościcka, 2018). The digital objects are distributed by different institutions: Western Illinois University (WIU) (Allen, 2008), the Jagiellonian Digital Library (JDL), the Digital Library of the University of Wroclaw (DLUW), the Silesian University of Technology Digital Library (SUTDL), the Pedagogical Digital Library (PDL), the Digital Library of the Warsaw University (e-BUW), the Digital Library University of Lodz (DLUL) and the Maria Curie-Skłodowska University Digital Library (MSCUDL) (Kuźma and Mościcka, 2018).
The size of research ranged from 1,000 (Allen, 2008) to 3,000 items (Kuźma and Mościcka, 2018). Various standards of metadata were used and were usually mixed standards: core metadata according to ISO 19115 and Dublin Core metadata (Kuźma and Mościcka, 2018) and MARC (Allen, 2008), although Allen (2008) pointed out that nowadays MARC is not robust enough for cataloguing digital maps and data, as the number of fields that allow geographic information to be entered is limited, which in turn limits its searchability.
The authors used such administrative data as abstract, keywords, type of content, date, date range, geographic location and subject (Kuźma and Mościcka, 2018). For those projects, adding geospatial data was very important, as this information could help searching geospatial data on the Internet, and these data can be more searchable.
In general, archival cartographic documents are described by such metadata as geographic location, scale of map, orientation, reference system, mapping methods, map format and source materials used to develop the map (Kuźma and Mościcka, 2018) or only geospatial data (Allen, 2008).
SLR shows that there are no studies that would assess the metadata of cartographic studies, although numerous publications (Ahonen-Rainio, 2005; March and Scarletto, 2017), point to the need for this type of work.
4. Discussion
Results of the research demonstrate that the number of papers connected with quality metadata, particularly for maps, is quite limited. The research questions posed at the beginning of the article: What are the most important criteria when assessing the quality of metadata? Moreover, How are archival cartographic documents metadata evaluated in digital libraries? prove the need to evaluate metadata.
Based on the conducted research, it was noted that the largest number and the most valuable publications in this field can be found in Scopus (109 excluded and 4 included) and Web of Science (109 excluded and 4 included) databases. The publications found during the searching review came from the period 2003-2019. Most of them were published in 2007 and 2015 (21). Most research results were published in the form of journal papers, and eight out of 304 were presented in the International Journal on Digital Libraries. Based on this methodology, 11 articles were found that were used to answer two research questions: RQ1 and RQ2.
The conducted research brought an answer to the first research question; that is, it defined the most important criteria used by researchers when assessing the quality of metadata. The results also show that there is still a lot to do in this area. The first problem encountered with the classification of metadata evaluation criteria is the lack of coherence in defining them. It happened that some concepts had the same names, but individual authors understood them differently or, the other way round: the same concepts had different names. For example, some of the criteria were named differently but their definitions were very similar, such as significance (Moreira et al., 2009) and size (Ivanyukovich et al., 2007) as well as the definition of accessibility, retrievability and function of the extracted information (Ivanyukovich et al., 2007; Kuźma and Mościcka, 2018; Moreira et al., 2009; Neumaier et al., 2016; Ochoa and Duval, 2009). It would be very helpful if common criteria names and definitions were defined. As a result, these concepts would be understood by everyone in the same way.
According to RQ1, the authors of nine papers presented 16 different evaluation criteria related to digital objects, metadata specification, service and users. The most popular component (related to metadata) was completeness because it is the easiest to count and automate. Librarians very often use standards of metadata or they have their own metadata specification and it is easy to compare the expected value to the given value. Unfortunately, librarians limit the values to a mandatory set of elements, while the information that is the most important for users is collected in optional or recommended elements (Gavrilis et al., 2015). The second component (related to digital objects) is accessibility. It is very important for users and developers of software because it provides information for which a user could use this data (object) and on how developers can create functionality of software to share data. It shows that those two criteria are connected with providers and users: completeness can answer the needs of providers and accessibility shows if objects in a digital library can be used by users.
Evaluation of metadata should be an automated process. The potential benefits of automated quality evaluation are as follows: determining the percentage of digital objects completely, partially, or not accessible; the average significance of the digital objects; the similarity of pairs of digital objects; the average age of the digital objects; the level of detail contained in the metadata specifications of a DL; the level of compliance of metadata specifications with respect to a given standard; and the efficiency and robustness of the DL services (Moreira et al., 2009).
Unfortunately, the present research did not bring a direct answer to the second research question. The research studies were based on digital objects, but the majority did not examine maps. First of all, it demonstrates that maps are hardly ever treated as resources which require a different scope of metadata and that the evaluation criteria dedicated to maps are practically non-existent in the literature.
Only two papers focussed on cartographic materials: Allen (2008) and Kuźma and Mościcka (2018). When assessing map metadata, these authors created their own evaluation criteria based on different metadata standards. On the one hand, typical standards for objects in digital collections were used, such as Dublin Core (Kuźma and Mościcka, 2018) or MARC (Allen, 2008), on the other hand, those that allow the description of spatial data, such as ISO (Kuźma and Mościcka, 2018) and Federal Geographic Data Committee (Allen, 2008). Standards for spatial data were used because they enable us to describe such metadata as: the geographic location, scale of map, orientation, reference system, mapping methods, map format and source materials used to develop the map (Kuźma and Mościcka, 2018) or geospatial data (Allen, 2008). It means that there is still a lot to do in the future. Evaluation allows users to obtain knowledge about accuracy, if the objects they find are suitable for their needs.
This research shows that evaluation of metadata of maps is a pivotal (key) idea, because there are plenty of large collections shared by different institutions which cannot be exchanged and integrated. The first stage of the required works should be the development of the old maps metadata scope description so as to provide proper characteristics of the cartographic material. This could later be the basis for proposing map metadata evaluation criteria. Such criteria are necessary for map cataloguers and GIS librarians. Map cataloguers work with printed and digital cartographic resources, and they are responsible for the direction of these two different types of map collections development. They create and maintain bibliographic access to all cartographic resources, both hardcopy and digital. Their activities include descriptive cataloguing, classification of items and ensuring appropriate access. On the other hand, a different point of view is represented by GIS librarianship. A GIS allows a user to ask and answer questions in an entirely new way. GIS librarians have a collaborative role to ensure that the principles and expertise of library science be present in the fast-evolving geoinformatics and spatial literacy movements. Both map cataloguers and GIS librarians, even if they realize the importance of the data needed to describe the map, need to have clear guidelines on where and how data are (or should be) saved in metadata. It is necessary for each person preparing metadata not to write data differently. This requires the development of consistent rules for recording data about maps and their consistent application (Kuźma and Mościcka, 2018), as well as competencies allowing to understand these data.
Library staff must be equipped with competencies that will enable them to properly understand cartographic resources and to keep map collections useful. The required skills of a map librarian are varied and overlap in many areas with those of the GIS librarian and the map cataloguer. This is inevitable because the map librarian should be involved in all aspects of the creation and sharing of the collection, which includes digital resources and services; for example, GIS (Weimer et al., 2008). Understanding contemporary needs is the basis for understanding why cartographic resources should be described in a special way. Therefore, map librarians need to be involved in the changing context of new technology and new media, giving them new solutions in access to resources and their dissemination (Parry and Perkins, 2001). For example, staff need to understand and be proficient in not only using GIS software, but also need to have a conceptual understanding and knowledge of GIS applications. Staff also need instructions on the issues of GIS theory, GIS databases and GIS applications in a discipline. These instructions are necessary because the ability to create and analyse data in a GIS requires understanding the inputs and processes needed to yield a result (Longstreth, 1995).
5. Conclusions
This literature review, focussed on journal papers, book chapters and conference proceedings, demonstrated that the number of works in the field of the evaluation of map metadata is scarce. By using the transparent workflow, the aim was to create a review that is replicable and reproducible. To ensure an exhaustive data collection, results were analysed from many scientific databases. By applying different quantitative and qualitative techniques, the attempt was to make this review free of bias but this was obviously impossible.
Further research needs to focus even more on the quality of metadata. It seems promising that several authors of papers tried to calculate different criteria and presented certain formulas (Kuźma and Mościcka, 2018; Ochoa and Duval, 2009). Additionally, it is clear that the evaluation of metadata should be automated. The review provided answers to the research questions. The overview of the existing methods might motivate researchers from across the globe to cooperate. Existing research gaps were also identified and it was proven that there is a strong need for new research contributions in the evaluation of map metadata. This might be an inspiration for many researchers.
The research was carried out as part of the statutory project “Acquisition and processing of geodata for the needs of geospatial recognition systems”, task “Processing spatial data for the purpose of terrain assessment”, grant number PBS 23-885/2019, realized in 2019 at the Military University of Technology in Warsaw, Poland.
Number of studies in each step
Relation between publication year and the number of papers
Item type
Number of papers for each evaluation criterion
Electronic databases used
| Source | URL | Date of search | Search results |
|---|---|---|---|
| Scopus | (Scopus, 2019) | 2019-04-30 | 109 |
| IEEE Library | (IEEE, 2019) | 2019-04-30 | 13 |
| Springer | (SpringerLink, 2019) | 2019-05-01 | 43 |
| EBSCO | (EBSCO, 2019) | 2019-05-01 | 27 |
| Web Of Science | (Web of Science, 2019) | 2019-04-30 | 112 |
| Total | 304 | ||
RQ1: Which of the quality criteria are mainly focussed on evaluation metadata?
| No. | Study name | What kind of metadata was used? (standards) | What evaluation criteria were used? | Were these methods automatic, semi-automatic, or manual? | What tools, software, or programming language were used to perform the evaluation? | Is there any information about the accuracy of the process provided? How was it measured? |
|---|---|---|---|---|---|---|
| 1 | Ivanyukovich et al. (2007) | 500k scientific documents, the papers (IEEE, ACM, Springer, etc.) on CiteSeer.IST: title, authors, publication source and other similar metadata that uniquely characterise an article; references – title | Function of the extracted information from the repository and the size of the repository | Manual | The ScienceTreks prototype based on a FSM-based lexical grammar parser | No |
| 2 | Neumaier et al. (2016) | Metadata for open data in DCAT, and CKAN standards; set of over 260 open data portals with 1.1 M data sets. A data set aggregates a group of data files (referred to as resources or distributions) that are available for access or download in one or more formats (e.g., CSV, PDF, Microsoft Excel, etc.). Additionally, a data set contains metadata | Existence |
Automatic | “Open Data Portal Watch” based on the W3C metadata schema DCAT | No |
| 3 | Moreira et al. (2009) | Dublin Core. These dimensions were evaluated based on the behaviour of the searching and browsing services. Initially, the data necessary to calculate these dimensions would be extracted from the XMLLog file | Accessibility |
Automatic | 5SQual, a tool which provides ways to perform automatic and configurable evaluations of some of the most important DL components, among them, digital objects, metadata, and services | No |
| 4 | Ochoa and Duval (2009) | A title of a picture, or the description of the content of a document, author, and date, 5,000 queries to the repository | Completeness |
Semi-automatic | Text Information Content | Quality metrics correlation with human-made quality assessment |
| 5 | Gavrilis et al. (2015) | 67 packages containing approximately 2 million metadata items (2,113,266) from the CARARE (Connecting ARchaeology and ARchitecture for Europeana) project were used for the two validation experiments. CARARE aggregated content from over 22 different providers related to the archaeological and architectural heritage and delivered to Europeana over 2 million records ensuring a high degree of homogeneity and quality. Three element groups: the Identity element group such as ids, titles, etc.; the Spatial element group such as place labels, addresses, point or area coordinates; the Temporal element group such as information about dates as well as period names | Completeness |
Automatic | A flexible metadata quality evaluation model (MQEM) | |
| 6 | Kortemeyer (2016) | The former, smaller data set has 383,281 transactions on 431 items by 472 students, while the latter data set has 1,361,107 transactions on 1,560 items by 1,330 students. The data sets represent about 1% of the data produced by LON-CAPA | Ability |
Automatic | LON-CAPA | No |
| 7 | Margaritopoulos et al. (2012) | 15 fields of the simple Dublin Core: identifier, title, creator, format, editor, date, language, rights, type, contributor, description, coverage, relation, source, subject |
Completeness | Automatic | XML | No |
| 8 | Stvilia and Gasser (2008) | 15 fields of the simple Dublin Core: identifier, title, creator, format, editor, date, language, rights, type, contributor, description, coverage, relation, source, subject |
Completeness | Automatic | OAIster | No |
| 9 | Kuźma and Mościcka (2018) | Fields of the metadata from Dublin Core, ISO 19115. |
Accessibility – accuracy, consistency, and completeness | Semi-automatic | Algorithm, database | No |
Criteria and their definitions
| No. | Criteria | Definition |
|---|---|---|
| 1 | Ability | The prevalent latent trait is frequently called “ability” and, loosely speaking, it measures how “good” the student is. On summative assessments, the goal is frequently to distinguish “good” from “bad” students, so good test items are those that can be solved by students with high ability and cannot be solved by students with low ability. This is usually expressed in terms of the probability pi (θj) of a student j with ability θj to successfully solve item i (Kortemeyer, 2016) |
| 2 | Accessibility/Retrievability/function of the extracted information | Accessibility implies the level to which a metadata instance can be found and later understood (Ochoa and Duval, 2009). Subsequent statistical correction of the extracted metadata allowed us to re-use already extracted information for its correction and further precision improvement (Ivanyukovich et al., 2007). The existence dimensions (analogous to completeness) can be categorised as contextual (Zaveri et al., 2016) or context-based dimensions (Bizer and Cyganiak, 2009), that is, as dimensions that “highly depend on the context of the task at hand” (Zaveri et al., 2016) (Neumaier et al., 2016). It reflects the rights of a certain community of users to access (parts of) the digital objects of a digital library (Moreira et al., 2009). Accessibility - three main metadata quality criteria have been defined: accuracy, consistency and completeness (Park, 2009; Park and Tosaka, 2010). Metadata accuracy is defined as the degree of conformity of the values saved in metadata records with the characteristics of the described object (Stvilia and Gasser, 2008). Metadata consistency consists of semantic (the same values represent similar concepts) and structural (the degree of conformity of the same structure in representing information in certain metadata elements) consistency (Park, 2009; Bruce and Hillmann, 2004). Metadata completeness is the degree to which objects are described using all metadata elements (Moen et al., 1998a; Kuźma and Mościcka, 2018) |
| 3 | Accuracy | The accuracy dimension reflects the degree of how accurately the available metadata values describe the actual data (Neumaier et al., 2016). The information provided about the resource in the metadata instance should be as correct as possible (Ochoa and Duval, 2009). It indicates how accurate the information provided to describe a certain element is. For example, a thematic subject term encoded as text is less accurate than a subject term accompanied by a SKOS URI (Gavrilis et al., 2015) |
| 4 | Appropriateness/Existence | It indicates whether the values provided are appropriate for the targeted use (Gavrilis et al., 2015). The existence dimensions (analogous to completeness) can be categorised as contextual (Zaveri et al., 2016) or context-based dimensions (Bizer and Cyganiak, 2009), that is, as dimensions that “highly depend on the context of the task at hand” (Zaveri et al., 2016; Neumaier et al., 2016) |
| 5 | Auditability | It indicates whether the record can be tracked back to its original form. This is most useful, especially when a curation service is present (Gavrilis et al., 2015) |
| 6 | Completeness | Completeness reflects how many of the attributes specified in a metadata standard have their values defined in a metadata specification (Moreira et al., 2009). Completeness is the degree to which the metadata instance contains all the information required to have a comprehensive representation of the described resource (Ochoa and Duval, 2009) |
| 7 | Confidence | Confidence indicates the probability of success (no failures) of the execution of a particular service (Moreira et al., 2009) |
| 8 | Conformance to expectations | Conformance to expectations is the degree to which metadata fulfills the requirements of a given community of users for a given task to be considered as a major dimension of the quality of a metadata instance. If the information stored in the metadata helps a community of practice to find, identify, select and obtain resources without a major shift in their workflow it could be considered to conform to the expectations of the community. According to the definition of quality (“fitness for purpose”) used in this paper, this is one of the most important qualities (Ochoa and Duval, 2009) |
| 9 | Consistency/Conformance | It indicates whether the metadata values are consistent with the acceptable types of the metadata elements described by the metadata schema (Gavrilis et al., 2015). Conformance indicates whether the attributes and their respective values in a metadata specification follow the rules defined in a given metadata standard (Moreira et al., 2009). The conformance dimension is inspired by the “representational-consistency” dimension, which is defined as “the degree to which the format and structure of the information conform to previously returned information” (Zaveri et al., 2016). However, our conformance dimension differs from consistency in terms of not comparing values to previously returned information but by checking the conformance of values w.r.t. a given schema or standard (Neumaier et al., 2016). Logical consistency and coherence: Metadata should be consistent with standard definitions and concepts used in the domain. The information contained in the metadata should also have internal coherence, which means that all the fields describe the same resource (Ochoa and Duval, 2009) |
| 10 | Difficulty | The simplest measure for difficulty is again the ratio of successful transactions to total transactions, this time on a particular item rather than by a particular student (Kortemeyer, 2016) |
| 11 | Discrimination | The more commonly used model contains an additional parameter for “discrimination,” denoting how well the item separates “good” and “bad” students (Kortemeyer, 2016) |
| 12 | Efficiency | Efficiency indicates the speed of execution of services (Moreira et al., 2009) |
| 13 | Provenance | The source of the metadata can be another factor that determines its quality. Knowledge about who created the instance, the level of expertise of the indexer, the methodologies that were followed at indexing time and the transformations the metadata has passed through, could provide insight into the quality of the instance (Ochoa and Duval, 2009) |
| 14 | Significance/size | Significance indicates the importance of digital objects regarding a specific factor, such as number of downloads, number of citations, etc. (Moreira et al., 2009) |
| 15 | Similarity | Similarity estimates how related or close two digital objects are Moreira et al. (2009) |
| 16 | Timeliness | Timeliness indicates how recent the digital objects of the DL are Moreira et al. (2009). Metadata should change whenever the described object changes (currency). Also, a complete metadata instance should be available by the time the object is inserted in the repository (lag) (Ochoa and Duval, 2009) |
Components of evaluation and related papers
| No. | Criteria | Component of evaluation | Ivanyukovich et al. (2007) | Neumaier et al. (2016) | Moreira et al. (2009) | Ochoa and Duval (2009) | Gavrilis et al. (2015) | Kortemeyer (2016) | Margaritopoulos et al. (2012) | Stvilia and Gasser (2008) | Kuźma and Mościcka (2018) | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Ability | User | x | 1 | ||||||||
| 2 | Accessibility/retrievability/function of the extracted information | Digital object | x | x | x | x | x | 5 | ||||
| 3 | Accuracy | Metadata | x | x | x | x | 4 | |||||
| 4 | Appropriateness/existence | Metadata | x | x | 2 | |||||||
| 5 | Auditability | Metadata | x | 1 | ||||||||
| 6 | Completeness | Metadata | x | x | x | x | x | x | 6 | |||
| 7 | Confidence | Service | x | 1 | ||||||||
| 8 | Conformance to expectations | Digital object | x | 1 | ||||||||
| 9 | Consistency/conformance | Metadata | x | x | x | x | 4 | |||||
| 10 | Difficulty | User | x | 1 | ||||||||
| 11 | Discrimination | User | x | 1 | ||||||||
| 12 | Efficiency | Service | x | x | 2 | |||||||
| 13 | Provenance | Metadata | x | 1 | ||||||||
| 14 | Significance/size | Digital object | x | x | 2 | |||||||
| 15 | Similarity | Digital object | x | 1 | ||||||||
| 16 | Timeliness | Digital object | x | x | 2 | |||||||
| Sum | 2 | 4 | 8 | 7 | 5 | 3 | 1 | 1 | 4 | |||
RQ2: How are archival cartographic documents metadata evaluated in digital libraries?
| Study name | What types of archival cartographic documents were used? | How were archival cartographic documents described? (standard) | How detailed was the metadata used? | Which institutions distributed those data? | What was the data extent on a map (scale, range)? | Was cartographic metadata evaluated? |
|---|---|---|---|---|---|---|
| Allen, 2008 | The Illinois map collections as a starting point, since those were the maps most heavily used by the students. More than 1,000 maps | FGDC standard (Federal Geographic Data Committee); without detail, we do not know what exactly was used | The MARC metadata standard in its current form is not robust enough for cataloguing digital maps and data; the number of fields that allow geographic information to be entered is limited, which in turn limits searching capability | Western Illinois University (WIU) | Geospatial data only | Yes |
| Kuźma and Mościcka, 2018 | 2,700 documents: atlases, plans, maps and old prints from seven digital libraries | Metadata based on Core metadata according to ISO 19115 and Dublin Core metadata element set | Type of content, Date, Date range, Geographic location, Subject, Scale of map, Orientation, Reference system, Mapping methods, Map format, Source materials used to develop the map, Access Rights, Distribution format, Rights, Language | Jagiellonian Digital Library (JDL), the Digital Library of the University of Wroclaw (DLUW), the Silesian University of Technology Digital Library (SUTDL), the Pedagogical Digital Library (PDL), the Digital Library of the Warsaw University (e-BUW), the Digital Library of the University of Lodz (DLUL), the Maria Curie-Skłodowska University Digital Library (MSCUDL) | Geographic location, Subject, Scale of map, Orientation, Reference system, Mapping methods, Map format, and Source materials used to develop the map | Yes |
© Emerald Publishing Limited 2020
