1. Introduction
Understanding tourism competitiveness of countries has become a key aspect to destinations. Tourism has shown to highly impact the social-cultural environment and economic growth of a country [1]. Therefore, countries invest a huge amount of money to collect data related to tourism industries, attractions, infrastructure, and so on. In addition, several organizations, such as the World Economic Forum (WEF), collect and analyze data from several countries in order to determine how competitive countries are in the tourism sector. WEF is a well-known organization devoted to the dissemination of world-wide data that also emit data which show the state of tourism competitiveness of countries. In a broad way, WEF is an organization for public–private cooperation that engages the foremost political, business, and other leaders of society to shape global, regional, and industry agendas [2]. WEF has published the Travel & Tourism Competitiveness Report since 2007.
The analysis of tourism on the economies typically relies on official tourism statistics provided by governments and institutions. Parallel to the dissemination of official statistical data, Information and Communication Technologies (ICT) particularly in general and mobile and social network technologies have opened a new door, and data coming from these new sources are used to analyze tourism, as shown in several recent studies [3,4]. These online tools, social networks, and collaborative platforms have emerged as a relevant data source to understand tourism behavior and traveling trends [5,6,7,8] to create accurate tourist profiles [9,10] and to elicit a picture of the tourism industry [11].
A remarkable example of these new sources is the free mapping service offered by the collaborative mapping platform OpenStreetMap (OSM) [12], with around 37,000 active contributors during a typical month. OSM is claimed to be the largest freely and openly accessible database of geographic data in the world [13]. It emerges as an alternative to the restricted use of other mapping services, such as Google Maps. One argument in favor of Google Maps could be the wide range of advanced features that it offers (street-view images, multimodal navigation, social recommendations, etc.). However, some services based on the OSM database also provide them. For example, Mapillary (
This paper presents an exploratory analysis of the OSM data set and compares the obtained insight with the publicly available data of the tourism competitiveness provided by WEF for a group of about 130 countries worldwide. Specifically, we are interested in studying the representativeness and reliability of tourism-related data found in an open and collaborative platform, such as OSM; that is, our aim is to analyze how well the OSM data reflect the actual tourism competitiveness data from the WEF across eight indicators. We will investigate the relationship between OSM and the WEF tourism competitiveness report through regression models to study the relationship between the data collected from OSM for an indicator and the official values of such indicators in WEF.
Sometimes, official information is difficult to find, it is not possible to access it at the desired level of granularity, or it is not easily upgradeable. As explained above, social networks and collaborative platforms have emerged as a relevant and alternative data source that can be used in these cases. Therefore, in this paper, we will examine the tourism-related information of OSM and determine in which cases OSM is a reliable alternative data source to WEF and can be used for forecasting. In a nutshell, given the common acknowledgement that OSM is a powerful and user-friendly geo-data platform extensively used for tourism purposes, our aim is to give response to the following question: does OSM provide an accurate picture of the studied components of tourism competitiveness?. That is, we are interested in analyzing whether the elements mapped in OSM can be used to infer some WEF data. If the answer is yes, OSM data can be used to, for example, analyze the same components of tourism competitiveness at a more specific area (not necessarily at a country level, as WEF provides). Otherwise, we will analyze which aspects make this task difficult.
Given the nature of the OSM data, which is mainly related to attractions, accommodation, and infrastructure, the components of tourism competitiveness that will be analysed in this paper are those concerned to the endowments of these elements in each country. Therefore, other tourism competitiveness aspects, such as the dimension of touristic flows, pricing policies, destination marketing, the reputation of the place, and so forth are out of the scope of the analysis presented in this paper. Specifically, we will focus on attractions and accomodation, which are related to eight WEF indicators.
We will carry out an statistical and regression analysis of eight different tourism indicators over 133 countries from two different points of view: (1) considering all the countries as a whole, and (2) splitting the countries into three groups according to their ICT level given by the ICT readiness pillar of WEF. The reason for this double analysis is that, according to [17], the status of a country’s ICT services will determine, for instance, the success of a Volunteered Geographic Information (VGI) initiative or the expected growth in the years to come. Moreover, previous investigations [18] found that although OSM has had great global success, there is still a clear difference in the volume of contributed data between affluent and poorer communities. Therefore, we will also examine whether the country ICT level is an influential factor in the relation between OSM and WEF. We hypothesize that a higher ICT level would imply a better representativeness of OSM with respect to official data sources, given that technology in these countries is more easily accessible and hence users will participate more intensely in collaborative platforms (OSM, in this case).
An additional aspect that must be mentioned is that the two data sources we handle in this work, WEF and OSM, are of a very different nature, and thereby it is not always possible to measure exactly the same concept in both sources. For example, it could be the case that a particular variable is measured in different units in OSM and WEF, or it is not possible to find an exact element in OSM to a given WEF indicator. In both cases, some approximations have been computed, and we will discuss the limitations we have found regarding this.
Our Research Questions can be summarized in the following:
Question 1: Can OSM data be used as a reliable alternative source to extract the WEF tourism indicators?
Question 2: Is it possible to model the trend reflected in WEF tourism indicators with OSM data?
Question 3: Does the ICT level of a country influence the models built to answer Question 2?
The paper is structured in the following sections. Section 2 gives an overview of previous work that uses OSM data in several contexts. Section 3 describes the WEF and OSM data sources used in our analysis. Section 4 describes the analysis we performed with WEF and OSM data, Section 5 presents the outcomes of this analysis, and Section 6 discusses these results. Finally, in the last section, we outline the conclusions and future research directions.
2. Related Work
Volunteered Geographic Information (VGI) [19] systems have emerged as an answer to the need for open and easy-to-use geographic data and as an alternative to Commercial Geographic Information systems which impose restrictions on the use of the data. Technological advancement has fostered the emerging role of the citizen as a source of data. Citizen sensing has dramatically affected mapping and map use, impacting on routine daily life activities, such as gaming and tourism, as well as on science and technology more generally [20]. Due to the proliferation of location-aware devices and the opportunities of Web 2.0, it is now possible for citizens to easily acquire geographical information, which may dramatically reduce the cost of map acquisition [21] and also allows to usually have up-to-date maps [22]. Additionally, it can become a tool for the empowerment of marginalized individuals and social groups [23].
However, citizen-derived data are also often of varied quality and trust levels. For example, the data generated may be poorly described and associated with little metadata. Additionally, there are other considerations in the use of VGI, including ownership rights, as well as privacy, legal, and ethical issues [20].
OpenStreetMap (OSM) is one of the most well-known VGI projects. The crowdsourced approach of OSM derives its success from citizens mapping and collecting data and information about their locality [13]. Features being mapped include the location of garbage cans, pedestrian crossings, land cover types, shops, education facilities, to government buildings, roads, and river networks. All data in the OSM database can be downloaded for free in a variety of spatial data formats. Additionally, a number of open source tools are available to process this data and produce other formats [21]. The OSM project counts on experienced volunteers that spend time checking, updating, and improving OSM data. The process of validation aims to ensure the completeness and quality of data. Nevertheless, the fact that the OSM is either non-commercial or governmental and that validation is carried out by volunteers sometimes puts the validation of data in question [20].
In order to alleviate the doubts concerning the quality and precision of OSM data, a large number of works have investigated the robustness and validity of OSM in several fields, like in environmental epidemiological and exposure assessment studies [24]. This study compared OSM and Governmental Major Road Data in three different regions: Massachusetts (USA), Bern (Switzerland), and Beer-Sheva (South Israel). This investigation found that OSM data was fairly complete and accurate in all regions, and that the results in all regions were robust, with Massachusetts showing the best fit ( of 0.93).
In the same direction, the work [25] evaluates the quality of OSM data with respect to its suitability for a certain application, specifically for pedestrian navigation. The analysis compares routes calculated with OSM data and routes done with the German topographic data set, using accessibility and length of routes as quality criteria. The study concludes that OSM is fairly accurate on average within about six meters of the position recorded by the Ordnance Survey, and with approximately 80% overlap of motorway objects between the two datasets.
Another relevant work is about comparing the accuracy of the OSM data on land use in four German metropolitan areas versus the Global Monitoring for Environment and Security Urban Atlas as a reference [26]. The study reveals the suitability of using OSM as an alternative complementary source for extracting land use information as it also highlights the potential of collaboratively collected land use features by mappers.
There have also been attempts to evaluate the quality of OSM—in terms of completeness, and positional and semantic accuracy in the cultural sector. In [27], authors show that the number of museums of Italy mapped in OSM accounts for 86% of the official total. In addition, OSM has records of positional and semantic information of 39% of the museums overall. The study also states that for 77.7% of the museums, the location reported by OSM is less than 150 me away from the actual location of the museum. Likewise, 90% of the museums have a similar denomination in OSM and in the official sources.
OSM has also been used to predict socio-economic indicators (sustainability, human development, vulnerability, risk, resilience, and climate change adaptation) for municipalities. In [28], authors present an interesting study that highlights the prospects of OSM to analyze interdisciplinary topics and factors like social cohesion, and provide meaningful insight into the spatial differences in social, environmental, or economic inequalities. One of the conclusions of this study is that further research is needed to determine the impact of regional and international differences in user contributions on the outputs.
In the specific field of tourism, we found some works that use OSM in analysis tasks. For instance, in [29], a framework for the assessment of the quality of OpenStreetMap is depicted. The approach analyses several quality measures, such as completeness, compliance, consistence, granularity, richness, and trust of OSM tags in Spain. The authors conclude that the current status of the Spanish OSM data can be considered satisfactory in some indicators (compliance and consistency), while in some others (granularity and richness) it should be improved. For tourism POIs, some elements are still missing. For instance, shopping and amenity destinations should include opening hours, phone numbers, and so forth, and specific categories like restaurants or hotels should include more detailed information (prices, cuisine, stars, etc.).
In the same way, ref. [30] evaluated the consistency of the information contained in the Compendium of Tourism Statistics of the World Tourism Organization with respect to the information published in OSM, especially information on places of accommodation, food and beverages, and travel agencies. Among the results shown in this paper, the high correlation that exists between the data from both sources with respect to information on accommodation (0.81), food and beverage sites (0.87), and travel agencies (0.82) is remarkable.
In [31], the authors exposed how they used OSM data along with data from official sources and other platforms with the objective of identifying spatial patterns in park popularity in the state of Victoria, Australia. Statistically significant correlations were found between official data and OSM data, indicating that OSM vertices’ density in a given area can be used to infer the number of visitors.
Finally, in [32], a methodology for computing composite indicators derived from OSM data as an alternative to statistical offices was presented. To demonstrate its use, they applied this methodology to a number of indicators used for real estate valuation of properties in Italy. Among these indicators, they considered a number of sites of historical relevance and a number of nearby hotels and hotel-related features.
3. Data
This section describes firstly the tourism indicators from the WEF data sources which will be used in our analysis. Subsequently, we overview some basic aspects of OSM, and we define the concept of direct and indirect variables.
3.1. WEF
Tourism competitiveness is regarded as the set of regulations, infrastructure, and resources that enable the sustainable development of the Travel & Tourism (T&T) sector. For our analysis, data on tourism competitiveness were retrieved from sources of the WEF organization. Particularly, we focus on the Travel & Tourism Competitiveness Report, of which the first edition was published in 2007. This report is based on secondary data from various international organisms and provides engaged leaders in T&T an in-depth analysis of tourism competitiveness of a large number economies across the world. The 2017 edition covers 141 economies and features data about 14 key factors and policies, also called pillars, that enable the sustainable development of the T&T sector and contribute to the development and tourism competitiveness of a country [33].
A pillar measures the strengths and weaknesses of a country in a scale of 1 (bad) to 7 (excellent), and it is based on a set of 90 indicators that are collected either from surveys or official national statistics. These indicators are mainly extracted from two sources:
Survey indicators: These are data derived from responses to the WEF’s Executive Opinion Survey that capture the opinions of business leaders around the world on a broad range of topics. These indicators are aimed to measure critical concepts to complement the traditional sources of statistics and provide a more accurate assessment of drivers of economic development. Survey indicators range in value from 1 to 7 (1: the lowest negative perception; 7: the highest positive perception).
Hard data indicators: These are data which objectively represent the state of some resource or abstract concept, and they are often measured by official international or national organizations (e.g., number of stadiums, airports, ATMs, etc.). These indicators are normalized to a scale of 1 to 7 in order to align them with the Executive Opinion Survey’s results
WEF uses the survey and hard data indicators to shape the 14 pillars, which are then compiled into a global Travel and Tourism Competitiveness index that represents how viable a country is within the T&T sector.
For our analysis, we opted for selecting indicators that measure tangible aspects that are rather directly perceived by tourists and can be determinant in the selection of a particular destination. The nine indicators selected as our study variables are shown in Table 1. The second column of Table 1 shows the indicator name alongside a brief description. The first column is the pillar that the indicator belongs to. The third column indicates the name of the variable in our study. The fourth column shows whether the indicator is a hard data indicator (H) or a survey indicator (S). Finally, the fifth column is explained in Section 3.2 as it is directly involved with the retrieval of the OSM data.
As can be observed, each variable is drawn from only one WEF indicator except for the variable
All in all, we have a total of eight variables covering the most relevant aspects of tourism competitiveness that influence the tourist perception of the country. The selected indicators embody aspects that have a major impact on a tourist trip, such as the presence of car rental companies, the availability of accommodation, or the number of cultural/natural sites. Some of the variables in Table 1 refer to elements related to the tourism infrastructure, while others are intended to survey the tourism attractiveness of the country. The values of the indicators for every country are extracted from the Travel & Tourism Competitiveness Report, which is directly available and downloadable in electronic format [33].
3.2. OSM
In this section, we will describe the elements of OSM that will be used in our analysis. Objects drawn on a OSM map are called map features, but these map features are not a tourism-specific site. However, the aggregation of web maps and user-generated content is fed with a broad variety of metadata (OSM tags) that provide valuable tourism information, like the location of accommodation, food establishments, or tourist attractions. Hence, we are able to collect information about tourism competitiveness within a geographical or administrative area, such as a country [34].
In this sense, Table 2 shows a list of five keys alongside a brief textual description of each one. At the end of the description, we show some examples of tags that represent a particular map feature. For instance, a bar is an element tagged in OSM as
There are no specific guidelines for the type of tags to define a map feature, except that they must always be string values. Although OSM contributors are allowed to use free-style attributes to define features, there exists a wiki page (
The two data sources we handle in this work, WEF and OSM, are of a very different nature, and thereby it is not always possible to measure exactly the same concept in both sources. Hence, a relevant aspect that must be considered in the data extraction is whether or not the OSM value of a particular variable is given in the same measurement units as the value of the corresponding indicator in WEF, which gives rise to:
Direct variables: This is the case when the variable is measured in the same terms as the WEF indicator. For instance, the value retrieved from OSM for the variable
CAR is the number of establishments that provide such particular service, as are the values obtained from WEF for the indicator “Presence of major can rental companies”.Indirect variables: this is the case when the variable in OSM is measured in units other than the ones used in the WEF indicator. For instance, the value of the WEF indicator “Attractiveness of natural assets” is a value within the range 1 to 7 that comes from a survey, while the value we obtain from OSM for variable
NAT is the number of natural beauty spots.
We can observe in the fifth column of Table 1 that variables are classified as direct (D) or indirect (I).
4. Methods
Our aim is to analyze how well the OSM data approximate the values of the WEF indicators and thus determine whether OSM is a reliable data source to evaluate tourism competitiveness.
Figure 1 shows the workflow followed in our analysis. First, the Travel & Tourism Competitiveness Report 2017 was reviewed and, as explained in Section 3, eight variables related with attractions and accommodation infrastructure were selected. The data for each country corresponding to these variables in 2017 was downloaded from WEF. Then, the OSM database was studied, and the most appropriate data for each variable was extracted in 2017 (this will be explained in Section 4.1). Both data from WEF and OSM were combined to build some statistical models, as shown in Section 4.2. For evaluating these models, the following steps were performed: (1) OSM data were downloaded in 2019, (2) these new OSM data were used to infer the WEF values, by using the regression models and (3) the inferred values were compared to the actual WEF values in the Travel & Tourism Competitiveness Report 2019.
4.1. OSM Data Processing
We follow a straightforward two-step process to retrieve the OSM data for each variable:
Step 1. We identify the specific combination of OSM tags that better capture the meaning of the variable. As an example, for the WEF variable
CAR (car rental companies), we selected the tagsamenity ,name , andoperator , since this particular combination enables knowledge of whether a specific car rental company is present in a geographical area.Step 2. We query the OSM tags selected in Step 1 through the Overpass API (The Overpass API is an API that serves up custom selected parts of the OSM map data by search criteria, such as location, type of objects, tag properties, proximity, or combinations of them (
https://wiki.openstreetmap.org/wiki/Overpass_API/Language_Guide (accessed on 3 July 2020))) within the delimited geographical area of a specific country. Algorithm 1 shows a query to retrieve the car rental companies in Colombia. Once the objects of typeamenity = "car_rental" are retrieved, we can apply the queryname = "Europcar" or the queryoperator = "Europcar" over the retrieved objects so as to find out if the car rental companyEuropcar is present in Colombia.
Algorithm 1: Excerpt of Overpass code. |
|
In some cases, it is necessary to apply two or more queries as described in Step 2 to retrieve the value of a particular variable. Aggregation, arithmetic operations, or more complex operations are needed to approximate the value of some variables with OSM data. Both Overpass queries and the subsequent approximation operations have been implemented in Python.
In the following, we explain the tags used to retrieve the variables, as well as the operations needed in some cases to approximate the value of the WEF indicator.
4.2. Statistical Analysis
In this section we will carry out a statistical analysis and investigate the relationship between the values of the official WEF indicators and the data collected from OSM. In particular, first, a linear correlation analysis between each WEF variable (denoted as variable-WEF) and its counterpart in OSM (denoted as variable-OSM) is performed, and then regression models are calculated to measure how well the OSM data fits the WEF indicators. In order to obtain the most accurate model that fits the data at hand, linear and non-linear regression models were tested, like multiplicative, double-squared, and squared-root-Y models, among others (see Table 3). These regression models are an alternative when linear models do not achieve the desired accuracy, or when the phenomenon under study has a behavior that can be considered non-linear. To assess the accuracy of each model, the determination coefficient (), which measures the proportion of variation of the dependent variable (variable-WEF), is explained by the independent variable, and (variable-OSM) is calculated. Finally, the models are tested with new data from 2019 and the values predicted by these models are compared with the actual WEF values. These analyses will help us to answer our Research Questions 1 and 2.
As stated in [17], the status of a country’s ICT services will determine how successful a VGI initiative could be and what growth may be expected in the years to come. Previous investigations [18] found that although OSM has had great global success, there is still a clear difference in the volume of contributed data between affluent and poorer communities. Since OSM relies upon volunteers and the amount of time and effort spent to the relevant area of the map, broader OSM coverage will happen in wealthier countries that have a high ICT level, given that this pillar measures the existence of modern infrastructure (mobile network coverage and quality of electricity supply), but also the capacity of businesses and individuals to use and provide online services. Therefore, in order to answer our Research Question 3, our analysis is carried out from two different points of view: (1) considering all the countries as a whole, and (2) splitting the countries into three groups according to their ICT level given by the ICT readiness pillar of WEF.
Therefore, we used the value of the ICT readiness pillar (score from 1 to 7) to break up the analysis of countries into meaningful segments. Particularly, the values of this pillar that appear in the Travel & Tourism Competitiveness Report 2017 range from 1.57 (Burundi) to 6.47 (Hong Kong SAR), so we created three ICT segments that stand for low, medium, and high ICT levels. Specifically, low ICT comprises countries that have values in , medium ICT includes countries with values in , and in the high ICT segment we found countries with values within . According to these intervals, 32 countries are classified as low ICT, 54 countries are classified as medium ICT, and 47 countries are classified as high ICT. In the Figure 2, we can observe how the countries are distributed according to the ICT level.
In summary, we performed the analysis of each variable by taking into account all the countries together, and also with respect to low, medium, and high ICT levels. First, data included in the OSM database at the beginning of 2018 is collected and processed as explained in Section 3.2. Then, the Statgraphics (
Finally, we are interested in checking the applicability of the obtained models with new data. The main idea is to compare the last published WEF indicators (from 2019 Travel & Tourism Competitiveness Report) with the predicted values given by our models, using as input data those that are included in the OSM database at the beginning of 2020. This way, data from the same period will be compared. In order to collect this new OSM data, we apply the same procedure explained in Section 3.2.
5. Results
From this point, we analyze how well the OSM data represent the eight WEF variables that measure the tourism competitiveness. Table 4 shows a summary of the results obtained in our analysis for each variable. Column Best ICT segm. indicates whether the best model has been found when considering the countries all together or when using the segmentation by ICT level. Columns Best fit model and Overall adequacy to OSM indicate the type of model that better fits the data and how well the data fits this model in each case. Each of the following sections is devoted to one variable; the details of the models for each ICT level, together with the correlation and values, are shown in Appendix A. The best model is selected for each variable, and then each of these models is applied to new OSM data (2019 data) in order to assess whether the model still gives a good fit. Column Fit to 2019 data in Table 4 compares the fitting to the model of data from 2017 with data from 2019 (Appendix B shows the value for each variable with both data sets).
5.1. CAR
Firstly, we recall that this variable measures the presence of seven major car rental companies, so the variable
(1)
The p-value lower than 0.05 indicates that there is a statistically significant relationship between
That said, the values obtained when the countries are classified by ICT are also acceptable, reflecting in all cases a strong and significant association. In general, the OSM coverage of this indicator across countries is relatively good as compared with the car rental companies registered in WEF.
Additionally, Figure 3a shows the mean values of
Countries that belong to the medium ICT level show a good correlation, partly supported by the positive correlation of some well-mapped countries like Morocco (5/6), Peru and Thailand (5/7), or Dominican Republic and Mexico (7/7), all important tourist destinations. In contrast, the relationship of countries that belong to the high ICT group is slightly worse because no car rental companies are mapped for quite a few countries that present high values of
Regarding the analysis with 2019 data, we can observe in Appendix B that the value is slightly worse than the obtained with data from 2017. This indicates that the model is not as well-adjusted to 2019 data as to 2017 data. However, the difference is not particularly remarkable.
As a conclusion, we can say that OSM reflects the official values of car rental companies across world economies quite well. More importantly, we can conclude that
5.2. ATM
In this case,
The figures for the variable
(2)
Regarding the ICT segmentation models, a remarkable point is that the goodness of fit is inversely proportional to the ICT readiness, and the relationship for countries that belong to the high ICT level is neither strong nor significant, which is a clear indication that ATMs are not well-mapped in OSM. In developed countries that count on a huge number of ATMs, it seems reasonable that OSM contributors are not very interested in mapping such facilities, as an ATM is easily found all around. The null correlation comes from the fact that although the
It is important to note that the number of ATMs is an estimation, as explained in Section 3.2, and results reflect that this estimation should be improved. The countries with the largest actual number of ATMs, those at the high ICT level, also have the largest number of ATMs in OSM (as shown in Figure 3b), but the difference between the expected (WEF) and calculated (OSM) value is significant, which makes it difficult to find a good model. In contrast,
All in all, we can conclude that
5.3. HOT
In order to compare the values for this variable, we transformed the value provided by WEF (see Section 3.2) into the total number of hotel rooms available in a country using the World Bank population estimates. Hence, we will analyze the relationship between the number of hotels (
Unlike previous variables, in this case, the best-fitted models are those obtained for countries classified according to the different ICT levels, as shown in Appendix A. Both medium and low levels follow a quite similar model, unlike a high level. Specifically:
(3)
(4)
(5)
On the other hand, it can be observed that both the linear correlation and are significant and quite similar for high and medium ICT levels, since the developed, richer countries with a higher level of ICT also have better hotel infrastructure and a more organized and competitive tourism industry as is the case of countries like Mexico, Greece at the medium level and Spain and France at the high level. However, it has not been possible to find a good model for countries in the low ICT level. This may reflect uneven data and the presence of outliers. In fact, when looking deep into the data, four outliers are identified (Burundi, Nigeria, Tajikistan, and Uganda). A new model is generated with the low ICT level countries by eliminating these outliers; this model obtains a of 0.4723 and an acceptable fit for outliers, quite similar in some cases, compared to the previous model for low ICT level countries.
On the other hand, the model with all the countries also obtains acceptable fitness to the data, comparable to those obtained for the
When the models by ICT levels are applied to the 2019 data (see Appendix B), the value is slightly worse in the case of high ICT countries and it remains the same for medium ICT countries, whereas it is better in the case of low ICT countries.
Finally, as a conclusion, we can say that the number of hotels mapped in OSM is a significant data source for countries that belong to medium and high ICT levels, even taking into account that both variables are measuring different concepts.
5.4. HBD
As with the variable
In this case, it is clear that the best models are those obtained for countries classified according the ICT level. Specifically:
(6)
(7)
(8)
Appendix A shows that the strength and significance of the relationship between
This model behaves better when 2019 data are used. As shown in Appendix B, the value of is higher in all cases, even reaching 0.97 in the case of low ICT level countries.
All in all, we can say that institutions for health care are generally well-mapped in OSM, which are valuable data for tourism purposes.
5.5. WHS
As we can observe in Appendix A, in this case, the model obtained for all the countries is not the best option. The best figures are obtained for countries that belong to the low ICT level, and models for countries in the medium and high ICT levels are comparable with the model with all the countries. The models for the different ICT levels are:
(9)
(10)
(11)
Unlike other variables, in the case of
The good measures in the low ICT level are due to the fact that a group of 25 countries of this level present
For countries that belong to a medium or high ICT level, there is no such strong positive relation. The main reason lies in the existence of some countries that have large values of
Appendix B shows that the adjustment of models for medium and high ICT levels improves with 2019 data, around 20% in both cases. This indicates that the models are still valid and that OSM data contain less outliers than 2017 data. The model for the low ICT level shows a very good fit with both datasets.
5.6. AIR
For this variable, we converted the value of
Therefore, the model with all the countries, that reaches a of 0.93, is considered the best model for this variable. The obtained regression model is:
(12)
Appendix B shows that the for this model is slightly worse when applied to 2019 data, but it still has a good fit (0.916).
All in all, we can conclude that the higher the ICT level, the more representative the relationship between
5.7. CDD
As explained above, in this case, the analysis is focused on the relationship between the online search index of cultural and entertainment activities (
(13)
(14)
(15)
A close look at the collected data reveals that the highest coverage of mapped locations corresponds by far to European countries, which also have the highest search index globally. This is the main reason that justifies the stronger correlation of the high-ICT countries, since most European countries fall within this group. The second-ranked group of countries in relation to OSM coverage corresponds to both North and South American countries, and finally the Southeast Asian countries.
The disparity between the search index and mapped locations that makes the correlation weak and moderate in medium and high ICT countries, respectively, is mostly affected by the highly coverage of European countries in comparison to the rest of the countries. As an example, the search index of countries like Czech Republic (6.5) and Poland (14) is 5 and 2.5 times less than the search index of the USA (34), while the number of mapped locations is two and three times higher in these two countries than in USA. If we focus exclusively on medium ICT, Peru and Chile have almost the same search index as Greece, but 60% less mapped locations. This provides evidence that, globally, Europe is extensively much better-mapped than the rest of the world, especially concerning cultural interests.
As for low-ICT countries, the relationship is highly significant. Furthermore, the coefficient of determination in this case is , thus indicating that 99% of variation of
5.8. NAT
In this case,
Therefore, we conclude that OSM is not a very informative source when looking for the natural spots of a country.
6. Discussion
This section discusses the results presented in the previous section, describes the limitations encountered in this analysis, and provides suggestions to make OSM a user-generated VGI reference platform in tourism management.
From Table 4 and Appendix A and Appendix B, we can conclude that OSM is representative of WEF data for CAR, HBD, and AIR variables; in the case of HOT, WHS, and CDD, it depends on the ICT level, and for ATM and especially NAT, the adequacy is not good. Moreover, we can observe that there is not a clear pattern regarding the OSM representativeness in comparison to WEF when the ICT level is taken into account. That is, in some cases, countries with a high ICT level show the best values (for example, for the AIR and HOT variables), whereas in other cases, such as WHS and CDD, countries with a low ICT level show better values. In the following, we will explain the difficulties we have faced that may explain these results.
The first limitation of OSM is the incompleteness of the data regarding the mapped elements—that is, many spots are not mapped (for example, ATMs), especially in countries with a low ICT level. In fact, in the several maps provided by Anderson [36], we can observe the huge differences in the editing density across countries, with Europe being the area with the highest density in contrast with low-ICT countries. This map also shows that the editing task also focuses on some specific areas of some countries. In general, well-governed countries with good Internet access tend to be more complete, and both sparsely populated areas and dense cities are the best-mapped [37]. However, in the last few years, there has been a significative effort in mapping many areas of Africa, as shown by Kateregga [38], which will have a positive impact on the representation of OSM with respect to WEF in these countries.
Another limitation is the incompleteness of the data with respect to the value of tags; that is, many spots are mapped but some lack information in key tags, and so we were not able to extract the same exact information as represented by WEF. This happens in variables such as HBD and HOT; there are tags defined in OSM to specify the value of the number of hospital beds or the hotel rooms but, in many cases, this information is not registered. As explained in Section 4, we have (quite successfully) overcome this difficulty in these cases by using an approximation. On the other hand, as explained above, in countries with a high ICT level, the information regarding World Heritage Sites is not registered in the appropriate tag, which has made it difficult to identify these spots. Given that these factors are important for the image of a country, authorized initiatives to record these types of data in OSM could be encouraged.
Additionally, we have missed some tags in the OSM catalog that would be very helpful in our analysis. For instance, in the case of
On the other hand, apart from the incompleteness of OSM data, our interpretation of the WEF variables in terms of OSM tags may indeed affect the accuracy of the results. For example, the estimation we used in our analysis for the variable HOT works well for high and medium ICT countries, but it should be adjusted for low-ICT countries. This fact is especially remarkable in the variable AIR, where the is 0.96 for high-ICT countries and only 0.13 for low-ICT countries. In the latter case, it would be interesting to add some additional information for a better estimation. Sometimes, however, it is not easy to find; for example, [39] publishes the airport traffic data for the top 60 worldwide airports, with respect to passengers’ traffic, but we have not found data about small airports. Another variable that would benefit from the combination of OSM data with external resources is WHS for high and medium ICT level countries: the Wikipedia gives an exhaustive list of World Heritage Sites by country [40]; however, in this case, a better approach would be to use the information in Wikipedia to complete the corresponding tag in OSM data.
We envision the following challenges to make OSM a user-generated VGI reference platform in tourism management: (1) To expand the OSM tagging system by including specific tourism-related tags; (2) encourage users, representatives, authorities, and tourism industry managers to participate in OSM; (3) foster a balance between the general freedom of OSM contributors to fill in data and producing data in a standardized way. Additionally, interesting initiatives like LinkedGeoData that collect spatial data from OSM and make it available as an RDF knowledge base will help increase the visibility of OSM and incentivize its utilization by visitors.
7. Conclusions
Tourism research has fostered the exploitation of OSM in smart tourism projects, encouraged by promising outcomes of studies that regard OSM as a holistic tourism platform. This new vision of tourism that deals with hyper-connected tourists who consume content any time and through different channels revolves around two core elements, smart phones and geolocation, with OSM being mostly a globally used geodata platform.
In this paper, we have presented an exploratory analysis to study the representativeness of data gathered in OSM. We have undertaken a thorough analysis of eight variables of WEF that cover different tourism aspects, and examined how well OSM data reflect the official values of such variables. We carefully selected the most representative OSM tags to retrieve the information comprised in the eight variables, and then studied for each variable the relationship between the official value and the OSM value.
The presented analysis is a small sample that illustrates the adequacy of OSM user-generated content for obtaining a picture of the tourism industry in a country. We selected a few variables representing concepts that are measurable and comparable with official statistics, but the analysis is extensible to the large variety of maps, data, and volunteered geo-information offered by OSM.
Studies such as the one presented in this article are relevant because they serve to determine whether OSM data can be used as a reliable data source for tourism-related applications.
Further work can be done to study other indicators that highly influence tourism behaviour, such as road density, railroad infrastructure, or protected areas, as well as extending the analysis to other collaborative data sources, such as DBPedia and Foursquare, among others. In addition to the ICT level, some other aspects could also be considered, such as the country’s population, geographical area, gross domestic product, or the International Monetary Fund classification in Advanced countries and Emerging and developing countries, among others, in the model generation.
Author Contributions
Alexander Bustamante: Conceptualization, Visualization, Software, Writing—original draft; Laura Sebastia: Methodology, Writing—review & editing; Eva Onaindia: Supervision, Writing—review & editing. All authors have read and agreed to the published version of the manuscript.
Funding
This work has been supported by COLCIENCIAS through a PhD scholarship.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Acknowledgments
This work is supported by the Spanish MINECO project TIN2017-88476-C2-1-R. Map data copyrighted OpenStreetMap contributors and available from
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
MDPI | Multidisciplinary Digital Publishing Institute |
DOAJ | Directory of open access journals |
TLA | Three letter acronym |
LD | linear dichroism |
Appendix A. Models Obtained for All the Variables
This table shows the correlation value and the determination coefficient for each variable and ICT-based segment (All refers to all countries analysis and High, Medium and Low refer to the analysis based on country segmentation by ICT level). Column Model indicates the type of model that better fits the data at hand and column p-value shows the confidence level of this model.
Table A1
Models by ICT level.
VAR. | ICT | Model | Correlation | p-Value | |
---|---|---|---|---|---|
CAR | All | Squared-Y Squared Root-X | 0.83 | 0.7041 | <0.05 |
High | Double Squared Root | 0.71 | 0.5174 | <0.05 | |
Medium | Squared-Y Squared Root-X | 0.78 | 0.619 | <0.05 | |
Low | Double Square | 0.80 | 0.6433 | <0.05 | |
ATM | All | Log-Y Squared Root-X | 0.64 | 0.4209 | <0.05 |
High | Reciprocal-Y Squared-X | −0.27 | 0.0752 | >0.05 | |
Medium | Log-Y Squared Root-X | 0.46 | 0.2186 | <0.05 | |
Low | Multiplicative | 0.56 | 0.3151 | <0.05 | |
HOT | All | Squared Root-Y | 0.83 | 0.6986 | <0.05 |
High | Squared Root-Y | 0.88 | 0.7802 | <0.05 | |
Medium | Multiplicative | 0.88 | 0.7920 | <0.05 | |
Low | Multiplicative | 0.61 | 0.3799 | <0.05 | |
HBD | All | Log-Y Squared Root-X | 0.77 | 0.6021 | <0.05 |
High | Double Squared Root | 0.91 | 0.8290 | <0.05 | |
Medium | Multiplicative | 0.85 | 0.7325 | <0.05 | |
Low | Double Square | 0.83 | 0.7018 | <0.05 | |
WHS | All | Double Squared Root | 0.67 | 0.4508 | <0.05 |
High | Log-Y Squared Root-X | 0.69 | 0.4791 | <0.05 | |
Medium | Log-Y Squared Root-X | 0.63 | 0.4089 | <0.05 | |
Low | Double Square | 0.95 | 0.9121 | <0.05 | |
AIR | All | Double Square | 0.96 | 0.9311 | <0.05 |
High | Double Square | 0.98 | 0.9680 | <0.05 | |
Medium | Double Squared Root-X | 0.70 | 0.4933 | <0.05 | |
Low | Reciprocal-Y Squared-X | 0.37 | 0.1391 | <0.05 | |
CDD | All | Squared-Y Squared Root-X | 0.63 | 0.4038 | <0.05 |
High | Log-Y Squared Root-X | 0.66 | 0.4409 | <0.05 | |
Medium | Squared Root-Y Log-X | 0.52 | 0.2765 | <0.05 | |
Low | Double Square | 0.99 | 0.9947 | <0.05 | |
NAT | All | Log-Y Squared Root-X | 0.17 | 0.0304 | <0.05 |
High | Log-Y Squared Root-X | 0.23 | 0.0570 | <0.05 | |
Medium | Double Square | −0.12 | 0.0150 | <0.05 | |
Low | Double Square | −0.08 | 0.0072 | <0.05 |
Appendix B. Comparison between 2017 and 2019 Data
This table shows a comparison between OSM from 2017 (original data) and 2019 (test data), where the determination coefficient for both sets of data and for the selected models can be observed.
Comparison between results with the original data from 2017 and test data from 2019.
Table A2
Model fit comparison.
VAR. | ICT | —2017 (Original Data) | —2019 (Test Data) |
---|---|---|---|
CAR | All | 0.7041 | 0.6697 |
ATM | All | 0.4209 | 0.4149 |
HOT | High | 0.7802 | 0.7587 |
Medium | 0.7920 | 0.795 | |
Low | 0.3799 | 0.4343 | |
HBD | High | 0.829 | 0.86586 |
Medium | 0.7325 | 0.7949 | |
Low | 0.7018 | 0.9733 | |
WHS | High | 0.4791 | 0.615 |
Medium | 0.4089 | 0.6575 | |
Low | 0.91527 | 0.91966 | |
AIR | All | 0.9311 | 0.91628 |
CDD | High | 0.4409 | 0.59992 |
Medium | 0.2765 | 0.3077 | |
Low | 0.9947 | 0.9986 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures and Tables
Table 1Tourism competitiveness variables. Source for indicators can be (S)urvey or (H)ard data. Indicators can be computed (D)irectly or (I)ndirectly from OSM data.
Pillar | Indicator | Variable | Source | Comp. |
---|---|---|---|---|
Tourist service infrastructure | Presence of major car rental companies This indicator measures the presence of seven major car rental companies: Avis, Budget, Europcar, Hertz, National Car Rental, Sixt and Thrifty. For each country WEF counts how many of these companies operate via online research. | CAR | H | D |
Tourist service infrastructure | ATMs per adult population Number of automated teller machines (ATMs) per adult population of 100,000. | ATM | H | D |
Tourist service infrastructure | Hotel rooms Number of hotel rooms per population of 100. |
|
H | I |
Health and hygiene | Hospital beds Hospital beds include inpatient beds available in public, private, general and specialized hospitals and rehabilitation centers. In most cases, beds for both acute and chronic care are included, per population of 10,000. |
|
H | I |
Cultural resources and business travel | Number of World Heritage cultural sites Number of properties that the World Heritage Committee considers as having outstanding universal cultural value. | H | D | |
Natural resources | Number of World Heritage natural sites Number of properties that the World Heritage Committee considers as having outstanding universal natural value. | WHS | H | D |
Air Transport Infrastructure | Airport density Number of airports with at least one scheduled flight per million of urban population. |
|
H | D |
Cultural resources and business travel | Cultural and entertainment tourism digital demand This indicator measures the total online search volume related to the following cultural brandtags: Historical Sites, Local People, Local Traditions, Museums, Performing Arts, UNESCO, City Tourism, Religious Tourism, Local Gastronomy, Entertainment Parks, Leisure Activities, Nightlife and Special Events. |
|
H | I |
Natural resources | Attractiveness of natural assets To what extent do international tourists visit your country mainly for its natural assets (i.e., parks, beaches, mountains, wildlife, etc.)? (1 = not at all; 7 = to a great extent). |
|
S | I |
Keys of OSM to represent tourism elements.
Key | Description |
---|---|
Amenity | This key is used to map facilities used by visitors and residents. For example: bar ( |
Aeroway | This is mainly related to aerodromes |
Historic | This key is used to describe various historic places. For example: archeological sites |
Leisure | This key is used to tag leisure and sports facilities, such as water parks |
Tourism | It represent places and things of specific interest to tourists including places to see, places to stay, and things and places providing information and support to tourists. A museum is one of the possible values of this tag ( |
Models used in our analysis.
Model | Equation | Transformation on Y | Transformation on X |
---|---|---|---|
Linear | None | None | |
Double Squared Root | Square root | Square root | |
Multiplicative | Log | Log | |
Double Square | Square | Square | |
Log-Y square root-X | Log | Square Root | |
Squared-Y square root X | Square | Square Root | |
Square root-Y | Square root | None | |
Square root-Y log-X | Squared root | Log |
Summary of results for all the variables.
VAR. | Best ICT Segm. | Best Fit Model | Overall Adequacy to OSM | Fit to 2019 Data |
---|---|---|---|---|
CAR | All | Squared-Y Squared Root-X | Good | Slightly worse |
ATM | All | Log-Y Squared Root-X | Fair | Slightly worse |
HOT | ICT Segmentation | High: Squared Root-Y | Good | Slightly worse |
Med: Multiplicative | Good | Similar | ||
Low: Multiplicative | Poor | Slightly better | ||
HBD | ICT Segmentation | High: Double Squared Root | Good | Slightly better |
Med: Multiplicative | Good | Slightly better | ||
Low: Double Square | Good | Better | ||
WHS | ICT Segmentation | High: Log-Y Squared Root-X | Fair | Better |
Med: Log-Y Squared Root-X | Fair | Better | ||
Low: Double Square | Very good | Similar | ||
AIR | All | Double Square | Very good | Slightly worse |
CDD | ICT Segmentation | High: Log-Y Squared Root-X | Fair | Better |
Med: Squared Root-Y Log-X | Poor | Slightly better | ||
Low: Double Square | Very good | Similar | ||
NAT | - | No model found | - | - |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Since 2007, the World Economic Forum (WEF) has issued data on the factors and policies that contribute to the development of tourism and competitiveness across countries worldwide. While WEF compiles the yearly report out of data from governmental and private stakeholders, we seek to analyze the representativeness of the open and collaborative platform OpenStreetMap (OSM) to the international tourism scene. For this study, we selected eight parameters indicative of the tourism development of each country, such as the number of beds or cultural sites, and we extracted the OSM objects representative of these indicators. Then, we performed a statistical and regression analysis of the OSM data to compare and model the data emitted by WEF with data from OSM. Our aim is to analyze the tourist representativeness of the OSM data with respect to official reports to better understand when OSM data can be used to complement the official information and, in some cases, when official information is scarce or non-existent, to assess whether the OSM information can be a substitute. Results show that OSM data provide a fairly accurate picture of official tourism statistics for most variables. We also discuss the reasons why OSM data is not so representative for some variables in some specific countries. All in all, this work represents a step towards the exploitation of open and collaborative data for tourism.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



1 Valencia Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València, 46022 Valencia, Spain;
2 Valencia Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València, 46022 Valencia, Spain;