Content area

Abstract

Cultural and touristic information is increasingly available through a multitude of heterogeneous sources, including official repositories, community platforms, and open data initiatives. While prominent landmarks are typically covered across sources, less-known attractions are also documented with varying degrees of detail, resulting in fragmented, overlapping, or complementary content. To enable integrated access to this wealth of information, harvesting and consolidation mechanisms are required to collect, reconcile, and unify distributed content referring to the same entities. This paper presents a machine learning-driven framework for harvesting, homogenizing, and augmenting cultural and touristic data across multilingual sources. Our approach addresses entity resolution, duplication detection, and content harmonization, laying the foundation for enriched, unified representations of attractions and points of interest. The framework is designed to support scalable integration pipelines and can be deployed in applications aimed at tourism promotion, digital heritage, and smart travel services.

Details

1009240
Title
A Machine Learning Framework for Harvesting and Harmonizing Cultural and Touristic Data
Publication title
Volume
16
Issue
12
First page
1038
Number of pages
46
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
e-ISSN
20782489
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-11-28
Milestone dates
2025-09-27 (Received); 2025-11-14 (Accepted)
Publication history
 
 
   First posting date
28 Nov 2025
ProQuest document ID
3286306681
Document URL
https://www.proquest.com/scholarly-journals/machine-learning-framework-harvesting-harmonizing/docview/3286306681/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-12-24
Database
2 databases
  • Coronavirus Research Database
  • ProQuest One Academic