A Machine Learning Framework for Harvesting and Harmonizing Cultural and Touristic Data

Published: 2025, Last Modified: 26 Jan 2026Inf. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Cultural and touristic information is increasingly available through a multitude of heterogeneous sources, including official repositories, community platforms, and open data initiatives. While prominent landmarks are typically covered across sources, less-known attractions are also documented with varying degrees of detail, resulting in fragmented, overlapping, or complementary content. To enable integrated access to this wealth of information, harvesting and consolidation mechanisms are required to collect, reconcile, and unify distributed content referring to the same entities. This paper presents a machine learning-driven framework for harvesting, homogenizing, and augmenting cultural and touristic data across multilingual sources. Our approach addresses entity resolution, duplication detection, and content harmonization, laying the foundation for enriched, unified representations of attractions and points of interest. The framework is designed to support scalable integration pipelines and can be deployed in applications aimed at tourism promotion, digital heritage, and smart travel services.
Loading