Abstract: In the digital age, social media platforms have amassed a wealth of user-generated content, which contains valuable geographic information. However, the irregularities and noise in user-generated text, have led to suboptimal performance in traditional text-based user geolocation methods. We propose a unsupervised framework for user geolocation based on Large Language Models (LLMs), which utilizes the LLMs’ powerful text processing abilities to geolocate users based on user-generated text unsupervisedly. Firstly, preprocess the text using regularization rules and LLM to denoise and normalize user-generated text, thus enhancing data quality. Subsequently, appropriate prompts are designed to guide the knowledgeable LLM in understanding the user text’s geolocating mechanism, thereby profiling users. To refine user geolocation accuracy, five independent positioning iterations are conducted, with the most frequent occurrence identified as the final user location. Through a series of experiments, we have demonstrated the potential of utilizing large language models for processing noisy text and the effectiveness of geolocating users in an unsupervised setting.
Loading