Improving Occupational ISCO Classification of Multilingual Swiss Job Postings with LLM-Refined Training Data
Abstract: Classifying occupations in multilingual job postings is challenging due to label noise, language variation, and domain-specific terminology. We propose an approach that refines existing silver-standard job labels using large language model (LLM) assessments and integrates them into Multiple Negatives Ranking (MNR) training for SBERT-based ISCO classification. Our method improves classification accuracy across languages while retaining partial ontology alignment. Experimental results show that LLM-assisted curation enhances training data quality, increasing Top-1 accuracy by over 20 percentage points on job postings. Additionally, multilingual performance benefits from positive cross-lingual transfer, with substantial gains in French and Italian. While fine-tuning leads to a slight drop in ontology-specific accuracy, the overall alignment between job ads and occupational classifications improves. Our findings highlight the potential of LLM-guided refinement for enhancing occupation classification in multilingual labor market data.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: NLP tools for social analysis;document-level extraction; multilingual extraction; zero/few-shot extraction;
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: German, English, French, Italian
Submission Number: 7213
Loading