Skill Extraction from Resumes and Job Offers across Six Languages

17 Mar 2026 (modified: 19 May 2026)SwissText 2026 Conference SubmissionEveryoneRevisionsCC BY 4.0
Track: Scientific Track
Keywords: Natural Language Processing, NLP, Skill Extraction, Human Resources, Multilinguality
TL;DR: We evaluate rule-based, semantic, and supervised skill extraction across 1,200 expert-annotated job offers and resumes in six languages, identifying a significant performance gap between structured job postings and highly variable resume data.
Abstract: We comprehensively evaluate multiple skill extraction approaches, including rule-based, semantic, and supervised methods, using resumes and job offers in English, French, German, Italian, Spanish, and Portuguese. Due to inherent privacy concerns in Human Resources (HR) data and the high cost of manual annotations, research on identifying relevant skills for the job market remains limited, often restricted to specific domains, datasets, and entity types, and is available in only a few languages. In the context of an industrial project, we have annotated 1,200 job offers and resumes across diverse domains and six languages, through a multidisciplinary collaboration among HR researchers, NLP researchers, and HR tech professionals. Our evaluation assesses the effectiveness of these systems in a multilingual, multidomain setting, capturing both standardized job offers and highly variable resumes. The results show that supervised models achieve F1 scores of up to 0.6, while rule-based methods offer better interpretability. Furthermore, we find large differences between how skills are formulated in job offers and resumes, while the latter is understudied in academic research.
Submission Number: 30
Loading