Abstract: In recent years, natural language processing (NLP) technologies have made a significant contribution in addressing a number of labour market tasks. One of the most interesting challenges is the automatic extraction of competences from unstructured texts.This paper presents a pipeline for efficiently extracting and standardizing skills from job advertisements using NLP techniques. The proposed methodology leverages open-source Transformer and Large Language Models to extract skills and map them to the European labour market taxonomy, ESCO.To address the computational challenges of processing lengthy job advertisements, a BERT model was fine-tuned to identify text segments likely containing skills. This filtering step reduces noise and ensures that only relevant content is processed further. The filtered text is then passed to an LLM, which extracts implicit and explicit hard and soft skills through prompt engineering. The extracted skills are subsequently matched with entries in a vector store containing the ESCO taxonomy to achieve standardization.Evaluation by domain experts shows that the pipeline achieves a precision of 91% for skill extraction, 80% for skill standardization and a combined overall precision of 79%. These results demonstrate the effectiveness of the proposed approach in facilitating structured and standardized skill extraction from job postings.
Loading