Abstract: The purpose of this paper is to present the design and results of experiments that focus on universal, autonomous data extraction (web scraping) system fed by publicly available online job listings. In particular, methods of automated crawling, preprocessing and classifying data from job offers will be presented together with the aggregation of the acquired data stored in large-scale, structured databases. We tested two models to classify the content of job portals: fastText and XGBoost. We obtained promising results in the experimental phase, with 88% accuracy by both methods.
0 Replies
Loading