Abstract: The problem of CV (or resume) text mining becomes increasingly relevant nowadays as long as it could simplify the evaluation of future employees and their suitability for the post for which they apply. The paper proposes a procedure for automatic information extraction from text documents, namely from candidate’s CVs. The described algorithm is based on Natural Language Processing methods and allows to transform text information into categorical features or classes. These features may further be used as inputs for a machine learning model to predict the suitability of the candidate for the position. Besides the general method, the description of the experiments is given in which the algorithm was used for clusterization of future employees according to their previous position and job spheres they worked in. The obtained classes were used to predict the probability of the candidate’s turnover in the first six months. Their addition allowed to raise the model score.
Loading