Applying NLP Techniques to Classify Businesses by their International Standard Industrial Classification (ISIC) Code
Abstract: The application of machine learning has played an important role in several aspects of text classification across domains, and has brought with it great changes to the current state of the art. In this paper, we propose a novel application of NLP techniques to classify entities by their International Standard Industrial Classification (ISIC) code based on descriptions provided by the business owners themselves and the names of said businesses. Faced with the issues of irregularity and a small amount of noisy training data, we employ several different NLP models and data enhancement strategies. We identify DistilBERT, under normalized loss, as the best model for our task, with a 77.9% average accuracy on a 56 label multiclass classification task.
External IDs:dblp:conf/bigdataconf/BecharaZYJ22
Loading