Abstract: Our society has been immersed with massive unstructured text data, posing great challenges for people to fetch needed data, digest critical information, and derive actionable knowledge. Such needs necessitate the development of text classification which is a fundamental task towards structuring the unstructured web data. Existing methods either require heavy human annotation or work only with limited scope (e.g., classification into only a small number of classes), far off from the real needs. Recently developed deep learning and pre-trained language models boost our research substantially, but many problems still remain. Therefore, we propose to develop a minimally-supervised approach to structure massive text into a multi-granularity text space. We explore the following four subtasks: (1) weak supervision enrichment, (2) PLM-enhanced weakly-supervised text classification, (3) empowering fine-grained text classification with enriched taxonomy, (4) joint classification of multi-granular text units.
0 Replies
Loading