Weak Supervision Text Classification using Cosine Similarity and SVM for Hardware Constrained SystemsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Weakly supervised text classification is the ability to classify large, diverse types of unstructured text data while requiring only a small amount of manual guidance. With open-source pre-trained language models becoming widely available in the last couple of years, the weak supervision text classification domain has received renewed interest due to the potential for transfer learning. Recent weak supervision methods proposed using pre-trained language models have performed well against the popular WRENCH benchmark datasets (Zhang et al., 2021), demonstrating the capability of transfer learning. However, these methods use pre-trained language models that are computationally expensive to perform inference with and are unfeasible to finetune without specialized accelerated hardware. Methods that don’t require fine-tuning often require repeated inference or large storage needs to achieve their results. In this paper, an alternative solution is proposed that uses a single inference step, has minimal storage and memory requirements, doesn’t require accelerated hardware, and can provide competitive results to much more hardware-intensive methods.
Paper Type: long
Research Area: Semantics: Sentence-level Semantics, Textual Inference and Other areas
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Preprint Status: There is no non-anonymous preprint and we do not intend to release one.
A1: yes
A1 Elaboration For Yes Or No: Section 9
A2: yes
A2 Elaboration For Yes Or No: Section 8
A3: yes
A3 Elaboration For Yes Or No: Sections 4 and 5
B: yes
B1: yes
B1 Elaboration For Yes Or No: 4.3
B2: no
B2 Elaboration For Yes Or No: These are common datasets used in many papers across the NLP domain. I've referenced the original source papers for the datasets in case anyone would like to review the original intent behind the paper.
B3: no
B3 Elaboration For Yes Or No: I provided my intended use with them in section 4.2, but didn't elaborate as to their original intended use. I've referenced the original source papers for the datasets in case anyone would like to review the original intent behind the paper.
B4: no
B4 Elaboration For Yes Or No: These are commonly used datasets from publically available sources, and therefore should not have any concerns about privacy or offensive content. This paper's intent is to classify data, and not store, display or focus on any particular individual text data from the datasets.
B5: yes
B5 Elaboration For Yes Or No: Section 4, Table 1 provides some text classification domain overviews. The paper references all the source papers that the datasets came from, so that more in depth knowledge can be found in the creator's papers.
B6: yes
B6 Elaboration For Yes Or No: Section 4, Table 1.
C: yes
C1: yes
C1 Elaboration For Yes Or No: Section 4, Table 1 and Table 2
C2: yes
C2 Elaboration For Yes Or No: Section 4.2
C3: yes
C3 Elaboration For Yes Or No: Section 4.3
C4: yes
C4 Elaboration For Yes Or No: Appendix B
D: no
D1: n/a
D2: n/a
D3: n/a
D4: n/a
D5: n/a
E: no
E1: n/a
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview