SWEET - Weakly Supervised Person Name Extraction for Fighting Human Trafficking

Javin Liu; Hao Yu; Vidya Sujaya; Pratheeksha Nair; Kellin Pelrine; Reihaneh Rabbany

SWEET - Weakly Supervised Person Name Extraction for Fighting Human Trafficking

Javin Liu, Hao Yu, Vidya Sujaya, Pratheeksha Nair, Kellin Pelrine, Reihaneh Rabbany

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Information Extraction

Submission Track 2: NLP Applications

Keywords: Information Extraction, Large Language Models, Generation, NLP Applications, Resources and Evaluation

Abstract: In this work, we propose a weak supervision pipeline SWEET: Supervise Weakly for Entity Extraction to fight Trafficking for extracting person names from noisy escort advertisements. Our method combines the simplicity of rule-matching (through antirules, i.e., negated rules) and the generalizability of large language models fine-tuned on benchmark, domain-specific and synthetic datasets, treating them as weak labels. One of the major challenges in this domain is limited labeled data. SWEET addresses this by obtaining multiple weak labels through labeling functions and effectively aggregating them. SWEET outperforms the previous supervised SOTA method for this task by 9% F1 score on domain data and better generalizes to common benchmark datasets. Furthermore, we also release HTGEN, a synthetically generated dataset of escort advertisements (built using ChatGPT) to facilitate further research within the community.

Submission Number: 5089

Loading