Triage of Messages and Conversations in a Large-Scale Child Victimization Corpus

Published: 01 Jan 2024, Last Modified: 15 Jun 2024WWW 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Children are among the most vulnerable online populations. Reports of child sexual exploitation on social media and apps have grown annually at an alarming rate and are overwhelming investigators. Even a single case can require examining millions of messages involving hundreds of victims. Triage and prioritization based on victims' experiences is an unfortunate necessity. Using a chat dataset of more than 3 million messages between victims and perpetrators, we evaluate and contribute tools for analyzing the experiences of victims of sexual exploitation. We develop both supervised and unsupervised methods to classify messages into categories of interest to law enforcement, such as age requests, persuasion, and sexual messages. We also introduce a conversation clustering technique to illuminate differences among victims' experiences based on their chat history. Through a qualitative analysis, we demonstrate that the learned clusters are coherent and represent distinct conversation patterns. For example, we can distinguish groups of users who never comply with sexual requests, comply after a few conversations, or comply immediately after being targeted. We expect this approach and associated visualizations will aid law enforcement, industry moderators, and sociologists who need to analyze massive corpora in this domain. Finally, we validate prior models derived from conversations involving adults pretending to be minors and provide statistics that could help undercover adults more accurately portray minor victims.
Loading