Laying foundations for effective machine learning in law enforcement. Majura - A labelling schema for child exploitation materials
Abstract: The health impacts of repeated exposure to distressing concepts such as child exploitation materials
(CEM, aka ‘child pornography’) have become a major concern to law enforcement agencies and associated entities. Existing methods for ‘flagging’ materials largely rely upon prior knowledge, whilst predictive methods are unreliable, particularly when compared with equivalent tools used for detecting
‘lawful’ pornography. In this paper we detail the design and implementation of a deep-learning based
CEM classifier, leveraging existing pornography detection methods to overcome infrastructure and
corpora limitations in this field. Specifically, we further existing research through direct access to
numerous contemporary, real-world, annotated cases taken from Australian Federal Police holdings,
demonstrating the dangers of overfitting due to the influence of individual users' proclivities. We
quantify the performance of skin tone analysis in CEM cases, showing it to be of limited use. We assess
the performance of our classifier and show it to be sufficient for use in forensic triage and ‘early warning’
of CEM, but of limited efficacy for categorising against existing scales for measuring child abuse severity.
We identify limitations currently faced by researchers and practitioners in this field, whose restricted
access to training material is exacerbated by inconsistent and unsuitable annotation schemas. Whilst
adequate for their intended use, we show existing schemas to be unsuitable for training machine
learning (ML) models, and introduce a new, flexible, objective, and tested annotation schema specifically
designed for cross-jurisdictional collaborative use.
This work, combined with a world-first ‘illicit data airlock’ project currently under construction, has
the potential to bring a ‘ground truth’ dataset and processing facilities to researchers worldwide without
compromising quality, safety, ethics and legality
0 Replies
Loading