Detecting Complex Sensitive Information via Phrase Structure in Recursive Neural Networks

Jan Neerbek; Ira Assent; Peter Dolog

Detecting Complex Sensitive Information via Phrase Structure in Recursive Neural Networks

Jan Neerbek, Ira Assent, Peter Dolog

Published: 01 Jan 2018, Last Modified: 17 Dec 2024PAKDD (3) 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: State-of-the-art sensitive information detection in unstructured data relies on the frequency of co-occurrence of keywords with sensitive seed words. In practice, however, this may fail to detect more complex patterns of sensitive information. In this work, we propose learning phrase structures that separate sensitive from non-sensitive documents in recursive neural networks. Our evaluation on real data with human labeled sensitive content shows that our new approach outperforms existing keyword based strategies.

Loading