Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications
Keywords: Adversarial Attack, Knowledge Base Completion
Abstract: Representing entities and relations in an embedding space is a well-studied approach for machine learning on relational data. Existing approaches, however, primarily focus on improving ranking metrics and ignore other aspects of knowledge base representations, such as robustness, interpretability, and ability to detect errors. In this paper, we propose adversarial attacks on link prediction models (AALP): identifying the fact to add into or remove from the knowledge graph that changes the prediction of a target fact. Using these attacks, we are able to identify the most influential related fact for a predicted link and investigate the sensitivity of the model to additional made-up facts. We introduce an efficient approach to estimate the effect of making a change by approximating the change in the embeddings upon altering the knowledge graph. In order to avoid the combinatorial search over all possible facts, we introduce an inverter function and gradient-based search to identify the adversary in a continuous space. We demonstrate that our models effectively attack the link prediction models by reducing their accuracy between 6-45% for different metrics. Further, we study patterns in the most influential neighboring facts, as identified by the adversarial attacks. Finally, we use the proposed approach to detect incorrect facts in the knowledge base, achieving up to 55% accuracy in identifying errors.