On Leakage in Some Popular Benchmarks on Graphs

Anonymous

On Leakage in Some Popular Benchmarks on Graphs

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone

Abstract: A number of benchmarks are based on graphs. Edges are typically split into train, validation and test splits, using a random partition. Leakage has been discovered in a number of popular benchmarks; FB15k has been replaced by FB15k-237 and WN18 has been replaced by WN18RR, though leakage has been reported even after these corrections. This paper will report a new type of leakage, $A$-leakage, on benchmarks for synonym-antonym classification. $A$-leakage infers labels for pairs of words in the test split, $w_i , w_j$, by exploiting labels on paths from $w_i$ to $w_j$ in the training split. We conclude that it is safer to partition vertices, $V$, than edges, $E$.

Paper Type: short

0 Replies

Loading