Why does Negative Sampling not Work Well? Analysis of Convexity in Negative SamplingDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Knowledge Graph Embedding, KGE, Negative Sampling, Convexity
Abstract: A negative sampling (NS) loss function is widely used in various tasks because we can choose an appropriate noise distribution considering properties for a targeting task. In particular, since the NS loss function does not have a normalization term, it is useful for classification problems with a large number of labels to be considered, such as knowledge graph embedding in terms of computational efficiency. On the other hand, properties of the NS loss function that are considered important for learning, such as the relationship between the noise distribution and the number of negative samples, have not been investigated theoretically. By analyzing the gradient of the NS loss function, we show that the NS loss function is non-convex and has a partial convex domain. We investigated the conditions of noise distribution and the number of samples required for efficient learning under this property. As a result, we found that the NS loss function behaves as a convex loss function when our induced conditions are satisfied and combined with a scoring method that handles only non-negative values, which enables efficient learning. Experimental results in FB15k-237, WN18RR, and YAGO3-10 showed that NS loss satisfying the conditions we proposed can improve the performance of KG completion by utilizing TransE and RotatE, which are non-negative scoring methods.
One-sentence Summary: We investigate the non-convexity of the negative sampling loss function and its inflection point to understand what conditions can improve the model's performance through training.
20 Replies

Loading