Abstract: Highlights•MI estimation and triplet loss are firstly employed to learn the semantic relationship over the entire dataset in PLL.•MI estimation can make the learned representation contain more distinguishing information. Triplet loss can make thedistance between graph representations with different label classes as big as possible.•Instead of random sampling, positive and negative instances are intention[1]ally sampled based on the graph to calculate MI estimation with higher accuracy.•MI estimation is carried out for not only the local representation but also the global representation, so that the learned representation will contain more distinguishing information for disambiguation.