In the paper 'Pretraining Methods for Dialog Context Representation Learning', the related works mentioned that Incorporating a useful auxiliary loss function to complement the primary objective has been shown to improve the performance of deep neural network models, one of the examples was cross-lingual speech tagging, which setting is from another paper that you have read. Provide the full name of the paper.