In the paper 'Pretraining Methods for Dialog Context Representation Learning', it mentions another related paper that incorporates a useful auxiliary loss function for error
detection, which you've also read. Provide the full name of that work.