In the paper 'Pretraining Methods for Dialog Context Representation Learning', it mentions another related paper that incorporates a useful auxiliary loss function for latent variable inference for dialog generation. Provide the full name of that work.