In paper 'Attention-Guided Curriculum Learning for Weakly Supervised Classification and Localization of Thoracic Diseases on Chest Radiographs', an important baseline is an end-to-end CNN-RNN architecture for learning to embed visual images and text reports for image classification and report generation, which is from another paper that you've also read. Provide the title of that paper.