1. The simulation folder provides the code for the two-dimensional scatter plots in the main text, including data generation, estimation of oracle b and oracle s, and the plotting code. Among them, oracle_generate.py is the code for estimation without errors; generate_with_error.py is the code for estimation with errors, where different error_type and gamma_type can be set, corresponding to the different experimental results in the main text.

2. The classification folder provides the experimental results on MNIST and FashionMNIST. This code is primarily based on [Condition DDPM](https://github.com/byrkbrk/conditional-ddpm). We have added the following: the checkpoints folder provides checkpoints for generative models corresponding to different training set sizes, and the generated_images folder contains samples generated by these checkpoints. classification.py provides the code for training the classification model.

3. The solo-learn folder provides the code for contrastive learning, which is mainly based on [solo-learn](https://github.com/vturrisi/solo-learn). We designed generate_extra_data.py to generate noisy samples with added noise. The samples generated by the generative model are based on [adainf](https://github.com/PKU-ML/adainf). Using these additional data, we trained the model using the contrastive learning code from solo-learn.

4. memory_quality folder contains the experiments from the appendix section "The Relation between Generation Quality and Training Set." This part of the code is mainly based on the training code for ImageNet from [Sit](https://github.com/willisma/SiT). We modified train.py to record the quality and memorability of the images generated by the model at regular intervals during training. quality_memory_combing can plot the line graph of image quality versus memorability. select_train_data.py can select a certain number of images from the ImageNet dataset to form the training set.



