# CQMU
This repo contains the source code of CQMU. 

## Folder structure
 - In CQMU, one can find the codes for CQMU, CPU, and RT, as follows respectively:
   - 1) cqmu_general generates the output for CQMU.
   - 2) cpu_general generates the output for CPU.
   - 3) rt_general generates the output for RT.
   - In helper_functions, one can find the utility functions used in CQMU, CPU,
     and RT, including data loading, model training, and evaluation.
 - In reproducing_others_third_party, there are two subfolders we adapted prior works:
   - BADT_UNSIR contains the codes for BADT and UNSIR:
     - 3) badt_unsir_general generates the output for BADT and UNSIR.
     - In helper_functions, one can find the same utility functions used in CQMU and the rest.
     - The rest of the files are the original special utility functions used in
       BADT and UNSIR, which we did not modify (besides adaptation). There is also the original license file of BADT that we left unchanged.
   - NABLA_AMN contains the codes for nabla tau, SSD, SCRUB, and AMN:
     - 4) nabla_amn_general generates the output for nabla tau, SSD, SCRUB, and AMN.
     - In helper_functions, one can find the same utility functions used in CQMU
       and the rest.
     - The rest of the files are the original special utility functions used in
       nabla tau, SSD, SCRUB, and AMN, which we did not modify (besides adaptation).
       There is also the original license file of nabla tau that we left unchanged.
 - In models, the base pre-trained models are supposed to be stored. Due to size limit of openreview,
   we could not include them here. However, the code is supposed to train the models from scratch when it does not find them.
 - In results, one can find the output results of all the methods after running the codes.
 - In data, the datasets are supposed to be stored. Due to size limit of openreview,
   we could not include them here. However, the code is supposed to download some datasets (s.a. CIFAR-100) automatically when it does not find them.
 - 5) requirements.txt contains the required packages to run the codes. It can be used to create a
   conda environment by running
   `conda create --name <env_name> --file requirements.txt`
 - LICENSE does not contain the authors' name, as the work is under double-blind review, but it will
   be updated after the review process.

## Codes running
 - All the codes use a similar convention to run. For example, to run 1) CQMU, one can use
   `python cqmu_general.py --scenarios [[[100],[0.1],20]] --seeds [0] --epoch 20 --lr 0.01 --lam 0.1 --gamma 1.0 --mode label --dataset cifar100 --model_type resnet18 --text 0 --batch_size 256`
    - The arguments are as follows:
      - scenarios: a list of scenarios, where each scenario is a list of three elements:
        - the first element is the critical set size c,d (can include multiple values for testing) (e.g., [100,50,20] for CIFAR-100),
        - the second element is the coverage level alpha (e.g., [0.1] for coverage level 0.90),
        - the third element is the number of forgotten classes/clusters/random points (e.g., 20 or 1000).
      - seeds: a list of random seeds to run the experiments (e.g., [0, 1, 2]).
      - epoch: number of training epochs (e.g., 20).
      - lr: learning rate (e.g., 0.01).
      - lam: lambda parameter for CQMU (e.g., 0.1).
      - gamma: gamma parameter for CQMU (e.g., 1.0).
      - delta: delta parameter for CQMU (default is 0.5).
      - cp_calib: unused.
      - dataset: dataset name (e.g., cifar100, imagenet).
      - model_type: model architecture (e.g., resnet18 or berta_distill).
      - batch_size: batch size for training (e.g., 256).
      - mode: mode of operation, e.g. 'label' or 'cluster' (default is 'label').
      - text: whether to use text features (0 for no, 1 for yes; default is 0).

    - The other codes can be run similarly, with slight differences in the arguments depending on the parameters used for each method. The arguments can be checked by running `python <code_name> --help`.