Section 1: Environment Installation

Environment installation instructions:
Run the following commands within neurips_supplementary to install the environment:
1. conda create -c conda-forge python=3.9.12 -n cps_neurips_env 
2. conda activate cps_neurips_env
3. conda install poetry 
4. pip3 -V 
5. pip3 install poetry 
6. export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring 
7. poetry install

In addition, for reproducibility, we provide the yml file for the environment - check cps_neurips_env.yml in the neurips_supplementary directory.

Downloadable links:

For datasets - https://drive.google.com/file/d/11Gz-thfqaqn5-O0bsjeYhUz-FOb1nckA/view?usp=sharing
For checkpoints - https://drive.google.com/file/d/1iFTCmQxFZVQd6SwxbsAGT1XgvNRg9gvG/view?usp=sharing

########################################################################################################################

Section 2: Training the models

The Air Quality, Stores, and Traffic datasets are all publicly available. We have preprocessed the data and the preprocessed data can be downloaded from the following link:

https://drive.google.com/file/d/11Gz-thfqaqn5-O0bsjeYhUz-FOb1nckA/view?usp=sharing

Extract the zip file and place the data in the following directory:
PATH_TO_SUPPLEMENTARY_MATERIAL/data/

The data directory should contain the following subdirectories:

--air_quality
--stores
--traffic
--waveforms 
--waveforms_truncated

All the commands below are run within the neurips_supplementary directory.

All the processed datasets are stored in the following directory:
PATH_TO_SUPPLEMENTARY_MATERIAL/data/

TRAFFIC_DATASET_LOG_DIR_FULL_PATH = PATH_TO_SUPPLEMENTARY_MATERIAL/data/traffic

Here are the commands to train the models, generate data, and evaluate the models. The training commands are specific to the traffic dataset.

Train Models:

Command to train conditional diffusion model:
poetry run python timeseries_synthesis/scripts/train_pl_model.py --config-name=config_timeweaver_diffusion.yaml base_path=BASE_DIRECTORY_FULL_PATH dataset_name=traffic traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

Command to train unconditional diffusion model:
poetry run python timeseries_synthesis/scripts/train_pl_model.py --config-name=config_timeweaver_diffusion.yaml base_path=BASE_DIRECTORY_FULL_PATH dataset_name=traffic traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH csdi_timeseries_denoiser_v4_config.use_metadata=False

Command to train the conditional GAN model:
poetry run python timeseries_synthesis/scripts/train_pl_model.py --config-name=config_timeweaver_gan.yaml base_path=BASE_DIRECTORY_FULL_PATH dataset_name=traffic traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH wavegan_v1_config.use_metadata=True wavegan_v1_config.generator_config.blow_up_factor=10

Command to train the unconditional GAN model:
poetry run python timeseries_synthesis/scripts/train_pl_model.py --config-name=config_timeweaver_gan.yaml base_path=BASE_DIRECTORY_FULL_PATH dataset_name=traffic traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH wavegan_v1_config.use_metadata=False wavegan_v1_config.generator_config.blow_up_factor=30

Note: Please set the use_metadata and generator_config.blow_up_factor fields in the config file to the appropriate values according to the dataset.

Command to train the FTSD model:
poetry run python timeseries_synthesis/scripts/train_pl_model.py --config-name=config_cltsp_metric.yaml base_path=BASE_DIRECTORY_FULL_PATH dataset_name=traffic traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH training.use_timeseries_scl_loss=True training.use_condition_scl_loss=True training.use_clip_loss=False

Command to train the J-FTSD model:
poetry run python timeseries_synthesis/scripts/train_pl_model.py --config-name=config_cltsp_metric.yaml base_path=BASE_DIRECTORY_FULL_PATH dataset_name=traffic traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

For a different dataset, update the dataset_name and its corresponding log_dir fields in the config file.

########################################################################################################################

Section 3: Storing the checkpoint files

Copy the checkpoint files to the right location. 
For example, for the diffusion model trained on the conditional variant of the traffic dataset, the checkpoint files should be copied to the following location:

PATH_TO_SUPPLEMENTARY_MATERIAL/logs/diffusion_model_logs/traffic_conditional/checkpoints/traffic_conditional.ckpt 

The same goes for the GAN model, the FTSD model, and the J-FTSD model.

We have already provided the checkpoint files for inference in the following downloadable link.

https://drive.google.com/file/d/1iFTCmQxFZVQd6SwxbsAGT1XgvNRg9gvG/view?usp=sharing

Extract the zip file and place the checkpoint files in the following location:

PATH_TO_SUPPLEMENTARY_MATERIAL/logs

The logs directory should contain the following subdirectories:

--diffusion_model_logs
--loss_difftime_logs
--cltsp_logs
--gan_logs


########################################################################################################################

Section 4: Generating the synthetic data

The generation commands are specific to the conditional variant of the traffic dataset.

Generating data:

Main command: poetry run python timeseries_synthesis/scripts/generate_constrained_synthetic_dataset.py --config-name=traffic_conditional.yaml synthesizer_wrapper_checkpoint_path=DIFFUSION_MODEL_CHECKPOINT_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH discriminator_params.discriminator_wrapper_checkpoint_path=GAN_MODEL_CHECKPOINT_FULL_PATH

To the main command, we can add the following flags to obtain the baseline methods and our proposed method.
Note that normal data generation is required for COP-FT to run.

To run normal data generation:
Main command 

To run CPS: 
Main command + use_projection=True project_during_synthesis=True use_penalty_based_projection=True

To run COP:
Main command + use_projection=True project_after_synthesis=True use_real_seed=True

To run COP-FT: 
Main command + use_projection=True project_after_synthesis=True 

To run Diffusion-TS:
Main command + use_guidance=True

To run Guided-Difftime:
Main command + use_diffts=True

To run PDM:
Main command + use_projection=True project_during_synthesis=True use_strict_projection=True

To run PRODIGY:
Main command + use_projection=True project_during_synthesis=True use_strict_projection=True use_prodigy=True

These executions create a directory with "train", "val", and "test" subdirectories. Each of these subdirectories contains a "timeseries" file with the generated timeseries. 
For example, if we run the CPS method for the conditional variant of the traffic dataset, the generated data will be stored in the following directory:

PATH_TO_SUPPLEMENTARY_MATERIAL/logs/diffusion_model_logs/traffic_conditional/11_constraints/projection_during_synthesis_with_penalty_based_projection_using_exp/ 

And within this directory, we have the "train", "val", and "test" subdirectories with the generated timeseries. 

For a different dataset, update the config file name and the corresponding log_dir fields in the config file. Additionally, update the checkpoints too according to the new dataset.

######################################################################################################################## 

Section 5: Evaluating the models (Tables 1 and 2 in the manuscript)

The evaluation commands are specific to the conditional variant of the traffic dataset.

In all these commands, we refer to the directory with the generated synthetic data as SYNTHETIC_DATASET_DIR_FULL_PATH.

As an example, for the conditional variant of the traffic dataset, the SYNTHETIC_DATASET_DIR_FULL_PATH is:
PATH_TO_SUPPLEMENTARY_MATERIAL/logs/diffusion_model_logs/traffic_conditional/

Evaluation Commands:

To obtain the constraint violation metric:
Main command: poetry run python timeseries_synthesis/scripts/compute_constraint_violation_rate.py --config-name=traffic_conditional_eval.yaml similarity_checker_wrapper_checkpoint_path=CLTSP_MODEL_CHECKPOINT_FULL_PATH synthetic_dataset_dir=SYNTHETIC_DATASET_DIR_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

To the main command, we can add same flags as in the case of data generation.

To obtain the FTSD or the J-FTSD metric:
Main command: poetry run python timeseries_synthesis/scripts/generate_timeseries_and_condition_embeddings.py --config-name=traffic_conditional_eval.yaml similarity_checker_wrapper_checkpoint_path=CLTSP_MODEL_CHECKPOINT_FULL_PATH synthetic_dataset_dir=SYNTHETIC_DATASET_DIR_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

To the main command, we can add same flags as in the case of data generation.

To obtain the DTW metric:
Main command: poetry run python timeseries_synthesis/scripts/compute_dtw_metric.py --config-name=traffic_conditional_eval.yaml similarity_checker_wrapper_checkpoint_path=CLTSP_MODEL_CHECKPOINT_FULL_PATH synthetic_dataset_dir=SYNTHETIC_DATASET_DIR_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

To the main command, we can add same flags as in the case of data generation to obtain the results for all the methods.

For a different dataset, update the config file name and the corresponding log_dir fields in the config file. Additionally, update the checkpoints and the synthetic data directory according to the new dataset.

########################################################################################################################

Section 6: Obtaining the DTW and FTSD for varying number of constraints (Figure 5 in the manuscript)

To generate the synthetic data for the cases with different number of constraints (Figure 5 in the manuscript), use the "traffic1.yaml", ..., "traffic9.yaml" configs and execute the commands in Section 4.

To evaluate the DTW, FTSD, and J-FTSD scores for the cases with different number of constraints (Figure 5 in the manuscript), use the "traffic_eval.yaml" config and update the "equality_constraints_to_extract" field in the "traffic_dataset" section of the config file. Consequently, the evaluation command will be the same as in Section 5.

########################################################################################################################

Section 7: Obtaining the results for models trained with constraints (Table 3 in the manuscript)

For training the model with constraints, we use the following command:

poetry run python timeseries_synthesis/scripts/train_pl_model.py --config-name=config_timeweaver_constrained_diffusion.yaml base_path=BASE_DIRECTORY_FULL_PATH dataset_name=traffic traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

Place the checkpoint file in the following locationtion for the traffic dataset:

CONSTRAINED_DIFFUSION_MODEL_CHECKPOINT_FULL_PATH = PATH_TO_SUPPLEMENTARY_MATERIAL/logs/loss_difftime_logs/traffic/checkpoints/traffic.ckpt

Then, for the traffic dataset, we use the following command to generate the synthetic data:

poetry run python timeseries_synthesis/scripts/generate_constrained_synthetic_dataset.py --config-name=traffic_trained.yaml synthesizer_wrapper_checkpoint_path=CONSTRAINED_DIFFUSION_MODEL_CHECKPOINT_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

For reference, the data generation command is same as the one for "normal" data generation in Section 4.

To evaluate the FTSD scores for the traffic dataset, we use the following command:

poetry run python timeseries_synthesis/scripts/generate_timeseries_and_condition_embeddings.py --config-name=traffic_trained_eval.yaml similarity_checker_wrapper_checkpoint_path=CLTSP_MODEL_CHECKPOINT_FULL_PATH synthetic_dataset_dir=SYNTHETIC_DATASET_DIR_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

Similarly, we can obtain the DTW and constraint violation scores for the traffic dataset using the compute_dtw_metric.py and compute_constraint_violation_rate.py scripts.

For a different dataset, update the config file name and the corresponding log_dir fields in the config file. Additionally, update the checkpoints and the synthetic data directory according to the new dataset.

########################################################################################################################

Section 8: Obtaining the inference latency for different constraints for CPS (Figure 7)

To obtain the inference latency for different constraints (Figure 7), we run the following command:

# for the case with 1 constraint
poetry run python timeseries_synthesis/scripts/compute_inference_time.py --config-name=traffic1.yaml use_projection=True project_during_synthesis=True use_penalty_based_projection=True synthesizer_wrapper_checkpoint_path=DIFFUSION_MODEL_CHECKPOINT_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

# for the case with 3 constraints
poetry run python timeseries_synthesis/scripts/compute_inference_time.py --config-name=traffic3.yaml use_projection=True project_during_synthesis=True use_penalty_based_projection=True synthesizer_wrapper_checkpoint_path=DIFFUSION_MODEL_CHECKPOINT_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

and so on.

########################################################################################################################

Section 9: Obtaining the inference latency comparison for all methods (Table 6 in the manuscript)

for the case with all constraints, we use the following command (Table 6):

poetry run python timeseries_synthesis/scripts/compute_inference_time.py --config-name=traffic.yaml use_projection=True project_during_synthesis=True use_penalty_based_projection=True synthesizer_wrapper_checkpoint_path=DIFFUSION_MODEL_CHECKPOINT_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

for the baselines, we use the following command:

for COP-FT
poetry run python timeseries_synthesis/scripts/compute_inference_time.py --config-name=traffic.yaml use_projection=True project_after_synthesis=True synthesizer_wrapper_checkpoint_path=DIFFUSION_MODEL_CHECKPOINT_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH discriminator_params.discriminator_wrapper_checkpoint_path=GAN_MODEL_CHECKPOINT_FULL_PATH 

for Guided-Difftime
poetry run python timeseries_synthesis/scripts/compute_inference_time.py --config-name=traffic.yaml use_guidance=True synthesizer_wrapper_checkpoint_path=DIFFUSION_MODEL_CHECKPOINT_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH

########################################################################################################################

Section 10: Non-Convex Experiments (Table 4 in the manuscript)

To generate the synthetic data for the non-convex experiments, we use the following main command:

Main command: poetry run python timeseries_synthesis/scripts/generate_constrained_synthetic_test_set.py --config-name=stocks_non_convex.yaml discriminator_params.discriminator_wrapper_checkpoint_path=GAN_MODEL_CHECKPOINT_FULL_PATH synthesizer_wrapper_checkpoint_path=DIFFUSION_MODEL_CHECKPOINT_FULL_PATH stocks_dataset.log_dir=STOCKS_DATASET_LOG_DIR_FULL_PATH

for each approach, like CPS, COP-FT, etc., update the command with the appropriate set of flags, similar to the case of data generation in Section 4.

To evaluate the FTSD scores for the non-convex experiments, we use the following main command:

Main command: poetry run python timeseries_synthesis/scripts/generate_timeseries_and_condition_embeddings.py --config-name=stocks_non_convex_eval.yaml similarity_checker_wrapper_checkpoint_path=CLTSP_MODEL_CHECKPOINT_FULL_PATH synthetic_dataset_dir=SYNTHETIC_DATASET_DIR_FULL_PATH stocks_dataset.log_dir=STOCKS_DATASET_LOG_DIR_FULL_PATH

Similarly, for different approaches, update the command with the appropriate set of flags, similar to the case of data generation in Section 4. 

To obtain the DTW, and constraint violation scores for the non-convex experiments, we use the compute_dtw_metric.py and compute_constraint_violation_rate.py scripts.

########################################################################################################################

Section 11: Choice of Penalty coefficients (Table 5 in the manuscript)

To run the experiments for the choice of penalty coefficients, we use the following command for the traffic dataset to generate the synthetic data:

for linear choice:
poetry run python timeseries_synthesis/scripts/generate_constrained_synthetic_test_set.py --config-name=traffic.yaml discriminator_params.discriminator_wrapper_checkpoint_path=GAN_MODEL_CHECKPOINT_FULL_PATH synthesizer_wrapper_checkpoint_path=DIFFUSION_MODEL_CHECKPOINT_FULL_PATH traffic_dataset.log_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH use_projection=True project_during_synthesis=True use_penalty_based_projection=True gamma_choice=lin

Similarly, we can run the experiments for quadratic choice. 

The evaluation command is the same as in Section 5.

########################################################################################################################

Section 12: TSTR Experiments 

Now, we provide the commands to run the TSTR experiments for the conditional variant of the traffic dataset. 

For this, we use the Time-Series-Library (https://github.com/thuml/Time-Series-Library).

Please follow the instructions in the Time-Series-Library README to install the environment. 

In addition, for reproducibility, we provide the yml file for the environment - check cps_neurips_tstr.yml in the Time-Series-Library directory.

To run the TSTR experiments for the conditional variant of the traffic dataset, execute the scripts in the following directory:

PATH_TO_SUPPLEMENTARY_MATERIAL/Time-Series-Library/scripts/tstr_imputation_scripts/traffic_conditional_scripts/

In each script, update the following paths:

real_data_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH
synthetic_data_dir=SYNTHETIC_DATASET_DIR_FULL_PATH (This is the directory with the generated synthetic data and it varies for each method, refer to Section 4. This directory should contain the "train", "val", and "test" subdirectories with the generated timeseries.)

The scripts execute the TSTR Imputation task for 3 seeds and the mse values are printed in the terminal.

For a different dataset, update the real_data_dir and synthetic_data_dir variables in the scripts according to the new dataset.

########################################################################################################################

Section 13: Discriminative Score Experiments

Now, we provide the commands to run the discriminative score experiments.

For this, we use the TimeGAN repository (https://github.com/jsyoon0823/TimeGAN).

Please follow the instructions in the TimeGAN README to install the environment.

In addition, for reproducibility, we provide the yml file for the environment - check cps_neurips_ds.yml in the TimeGAN directory.

For each approach, like CPS, COP-FT, etc., update the paths in the metrics_computation.py file.

More specifically, update the real_timeseries_dir and synthetic_timeseries_dir variables in the get_real_and_synthetic_paths function.

real_timeseries_dir=TRAFFIC_DATASET_LOG_DIR_FULL_PATH
synthetic_timeseries_dir=SYNTHETIC_DATASET_DIR_FULL_PATH (This is the directory with the generated synthetic data and it varies for each method, refer to Section 4. This directory should contain the "train", "val", and "test" subdirectories with the generated timeseries.)

Then, execute the following command from within the TimeGAN directory:

poetry run python metrics_computation.py traffic_conditional

This will print the discriminative score values with the results for 5 seedsfor the conditional variant of the traffic dataset for the particular approach.

For a different dataset, update the real_timeseries_dir and synthetic_timeseries_dir variables in the get_real_and_synthetic_paths function.











