# Code for Personal and Relational Event Sequence Modeling
Code to create 5 datasets with both personal and relational event types and benchmark a variety of approaches on them. 

# Environment Set Up

Designed to work within a KubeFlow Jupyter Notebook. The base image should include:
* PyTorch 2.5.1
* CUDA 12.4
* CUDA compiler toolchain (including gcc)

On top of this base image, simply run `setup.sh` to create a conda environment named `tgs`. A full dump of all installations in the virtual environment used for benchmarking is provided in `pip_freeze.txt`. All code should be run within the `tgs` conda environment.

# Repo Structure
The repo consists of two main parts: Creating Datasets, and Running Benchmarks.

# Creating Datasets
Creating datasets can be skipped if you download the processed datasets & tasks from \(Removed for double blind review). To recreate datasets, you can:
1. Download the corresponding raw dataset
2. Run `process_*.ipynb` to create a CSV containing **all** events
   * NOTE: github requires running `github_extract.py` first
4. Run `task_*.py` to create CSVs needed for training, validating and testing each task on that dataset

The task directories created from this process (or available from the download link) will be used as input to the benchmark running scripts.

# Running Benchmarks
Within the `run_benchmarks` directory, the following subdirectories are present:
* relational_social: Used for relational (friend recommendation) tasks on BrightKite and Gowalla datasets
* personal_social: Used for personal (location checkin) tasks on BrightKite and Gowalla datasets
* relational_amazon: Used for relational (coview prediction) tasks on amazon-clothing and amazon-electronics datasets
* personal_amazon: Used for personal (next product review) tasks on amazon-clothing and amazon-electronics datasets
* relational_github: Used for relational (collaborator recommendation) tasks on the GitHub dataset

Commands used for the paper's experiments are included within each directory. Specific hyperparameters may need to be updated from the sample commands depending on which dataset you are running (example: Gowalla only trains for 20 epochs vs. BrightKite uses 100 epochs for experiments within `relational_social`).

# License
This work is published under an MIT license, as found in the LICENSE.md file