# ICLR23 Submission "Some Practical Concerns and Solutions For Using Pretrained Representations In Industrial Systems"

#### We use this temporay repo to provide experiment code for our manuscript.

## The scripts for replicating our main experiments are provided in:

```bash
python ML-1m_scripts.py
python IMDB_scripts.py
```

For ML-1m experiments, the script includes embedding pretraining (using user history behavior data or contextual data), downstream sequential recommendation task with GRU4Rec, and downstream movie genre classification. 

The arguments are as follow:
```bash
  --log_dir: the logging directory for saving the results
  --dat_dir: the data directory, e.g. data/ml-1m/
  --emb_dim: the dimension of the embeddings. We use 32 in our experiments unless specificed
  --rnn_dim: the hidden dimensions of the GRU4Rec model
  --phi_dim: the dimension of the featurized representation
  --ns: the numeber of negative sampling for item2vec pretraining
  --ws: the window size for item2vec pretraining
  --epochs: the number of training epochs
  --save_emb: whether to save the pre-trained embeddings under each CL setting
  --GPU: specify the GPU usage.
```

For IMDB experiment, the script includes embedding pretraning, using the last hidden layer's output.

The arguments are as follow:
```bash
  --log_dir: the logging directory for saving the results
  --emb_dim: the dimension of the embeddings. We use 32 in our experiments unless specificed
  --epochs: the number of training epochs
  --lr: the learning rate for the pretraining model
  --save_emb: whether to save the pre-trained embeddings under each CL setting
  --GPU: specify the GPU usage.
```


For the sequential recommendation tasks, we implement the two-tower GRU4Rec model, with the model's source code in **src/model.py**. 

We also follow the standard data processing steps in the recommendation literature, and the implementations can be found in **src/data.py**.





