# Feature-Based Pairwise Prediction

This repository implements methods for **predicting task pairwise affinities** and analyzing **multi-task learning (MTL) gains** using dataset-driven features.  
It includes baseline methods (TAG and GRADTAE), MTL training pipelines, feature-based prediction models, and miscellaneous scripts for group selection and analysis.

---

## 🧩 Components

### **1. Baseline Implementation**
Contains **TAG** and **GRADTAE** implementations for three benchmark datasets:  
**Chemical**, **Landmine**, and **School**.  

- `*_GRADTAE_Training.py` – train GRADTAE models  
- `*_GRADTAE_Estimation.py` – estimate task affinities using trained models  
- `*_TAG.py` – TAG baseline implementation  
- `average_pairwise_affinities_*.py` – compute average pairwise affinities for baseline comparison  

---

### **2. MTL Training**
Scripts for **single-task (STL)** and **multi-task (MTL)** training:

- `prepare_datasets.py` – prepare dataset splits for training  
- `*_Model_Training.py` – train models on Chemical, Landmine, or School datasets  

---

### **3. Prediction with Task Similarity**
Feature-based prediction pipeline for estimating pairwise task gains:

- `prep_similarity_features.py` – prepare task-pair features  
- `FeatureBased_Predictive_Utility.py` – evaluate predictive utility of features  
- `final_model_for_pairwise_prediction.py` – train and evaluate feature-based predictive model  
- `pairwise_Affinities_GroundTruth.py` – compute ground-truth task affinities  
- `pairwise_prediction_performance.py` – evaluate prediction performance  

---

### **4. Miscellaneous Scripts**
Other analyses and group selection utilities:

- `FeatLabel_correlation_analysis.py` – raw feature-label correlation analysis  
- `data_prep_for_groups.py` – prepares group-level data (binary vectors with ground-truth MTL gains)  
- `group_prediction_by_averaging.py` – baseline group prediction by averaging pairwise predictions  
- `Beam_Search_GroupSelection.py` – selects task groups using beam search  
- `SDP_Cluster_GroupSelection.py` – selects task groups via SDP clustering  
- `runtime_analysis.py` – run-time benchmarking  

---

## Getting Started

### **Prerequisites**
- Python 3.9+  
- Recommended: create a virtual environment

### **Setup**
- Before running the scripts, set the correct **data paths** in each script for reading datasets and storing results.  
- Ensure the `datapath` variable is initialized appropriately in all scripts.

### **Install Dependencies**
```bash

pip install -r requirements.txt
```

## Example Usage

### **Train STL and MTL Models**
```bash

python mtl_training/Landmine_Model_Training.py
```

### **Prediction with task similarity**
```bash 

python prediction_with_task_similarity/prep_similarity_features.py
python prediction_with_task_similarity/final_model_for_pairwise_prediction.py

```
### **Train & Estimate Baselines**

- GRADTAE
```bash

python baseline_implementation/Chemical_GRADTAE_Training.py
python baseline_implementation/Chemical_GRADTAE_Estimation.py
python average_pairwise_affinities_GRADTAE.py
```
- TAG
```bash

python baseline_implementation/School_TAG.py
python average_pairwise_affinities_TAG.py
```

