# 🧠 Learning Hidden Cascades via Classification

This project focuses on learning the transmission dynamics in networks via classification. It supports both **synthetic** and **real-world** experiments. The goal is to infer the transmission probabilities — **p** (propagation probability) and **q** (symptom probability) — based on hidden node statuses (e.g., "informed" or "uninformed") in graphs where only partial behaviors (such as trades) are observable.


---

## 📁 Project Structure

```
Learning_transmission_probabilities/
├── src/
│   ├── Monte_carlo_experiments/
│   │   ├── main.py                 # Runs synthetic experiments (Tree, Loopy, Insiders without transactions)
│   │   └── ...                     # Supporting modules and simulation logic
│   ├── Empirical_experiment/
│   │   ├── main.py                 # Runs real-data experiments (Insiders graph with transactions)
│   │   └── ...
├── data/
│   ├── announcements.csv                      # Company names and announcement dates
│   ├── transaction_data.csv                   # Investor trades, timestamps, and related metadata
│   ├── company_names.csv                      # List of all 28 companies
│   ├── insiders_network_links.txt             # Edges of the insiders trading graph (source → target)
│   ├── investor_company.csv                   # Mapping of investors to their respective companies
│   └── Company_baseline_probabilities/
│       ├── Company 1_baseline_trade_probabilities.csv
│       ├── <Company ..>_baseline_trade_probabilities.csv
│       └── ...                                 # Per-company investor baseline trade probabilities
├── Results/
│   ├── synthetic_results/                   # Output tables for synthetic experiments
│   ├── empirical_results/                     # Inferred p, q, accuracy for real-data experiments
│   └── ...
├── README.md
└── requirements.txt
```

---

## ⚙️ Requirements

Install dependencies:

```bash
pip install -r requirements.txt
```

**Main libraries:**

* `scikit-learn`
* `networkx`
* `matplotlib`
* `pandas`
* `fire`

---

## 🚀 Running the Code

### 🔬 Monte Carlo Experiments (Synthetic Graphs)

Performs classification over synthetic graphs (Tree, Loopy, and Insiders graph **without transactions**). In each run, the model infers:

* `p`: **Propagation probability** (how likely information spreads between nodes)
* `q`: **Symptom probability** (how likely informed nodes exhibit observable behavior)

Each experiment is repeated `n` times for robustness, and results are saved in `Results/synthetic_results`.

**Command:**

```bash
python -m Learning_transmission_probs.src.Monte_carlo_experiments.main \
    --classifier_name="SVM" \
    --graph_type="Tree" \
    --feature_type="Limited"
```

**Arguments:**

| Argument          | Description            | Accepted Values                                                                             |
| ----------------- | ---------------------- | ------------------------------------------------------------------------------------------- |
| `classifier_name` | Classifier to use      | "Random Forest", "Decision Tree", "Naive Bayes", "KNN", "SVM", "SGD", "Logistic Regression" |
| `graph_type`      | Graph type             | "Tree", "Loopy", "Insiders\_graph"                                                          |
| `feature_type`    | Type of feature vector | "Extended", "Limited"                                                                       |

> ⚠️ Use quotes for values with spaces.

---

### 📈 Empirical Experiments (Real Insider Trading Network)

Runs classification on the **real insider trading network**, using investor-level trade activity and announcement timing. Computes:

* `p`: Information propagation probability
* `q`: Probability of informed right trade

Runs across **all 28 companies**, using investor-specific features. Output is saved to `Results/empirical_results/Infered_p_and_q_of_all_companies.csv`.

**Command:**

```bash
python -m Learning_transmission_probs.src.Empirical_experiment.main
```

This uses:

* Classifier: **SVM**
* Features: **Limited**
* Graph: Insiders network with transaction data

---

## 📊 Data Overview (`data/`)

| File / Folder                     | Description                                                                 |
| --------------------------------- | --------------------------------------------------------------------------- |
| `announcements.csv`               | Mapping of companies to their announcement dates                            |
| `transaction_data.csv`            | Timestamped trading data of investors                                       |
| `company_names.csv`               | List of 28 companies used in empirical analysis                             |
| `insiders_network_links.txt`      | Edges of the insiders network (investor → investor)                         |
| `investor_company.csv`            | Maps investors to their associated companies                                |
| `Company_baseline_probabilities/` | Per-company baseline trading probability files (e.g., `Amer sports_...csv`) |

Each `*_baseline_trade_probabilities.csv` file contains prior trade behavior statistics of investors **outside the pre-announcement window**, used to assess abnormal behavior.

---

## 🧪 Output

* **Monte Carlo Results**:
  Inferred `p`, `q`, accuracy across repetitions saved to `Results/synthetic_results/Result_{graph_type}_graph_{classifier_name}_classifier_{feature_type}_features.csv`.

* **Empirical Results**:
  Final table of `p`, `q`, and accuracy per company saved to:
  `Results/empirical_results/Infered_p_and_q_of_all_companies.csv`


---

## ⚙️ Configurations

Configurations are set in `configs.py` using Python `@dataclass` objects. You can change:

* Number of simulation iterations
* Trade label probabilities
* Graph parameters (size, edge count)
* Random seeds
* Feature type and classifier

---

