# Multi-view Adaptively Partitioned Embedding (MAPE)  
*A unified framework that restores manifold connectedness and captures global dependencies for Graph Anomaly Detection.*


[![Python](https://img.shields.io/badge/Python-3.9%2B-blue.svg)](#)
[![PyTorch](https://img.shields.io/badge/PyTorch-2.x-lightgrey)](#)

---

## Table of Contents
- [Highlights](#highlights)
- [Installation](#installation)
- [Dataset Preparation](#dataset-preparation)
- [Training&Evaluation](#training&evaluation)



---

## Highlights
- **Fragmented-manifold problem solved.** MAPE discretises tabular feature space via multiple *adaptive partition operators*, assigning learnable embeddings that **restore manifold connectedness** for deep models.  
- **Global context preserved.** The **Multi-Pattern Global Association (MPGA)** module treats shared sub-spaces as high-order affinities, capturing long-range dependencies without quadratic cost.  
- **Consistent SOTA gains.** On ten public benchmarks, MAPE surpasses state-of-the-art GAD methods by an average of **+7.6 pp AUROC**, with peak improvements of **+26.97 pp on *Reddit*** and **+25.33 pp on *Questions***.  
- **Scalable & lightweight.** The framework scales linearly with edge count and trains million-node graphs in under two hours on a single GPU.

![MAPGA Framework](Framework.png)

---

## Installation

<summary>Quick start (conda)</summary>

```bash
# clone repo
git clone https://github.com/<user>/MAPE.git
cd MAPE

# create environment
pip install -r requirements.txt

```


---

## Dataset Preparation
| # | Dataset      | 下载链接                                                                                                            | 
|---|--------------|--------------------------------------------------------------------------------------------------------------------|
| 0 | **Reddit**   | [Google Drive](https://drive.google.com/uc?id=1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1)                                   | 
| 1 | **Weibo**    | [Google Drive](https://drive.google.com/uc?id=1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1)                                   | 
| 2 | **Amazon**   | [Google Drive](https://drive.google.com/uc?id=1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1)                                   | 
| 3 | **YelpChi**  | [Google Drive](https://drive.google.com/uc?id=1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1)                                   | 
| 4 | **Tolokers** | [Google Drive](https://drive.google.com/uc?id=1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1)                                   | 
| 5 | **Questions**| [Google Drive](https://drive.google.com/uc?id=1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1)                                   |
| 6 | **T-Finance**| [Google Drive](https://drive.google.com/uc?id=1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1)                                   | 
| 7 | **Elliptic** | [Kaggle](https://www.kaggle.com/datasets/ellipticco/elliptic-data-set)                                             | 
| 8 | **DGraph-Fin**| [Official Site](https://dgraph.xinye.com/dataset)                                                                 | 
| 9 | **T-Social** | [Google Drive](https://drive.google.com/uc?id=1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1)                                   | 

**Detail Description of Datasets**
| Dataset        | #Nodes   | #Edges     | #Feat. | Anomaly | Train | Relation Concept     | Feature Type      |
|--------------- |---------:|-----------:|-------:|--------:|------:|----------------------|-------------------|
| Reddit   | 10,984   | 168,016     | 64    | 3.3 %   | 40 % | Under Same Post       | Text Embedding    |
| Weibo      | 8,405    | 407,963     | 400   | 10.3 %  | 40 % | Under Same Hashtag    | Text Embedding    |
| Amazon    | 11,944   | 4,398,392   | 25    | 9.5 %   | 70 % | Review Correlation    | Misc. Information |
| YelpChi    | 45,954   | 3,846,979   | 32    | 14.5 %  | 70 % | Reviewer Interaction  | Misc. Information |
| Tolokers      | 11,758   | 519,000     | 10    | 21.8 %  | 40 % | Work Collaboration    | Misc. Information |
| Questions     | 48,921   | 153,540     | 301   | 3.0 %   | 52 % | Question Answering    | Text Embedding    |
| T-Finance     | 39,357   | 21,222,543  | 10    | 4.6 %   | 50 % | Transaction Record    | Misc. Information |
| Elliptic     | 203,769  | 234,355     | 166   | 9.8 %   | 50 % | Payment Flow          | Misc. Information |
| DGraph-Fin  | 3,700,550| 4,300,999   | 17    | 1.3 %   | 70 % | Loan Guarantor        | Misc. Information |
| T-Social     | 5,781,065| 73,105,508  | 10    | 3.0 %   | 40 % | Social Friendship     | Misc. Information |
                                                                                                                         
                                                                                                                         
<summary>Step by Step</summary>
  
```bash
# 0) make a root folder
mkdir -p datasets && cd datasets

# 1) grab the eight public graphs in one archive (~2 GB)
#    (requires gdown:  pip install gdown)
gdown 1txzXrzwBBAOEATXmfKzMUUKaXh6PJeR1   # GADBench datasets.zip
unzip datasets.zip && rm datasets.zip

# 2) Elliptic (manual – Kaggle)
#    After downloading, place the extracted folder as shown below.

# 3) DGraph-Fin (manual – registration site)
#    Follow the provider’s instructions, then move the files likewise.

# final directory layout
datasets/
├── reddit/
├── weibo/
├── amazon/
├── yelpchi/
├── tolokers/
├── questions/
├── tfinance/
├── tsocial/
├── elliptic/      # ← from Kaggle
└── dgraph-fin/    # ← from official site

```

---

## Training&Evaluation

### Running a training job

```bash
# generic usage
python main.py --dataset <DATASET_NAME> --train_ratio <RATIO>
```

| Argument        | Description                                                                                                                  |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `--dataset`     | Dataset to load. Choose one of the names listed in the **Dataset Preparation** table above.                                  |
| `--train_ratio` | Fraction of the whole dataset used for training. Must match the allowed ratio for the selected dataset (see the same table). |

```bash
# example
python main.py --dataset elliptic --train_ratio 0.5
```



### Evaluation Metrics

| Metric | What it Measures | Interpretation |
| ------ | ---------------- | -------------- |
| **AUC**<br>(Area Under the ROC Curve) | Discrimination ability across every possible classification threshold. The ROC curve plots *True-Positive Rate* vs. *False-Positive Rate*. | **1.0** = perfect ranking, **0.5** = random guessing. |
| **ACC**<br>(Accuracy) | Proportion of correctly classified samples among all test samples:  \(\text{ACC} = \tfrac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text{FN}}\). | Easy to read, but can be misleading when classes are highly imbalanced. |
| **AUPRC**<br>(Area Under the Precision–Recall Curve) | Ranking quality when positive (anomalous) samples are rare. Integrates the *Precision*–*Recall* curve over all thresholds. | Values closer to **1.0** indicate better separation; superior to AUC for imbalanced data. |
| **rec@K**<br>(Recall at K) | Fraction of true positives contained in the top-\(K\) ranked test samples:  \(\text{rec@}K = \tfrac{\text{TP among top }K}{\text{Total TP}}\). <br>In this repo we set \(K\) equal to the number of positive (e.g., fraud) samples in the test set so that \(\text{rec@}K\) becomes *Hit Rate*. | **1.0** means all anomalies appear within the top-\(K\) scores. |

**Why these four metrics?**

* **AUC** and **AUPRC** evaluate the *ranking* quality of the anomaly scores over all thresholds.  
* **ACC** presents a single operating point, useful when a probability threshold is chosen for deployment.  
* **rec@K** reflects a realistic workflow in fraud or intrusion settings where analysts only inspect the top-\(K\) alerts.

> **Tip** When reporting results, always pair a ranking metric (AUC / AUPRC) with a cut-off metric (ACC / rec@K) to give a complete view of performance.













