# Results Summary

This document provides instructions for reproducing the results of the DGSM-SCAM-GAT and MMT-ViT models, as reported in the ICLR submission. Results include classification metrics (accuracy, precision, recall, F1 score) and visualizations (confusion matrices). 

## 1. DGSM-SCAM-GAT(2 classes)
- **Dataset**: 
  - `dynamic_api_call_sequence` (2 classes): Dynamic API call sequences for malware classification.
  - `mal_api_2019` (2 classes): API and DLL sequence data for fine-tuning and validation.Combine the benign samples from the dataset dynamic_api_call_sequence with the malicious samples from mal_api_2019 to form a two-class dataset.
- 
- **Model Components**:
  - **Dynamic Gated Sequence Module (DGSM)**: Dynamically models temporal dependencies in API call sequences using a gated mechanism, capturing evolving patterns in malware behavior.
  - **Sequence Context Aggregation Module (SCAM)**: Employs self-attention to capture intra-sequence patterns and cross-attention to align API calls with malware behavior, enhancing contextual understanding.
  - **Graph Attention Network (GAT)**: Models relationships between API calls using multi-head attention (8 heads, hidden dimension 256, as defined in `config.py`).
- **Output Location**: `ProgectPytorch/results/dgsm_scam_gat/`
  - **Metrics**: accuracy, precision, recall, F1 score.
  - **Checkpoint**: `ProgectPytorch/results/dgsm_scam_gat/dynamic_api_epoch/` (not included due to size constraints).
 
- **Reproducing Results**:
  1. Download and place `dynamic_api_call_sequence` in `ProgectPytorch/data/dynamic_api_call_data/` (see `docs/dataset_description.md`).
  2. Download and place `mal_api_2019` in `ProgectPytorch/data/mal_api_2019/` (contact submission authors via OpenReview if not publicly available).
  3. Run training, testing, and validation: `bash scripts/run_dgsm_scam_gat.sh`. This executes:
     - `ProgectPytorch/dgsm_scam_gat_model.py`: Main training and testing on `dynamic_api_call_sequence`.
  4. Run `bash scripts/run_dgsm_scam_gat_fine-tuning.sh`. This executes:
     - `ProgectPytorch/dgsm-scam-gat_model_yz.py`: Validation.
     - `ProgectPytorch/dgsm-scam-gat_model_wt.py`: Fine-tuning validation on `mal_api_2019`.
  5. Check results in `ProgectPytorch/results/dgsm_scam_gat/`.

## 2. MMT-ViT

### Note：When executing run_mmt_vit.sh and run_mmt_vit.sh, i.e., running all the code for training, 
### fine-tuning, and validating the MMT-ViT model, since the pretrained model "google/vit-base-patch16-224" 
### is used, it is essential to ensure a stable internet connection to download the pretrained model 
### "google/vit-base-patch16-224". Otherwise, an error will occur.

### big2015 (9 classes)
- **Dataset**: `big2015` (9 classes): Malware dataset with `.bytes`, `.asm`, and labels in `big2015_Labels.csv`.
- **Output Location**: `ProgectPytorch/results/mmt_vit/`
  - **Metrics**: accuracy, precision, recall, F1 score.
  - **Visualizations**: Confusion matrices .
  - **Checkpoint**: `ProgectPytorch/results/mmt_vit/mmt_vit_results/mmt-ViT_epoch/` (not included).
  - 
- **Reproducing Results**:
  1. Download and place see `docs/dataset_description.md`.
  2. Run preprocessing: `bash scripts/preprocess_data.sh`.
  3. Run training and testing: `bash scripts/run_mmt_vit.sh`. This executes `ProgectPytorch/mmt-ViT_multimodal_model.py`.
  4. Check results in `ProgectPytorch/results/mmt_vit/`.

### malimg (25 classes, fine-tuning)
- **Dataset**: `malimg` (25 classes): Grayscale images for malware classification.
- **Output Location**: `ProgectPytorch/results/mmt_vit/big2015_yz/yz_results_25/`
  - **Metrics**: accuracy, precision, recall, F1 score.
  - **Visualizations**: Confusion matrices .
  - **Checkpoint**: `ProgectPytorch/results/mmt_vit/big2015_yz/yz_results_25/` (not included).
- **Reproducing Results**:
  1. Download and place see `docs/dataset_description.md`.
  2. Run fine-tuning: `bash scripts/run_mmt_vit_finetune.sh`. This executes `ProgectPytorch/mmt-ViT_multimodal_wt_yz_25.py`.
  3. Check results in `ProgectPytorch/results/mmt_vit/big2015_yz/yz_results_25/`.

### Malevis_malimg (31 classes, fine-tuning)
- **Dataset**: `Malevis_malimg` (31 classes): RGB and grayscale byteplot images.
- **Output Location**: `ProgectPytorch/results/mmt_vit/big2015_yz/yz_results_31/`
  - **Metrics**: accuracy, precision, recall, F1 score.
  - **Visualizations**: Confusion matrices.
  - **Checkpoint**: `ProgectPytorch/results/mmt_vit/big2015_yz/yz_results_31/` (not included).
- **Reproducing Results**:
  1. Download and place see `docs/dataset_description.md`.
  2. Run fine-tuning: `bash scripts/run_mmt_vit_finetune.sh`. This executes `ProgectPytorch/mmt-ViT_multimodal_wt_yz_31.py`.
  4. Check results in `ProgectPytorch/results/mmt_vit/big2015_yz/yz_results_31/`.

## Notes
- **Checkpoints**: Due to ICLR size constraints (<100MB), checkpoint files (`.pth`) are not included in the submission package. Run the training scripts to generate them.
- **Hardware**: NVIDIA RTX 4070 or higher with CUDA 11.8 is recommended. Adjust `cfg.BATCH_SIZE` in `ProgectPytorch/config.py` if GPU memory is limited.
- **Reproducibility**: Random seed is fixed at 42 (`cfg.SEED=42`) in `ProgectPytorch/config.py` to ensure reproducible results.
- **Dependencies**: Install dependencies using `pip install -r requirements.txt` before running scripts.
- **Running Order**: Execute scripts in the following order:
  1. `bash scripts/run_dgsm_scam_gat.sh`
  2. `bash scripts/run_dgsm_scam_gat_fine-tuning.sh`
  3. `bash scripts/preprocess_data.sh`
  4. `bash scripts/run_mmt_vit.sh`
  5. `bash scripts/run_mmt_vit_finetune.sh`