# LLM-DAMVC: Experimental Reproduction Guide

## 1. Overview

This document provides all information required to reproduce the experimental results of the LLM-DAMVC (Large Language Model-guided Dynamic Adversarial Multi-View Clustering) framework. We designed this guide to ensure the research community can fully replicate our experimental outcomes and validate our findings.

## 2. Code Structure and Environment Setup

### 2.1 Environment Requirements

```plaintext
Python >= 3.8  
PyTorch >= 1.10.0  
CUDA >= 11.3 (recommended)  
numpy >= 1.20.0  
scipy >= 1.7.0  
scikit-learn >= 1.0.0  
tqdm >= 4.62.0  
ollama >= 0.1.14  
matplotlib >= 3.5.0  
```

### 2.2 Installation Steps

```bash
# Create a virtual environment  
conda create -n llm-damvc python=3.8  
conda activate llm-damvc  

# Install dependencies  
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html  
pip install numpy==1.23.5 scipy==1.9.3 scikit-learn==1.1.3 tqdm==4.64.1 matplotlib==3.6.2  
pip install ollama==0.1.14  
```

### 2.3 LLM Setup

```bash
# Install Ollama (https://ollama.com/)  
# Linux  
curl -fsSL https://ollama.com/install.sh | sh  
# Windows/macOS: Download the installer from https://ollama.com/download  

# Pull required models  
ollama pull deepseek-r1:1.5b  
```

## 3. Dataset Preparation

We use multiple standard multi-view clustering datasets. All datasets should be stored in the `datasets/` directory.

### 3.1 Dataset Download

| Dataset    | File Format |
| ---------- | ----------- |
| Reuters    | .mat        |
| MNIST-USPS | .mat        |
| BDGP       | .mat        |
| CCV        | .mat        |
| hand       | .mat        |
| NUS        | .mat        |

### 3.2 Data Format Requirements

All datasets must be saved in `.mat` format with the following structure:

- For single-file datasets: Include an `X` matrix (view data) and `Y`/`truth` (labels).
- For multi-view separated datasets: Include `X1`, `X2`, ..., `Xn` (data for each view) and `Y` (labels).

## 4. Reproducing Experimental Results

### 4.1 Core Configuration Parameters

The following table lists configuration parameters for all datasets:

| Dataset    | batch_size | learning_rate | num_epochs | weight_decay | Special Parameters |
| ---------- | ---------- | ------------- | ---------- | ------------ | ------------------ |
| NUS        | 256        | 0.001         | 200        | 0.0001       | -                  |
| CCV        | 256        | 0.001         | 200        | 0.0001       | -                  |
| Reuters    | 60         | 0.001         | 200        | 0.0001       | -                  |
| MNIST-USPS | 256        | 0.0001        | 100        | 0.0001       | -                  |
| BDGP       | 128        | 0.0001        | 100        | 0.0001       | -                  |
| hand       | 200        | 0.001         | 200        | 0.0001       | -                  |

### 4.2 Running Experiments

```python
config = {  
    "dataset": "MNIST-USPS",         # Dataset name  
    "batch_size": 256,             # Batch size  
    "learning_rate": 0.001,       # Learning rate  
    "num_epochs": 200,            # Total training epochs  
    "llm_epochs": 10,             # LLM intervention epochs  
    "min_epoch": 100,             # LLM start epoch  
    "weight_decay": 0.0001,       # Weight decay  
    "momentum": 0.9,              # Momentum  
    "patience": 20,               # Early stopping patience  
    "scheduler": "cosine",        # Learning rate scheduler  
    "warmup_epochs": 10,          # Warm-up epochs  
    "min_lr": 1e-5,               # Minimum learning rate  
    "mixup_alpha": 0.1,           # Mixup strength  
    "use_self_gating": True,      # Enable self-gating  
    "self_gating_only_final": True # Apply Self-Gating only to the final layer  
}  

trainer = MultiviewClusterTrainer(config)  
trainer.train()  
```

Outputs for each dataset will be saved in the `outputs/{dataset_name}/` directory, including:



- Model checkpoints (`models/` directory)

- Training logs (`logs/` directory)

- Performance metrics (`metrics/` directory)

- Visualization results (`visualizations/` directory)

  



# LLM-DAMVC: 实验复现指南

## 1. 概述

本文档提供了复现 LLM-DAMVC (Large Language Model-guided Dynamic Adversarial Multi-View Clustering) 框架实验结果所需的全部信息。我们设计此指南确保研究社区能够完全复现我们的实验结果，验证我们的研究发现。

## 2. 代码结构和环境设置

### 2.1 环境需求

```
Python >= 3.8
PyTorch >= 1.10.0
CUDA >= 11.3 (推荐)
numpy >= 1.20.0
scipy >= 1.7.0
scikit-learn >= 1.0.0
tqdm >= 4.62.0
ollama >= 0.1.14
matplotlib >= 3.5.0
```

### 2.2 安装步骤

```bash
# 创建虚拟环境
conda create -n llm-damvc python=3.8
conda activate llm-damvc

# 安装依赖
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install numpy==1.23.5 scipy==1.9.3 scikit-learn==1.1.3 tqdm==4.64.1 matplotlib==3.6.2
pip install ollama==0.1.14
```

### 2.3 LLM设置

```bash
# 安装Ollama (https://ollama.com/)
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows/macOS: 从https://ollama.com/download下载安装程序

# 拉取必要的模型
ollama pull deepseek-r1:1.5b
```

## 3. 数据集准备

我们使用了多个标准多视图聚类数据集。所有数据集应存放在`datasets/`目录下。

### 3.1 数据集下载

| 数据集     | 文件格式 |
| ---------- | -------- |
| Reuters    | .mat     |
| MNIST-USPS | .mat     |
| BDGP       | .mat     |
| CCV        | .mat     |
| hand       | .mat     |
| NUS        | .mat     |

### 3.2 数据格式要求

所有数据集应以`.mat`格式保存，包含以下结构：
- 对于单文件数据集：包含`X`矩阵(视图数据)和`Y`/`truth`(标签)
- 对于多视图分离的数据集：包含`X1`, `X2`, ..., `Xn`(每个视图的数据)和`Y`(标签)

## 4. 复现实验结果

### 4.1 核心配置参数

以下表格提供了所有数据集的配置参数：

| 数据集     | batch_size | learning_rate | num_epochs | weight_decay | 特殊参数 |
| ---------- | ---------- | ------------- | ---------- | ------------ | -------- |
| NUS        | 256        | 0.001         | 200        | 0.0001       | -        |
| CCV        | 256        | 0.001         | 200        | 0.0001       | -        |
| Reuters    | 60         | 0.001         | 200        | 0.0001       | -        |
| MNIST-USPS | 256        | 0.0001        | 100        | 0.0001       | -        |
| BDGP       | 128        | 0.0001        | 100        | 0.0001       | -        |
| hand       | 200        | 0.001         | 200        | 0.0001       | -        |

### 4.2 运行实验

```python

config = {
    "dataset": "MNIST-USPS",         # 数据集名称
    "batch_size": 256,             # 批次大小
    "learning_rate": 0.001,       # 学习率
    "num_epochs": 200,            # 总训练轮数
    "llm_epochs": 10,             # LLM介入轮数
    "min_epoch": 100,             # LLM开始轮次
    "weight_decay": 0.0001,       # 权重衰减
    "momentum": 0.9,              # 动量
    "patience": 20,               # 早停耐心值
    "scheduler": "cosine",        # 学习率调度器
    "warmup_epochs": 10,          # 预热轮数
    "min_lr": 1e-5,               # 最小学习率
    "mixup_alpha": 0.1,           # Mixup强度
    "use_self_gating": True,      # 使用自门控
    "self_gating_only_final": True # 只在最后一层使用Self-Gating
}

trainer = MultiviewClusterTrainer(config)
trainer.train()
```

为每个数据集生成的输出将保存在`outputs/{dataset_name}/`目录下，包含：
- 模型检查点 (`models/` 目录)
- 训练日志 (`logs/` 目录)
- 性能指标 (`metrics/` 目录)
- 可视化结果 (`visualizations/` 目录)





