# RAM++ ADE20K Adapter Usage Guide

## 📋 Overview

This implementation adds an MLP adapter to RAM++ for ADE20K classification. The adapter maps RAM++'s 4584-dimensional output to ADE20K's 150 classes.

## 🏗️ Architecture

```
Input Image → RAM++ (frozen) → [4584 logits] → MLP Adapter → [150 ADE20K logits] → Predictions
```

**MLP Adapter Architecture:**
- 4584 → 2048 → 1024 → 150
- ReLU activations + Dropout(0.3)
- Only adapter parameters are trained (RAM++ frozen)

## 🚀 Quick Start

### 1. Training the Adapter

```bash
cd /home/gyf/iclr/recognize-anything

python cltag/training/train_ade20k.py \
    --ram-checkpoint /path/to/ram_plus_swin_large_14m.pth \
    --ade20k-root /home/gyf/iclr/recognize-anything/ADE20K \
    --batch-size 16 \
    --epochs 50 \
    --lr 1e-4 \
    --output-dir ./logs/ade20k_training \
    --device cuda:0
```

### 2. Inference with Trained Model

```python
from cltag.models.ram_plus_ade20k import load_ram_plus_ade20k_pretrained
import torch
from PIL import Image
import torchvision.transforms as transforms

# Load model
model = load_ram_plus_ade20k_pretrained(
    ram_plus_checkpoint='/path/to/ram_plus_swin_large_14m.pth',
    ade20k_adapter_checkpoint='/path/to/trained_adapter.pth',
    device='cuda:0'
)

# Prepare image
transform = transforms.Compose([
    transforms.Resize((384, 384)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                       std=[0.229, 0.224, 0.225])
])

image = Image.open('your_image.jpg')
image_tensor = transform(image).unsqueeze(0)

# Make predictions
with torch.no_grad():
    predictions, probabilities = model.predict(image_tensor)
    predicted_classes = model.predict_classes(image_tensor)

print(f"Predicted classes: {predicted_classes[0]}")
```

### 3. Evaluation on ADE20K

```python
from cltag.models.ram_plus_ade20k import load_ram_plus_ade20k_pretrained
from cltag.datasets.ade20k_dataset import ADE20KDataset
from torch.utils.data import DataLoader

# Load model
model = load_ram_plus_ade20k_pretrained(
    ram_plus_checkpoint='/path/to/ram_plus.pth',
    ade20k_adapter_checkpoint='/path/to/adapter.pth'
)

# Create validation dataset
val_dataset = ADE20KDataset(split='val')
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

# Evaluate
model.eval()
all_predictions = []
all_targets = []

with torch.no_grad():
    for batch in val_loader:
        images, targets = batch['image'], batch['labels']
        predictions, probs = model.predict(images)
        
        all_predictions.append(probs)
        all_targets.append(targets)

# Calculate metrics...
```

## 📁 File Structure

```
cltag/
├── models/
│   └── ram_plus_ade20k.py          # Main model implementation
├── training/
│   └── train_ade20k.py             # Training script
├── datasets/
│   └── ade20k_dataset.py           # ADE20K dataset (already exists)
├── docs/
│   └── ADE20K_USAGE.md             # This guide
└── test_ade20k_model.py            # Test script
```

## 🔧 Key Classes

### `RAM_plus_ADE20K`
Main model class that wraps RAM++ with MLP adapter.

**Key Methods:**
- `forward(image)` → ADE20K logits
- `predict(image, threshold)` → binary predictions + probabilities  
- `predict_classes(image)` → class name lists

### `load_ram_plus_ade20k_pretrained()`
Utility function to load pretrained model with adapter weights.

## 🎯 Training Configuration

**Recommended Settings:**
- Batch size: 16-32 (depending on GPU memory)
- Learning rate: 1e-4
- Epochs: 30-50
- Optimizer: Adam with weight decay 1e-5
- Loss: BCEWithLogitsLoss

**Training Strategy:**
- Freeze RAM++ backbone completely
- Only train MLP adapter parameters (~12M parameters)
- Use standard data augmentation for training set

## 📊 Expected Performance

**Zero-shot baseline (existing mapping):**
- Around 10-20% mAP on ADE20K validation

**With trained adapter:**
- Expected 25-40% mAP improvement
- Better precision/recall on common classes

## 🐛 Troubleshooting

1. **CUDA out of memory:**
   - Reduce batch size
   - Use gradient accumulation

2. **Poor performance:**
   - Check data preprocessing matches RAM++ training
   - Verify ADE20K dataset loading
   - Try different learning rates

3. **Model loading errors:**
   - Ensure RAM++ checkpoint path is correct
   - Check adapter checkpoint format

## 📈 Next Steps

1. **Train the adapter** using the provided training script
2. **Evaluate performance** on ADE20K validation set
3. **Fine-tune hyperparameters** if needed
4. **Compare with zero-shot baseline** using existing mapping

The implementation is ready to use! Just ensure you have the RAM++ pretrained weights and ADE20K dataset properly set up.