# TASM: Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning

## Overview

TASM (Task-Aware Structured Memory) is a novel framework for extending context length in Multi-modal Large Language Models (MLLMs) for In-Context Learning (ICL). TASM addresses the fundamental limitations of existing KV cache compression methods through three key innovations.

## Installation

```bash
cd TASM

# Install dependencies
pip install -r requirements.txt

# Install the modified transformers
cd transformers
pip install -e .
cd ..

# Install lmms-eval
cd lmms-eval
pip install -e .
cd ..

# Install qwen-vl-utils
pip install qwen-vl-utils
```

## Quick Start

### Using TASM with Qwen2-VL-7B-Instruct

```bash
# Run evaluation with TASM
accelerate launch --num_processes 1 -m lmms_eval \
    --model qwen2_vl_tasm \
    --model_args pretrained=Qwen/Qwen2-VL-7B-Instruct,num_fewshot=20,\
enable_token_merging=True,enable_dynamic_retrieval=True,\
target_compression_ratio=0.2 \
    --tasks illusionvqa_soft_localization_fewshot \
    --batch_size 1 \
    --output_path ./logs/
```

### Configuration Options

| Parameter | Default | Description |
|-----------|---------|-------------|
| `task_vector_method` | `combined` | Task vector extraction method: `embedding_diff`, `head_activation`, `combined` |
| `enable_token_merging` | `True` | Enable semantics-aware token merging |
| `preserve_spatial` | `True` | Preserve spatial locality for visual tokens |
| `merge_similarity_threshold` | `0.5` | Minimum similarity for token merging |
| `enable_dynamic_retrieval` | `True` | Enable query-adaptive dynamic activation |
| `core_compression_ratio` | `0.1` | Compression ratio for core memory |
| `latent_compression_ratio` | `0.5` | Compression ratio for latent bank |
| `retrieval_top_k` | `32` | Number of tokens to retrieve per query |
| `target_compression_ratio` | `0.2` | Overall target compression ratio |
| `js_threshold` | `0.005` | JS divergence threshold for adaptive compression |

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    TASM Framework                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────────────────────────────────────────────┐   │
│  │          Innovation 1: Task-Vector Guided            │   │
│  │                                                       │   │
│  │   ICL Examples → Task Vector Extraction →            │   │
│  │   Projection-based Importance Scoring                │   │
│  └─────────────────────────────────────────────────────┘   │
│                           ↓                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │          Innovation 2: Semantic Token Merging        │   │
│  │                                                       │   │
│  │   Low-importance tokens → Bipartite Matching →       │   │
│  │   Merge into high-importance tokens                  │   │
│  └─────────────────────────────────────────────────────┘   │
│                           ↓                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │          Innovation 3: Dynamic Memory                │   │
│  │                                                       │   │
│  │   ┌──────────────┐    ┌─────────────────────────┐   │   │
│  │   │ Core Memory  │ ← ─│ Query-Adaptive Retrieval│   │   │
│  │   │ (GPU, 10%)   │    │ from Latent Bank (CPU)  │   │   │
│  │   └──────────────┘    └─────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘
```
