# Test-Time Model Merging (TTMM) Implementation

This directory contains the implementation of the Test-Time Model Merging approach, which creates specialized language model adapters and merges them at inference time based on input context.

## File Descriptions

### bisectingCluster.py
Performs clustering on dataset embeddings using BisectingKMeans algorithm. Takes a configuration file as input and divides the dataset into k clusters (specified in config). Saves cluster assignments and centroids.

### createDatasetEmbeddings.py
Generates embeddings for the entire dataset using SentenceTransformer. Processes data in batches to handle large datasets efficiently with memory-mapped arrays and checkpointing. These embeddings are used for clustering.

### createPerClusterEmbeddings.py
Computes mean embeddings for each cluster and normalizes them. Combines all cluster embeddings into a single tensor and saves the result for expert selection during inference.

### generateDataset.py
Takes the full dataset and cluster assignments, adds cluster labels to the dataset, and splits it by cluster. Saves each cluster as a separate dataset file for training specialized adapters.

### trainExpertAdapters.py
Trains LoRA adapters for each cluster using Parameter-Efficient Fine-Tuning (PEFT). Configures training arguments and saves the trained adapters for later use in the TTMM model.

### TTMM.py
Core implementation of the Test-Time Model Merging approach. Loads and merges adapters based on input similarity to cluster embeddings using an RBF kernel to determine adapter weights.

### testHarness.py
Evaluation script for testing the TTMM approach. Loads test datasets, evaluates model performance, and computes perplexity metrics. Takes various command-line arguments to configure testing.

### config.json
Configuration file that controls the behavior of the entire TTMM pipeline. Contains parameters such as:
- Number of clusters for bisecting K-means
- Paths for dataset, embeddings, and adapter storage
This file is used across multiple scripts to maintain consistent settings throughout the workflow.

## Workflow

1. Generate embeddings for the dataset (createDatasetEmbeddings.py)
2. Cluster the dataset using these embeddings (bisectingCluster.py)
3. Create per-cluster embeddings for the router (createPerClusterEmbeddings.py)
4. Generate dataset splits based on clusters (generateDataset.py)
5. Train expert adapters for each cluster (trainExpertAdapters.py)
6. Use TTMM for inference by dynamically merging adapters (TTMM.py)
7. Evaluate performance using the test harness (testHarness.py)

## Benchmarking Test-Time Efficiency

The directory `benchmark_merging` contains Python programs benchmarking generation speed, selection speed, and merging speed.
