# Install dependencies
pip install -r requirements.txt
# Verify installation
python -c "import gradient_attribution; print('✅ Installation successful!')"
# Set up environment variables and cache directories
python setup_environment.py
# Run the complete TIDPO training pipeline
python run_tidpo_example.py
This script will:
.cache/ directorypython -u train.py \
model=gpt2_small \
datasets=[hh] \
loss=sft \
exp_name=my_experiment \
batch_size=4 \
eval_batch_size=4 \
n_epochs=1 \
lr=1e-5 \
max_length=256 \
max_prompt_length=128 \
gradient_accumulation_steps=1 \
activation_checkpointing=true
python -u train.py \
model=gpt2_small \
datasets=[hh] \
loss=tidpo \
exp_name=my_experiment \
batch_size=4 \
eval_batch_size=4 \
n_epochs=1 \
lr=1e-5 \
max_length=256 \
max_prompt_length=128 \
gradient_accumulation_steps=1 \
activation_checkpointing=true
TIDPO extends TDPO, providing more fine-grained control over preference learning:
L_TDPO = -log σ(β * Σ_t [log π_θ(y_t) - log π_ref(y_t)] - α * δ)
TIDPO introduces token importance weights based on gradient attribution:
L_TIDPO = -log σ(β * Σ_t w_t * [log π_θ(y_t) - log π_ref(y_t)] - α * δ)
Where w_t is the importance weight calculated using gradient attribution.
TIDPO incorporates triplet loss to enhance training by learning better representations:
L_triplet = max(d(anchor, positive) - d(anchor, negative) + margin, 0)
Where:
anchor: Reference model outputspositive: Chosen responsesnegative: Rejected responsesd(·,·): Distance function (typically L2 norm)margin: Minimum distance margin (default: 0.2)The complete TIDPO loss combines both components:
L_total = L_TIDPO + α_triplet * L_triplet
Where α_triplet controls the weight of triplet loss (default: 0.2).
The gradient attribution module calculates token importance by:
The complete training pipeline consists of two stages:
Key configuration files:
config/config.yaml: Main configurationconfig/loss/tidpo.yaml: TIDPO-specific parametersconfig/model/gpt2_small.yaml: Model configurationconfig/config_memory_optimized.yaml: Memory-optimized settingsgpt2_small: GPT-2 small (124M parameters)gpt2_large: GPT-2 large (774M parameters)pythia28: Pythia-2.8Bpythia69: Pythia-6.9Bllama7b: LLaMA-7Bmistral7b: Mistral-7Bmistral7b_instruct: Mistral-7B-Instructllama3b: LLaMA-3Bhh: Anthropic's Helpful-Harmful datasetshp: Stanford Human Preferences datasetse: StackExchange datasetMMLU, TruthfulQA, GSM8K, MTBench, etc.
# config/loss/tidpo.yaml
name: tidpo
use_tidpo: true # Enable TIDPO
alpha_triplet: 0.2 # Triplet loss weight
gamma: 0.1 # Loss combination weight
enable_gradient_attribution: true # Enable gradient attribution
alpha: 0.5 # KL divergence weight
beta: 0.1 # Temperature parameter
For limited GPU memory:
# config/config_memory_optimized.yaml
batch_size: 4
eval_batch_size: 4
max_length: 512
max_prompt_length: 256
gradient_accumulation_steps: 1
activation_checkpointing: true
Recommended settings:
| Parameter | SFT | TIDPO |
|---|---|---|
| Learning Rate | 1e-5 | 1e-5 |
| Batch Size | 4-16 | 4-16 |
| Epochs | 1 | 1-3 |
| Max Length | 256 | 256 |
| Gradient Accumulation | 1-4 | 1-4 |
from gradient_attribution import compute_language_model_gradient_attribution
# Calculate token importance
tokens, importances = compute_language_model_gradient_attribution(
model=model,
tokenizer=tokenizer,
text="Your input text here",
device=device
)
def custom_importance_function(model, tokenizer, text, device):
# Implement your custom importance calculation
tokens, importances = compute_language_model_gradient_attribution(
model, tokenizer, text, device
)
# Apply your custom logic
return modified_importances
TIDPO includes triplet loss for enhanced training:
# Triplet loss is automatically computed when alpha_triplet > 0
alpha_triplet: 0.2 # Enable triplet loss
Run the comprehensive test suite:
# Test gradient attribution
python test_gradient_attribution.py
# Test TIDPO functionality
python test_tidpo.py
# Test triplet loss
python test_triplet_loss.py
# Test batch processing
python test_batch_size_fix.py
# Debug batch issues
python debug_batch_issue.py
# Monitor training progress
tail -f .cache/your_experiment_name_*/train.log
# Check GPU usage
nvidia-smi -l 1
# Enable debug mode for detailed output
python -u train.py ... debug=true
Symptoms: CUDA out of memory errors
Solutions:
batch_size: 2activation_checkpointing: trueconfig/config_memory_optimized.yamlgradient_accumulation_steps: 4Symptoms: "can't retain_grad on Tensor that has requires_grad=False"
Solutions:
inputs_embedsSymptoms: Loss becomes NaN during training
Solutions:
float32 precision: policy_dtype: float32lr: 1e-6max_grad_norm: 1.0Symptoms: "cannot reshape tensor of 0 elements"
Solutions:
batch_size: 4float16 for faster trainingmax_length for memory constraintsWe welcome contributions! Please follow these steps:
git checkout -b feature-namepython -m pytest tests/# Install development dependencies
pip install -r requirements.txt
# Run tests
python -m pytest tests/
# Run linting
flake8 .
# Run type checking
mypy .
This project is licensed under the MIT License - see the LICENSE file for details.
Note: This is a research implementation. For production use, additional testing and optimization may be required.