<!DOCTYPE html>
<html>

<head>
    <title>README.md</title>
    <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
    
<style>
/* https://github.com/microsoft/vscode/blob/master/extensions/markdown-language-features/media/markdown.css */
/*---------------------------------------------------------------------------------------------
 *  Copyright (c) Microsoft Corporation. All rights reserved.
 *  Licensed under the MIT License. See License.txt in the project root for license information.
 *--------------------------------------------------------------------------------------------*/

body {
	font-family: var(--vscode-markdown-font-family, -apple-system, BlinkMacSystemFont, "Segoe WPC", "Segoe UI", "Ubuntu", "Droid Sans", sans-serif);
	font-size: var(--vscode-markdown-font-size, 14px);
	padding: 0 26px;
	line-height: var(--vscode-markdown-line-height, 22px);
	word-wrap: break-word;
}

html,footer,header{
	font-family: var(--vscode-markdown-font-family, -apple-system, BlinkMacSystemFont, "Segoe WPC", "Segoe UI", "Ubuntu", "Droid Sans", sans-serif);
	font-size: var(--vscode-markdown-font-size, 14px);
}

#code-csp-warning {
	position: fixed;
	top: 0;
	right: 0;
	color: white;
	margin: 16px;
	text-align: center;
	font-size: 12px;
	font-family: sans-serif;
	background-color:#444444;
	cursor: pointer;
	padding: 6px;
	box-shadow: 1px 1px 1px rgba(0,0,0,.25);
}

#code-csp-warning:hover {
	text-decoration: none;
	background-color:#007acc;
	box-shadow: 2px 2px 2px rgba(0,0,0,.25);
}

body.scrollBeyondLastLine {
	margin-bottom: calc(100vh - 22px);
}

body.showEditorSelection .code-line {
	position: relative;
}

body.showEditorSelection .code-active-line:before,
body.showEditorSelection .code-line:hover:before {
	content: "";
	display: block;
	position: absolute;
	top: 0;
	left: -12px;
	height: 100%;
}

body.showEditorSelection li.code-active-line:before,
body.showEditorSelection li.code-line:hover:before {
	left: -30px;
}

.vscode-light.showEditorSelection .code-active-line:before {
	border-left: 3px solid rgba(0, 0, 0, 0.15);
}

.vscode-light.showEditorSelection .code-line:hover:before {
	border-left: 3px solid rgba(0, 0, 0, 0.40);
}

.vscode-light.showEditorSelection .code-line .code-line:hover:before {
	border-left: none;
}

.vscode-dark.showEditorSelection .code-active-line:before {
	border-left: 3px solid rgba(255, 255, 255, 0.4);
}

.vscode-dark.showEditorSelection .code-line:hover:before {
	border-left: 3px solid rgba(255, 255, 255, 0.60);
}

.vscode-dark.showEditorSelection .code-line .code-line:hover:before {
	border-left: none;
}

.vscode-high-contrast.showEditorSelection .code-active-line:before {
	border-left: 3px solid rgba(255, 160, 0, 0.7);
}

.vscode-high-contrast.showEditorSelection .code-line:hover:before {
	border-left: 3px solid rgba(255, 160, 0, 1);
}

.vscode-high-contrast.showEditorSelection .code-line .code-line:hover:before {
	border-left: none;
}

img {
	max-width: 100%;
	max-height: 100%;
}

a {
	text-decoration: none;
}

a:hover {
	text-decoration: underline;
}

a:focus,
input:focus,
select:focus,
textarea:focus {
	outline: 1px solid -webkit-focus-ring-color;
	outline-offset: -1px;
}

hr {
	border: 0;
	height: 2px;
	border-bottom: 2px solid;
}

h1 {
	padding-bottom: 0.3em;
	line-height: 1.2;
	border-bottom-width: 1px;
	border-bottom-style: solid;
}

h1, h2, h3 {
	font-weight: normal;
}

table {
	border-collapse: collapse;
}

table > thead > tr > th {
	text-align: left;
	border-bottom: 1px solid;
}

table > thead > tr > th,
table > thead > tr > td,
table > tbody > tr > th,
table > tbody > tr > td {
	padding: 5px 10px;
}

table > tbody > tr + tr > td {
	border-top: 1px solid;
}

blockquote {
	margin: 0 7px 0 5px;
	padding: 0 16px 0 10px;
	border-left-width: 5px;
	border-left-style: solid;
}

code {
	font-family: Menlo, Monaco, Consolas, "Droid Sans Mono", "Courier New", monospace, "Droid Sans Fallback";
	font-size: 1em;
	line-height: 1.357em;
}

body.wordWrap pre {
	white-space: pre-wrap;
}

pre:not(.hljs),
pre.hljs code > div {
	padding: 16px;
	border-radius: 3px;
	overflow: auto;
}

pre code {
	color: var(--vscode-editor-foreground);
	tab-size: 4;
}

/** Theming */

.vscode-light pre {
	background-color: rgba(220, 220, 220, 0.4);
}

.vscode-dark pre {
	background-color: rgba(10, 10, 10, 0.4);
}

.vscode-high-contrast pre {
	background-color: rgb(0, 0, 0);
}

.vscode-high-contrast h1 {
	border-color: rgb(0, 0, 0);
}

.vscode-light table > thead > tr > th {
	border-color: rgba(0, 0, 0, 0.69);
}

.vscode-dark table > thead > tr > th {
	border-color: rgba(255, 255, 255, 0.69);
}

.vscode-light h1,
.vscode-light hr,
.vscode-light table > tbody > tr + tr > td {
	border-color: rgba(0, 0, 0, 0.18);
}

.vscode-dark h1,
.vscode-dark hr,
.vscode-dark table > tbody > tr + tr > td {
	border-color: rgba(255, 255, 255, 0.18);
}

</style>

<style>
/* Tomorrow Theme */
/* http://jmblog.github.com/color-themes-for-google-code-highlightjs */
/* Original theme - https://github.com/chriskempson/tomorrow-theme */

/* Tomorrow Comment */
.hljs-comment,
.hljs-quote {
	color: #8e908c;
}

/* Tomorrow Red */
.hljs-variable,
.hljs-template-variable,
.hljs-tag,
.hljs-name,
.hljs-selector-id,
.hljs-selector-class,
.hljs-regexp,
.hljs-deletion {
	color: #c82829;
}

/* Tomorrow Orange */
.hljs-number,
.hljs-built_in,
.hljs-builtin-name,
.hljs-literal,
.hljs-type,
.hljs-params,
.hljs-meta,
.hljs-link {
	color: #f5871f;
}

/* Tomorrow Yellow */
.hljs-attribute {
	color: #eab700;
}

/* Tomorrow Green */
.hljs-string,
.hljs-symbol,
.hljs-bullet,
.hljs-addition {
	color: #718c00;
}

/* Tomorrow Blue */
.hljs-title,
.hljs-section {
	color: #4271ae;
}

/* Tomorrow Purple */
.hljs-keyword,
.hljs-selector-tag {
	color: #8959a8;
}

.hljs {
	display: block;
	overflow-x: auto;
	color: #4d4d4c;
	padding: 0.5em;
}

.hljs-emphasis {
	font-style: italic;
}

.hljs-strong {
	font-weight: bold;
}
</style>

<style>
/*
 * Custom MD PDF CSS
 */
html,footer,header{
	font-family: -apple-system, BlinkMacSystemFont, "Segoe WPC", "Segoe UI", "Ubuntu", "Droid Sans", sans-serif, "Meiryo";

 }
body {
	font-family: -apple-system, BlinkMacSystemFont, "Segoe WPC", "Segoe UI", "Ubuntu", "Droid Sans", sans-serif, "Meiryo";
	padding: 0 12px;
}

pre {
	background-color: #f8f8f8;
	border: 1px solid #cccccc;
	border-radius: 3px;
	overflow-x: auto;
	white-space: pre-wrap;
	overflow-wrap: break-word;
}

pre:not(.hljs) {
	padding: 23px;
	line-height: 19px;
}

blockquote {
	background: rgba(127, 127, 127, 0.1);
	border-color: rgba(0, 122, 204, 0.5);
}

.emoji {
	height: 1.4em;
}

code {
	font-size: 14px;
	line-height: 19px;
}

/* for inline code */
:not(pre):not(.hljs) > code {
	color: #C9AE75; /* Change the old color so it seems less like an error */
	font-size: inherit;
}

/* Page Break : use <div class="page"/> to insert page break
-------------------------------------------------------- */
.page {
	page-break-after: always;
}

</style>
<link rel="stylesheet" href="file:///r%3A/2.Travail/1.Enseignement/Cours/_1.Outils/2.Developpement/1.SCSS/main.css" type="text/css"><link rel="stylesheet" href="file:///d%3A/rdaros/Cours/_1.Outils/2.Developpement/1.SCSS/main.css" type="text/css">
</head>

<body>
    <h1 id="token-importance-direct-preference-optimization-tidpo">Token-importance Direct Preference Optimization (TIDPO)</h1>
<p><a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a><br>
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python 3.8+"></a><br>
<a href="https://pytorch.org/"><img src="https://img.shields.io/badge/PyTorch-1.10+-red.svg" alt="PyTorch"></a></p>
<h2 id="%F0%9F%9A%80-features">🚀 Features</h2>
<ul>
<li><strong>TIDPO Extension</strong>: Token Importance DPO with gradient attribution</li>
<li><strong>Gradient Attribution</strong>: Advanced token importance calculation using gradient-based attribution</li>
<li><strong>Memory Optimization</strong>: Efficient memory usage with gradient checkpointing and mixed precision</li>
<li><strong>Multiple Model Support</strong>: Support for Mistral, Llama, GPT-2, Pythia, and other transformer models</li>
<li><strong>Comprehensive Testing</strong>: Extensive test suite for all components</li>
<li><strong>Easy Configuration</strong>: YAML-based configuration system</li>
</ul>
<h2 id="%F0%9F%93%8B-table-of-contents">📋 Table of Contents</h2>
<ul>
<li><a href="#installation">Installation</a></li>
<li><a href="#quick-start">Quick Start</a></li>
<li><a href="#core-concepts">Core Concepts</a></li>
<li><a href="#usage">Usage</a></li>
<li><a href="#configuration">Configuration</a></li>
<li><a href="#advanced-features">Advanced Features</a></li>
<li><a href="#troubleshooting">Troubleshooting</a></li>
<li><a href="#contributing">Contributing</a></li>
<li><a href="#citation">Citation</a></li>
</ul>
<h2 id="%F0%9F%94%A7-installation">🔧 Installation</h2>
<h3 id="prerequisites">Prerequisites</h3>
<ul>
<li>Python 3.8+</li>
<li>PyTorch 1.10+</li>
<li>CUDA (optional, for GPU acceleration)</li>
</ul>
<h3 id="install-dependencies">Install Dependencies</h3>
<pre class="hljs"><code><div>
<span class="hljs-comment"># Install dependencies</span>
pip install -r requirements.txt

<span class="hljs-comment"># Verify installation</span>
python -c <span class="hljs-string">"import gradient_attribution; print('✅ Installation successful!')"</span>
</div></code></pre>
<h3 id="environment-setup">Environment Setup</h3>
<pre class="hljs"><code><div><span class="hljs-comment"># Set up environment variables and cache directories</span>
python setup_environment.py
</div></code></pre>
<h2 id="%F0%9F%9A%80-quick-start">🚀 Quick Start</h2>
<h3 id="method-1-use-the-example-script-recommended">Method 1: Use the Example Script (Recommended)</h3>
<pre class="hljs"><code><div><span class="hljs-comment"># Run the complete TIDPO training pipeline</span>
python run_tidpo_example.py
</div></code></pre>
<p>This script will:</p>
<ol>
<li>Perform supervised fine-tuning (SFT)</li>
<li>Run TIDPO training with gradient attribution</li>
<li>Save models and logs to <code>.cache/</code> directory</li>
</ol>
<h3 id="method-2-manual-training">Method 2: Manual Training</h3>
<h4 id="step-1-supervised-fine-tuning-sft">Step 1: Supervised Fine-tuning (SFT)</h4>
<pre class="hljs"><code><div>python -u train.py \
    model=gpt2_small \
    datasets=[hh] \
    loss=sft \
    exp_name=my_experiment \
    batch_size=4 \
    eval_batch_size=4 \
    n_epochs=1 \
    lr=1e-5 \
    max_length=256 \
    max_prompt_length=128 \
    gradient_accumulation_steps=1 \
    activation_checkpointing=<span class="hljs-literal">true</span>
</div></code></pre>
<h4 id="step-2-tidpo-training">Step 2: TIDPO Training</h4>
<pre class="hljs"><code><div>python -u train.py \
    model=gpt2_small \
    datasets=[hh] \
    loss=tidpo \
    exp_name=my_experiment \
    batch_size=4 \
    eval_batch_size=4 \
    n_epochs=1 \
    lr=1e-5 \
    max_length=256 \
    max_prompt_length=128 \
    gradient_accumulation_steps=1 \
    activation_checkpointing=<span class="hljs-literal">true</span>
</div></code></pre>
<h2 id="%F0%9F%A7%A0-core-concepts">🧠 Core Concepts</h2>
<h3 id="tidpo-algorithm">TIDPO Algorithm</h3>
<p>TIDPO extends TDPO, providing more fine-grained control over preference learning:</p>
<pre class="hljs"><code><div>L_TDPO = -log σ(β * Σ_t [log π_θ(y_t) - log π_ref(y_t)] - α * δ)
</div></code></pre>
<h3 id="tidpo-extension">TIDPO Extension</h3>
<p>TIDPO introduces token importance weights based on gradient attribution:</p>
<pre class="hljs"><code><div>L_TIDPO = -log σ(β * Σ_t w_t * [log π_θ(y_t) - log π_ref(y_t)] - α * δ)
</div></code></pre>
<p>Where <code>w_t</code> is the importance weight calculated using gradient attribution.</p>
<h3 id="triplet-loss-component">Triplet Loss Component</h3>
<p>TIDPO incorporates triplet loss to enhance training by learning better representations:</p>
<pre class="hljs"><code><div>L_triplet = max(d(anchor, positive) - d(anchor, negative) + margin, 0)
</div></code></pre>
<p>Where:</p>
<ul>
<li><code>anchor</code>: Reference model outputs</li>
<li><code>positive</code>: Chosen responses</li>
<li><code>negative</code>: Rejected responses</li>
<li><code>d(·,·)</code>: Distance function (typically L2 norm)</li>
<li><code>margin</code>: Minimum distance margin (default: 0.2)</li>
</ul>
<p>The complete TIDPO loss combines both components:</p>
<pre class="hljs"><code><div>L_total = L_TIDPO + α_triplet * L_triplet
</div></code></pre>
<p>Where <code>α_triplet</code> controls the weight of triplet loss (default: 0.2).</p>
<h3 id="gradient-attribution">Gradient Attribution</h3>
<p>The gradient attribution module calculates token importance by:</p>
<ol>
<li>Computing gradients with respect to input embeddings</li>
<li>Using L1 norm for importance scoring</li>
<li>Normalizing scores for stable training</li>
<li>Applying mixed strategy with Gaussian prior for robustness</li>
</ol>
<h2 id="%F0%9F%93%96-usage">📖 Usage</h2>
<h3 id="training-pipeline">Training Pipeline</h3>
<p>The complete training pipeline consists of two stages:</p>
<ol>
<li><strong>Supervised Fine-tuning (SFT)</strong>: Pre-train the model on preference data</li>
<li><strong>TIDPO Training</strong>: Apply token importance preference optimization</li>
</ol>
<h3 id="configuration-files">Configuration Files</h3>
<p>Key configuration files:</p>
<ul>
<li><code>config/config.yaml</code>: Main configuration</li>
<li><code>config/loss/tidpo.yaml</code>: TIDPO-specific parameters</li>
<li><code>config/model/gpt2_small.yaml</code>: Model configuration</li>
<li><code>config/config_memory_optimized.yaml</code>: Memory-optimized settings</li>
</ul>
<h3 id="available-models">Available Models</h3>
<ul>
<li><code>gpt2_small</code>: GPT-2 small (124M parameters)</li>
<li><code>gpt2_large</code>: GPT-2 large (774M parameters)</li>
<li><code>pythia28</code>: Pythia-2.8B</li>
<li><code>pythia69</code>: Pythia-6.9B</li>
<li><code>llama7b</code>: LLaMA-7B</li>
<li><code>mistral7b</code>: Mistral-7B</li>
<li><code>mistral7b_instruct</code>: Mistral-7B-Instruct</li>
<li><code>llama3b</code>: LLaMA-3B</li>
</ul>
<h3 id="available-datasets">Available Datasets</h3>
<ul>
<li><code>hh</code>: Anthropic's Helpful-Harmful dataset</li>
<li><code>shp</code>: Stanford Human Preferences dataset</li>
<li><code>se</code>: StackExchange dataset</li>
</ul>
<p>MMLU, TruthfulQA, GSM8K, MTBench, etc.</p>
<h2 id="%E2%9A%99%EF%B8%8F-configuration">⚙️ Configuration</h2>
<h3 id="tidpo-parameters">TIDPO Parameters</h3>
<pre class="hljs"><code><div><span class="hljs-comment"># config/loss/tidpo.yaml</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">tidpo</span>
<span class="hljs-attr">use_tidpo:</span> <span class="hljs-literal">true</span>              <span class="hljs-comment"># Enable TIDPO</span>
<span class="hljs-attr">alpha_triplet:</span> <span class="hljs-number">0.2</span>           <span class="hljs-comment"># Triplet loss weight</span>
<span class="hljs-attr">gamma:</span> <span class="hljs-number">0.1</span>                   <span class="hljs-comment"># Loss combination weight</span>
<span class="hljs-attr">enable_gradient_attribution:</span> <span class="hljs-literal">true</span>  <span class="hljs-comment"># Enable gradient attribution</span>
<span class="hljs-attr">alpha:</span> <span class="hljs-number">0.5</span>                   <span class="hljs-comment"># KL divergence weight</span>
<span class="hljs-attr">beta:</span> <span class="hljs-number">0.1</span>                    <span class="hljs-comment"># Temperature parameter</span>
</div></code></pre>
<h3 id="memory-optimization">Memory Optimization</h3>
<p>For limited GPU memory:</p>
<pre class="hljs"><code><div><span class="hljs-comment"># config/config_memory_optimized.yaml</span>
<span class="hljs-attr">batch_size:</span> <span class="hljs-number">4</span>
<span class="hljs-attr">eval_batch_size:</span> <span class="hljs-number">4</span>
<span class="hljs-attr">max_length:</span> <span class="hljs-number">512</span>
<span class="hljs-attr">max_prompt_length:</span> <span class="hljs-number">256</span>
<span class="hljs-attr">gradient_accumulation_steps:</span> <span class="hljs-number">1</span>
<span class="hljs-attr">activation_checkpointing:</span> <span class="hljs-literal">true</span>
</div></code></pre>
<h3 id="training-parameters">Training Parameters</h3>
<p>Recommended settings:</p>
<table>
<thead>
<tr>
<th>Parameter</th>
<th>SFT</th>
<th>TIDPO</th>
</tr>
</thead>
<tbody>
<tr>
<td>Learning Rate</td>
<td>1e-5</td>
<td>1e-5</td>
</tr>
<tr>
<td>Batch Size</td>
<td>4-16</td>
<td>4-16</td>
</tr>
<tr>
<td>Epochs</td>
<td>1</td>
<td>1-3</td>
</tr>
<tr>
<td>Max Length</td>
<td>256</td>
<td>256</td>
</tr>
<tr>
<td>Gradient Accumulation</td>
<td>1-4</td>
<td>1-4</td>
</tr>
</tbody>
</table>
<h2 id="%F0%9F%94%AC-advanced-features">🔬 Advanced Features</h2>
<h3 id="gradient-attribution">Gradient Attribution</h3>
<pre class="hljs"><code><div><span class="hljs-keyword">from</span> gradient_attribution <span class="hljs-keyword">import</span> compute_language_model_gradient_attribution

<span class="hljs-comment"># Calculate token importance</span>
tokens, importances = compute_language_model_gradient_attribution(
    model=model,
    tokenizer=tokenizer,
    text=<span class="hljs-string">"Your input text here"</span>,
    device=device
)
</div></code></pre>
<h3 id="custom-token-importance">Custom Token Importance</h3>
<pre class="hljs"><code><div><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">custom_importance_function</span><span class="hljs-params">(model, tokenizer, text, device)</span>:</span>
    <span class="hljs-comment"># Implement your custom importance calculation</span>
    tokens, importances = compute_language_model_gradient_attribution(
        model, tokenizer, text, device
    )
    <span class="hljs-comment"># Apply your custom logic</span>
    <span class="hljs-keyword">return</span> modified_importances
</div></code></pre>
<h3 id="triplet-loss">Triplet Loss</h3>
<p>TIDPO includes triplet loss for enhanced training:</p>
<pre class="hljs"><code><div><span class="hljs-comment"># Triplet loss is automatically computed when alpha_triplet &gt; 0</span>
alpha_triplet: <span class="hljs-number">0.2</span>  <span class="hljs-comment"># Enable triplet loss</span>
</div></code></pre>
<h2 id="testing">Testing</h2>
<p>Run the comprehensive test suite:</p>
<pre class="hljs"><code><div><span class="hljs-comment"># Test gradient attribution</span>
python test_gradient_attribution.py

<span class="hljs-comment"># Test TIDPO functionality</span>
python test_tidpo.py

<span class="hljs-comment"># Test triplet loss</span>
python test_triplet_loss.py

<span class="hljs-comment"># Test batch processing</span>
python test_batch_size_fix.py

<span class="hljs-comment"># Debug batch issues</span>
python debug_batch_issue.py
</div></code></pre>
<h2 id="monitoring-and-debugging">Monitoring and Debugging</h2>
<h3 id="training-logs">Training Logs</h3>
<pre class="hljs"><code><div><span class="hljs-comment"># Monitor training progress</span>
tail -f .cache/your_experiment_name_*/train.log

<span class="hljs-comment"># Check GPU usage</span>
nvidia-smi -l 1
</div></code></pre>
<h3 id="debug-mode">Debug Mode</h3>
<pre class="hljs"><code><div><span class="hljs-comment"># Enable debug mode for detailed output</span>
python -u train.py ... debug=<span class="hljs-literal">true</span>
</div></code></pre>
<h3 id="common-issues">Common Issues</h3>
<h4 id="1-out-of-memory-oom">1. Out of Memory (OOM)</h4>
<p><strong>Symptoms</strong>: CUDA out of memory errors</p>
<p><strong>Solutions</strong>:</p>
<ul>
<li>Reduce batch size: <code>batch_size: 2</code></li>
<li>Enable gradient checkpointing: <code>activation_checkpointing: true</code></li>
<li>Use memory-optimized config: <code>config/config_memory_optimized.yaml</code></li>
<li>Increase gradient accumulation: <code>gradient_accumulation_steps: 4</code></li>
</ul>
<h4 id="2-gradient-attribution-failures">2. Gradient Attribution Failures</h4>
<p><strong>Symptoms</strong>: &quot;can't retain_grad on Tensor that has requires_grad=False&quot;</p>
<p><strong>Solutions</strong>:</p>
<ul>
<li>Ensure model supports <code>inputs_embeds</code></li>
<li>Check text length limits</li>
<li>Verify model is in training mode</li>
</ul>
<h4 id="3-nan-loss-values">3. NaN Loss Values</h4>
<p><strong>Symptoms</strong>: Loss becomes NaN during training</p>
<p><strong>Solutions</strong>:</p>
<ul>
<li>Use <code>float32</code> precision: <code>policy_dtype: float32</code></li>
<li>Reduce learning rate: <code>lr: 1e-6</code></li>
<li>Enable gradient clipping: <code>max_grad_norm: 1.0</code></li>
<li>Check data quality</li>
</ul>
<h4 id="4-empty-batches">4. Empty Batches</h4>
<p><strong>Symptoms</strong>: &quot;cannot reshape tensor of 0 elements&quot;</p>
<p><strong>Solutions</strong>:</p>
<ul>
<li>Increase batch size: <code>batch_size: 4</code></li>
<li>Check data preprocessing</li>
<li>Verify dataset loading</li>
</ul>
<h2 id="%F0%9F%93%8A-performance-optimization">📊 Performance Optimization</h2>
<h3 id="memory-optimization">Memory Optimization</h3>
<ol>
<li><strong>Gradient Checkpointing</strong>: Reduces memory usage by ~50%</li>
<li><strong>Mixed Precision</strong>: Use <code>float16</code> for faster training</li>
<li><strong>Batch Size Tuning</strong>: Balance memory and training stability</li>
<li><strong>Sequence Length</strong>: Reduce <code>max_length</code> for memory constraints</li>
</ol>
<h3 id="computational-optimization">Computational Optimization</h3>
<ol>
<li><strong>Gradient Attribution Caching</strong>: Cache importance scores</li>
<li><strong>Batch Processing</strong>: Process multiple samples together</li>
<li><strong>Parallel Computation</strong>: Use multiple GPUs if available</li>
</ol>
<h3 id="training-stability">Training Stability</h3>
<ol>
<li><strong>Learning Rate Scheduling</strong>: Use warmup and decay</li>
<li><strong>Gradient Clipping</strong>: Prevent gradient explosion</li>
<li><strong>Loss Monitoring</strong>: Track loss values for stability</li>
</ol>
<h2 id="%F0%9F%A4%9D-contributing">🤝 Contributing</h2>
<p>We welcome contributions! Please follow these steps:</p>
<ol>
<li>Fork the repository</li>
<li>Create a feature branch: <code>git checkout -b feature-name</code></li>
<li>Make your changes</li>
<li>Add tests for new functionality</li>
<li>Run the test suite: <code>python -m pytest tests/</code></li>
<li>Submit a pull request</li>
</ol>
<h3 id="development-setup">Development Setup</h3>
<pre class="hljs"><code><div><span class="hljs-comment"># Install development dependencies</span>
pip install -r requirements.txt

<span class="hljs-comment"># Run tests</span>
python -m pytest tests/

<span class="hljs-comment"># Run linting</span>
flake8 .

<span class="hljs-comment"># Run type checking</span>
mypy .
</div></code></pre>
<h2 id="%F0%9F%93%84-license">📄 License</h2>
<p>This project is licensed under the MIT License - see the <a href="LICENSE">LICENSE</a> file for details.</p>
<h2 id="%F0%9F%99%8F-acknowledgments">🙏 Acknowledgments</h2>
<ul>
<li>Original DPO implementation by <a href="https://github.com/eric-mitchell/direct-preference-optimization">Eric Mitchell</a></li>
<li>Hugging Face Transformers for model support</li>
<li>Anthropic for the HH-RLHF dataset</li>
</ul>
<hr>
<p><strong>Note</strong>: This is a research implementation. For production use, additional testing and optimization may be required.</p>

</body>

</html>