<!DOCTYPE html>
<html>

<head>
    <title>readme.md</title>
    <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
    
<style>
/* https://github.com/microsoft/vscode/blob/master/extensions/markdown-language-features/media/markdown.css */
/*---------------------------------------------------------------------------------------------
 *  Copyright (c) Microsoft Corporation. All rights reserved.
 *  Licensed under the MIT License. See License.txt in the project root for license information.
 *--------------------------------------------------------------------------------------------*/

body {
	font-family: var(--vscode-markdown-font-family, -apple-system, BlinkMacSystemFont, "Segoe WPC", "Segoe UI", "Ubuntu", "Droid Sans", sans-serif);
	font-size: var(--vscode-markdown-font-size, 14px);
	padding: 0 26px;
	line-height: var(--vscode-markdown-line-height, 22px);
	word-wrap: break-word;
}

html,footer,header{
	font-family: var(--vscode-markdown-font-family, -apple-system, BlinkMacSystemFont, "Segoe WPC", "Segoe UI", "Ubuntu", "Droid Sans", sans-serif);
	font-size: var(--vscode-markdown-font-size, 14px);
}

#code-csp-warning {
	position: fixed;
	top: 0;
	right: 0;
	color: white;
	margin: 16px;
	text-align: center;
	font-size: 12px;
	font-family: sans-serif;
	background-color:#444444;
	cursor: pointer;
	padding: 6px;
	box-shadow: 1px 1px 1px rgba(0,0,0,.25);
}

#code-csp-warning:hover {
	text-decoration: none;
	background-color:#007acc;
	box-shadow: 2px 2px 2px rgba(0,0,0,.25);
}

body.scrollBeyondLastLine {
	margin-bottom: calc(100vh - 22px);
}

body.showEditorSelection .code-line {
	position: relative;
}

body.showEditorSelection .code-active-line:before,
body.showEditorSelection .code-line:hover:before {
	content: "";
	display: block;
	position: absolute;
	top: 0;
	left: -12px;
	height: 100%;
}

body.showEditorSelection li.code-active-line:before,
body.showEditorSelection li.code-line:hover:before {
	left: -30px;
}

.vscode-light.showEditorSelection .code-active-line:before {
	border-left: 3px solid rgba(0, 0, 0, 0.15);
}

.vscode-light.showEditorSelection .code-line:hover:before {
	border-left: 3px solid rgba(0, 0, 0, 0.40);
}

.vscode-light.showEditorSelection .code-line .code-line:hover:before {
	border-left: none;
}

.vscode-dark.showEditorSelection .code-active-line:before {
	border-left: 3px solid rgba(255, 255, 255, 0.4);
}

.vscode-dark.showEditorSelection .code-line:hover:before {
	border-left: 3px solid rgba(255, 255, 255, 0.60);
}

.vscode-dark.showEditorSelection .code-line .code-line:hover:before {
	border-left: none;
}

.vscode-high-contrast.showEditorSelection .code-active-line:before {
	border-left: 3px solid rgba(255, 160, 0, 0.7);
}

.vscode-high-contrast.showEditorSelection .code-line:hover:before {
	border-left: 3px solid rgba(255, 160, 0, 1);
}

.vscode-high-contrast.showEditorSelection .code-line .code-line:hover:before {
	border-left: none;
}

img {
	max-width: 100%;
	max-height: 100%;
}

a {
	text-decoration: none;
}

a:hover {
	text-decoration: underline;
}

a:focus,
input:focus,
select:focus,
textarea:focus {
	outline: 1px solid -webkit-focus-ring-color;
	outline-offset: -1px;
}

hr {
	border: 0;
	height: 2px;
	border-bottom: 2px solid;
}

h1 {
	padding-bottom: 0.3em;
	line-height: 1.2;
	border-bottom-width: 1px;
	border-bottom-style: solid;
}

h1, h2, h3 {
	font-weight: normal;
}

table {
	border-collapse: collapse;
}

table > thead > tr > th {
	text-align: left;
	border-bottom: 1px solid;
}

table > thead > tr > th,
table > thead > tr > td,
table > tbody > tr > th,
table > tbody > tr > td {
	padding: 5px 10px;
}

table > tbody > tr + tr > td {
	border-top: 1px solid;
}

blockquote {
	margin: 0 7px 0 5px;
	padding: 0 16px 0 10px;
	border-left-width: 5px;
	border-left-style: solid;
}

code {
	font-family: Menlo, Monaco, Consolas, "Droid Sans Mono", "Courier New", monospace, "Droid Sans Fallback";
	font-size: 1em;
	line-height: 1.357em;
}

body.wordWrap pre {
	white-space: pre-wrap;
}

pre:not(.hljs),
pre.hljs code > div {
	padding: 16px;
	border-radius: 3px;
	overflow: auto;
}

pre code {
	color: var(--vscode-editor-foreground);
	tab-size: 4;
}

/** Theming */

.vscode-light pre {
	background-color: rgba(220, 220, 220, 0.4);
}

.vscode-dark pre {
	background-color: rgba(10, 10, 10, 0.4);
}

.vscode-high-contrast pre {
	background-color: rgb(0, 0, 0);
}

.vscode-high-contrast h1 {
	border-color: rgb(0, 0, 0);
}

.vscode-light table > thead > tr > th {
	border-color: rgba(0, 0, 0, 0.69);
}

.vscode-dark table > thead > tr > th {
	border-color: rgba(255, 255, 255, 0.69);
}

.vscode-light h1,
.vscode-light hr,
.vscode-light table > tbody > tr + tr > td {
	border-color: rgba(0, 0, 0, 0.18);
}

.vscode-dark h1,
.vscode-dark hr,
.vscode-dark table > tbody > tr + tr > td {
	border-color: rgba(255, 255, 255, 0.18);
}

</style>

<style>
/* Tomorrow Theme */
/* http://jmblog.github.com/color-themes-for-google-code-highlightjs */
/* Original theme - https://github.com/chriskempson/tomorrow-theme */

/* Tomorrow Comment */
.hljs-comment,
.hljs-quote {
	color: #8e908c;
}

/* Tomorrow Red */
.hljs-variable,
.hljs-template-variable,
.hljs-tag,
.hljs-name,
.hljs-selector-id,
.hljs-selector-class,
.hljs-regexp,
.hljs-deletion {
	color: #c82829;
}

/* Tomorrow Orange */
.hljs-number,
.hljs-built_in,
.hljs-builtin-name,
.hljs-literal,
.hljs-type,
.hljs-params,
.hljs-meta,
.hljs-link {
	color: #f5871f;
}

/* Tomorrow Yellow */
.hljs-attribute {
	color: #eab700;
}

/* Tomorrow Green */
.hljs-string,
.hljs-symbol,
.hljs-bullet,
.hljs-addition {
	color: #718c00;
}

/* Tomorrow Blue */
.hljs-title,
.hljs-section {
	color: #4271ae;
}

/* Tomorrow Purple */
.hljs-keyword,
.hljs-selector-tag {
	color: #8959a8;
}

.hljs {
	display: block;
	overflow-x: auto;
	color: #4d4d4c;
	padding: 0.5em;
}

.hljs-emphasis {
	font-style: italic;
}

.hljs-strong {
	font-weight: bold;
}
</style>

<style>
/*
 * Custom MD PDF CSS
 */
html,footer,header{
	font-family: -apple-system, BlinkMacSystemFont, "Segoe WPC", "Segoe UI", "Ubuntu", "Droid Sans", sans-serif, "Meiryo";

 }
body {
	font-family: -apple-system, BlinkMacSystemFont, "Segoe WPC", "Segoe UI", "Ubuntu", "Droid Sans", sans-serif, "Meiryo";
	padding: 0 12px;
}

pre {
	background-color: #f8f8f8;
	border: 1px solid #cccccc;
	border-radius: 3px;
	overflow-x: auto;
	white-space: pre-wrap;
	overflow-wrap: break-word;
}

pre:not(.hljs) {
	padding: 23px;
	line-height: 19px;
}

blockquote {
	background: rgba(127, 127, 127, 0.1);
	border-color: rgba(0, 122, 204, 0.5);
}

.emoji {
	height: 1.4em;
}

code {
	font-size: 14px;
	line-height: 19px;
}

/* for inline code */
:not(pre):not(.hljs) > code {
	color: #C9AE75; /* Change the old color so it seems less like an error */
	font-size: inherit;
}

/* Page Break : use <div class="page"/> to insert page break
-------------------------------------------------------- */
.page {
	page-break-after: always;
}

</style>
<link rel="stylesheet" href="file:///r%3A/2.Travail/1.Enseignement/Cours/_1.Outils/2.Developpement/1.SCSS/main.css" type="text/css"><link rel="stylesheet" href="file:///d%3A/rdaros/Cours/_1.Outils/2.Developpement/1.SCSS/main.css" type="text/css">
</head>

<body>
    <h1 id="tidpo-token-importance-detection--prediction-optimization-visualization-system">TIDPO (Token Importance Detection &amp; Prediction Optimization) Visualization System</h1>
<p>TIDPO demo system is a beautiful interactive web application for visualizing the importance scores of various tokens (tokens/word elements) in text. The system is based on gradient attribution technology, intuitively displaying the degree of influence of each token on the model's prediction results, helping to understand and interpret the model's decision-making process.</p>
<p><img src="https://via.placeholder.com/800x450/4285F4/FFFFFF?text=TIDPO+System+Interface+Example" alt="TIDPO System Interface Example"></p>
<h2 id="features">Features</h2>
<h3 id="core-features">Core Features</h3>
<ul>
<li><strong>Text Analysis and Importance Calculation</strong>: Based on gradient attribution technology, calculates the importance score of each token to the model's prediction</li>
<li><strong>Multiple Intuitive Visualization Methods</strong>:
<ul>
<li>Heatmap mode: Highlights token importance with gradient colors</li>
<li>Dual-color contrast mode: Uses different colors to distinguish positive and negative influences</li>
</ul>
</li>
<li><strong>Detailed Data Display</strong>: Includes complete token decomposition and corresponding importance score tables</li>
<li><strong>Interactive Charts</strong>: Uses Chart.js to dynamically display token importance distribution</li>
<li><strong>Multi-model Support</strong>: Integrates various pre-trained language models, including:
<ul>
<li>BERT (English/Chinese)</li>
<li>DistilBERT</li>
<li>RoBERTa</li>
<li>XLM-RoBERTa (Multilingual)</li>
<li>Pythia-2.8B (Local Model)</li>
</ul>
</li>
</ul>
<h3 id="user-interface-features">User Interface Features</h3>
<ul>
<li><strong>Intuitive and Friendly Interactive Interface</strong>: Clear layout and operation flow</li>
<li><strong>Responsive Design</strong>: Adapts to different screen sizes and devices</li>
<li><strong>Beautiful Visualization Effects</strong>: Professional color schemes and layouts</li>
<li><strong>Real-time Feedback</strong>: Loading indicators and analysis status prompts</li>
<li><strong>Flexible Label Selection</strong>: Can specify target labels or use model predictions</li>
</ul>
<h2 id="technical-architecture">Technical Architecture</h2>
<h3 id="backend-technology">Backend Technology</h3>
<ul>
<li><strong>Python 3.8+</strong>: Core development language</li>
<li><strong>Flask</strong>: Lightweight web framework</li>
<li><strong>PyTorch</strong>: Deep learning framework for model loading and gradient calculation</li>
<li><strong>Transformers</strong>: Hugging Face's model library, providing pre-trained model interfaces</li>
<li><strong>Pandas</strong>: Data processing and analysis</li>
</ul>
<h3 id="frontend-technology">Frontend Technology</h3>
<ul>
<li><strong>HTML5/CSS3</strong>: Page structure and styling</li>
<li><strong>JavaScript (ES6+)</strong>: Interactive logic and dynamic content updates</li>
<li><strong>Bootstrap 5</strong>: Responsive layout and components</li>
<li><strong>Chart.js</strong>: Data visualization and charts</li>
<li><strong>Font Awesome</strong>: Icon library</li>
</ul>
<h3 id="key-modules">Key Modules</h3>
<ul>
<li><strong>Gradient Attribution Engine</strong>: Core algorithm implementation for calculating token importance</li>
<li><strong>Web API</strong>: Provides RESTful interfaces supporting text analysis requests</li>
<li><strong>Visualization Renderer</strong>: Generates intuitive highlight display effects</li>
<li><strong>Model Manager</strong>: Handles model loading, switching, and resource management</li>
</ul>
<h2 id="file-structure">File Structure</h2>
<pre class="hljs"><code><div>TIDPO/
├── app.py                    # Flask application main file, handles web requests and routing
├── gradient_attribution.py   # Gradient attribution core algorithm implementation
├── run_visualization.py      # Convenient startup script
├── download_LLM.py           # Script for downloading and saving pre-trained models
├── token_importances.tsv     # Example output file (optional)
├── readme.md                 # Project documentation
├── static/                   # Static resources folder
│   ├── css/
│   │   └── styles.css        # Stylesheet
│   └── js/
│       └── script.js         # Frontend interaction script
├── templates/
│   └── index.html            # Main page template
└── pythia-2.8b/              # Local pre-trained model (generated after download)
    ├── config.json
    ├── tokenizer.json
    └── ...                   # Other model files
</div></code></pre>
<h2 id="installation-guide">Installation Guide</h2>
<h3 id="environment-requirements">Environment Requirements</h3>
<ul>
<li>Python 3.8 or higher</li>
<li>At least 8GB RAM (16GB+ recommended)</li>
<li>NVIDIA GPU with CUDA support (optional, but significantly improves performance)</li>
</ul>
<h3 id="dependency-installation">Dependency Installation</h3>
<ol>
<li>
<p>Clone or download this project to local:</p>
<pre class="hljs"><code><div>git <span class="hljs-built_in">clone</span> https://github.com/username/TIDPO.git
<span class="hljs-built_in">cd</span> TIDPO
</div></code></pre>
</li>
<li>
<p>Install required Python dependency packages:</p>
<pre class="hljs"><code><div>pip install flask torch transformers pandas
</div></code></pre>
</li>
<li>
<p>(Optional) Download local model:</p>
<pre class="hljs"><code><div>python download_LLM.py
</div></code></pre>
<p>Note: The Pythia-2.8B model is about 5.5GB and may take some time to download.</p>
</li>
</ol>
<h2 id="usage-guide">Usage Guide</h2>
<h3 id="starting-the-application">Starting the Application</h3>
<ol>
<li>
<p>Run the convenient startup script:</p>
<pre class="hljs"><code><div>python run_visualization.py
</div></code></pre>
</li>
<li>
<p>The browser will automatically open http://localhost:5000, displaying the application interface.</p>
</li>
<li>
<p>You can also start manually:</p>
<pre class="hljs"><code><div>python app.py
</div></code></pre>
<p>Then visit http://localhost:5000 in your browser</p>
</li>
</ol>
<h3 id="usage-process">Usage Process</h3>
<ol>
<li><strong>Input Text</strong>: Enter the text content you want to analyze in the text box</li>
<li><strong>Select Model</strong>: Choose a pre-trained model from the dropdown menu</li>
<li><strong>(Optional) Specify Label</strong>: If you need to analyze a specific label, you can enter the target label index</li>
<li><strong>Click Analyze</strong>: Click the &quot;Analyze Text&quot; button to start processing</li>
<li><strong>View Results</strong>: The system will display:
<ul>
<li>Highlighted text with color depth indicating importance</li>
<li>Detailed score table for each token</li>
<li>Importance distribution chart</li>
<li>Analysis-related information</li>
</ul>
</li>
</ol>
<h3 id="visualization-mode-switching">Visualization Mode Switching</h3>
<ul>
<li>Click &quot;Heatmap&quot; button: Display importance using color depth gradients</li>
<li>Click &quot;Dual-color Contrast&quot; button: Divide tokens into positive influence (green) and negative influence (red)</li>
</ul>
<h2 id="how-it-works">How It Works</h2>
<p>The TIDPO system is based on gradient attribution technology, which quantifies the importance of each input token to prediction results by calculating the gradient of model output relative to input embeddings. The workflow is as follows:</p>
<ol>
<li><strong>Text Processing</strong>: Convert input text to token IDs through a tokenizer</li>
<li><strong>Model Forward Propagation</strong>: Calculate the model's prediction results for the input</li>
<li><strong>Gradient Calculation</strong>: Through backpropagation, calculate the gradient of output (logits) to input embeddings</li>
<li><strong>Importance Scoring</strong>: Calculate the L2 norm of the gradient corresponding to each token as an importance indicator</li>
<li><strong>Normalization</strong>: Map scores to 0-1 range for easy visualization</li>
<li><strong>Visualization Presentation</strong>: Display the importance of each token with different color depths based on normalized scores</li>
</ol>
<h2 id="extension-and-customization">Extension and Customization</h2>
<h3 id="adding-new-models">Adding New Models</h3>
<p>Add new models in the <code>get_available_models</code> function in the <code>app.py</code> file:</p>
<pre class="hljs"><code><div>models = [
    <span class="hljs-comment"># Existing models...</span>
    {<span class="hljs-string">"id"</span>: <span class="hljs-string">"new_model_name_or_path"</span>, <span class="hljs-string">"name"</span>: <span class="hljs-string">"Model Display Name"</span>}
]
</div></code></pre>
<h3 id="customizing-visualization-styles">Customizing Visualization Styles</h3>
<p>Modify the relevant style definitions in the <code>static/css/styles.css</code> file:</p>
<pre class="hljs"><code><div><span class="hljs-comment">/* Heatmap color modification example */</span>
<span class="hljs-selector-class">.legend-gradient</span> {
    <span class="hljs-attribute">background</span>: <span class="hljs-built_in">linear-gradient</span>(to right, custom_start_color, custom_end_color);
}
</div></code></pre>
<h3 id="adjusting-analysis-parameters">Adjusting Analysis Parameters</h3>
<p>To adjust analysis parameters, modify the <code>compute_gradient_attribution</code> function in <code>gradient_attribution.py</code>.</p>
<h2 id="common-questions">Common Questions</h2>
<h3 id="why-is-the-analysis-speed-slow">Why is the analysis speed slow?</h3>
<p>Model loading and gradient calculation are time-consuming, especially when running on CPU. The first model load takes more time, but subsequent analyses will be faster. Using GPU can significantly improve performance.</p>
<h3 id="how-to-handle-long-text">How to handle long text?</h3>
<p>The current system truncates long text (maximum length is the upper limit supported by the model, usually 512 tokens). For longer text analysis, consider segmenting the text.</p>
<h3 id="can-i-add-custom-trained-models">Can I add custom trained models?</h3>
<p>Yes. Save custom models in Hugging Face Transformers format, then add the local path to the model list.</p>
<h2 id="future-improvement-directions">Future Improvement Directions</h2>
<ul>
<li>Support more types of attribution methods (such as Integrated Gradients, LIME, etc.)</li>
<li>Add batch processing functionality to support simultaneous analysis of multiple texts</li>
<li>Enhance visualization effects with diverse displays like word clouds, heatmaps, etc.</li>
<li>Provide model comparison functionality to compare analysis results of different models on the same text</li>
<li>Add analysis result export functionality (CSV, JSON, PDF formats)</li>
</ul>
<h2 id="license">License</h2>
<p>This project is licensed under the MIT License. See the <a href="LICENSE">LICENSE</a> file for details.</p>
<h2 id="acknowledgments">Acknowledgments</h2>
<ul>
<li><a href="https://github.com/huggingface/transformers">Hugging Face Transformers</a> - Provides pre-trained models and tool libraries</li>
<li><a href="https://pytorch.org/">PyTorch</a> - Deep learning framework</li>
<li><a href="https://flask.palletsprojects.com/">Flask</a> - Python web framework</li>
<li><a href="https://getbootstrap.com/">Bootstrap</a> - Frontend component library</li>
<li><a href="https://www.chartjs.org/">Chart.js</a> - JavaScript chart library</li>
</ul>

</body>

</html>