# TIDPO (Token Importance Detection & Prediction Optimization) Visualization System

TIDPO demo system is a beautiful interactive web application for visualizing the importance scores of various tokens (tokens/word elements) in text. The system is based on gradient attribution technology, intuitively displaying the degree of influence of each token on the model's prediction results, helping to understand and interpret the model's decision-making process.

![TIDPO System Interface Example](https://via.placeholder.com/800x450/4285F4/FFFFFF?text=TIDPO+System+Interface+Example)

## Features

### Core Features

- **Text Analysis and Importance Calculation**: Based on gradient attribution technology, calculates the importance score of each token to the model's prediction
- **Multiple Intuitive Visualization Methods**:
  - Heatmap mode: Highlights token importance with gradient colors
  - Dual-color contrast mode: Uses different colors to distinguish positive and negative influences
- **Detailed Data Display**: Includes complete token decomposition and corresponding importance score tables
- **Interactive Charts**: Uses Chart.js to dynamically display token importance distribution
- **Multi-model Support**: Integrates various pre-trained language models, including:
  - BERT (English/Chinese)
  - DistilBERT
  - RoBERTa
  - XLM-RoBERTa (Multilingual)
  - Pythia-2.8B (Local Model)

### User Interface Features

- **Intuitive and Friendly Interactive Interface**: Clear layout and operation flow
- **Responsive Design**: Adapts to different screen sizes and devices
- **Beautiful Visualization Effects**: Professional color schemes and layouts
- **Real-time Feedback**: Loading indicators and analysis status prompts
- **Flexible Label Selection**: Can specify target labels or use model predictions

## Technical Architecture

### Backend Technology

- **Python 3.8+**: Core development language
- **Flask**: Lightweight web framework
- **PyTorch**: Deep learning framework for model loading and gradient calculation
- **Transformers**: Hugging Face's model library, providing pre-trained model interfaces
- **Pandas**: Data processing and analysis

### Frontend Technology

- **HTML5/CSS3**: Page structure and styling
- **JavaScript (ES6+)**: Interactive logic and dynamic content updates
- **Bootstrap 5**: Responsive layout and components
- **Chart.js**: Data visualization and charts
- **Font Awesome**: Icon library

### Key Modules

- **Gradient Attribution Engine**: Core algorithm implementation for calculating token importance
- **Web API**: Provides RESTful interfaces supporting text analysis requests
- **Visualization Renderer**: Generates intuitive highlight display effects
- **Model Manager**: Handles model loading, switching, and resource management

## File Structure

```
TIDPO/
├── app.py                    # Flask application main file, handles web requests and routing
├── gradient_attribution.py   # Gradient attribution core algorithm implementation
├── run_visualization.py      # Convenient startup script
├── download_LLM.py           # Script for downloading and saving pre-trained models
├── token_importances.tsv     # Example output file (optional)
├── readme.md                 # Project documentation
├── static/                   # Static resources folder
│   ├── css/
│   │   └── styles.css        # Stylesheet
│   └── js/
│       └── script.js         # Frontend interaction script
├── templates/
│   └── index.html            # Main page template
└── pythia-2.8b/              # Local pre-trained model (generated after download)
    ├── config.json
    ├── tokenizer.json
    └── ...                   # Other model files
```

## Installation Guide

### Environment Requirements

- Python 3.8 or higher
- At least 8GB RAM (16GB+ recommended)
- NVIDIA GPU with CUDA support (optional, but significantly improves performance)

### Dependency Installation

1. Clone or download this project to local:
   ```bash
   git clone https://github.com/username/TIDPO.git
   cd TIDPO
   ```

2. Install required Python dependency packages:
   ```bash
   pip install flask torch transformers pandas
   ```

3. (Optional) Download local model:
   ```bash
   python download_LLM.py
   ```
   Note: The Pythia-2.8B model is about 5.5GB and may take some time to download.

## Usage Guide

### Starting the Application

1. Run the convenient startup script:
   ```bash
   python run_visualization.py
   ```

2. The browser will automatically open http://localhost:5000, displaying the application interface.

3. You can also start manually:
   ```bash
   python app.py
   ```
   Then visit http://localhost:5000 in your browser

### Usage Process

1. **Input Text**: Enter the text content you want to analyze in the text box
2. **Select Model**: Choose a pre-trained model from the dropdown menu
3. **(Optional) Specify Label**: If you need to analyze a specific label, you can enter the target label index
4. **Click Analyze**: Click the "Analyze Text" button to start processing
5. **View Results**: The system will display:
   - Highlighted text with color depth indicating importance
   - Detailed score table for each token
   - Importance distribution chart
   - Analysis-related information

### Visualization Mode Switching

- Click "Heatmap" button: Display importance using color depth gradients
- Click "Dual-color Contrast" button: Divide tokens into positive influence (green) and negative influence (red)

## How It Works

The TIDPO system is based on gradient attribution technology, which quantifies the importance of each input token to prediction results by calculating the gradient of model output relative to input embeddings. The workflow is as follows:

1. **Text Processing**: Convert input text to token IDs through a tokenizer
2. **Model Forward Propagation**: Calculate the model's prediction results for the input
3. **Gradient Calculation**: Through backpropagation, calculate the gradient of output (logits) to input embeddings
4. **Importance Scoring**: Calculate the L2 norm of the gradient corresponding to each token as an importance indicator
5. **Normalization**: Map scores to 0-1 range for easy visualization
6. **Visualization Presentation**: Display the importance of each token with different color depths based on normalized scores

## Extension and Customization

### Adding New Models

Add new models in the `get_available_models` function in the `app.py` file:

```python
models = [
    # Existing models...
    {"id": "new_model_name_or_path", "name": "Model Display Name"}
]
```

### Customizing Visualization Styles

Modify the relevant style definitions in the `static/css/styles.css` file:

```css
/* Heatmap color modification example */
.legend-gradient {
    background: linear-gradient(to right, custom_start_color, custom_end_color);
}
```

### Adjusting Analysis Parameters

To adjust analysis parameters, modify the `compute_gradient_attribution` function in `gradient_attribution.py`.

## Common Questions

### Why is the analysis speed slow?

Model loading and gradient calculation are time-consuming, especially when running on CPU. The first model load takes more time, but subsequent analyses will be faster. Using GPU can significantly improve performance.

### How to handle long text?

The current system truncates long text (maximum length is the upper limit supported by the model, usually 512 tokens). For longer text analysis, consider segmenting the text.

### Can I add custom trained models?

Yes. Save custom models in Hugging Face Transformers format, then add the local path to the model list.

## Future Improvement Directions

- Support more types of attribution methods (such as Integrated Gradients, LIME, etc.)
- Add batch processing functionality to support simultaneous analysis of multiple texts
- Enhance visualization effects with diverse displays like word clouds, heatmaps, etc.
- Provide model comparison functionality to compare analysis results of different models on the same text
- Add analysis result export functionality (CSV, JSON, PDF formats)

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Acknowledgments

- [Hugging Face Transformers](https://github.com/huggingface/transformers) - Provides pre-trained models and tool libraries
- [PyTorch](https://pytorch.org/) - Deep learning framework
- [Flask](https://flask.palletsprojects.com/) - Python web framework
- [Bootstrap](https://getbootstrap.com/) - Frontend component library
- [Chart.js](https://www.chartjs.org/) - JavaScript chart library



