# Code for HE-SNR and Data Curation

This folder contains the core implementation of the algorithms described in the paper.

## File Description

- **`metric_calculation.py`**: Implements the Top-K Truncated Entropy calculation and HE-SNR metric formulation.
- **`data_filtering.py`**: Implements the rigorous AST-based data filtering strategy to isolate valid signals from noise.
- **`utils.py`**: Contains helper functions for tokenization, index extraction, and log-probability processing.

## Dependencies
See `requirements.txt`.