# Test Scripts
The three subfolders — `AIME/`, `CharCount/`, and `Knowlogic/` — contain the evaluation code for the corresponding categories of datasets. The `AIME/` folder includes content for both AIME 2024 and AIME 2025, while the `CharCount/` folder contains both Chinese and English versions of the data and codes. All the code uses absolute paths, but these have been anonymized. Therefore, the `path/to` portions need to be modified according to your specific setup before the codes could be run.

For all datasets, `acc_and_length.py` is used to calculate changes in accuracy and output length before and after applying the MASK, corresponding to the results presented in Table 2 of the paper.  
`length_trend.py` is used to analyze how output length varies under different levels of bias deviation degree (high vs. low), which corresponds to part of the data shown in Table 1.

## CharCount
Under the `CharCount/` directory, there are a series of `.py` files used for generating model results. In addition, there are three subfolders: `words/`, `test/`, and `hiddenStates/`. The `words/` folder contains all the word data; the `test/` folder includes the code corresponding to the MASK method; and the `hiddenStates/` folder contains the code for plotting attention variations discussed in Section 5 of the paper.

### Generating Results
The code for generating model results includes:

- `dpsk_en.py`: Calls the API to generate English results for DeepSeek  
- `dpsk_zh.py`: Calls the API to generate Chinese results for DeepSeek  
- `generateQwenAns.py` and `generateQwenDirectAns.py`: Generate full Chinese results and direct answers for the Qwen R1-distilled model  
- `generateQwenENAns.py` and `generateQwenENDirectAns.py`: Generate full English results and direct answers for the Qwen R1-distilled model  
- `generateQwQAns.py` and `generateQwQDirectAns.py`: Generate full Chinese results and direct answers for QwQ  
- `generateQwQENAns.py`: Generates full English results and direct answers for QwQ  

Among these, only the QwQ English results and DeepSeek API calls can generate all results in a single script. For the others, generating the full results and direct answers are handled by separate scripts, and the full results must be generated first.

### `test/`
The scripts `en_mask.py` and `zh_mask.py` are used to generate MASK results for the English and Chinese versions, respectively. 

### `hiddenStates/`
The folder `greedyAnswers/` contains the model responses we sampled. The script `mask_or_not_reasoning.py` generates Figure 5 in the paper, and `draw_attention_bars.py` is used to generate Figure 4.

### Drawing Results
The scripts `draw_en.py` and `draw_zh.py` are used to generate the statistical plots presented in Section 4 of the paper. To use them, you need to specify which results to visualize by modifying the answer paths in the scripts.

### Bias Injection
`bias_injection.py` is the script for single-sample bias injection.

## Knowlogic
The subfolder `finaldata/` contains the processed Knowlogic dataset. The data has been reformatted for convenience, but the original content remains unchanged.

- `QwQ.py`: Used to generate the full outputs and direct answers for the QwQ model.  
- `dpsk.py`: Used to call the API and generate the full outputs and direct answers for DeepSeek-R1.  
- `generateAns.py`: Used to generate the full outputs for the R1-distilled model.  
- `generateDirectAns.py`: Used to generate the direct answers for the R1-distilled model.  
- `mask.py`: Used to generate the MASK results for the R1-distilled model.  
- `draw.py`: Used to generate the statistical plots presented in Section 4 of the paper.


## AIME
All contents in this folder include both 2024 and 2025 versions. Specifically:

- `aime2024/2025.py`: Generates full answers for the R1-distilled model  
- `direct2024/2025.py`: Generates direct answers for the R1-distilled model  
- `dpsk2024/2025.py`: Calls the API to generate full answers and direct answers for DeepSeek-R1  
- `qwq2024/2025.py`: Generates full answers and direct answers for the QwQ model  
- `mask_aime2024/2025.py`: Generates MASK results for the R1-distilled model

# MitigationTrials
See `MitigationTrials/README.md` for details.