# PeerQA Decontextualization Audit - All Methods Results

**Dataset**: Real PeerQA JSONL files
**Samples Processed**: 579
**Source**: 579 real Q&As, 90 papers

## Best Performing Configurations

| Retriever | Best Config | Recall@10 | MRR |
|-----------|-------------|-----------|-----|
| bm25 | paragraph/minimal | 0.011 | 0.015 |
| tfidf | paragraph/minimal | 0.009 | 0.013 |
| dense | sentence/minimal | 0.006 | 0.005 |
| colbert | paragraph/aggressive_title | 0.025 | 0.029 |

## Detailed Results by Configuration

### sentence/minimal

**Retrieval Performance:**
- bm25: Recall@10=0.006, MRR=0.006
- tfidf: Recall@10=0.006, MRR=0.005
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.003, MRR=0.004

**Downstream Tasks:**
- bm25: Accuracy=0.596, F1=0.675
- tfidf: Accuracy=0.611, F1=0.685
- dense: Accuracy=0.597, F1=0.677
- colbert: Accuracy=0.609, F1=0.683
- cross_encoder: Accuracy=0.637, F1=0.703

### sentence/title_only

**Retrieval Performance:**
- bm25: Recall@10=0.006, MRR=0.005
- tfidf: Recall@10=0.006, MRR=0.005
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.003, MRR=0.005

**Downstream Tasks:**
- bm25: Accuracy=0.620, F1=0.694
- tfidf: Accuracy=0.646, F1=0.710
- dense: Accuracy=0.628, F1=0.699
- colbert: Accuracy=0.646, F1=0.711
- cross_encoder: Accuracy=0.646, F1=0.713

### sentence/heading_only

**Retrieval Performance:**
- bm25: Recall@10=0.006, MRR=0.005
- tfidf: Recall@10=0.006, MRR=0.005
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.003, MRR=0.004

**Downstream Tasks:**
- bm25: Accuracy=0.621, F1=0.692
- tfidf: Accuracy=0.618, F1=0.691
- dense: Accuracy=0.617, F1=0.689
- colbert: Accuracy=0.612, F1=0.687
- cross_encoder: Accuracy=0.614, F1=0.687

### sentence/title_heading

**Retrieval Performance:**
- bm25: Recall@10=0.006, MRR=0.005
- tfidf: Recall@10=0.006, MRR=0.005
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.003, MRR=0.005

**Downstream Tasks:**
- bm25: Accuracy=0.630, F1=0.700
- tfidf: Accuracy=0.636, F1=0.704
- dense: Accuracy=0.614, F1=0.690
- colbert: Accuracy=0.630, F1=0.701
- cross_encoder: Accuracy=0.636, F1=0.705

### sentence/aggressive_title

**Retrieval Performance:**
- bm25: Recall@10=0.006, MRR=0.005
- tfidf: Recall@10=0.006, MRR=0.005
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.003, MRR=0.005

**Downstream Tasks:**
- bm25: Accuracy=0.630, F1=0.701
- tfidf: Accuracy=0.600, F1=0.680
- dense: Accuracy=0.631, F1=0.703
- colbert: Accuracy=0.637, F1=0.706
- cross_encoder: Accuracy=0.625, F1=0.698

### paragraph/minimal

**Retrieval Performance:**
- bm25: Recall@10=0.011, MRR=0.015
- tfidf: Recall@10=0.009, MRR=0.013
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.024, MRR=0.029

**Downstream Tasks:**
- bm25: Accuracy=0.603, F1=0.681
- tfidf: Accuracy=0.624, F1=0.696
- dense: Accuracy=0.593, F1=0.669
- colbert: Accuracy=0.621, F1=0.693
- cross_encoder: Accuracy=0.608, F1=0.684

### paragraph/title_only

**Retrieval Performance:**
- bm25: Recall@10=0.009, MRR=0.013
- tfidf: Recall@10=0.004, MRR=0.007
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.009, MRR=0.014

**Downstream Tasks:**
- bm25: Accuracy=0.639, F1=0.706
- tfidf: Accuracy=0.602, F1=0.679
- dense: Accuracy=0.628, F1=0.698
- colbert: Accuracy=0.625, F1=0.695
- cross_encoder: Accuracy=0.624, F1=0.696

### paragraph/heading_only

**Retrieval Performance:**
- bm25: Recall@10=0.009, MRR=0.013
- tfidf: Recall@10=0.004, MRR=0.007
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.009, MRR=0.014

**Downstream Tasks:**
- bm25: Accuracy=0.622, F1=0.694
- tfidf: Accuracy=0.602, F1=0.681
- dense: Accuracy=0.609, F1=0.684
- colbert: Accuracy=0.625, F1=0.698
- cross_encoder: Accuracy=0.628, F1=0.700

### paragraph/title_heading

**Retrieval Performance:**
- bm25: Recall@10=0.007, MRR=0.010
- tfidf: Recall@10=0.002, MRR=0.005
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.005, MRR=0.007

**Downstream Tasks:**
- bm25: Accuracy=0.644, F1=0.711
- tfidf: Accuracy=0.644, F1=0.712
- dense: Accuracy=0.653, F1=0.718
- colbert: Accuracy=0.644, F1=0.711
- cross_encoder: Accuracy=0.637, F1=0.703

### paragraph/aggressive_title

**Retrieval Performance:**
- bm25: Recall@10=0.010, MRR=0.014
- tfidf: Recall@10=0.009, MRR=0.012
- dense: Recall@10=0.006, MRR=0.005
- colbert: Recall@10=0.025, MRR=0.029

**Downstream Tasks:**
- bm25: Accuracy=0.617, F1=0.690
- tfidf: Accuracy=0.642, F1=0.710
- dense: Accuracy=0.634, F1=0.703
- colbert: Accuracy=0.634, F1=0.705
- cross_encoder: Accuracy=0.625, F1=0.696

