Shapley NEAR: Norm-based Attention-wise Usable Information in LLMs

This repository contains the official code for the paper "Fact or Hallucination? A Shapley-Based Analysis of Norm-wise Attention Usable Information in LLMs". Our method, Shapley NEAR, detects hallucinations in LLM outputs by attributing entropy-based information gain across all attention layers and heads using Shapley values.

Overview

Large Language Models (LLMs) often generate fluent but hallucinated responses. Shapley NEAR provides:

- A layer-wise and head-wise decomposition of usable information.
- A Shapley-based framework for token-level confidence scoring.
- Detection and attribution of both parametric and context-induced hallucinations.
- A test-time head clipping mechanism for mitigating hallucinations.

Files

| File/Folder               | Description                                                                 |
|---------------------------|-----------------------------------------------------------------------------|
| `train.py`               | Optional fine-tuning script if using hallucination-labeled datasets.        |
| `eval.py`                | Main script to compute NEAR scores on QA datasets.                          |
| `shapley_near/`          | Module containing attribution and model loading utilities.                  |
| `run_all.sh`             | Bash script to run evaluations on COQA using LLaMA-3.1-8B.                  |
| `README.txt`             | Placeholder for figures and diagrams used in the paper.                     |
| `__init__.py`            | Module initialization with imports.                                         |
| `load_model.py`          | Utility function to load models and tokenizers.                             |



Requirements

Install dependencies via:

bash
pip install -r requirements.txt
