Step-wise Confidence Estimation with NIBS and GIBS
This repository provides a framework for generating, evaluating, and attributing step-wise confidence scores to the reasoning traces of Large Language Models (LLMs). It includes implementations of two Information Bottleneck (IB) based methods:

NIBS (Non-parametric Information Bottleneck Selection): A non-parametric method that directly computes confidence based on consensus among correct reasoning traces.

GIBS (Graph Information Bottleneck Selection): A trainable method that learns to select important reasoning steps by modeling structural dependencies in the reasoning graph.

Requirements
Python 3.9+

PyTorch, Transformers, VLLM

Workflow and Usage
The end-to-end workflow consists of four main stages: Data Generation, Evaluation, and running the NIBS/GIBS methods.

Step 1: Generate LLM Reasoning Traces
First, we need to generate a large set of reasoning traces (responses) from a base LLM for a given dataset. This is done using a VLLM-optimized generation script.

Step 2: Preprocess Traces into Graphs and Embeddings
The generated text-based reasoning traces need to be converted into a structured format for our methods.

Step 3: Evaluate Answer Correctness
After generating the responses, we use a powerful LLM (e.g., GPT-4) as an evaluator to label whether the final answer of each reasoning trace is correct.

Step 4: Run Confidence Estimation Methods
With the labeled data and preprocessed graphs, you can now run the baseline models and the main NIBS/GIBS methods.

4.1. White-Box Baselines (Optional)
You can run white-box baseline methods that use model-internal states (e.g., token logits) to estimate confidence.

4.2. Running NIBS
NIBS is non-parametric and does not require model training. It computes confidence scores by comparing each reasoning trace against a consensus graph constructed from all correct traces.

4.3. Running GIBS
GIBS is a trainable model. The process involves a training step followed by an evaluation step.

(1) Train the GIBS Model:
(2) Evaluate the GIBS Model: Once the model is trained, you can run it on a test set to get the final step-wise confidence scores.