# BENFORD-QUANT: A BENFORD'S LAW-INSPIRED NON-UNIFORM QUANTIZER FOR EFFICIENT LANGUAGE MODELS

This repository contains the official implementation of the research paper "Benford-Quant". We investigate a novel non-uniform quantization method based on Benford's Law to efficiently compress Large Language Models (LLMs) and other deep neural networks.

## Abstract

The rapid growth of Large Language Models (LLMs) intensifies the need for effective compression, with weight quantization being the most widely adopted technique. Standard uniform quantizers assume that parameters are evenly distributed, an assumption at odds with the highly skewed distributions observed in practice. We propose Benford-Quant, a simple, data-free non-uniform quantizer inspired by Benford's Law, which predicts that leading digits follow a logarithmic distribution. Benford-Quant replaces the uniform grid with a log-spaced codebook, dedicating more resolution to the frequent small-magnitude weights. We provide both theoretical intuition and empirical evidence: (i) weights in transformer transformational layers adhere closely to Benford statistics, while normalization layers systematically deviate; (ii) on Small Language Models (SLMs), Benford-Quant consistently improves perplexity, reducing 4-bit perplexity on Gemma-270M by more than 10%; and (iii) on larger LLMs, it remains competitive, with differences explained by over-parameterization effects. Our results indicate that incorporating a Benford-inspired prior into quantization grids is a low-cost modification that yields accuracy gains in aggressive few-bit regimes.

## Project Structure

*   `/benford_quant`: Core source code for the library.
*   `/configs`: YAML configuration files to replicate experiments.
*   `/scripts`: High-level scripts to run analyses and benchmarks.
*   `/results`: Default output directory for plots, logs, and metrics.

## Installation

1.  Unzip the project file
    
2.  Install the dependencies:
    ```bash
    pip install -r requirements.txt
    ```
3.  Install the package in editable mode:
    ```bash
    pip install -e .
    ```

## Models and Benchmarks

This research focuses on a set of diverse and powerful open-source Large Language Models to validate our approach across different scales and architectures.

### Primary Models Investigated
*   **Bloom:** Open-source LLMs developed by bigscience initiative.
*   **OPT:** Open-source LLMs developed by Meta AI.
*   **Qwen:** Open-source LLMs developed by Alibaba Group.
*   **Gemma3:** Open-source LLMs developed by Google.

### Primary Benchmark
*   **Perplexity on WikiText-2:** We use perplexity on the standard WikiText-2 test set as our primary metric for evaluating language modeling performance degradation after quantization. This is a widely accepted benchmark for comparing quantization methods.

## Usage

### Research Questions

This repository is designed to investigate the following research questions:
*   **RQ1: Benford's Law Compliance.** Do the parameters and activations of large language models adhere to Benford's Law?
*   **RQ1.3: Per-Layer Variance.** How does this compliance vary across different layers and module types within the model?
*   **RQ2: Benford-Quant Efficacy.** Can a non-uniform, Benford-inspired quantization scheme outperform standard uniform methods?
*   **RQ3: Performance Benchmarks.** What is the performance (e.g., perplexity, downstream task accuracy) of a model quantized with Benford-Quant compared to other methods like RTN or GPTQ?

### Usage

#### Benford's Law Compliance Analysis
To analyze how well a model's weights and activations follow Benford's Law, use the `run_analysis.py` script. This script now produces a highly detailed, two-level analysis:
1.  **Aggregated Analysis:** A view of the entire model's compliance.
2.  **Per-Layer Analysis:** A granular breakdown of compliance for every individual parameter tensor.

```bash
python scripts/run_analysis.py --config_path configs/analysis_bloom-1b1.yml
```

#### Analysis of weight values

```bash
python scripts/run_value_analysis.py --config_path value_analysis_bloom-1b1.yml
```

#### Quantization Benchmark
To quantize a model using Benford-Quant (or other methods) and evaluate its perplexity, use the `run_quantization.py` script:
```bash
python scripts/run_quantization.py --config_path configs/quant_bloom1b1_benford.yml
```
You can compare different quantization methods by simply changing the `method` field in your configuration YAML file (e.g., from `benford-quant` to `uniform-rtn`).

## How It Works (Brief Explanation)

Benford-Quant leverages the insight that the distribution of parameters in many neural networks is logarithmic. Instead of spacing quantization levels uniformly across the entire range of values, our method creates non-uniform levels. These levels are clustered more densely around values that are predicted to be more common by Benford's Law (e.g., values starting with '1') and more sparsely for less common values (e.g., those starting with '9'). This allocates precision more efficiently, better preserving the model's performance after quantization.

--

A placeholder for the paper citation will be added here upon publication.
