# Temporal bias metrics: tool for measuring temporal bias in labels distribution

This repository contains the data and code used in the paper "Temporal Misinformation Detection: Simple Ways  to Improve Temporal Generalization and Better Evaluate Language Models".

## Data: FC30 dataset

Data is composed of 36,619 claims collected from PolitiFact (https://www.politifact.com/) and Snopes (https://www.snopes.com/), covering a time period of approximatively 30 years from September 24, 1995 to March 4, 2025.

Data is stored in the form of JSON file with the following schema:

    {
        "claim": str,
        "label": str,
        "year": int,
        "month": int,
        "day": int,
    }

The repository contains a python file `fc30.py` with the required functions to load the data, randomly or temporally ordered. It also contains a dictionary we used to group fine labels into larger categories for experiments. Note the the loading function converts the year/month/day fields into a python `datetime` object to ease sorting by date.

![label_distribution](figures/label_distribution.png "Main labels distribution over time in FC30")

## Temporal bias distribution metric

The code for computing the proposed bias metric is given in the `bias_metric.py` file.

It contains two main useful functions:

 - `temporal_bias(labels: List[int], granularity: int)` computes the proposed bias metrics for a temporally label list composed of integers in (0, n_classes). All classes should have at least one instance of label. The granularity parameter refers to the number of data points used in the curve. The default parameter of 100 is necessary to limit computations for large lists and approximates well the value with a high granularity.
 - `temporal_bias_binary_with_plot(labels: List[int], granularity: int)` computes the proposed bias metrics for a temporally label list composed of binary labels (in [0,1]), and plots the performance bias curves.




## Requirements

The code was tested with `python=3.8.10`, `scikit-learn==1.3.2` and `matplotlib==3.7.5`.