# RuleCollection-32K

This repository contains datasets for rule-based reasoning tasks, organized into two main categories:

## In-Distribution (ID) Datasets

- **ar_lsat**: Analytical Reasoning from LSAT
- **clutrr**: CLUTtRR (Compositional Language Understanding and Text-based Relational Reasoning)
- **folio**: FOLIO (First-Order Logic in Natural Language)
- **logic_nli**: Logic-based Natural Language Inference
- **logical_deduction**: Logical Deduction tasks
- **logiqa**: LogiQA (Logical Reasoning QA)
- **prontoqa**: ProntoQA (Prompt-based Question Answering)
- **proofwriter**: ProofWriter (Proof Generation and Verification)

## Out-of-Distribution (OOD) Datasets

- **bigbench_extra_hard**: BigBench Extra Hard tasks
- **bigbench_hard**: BigBench Hard tasks  
- **proverqa**: ProverQA (Automated Theorem Proving)

## File Format

Each dataset contains:
- `train.json`: Training data
- `test.json`: Test data

Data is provided in both JSON and Parquet formats for convenience.

## Usage

```python
from datasets import load_dataset

# Load a specific dataset
dataset = load_dataset("RuleCollection-32K", data_files="id/folio/train.json")

# Or load all files from a specific dataset
dataset = load_dataset("RuleCollection-32K", data_files="id/folio/*.json")
```
