Competition: Arabic Natural Language Inference (NLI) — 3-way Classification

Problem statement
Participants must build models that determine the logical relationship between an Arabic premise sentence and an Arabic hypothesis sentence. Each pair belongs to one of three classes: 0 (Entailment), 1 (Neutral), or 2 (Contradiction). Your goal is to predict the label for each pair in the test set.

Data description
- Task: 3-class Natural Language Inference (NLI) on Arabic sentence pairs.
- Train file: train.csv with columns [id, premise, hypothesis, label].
- Test file: test.csv with columns [id, premise, hypothesis]. You will predict label for each id.
- Sample submission: sample_submission.csv with columns [id, label]. It is a valid example submission with random labels.

Labels
- 0: Entailment — the hypothesis logically follows from the premise.
- 1: Neutral — no clear logical relation between premise and hypothesis.
- 2: Contradiction — the hypothesis contradicts the premise.

Evaluation
- Primary metric: Macro-averaged F1 score across classes {0,1,2}.
  - Each class contributes equally to the final score regardless of frequency.
  - In case of zero predictions or zero support for a class, that class’s F1 is defined as 0.0 to avoid undefined values.
- Format requirements for submissions are strictly validated:
  - CSV with header and exactly two columns: id, label
  - id must match exactly the test set ids, with no duplicates, no missing or extra ids
  - label must be integer in {0,1,2}

Files provided to participants
- train.csv — training pairs with labels.
- test.csv — test pairs without labels.
- sample_submission.csv — example of a valid submission format.

Submission format
- CSV with exactly two columns without additional indexes:
  - id: integer, matches the id in test.csv
  - label: integer in {0,1,2}

Notes
- Text comes from aggregated Arabic NLI sources and may vary widely in style and domain. Robust preprocessing can be critical.
- Class distribution may be imbalanced; macro-F1 rewards balanced performance across all classes, making the task both fair and challenging.
- The id column has no semantic information and exists only to align predictions to rows.
