Competition: CNN/DailyMail Abstractive News Summarization

Problem statement
Given a full news article, generate a concise, informative abstractive summary that captures the key facts and context of the piece. Your system should read an article and output a short summary in natural English (typically 1–3 sentences), optimized for the evaluation metric described below.

Data description
You are provided with cleaned CSV files derived from the CNN/DailyMail corpus.
- train.csv: columns [id, article, summary]
  - article: the body text of a news article (may include punctuation, quotes, and newlines)
  - summary: the reference highlight(s) written by the original author
- test.csv: columns [id, article]
  - For each test article, you must predict a summary.
- sample_submission.csv: columns [id, summary]
  - A valid example submission file. You should replace the summary content with your model’s predictions while keeping the same columns and id values.

Notes
- The final train.csv merges the common training and validation portions of the original corpus to maximize training data.
- The final test.csv contains only the article texts and ids. There is no label information in test.csv.

Submission format
- File name: submission.csv (any name is fine on Kaggle, but it must be a CSV when submitted)
- Columns (in order): id, summary
- Requirements:
  - Every id in test.csv must appear exactly once in your submission.
  - Do not include extra columns or rows.
  - The summary field must be text (UTF-8), can be empty but should ideally be 1–3 sentences. There is no hard length limit; be mindful of inference-time constraints.

Evaluation
Primary metric: ROUGE-L F1 (higher is better).
- For each test row, we compute ROUGE-L between the predicted summary and the hidden reference summary and average over all rows.
- Preprocessing before scoring:
  - Lowercasing
  - Whitespace normalization (collapse multiple spaces and trim)
  - Whitespace tokenization
- ROUGE-L is computed via the longest common subsequence (LCS) on tokens, then converted to F1 using precision and recall from the LCS length. The final leaderboard score is the mean of per-example ROUGE-L F1 values in [0, 1].
- Edge cases: empty vs. empty yields 1.0; empty vs. non-empty yields 0.0; numeric stability is handled to avoid NaN/Inf.

Why this is challenging and meaningful
- Long-input summarization: Articles average hundreds of tokens, requiring thoughtful preprocessing (e.g., truncation, sliding windows, extract-then-abstract).
- Abstractive generation: Success requires modeling salience, paraphrase, and factuality, not simple copying.
- Robust evaluation: ROUGE-L rewards capturing key sequences while being tolerant to paraphrasing.

Deliverables
- Train your model on train.csv using [article -> summary].
- Generate predictions for test.csv and produce a CSV with columns [id, summary].
- Use sample_submission.csv as a template to ensure formatting correctness.

Final files for this competition
- train.csv
- test.csv
- sample_submission.csv

Good luck and have fun!