World Athletics Performance Points Prediction

Problem Statement
Predict the World Athletics result_score (performance points) for elite track and field results using only contextual and metadata features, without using the actual performance mark. The goal is to build a model that generalizes across disciplines, sexes, age categories, venues, and seasons.

Why this is challenging
- Cross-discipline generalization: Participants must learn discipline-, sex- and age-specific performance patterns with one model.
- Heterogeneous features: Mix of categorical and numerical variables (e.g., nationality, venue, wind, age at event, season) with substantial missingness and noise.
- Real-world noise: Performances span decades (1935–2025), across countries and venues with non-uniform distributions.

Data Description
The competition data are provided as three CSV files:
- train.csv: Each row is a historical result with metadata features and the target result_score.
- test.csv: The same schema as train.csv but without the target column.
- sample_submission.csv: A template for submissions with random but valid predictions.

All rows correspond to individual performances from international track and field competitions. The features are:
- id: Unique row identifier.
- wind: Wind reading (m/s). May be missing where not applicable.
- competitor: Athlete name string.
- dob: Athlete date of birth (YYYY-MM-DD). May be missing.
- nationality: Athlete nationality code (ISO-like 3 letters).
- venue: Venue string (stadium/city).
- date: Event date (YYYY-MM-DD).
- discipline: Original discipline label string (as scraped), may be slightly verbose.
- normalized_discipline: Normalized discipline label (e.g., 100-metres, marathon, long-jump, shot-put, etc.).
- type: Event category (e.g., sprints, hurdles, jumps, throws, middlelong, road-running, race-walks, relays, combined-events).
- sex: male or female.
- age_cat: Age category (senior, u20, u18).
- track_field: track, field, or mixed.
- venue_country: Venue country code.
- age_at_event: Athlete age (years) at event time (float).
- season: Event season year (int).

Target
- result_score: World Athletics performance points (numeric), present only in train.csv.

Columns intentionally excluded to avoid leakage
- mark, mark_numeric, rank, position are considered near-direct outcomes of performance and are removed from the public files. Do not try to reconstruct them from external sources.

File List
- train.csv
- test.csv
- sample_submission.csv

Evaluation
Submissions are evaluated using Mean Absolute Error (MAE) between your predictions and the ground truth result_score over the test set.
- Lower MAE is better.
- Predictions must be finite real numbers.
- Format: A CSV with two columns: id,result_score, containing one row per id in test.csv.

Submission File
- The file must be named arbitrarily but must contain exactly the two columns id,result_score.
- The id values must match those in test.csv.

Important Notes
- The test set includes all discipline/sex/age_cat combinations present in training with sufficient examples. However, distributions may differ across seasons and venues.
- Columns that directly encode the realized performance (mark, mark_numeric, rank, position) are removed; models should rely on contextual information.

Good luck and have fun building a model that understands the structure of world-class athletics performances!