Competition: Predict Disneyland Review Star Ratings (1–5)

Problem statement
Participants must predict the 1–5 star rating given by a TripAdvisor reviewer based on the full review text and available metadata. The task is an ordinal text classification problem with rich, noisy reviews and multilingual signals, and requires robust preprocessing, feature engineering, and modeling.

Files provided to participants
- train.csv: Labeled training data with columns Review_ID, Year_Month, Reviewer_Location, Review_Text, Branch, Rating
- test.csv: Unlabeled test data with the same columns as train.csv except Rating
- sample_submission.csv: Example submission with columns Review_ID, Rating

Target
- Rating: integer in {1, 2, 3, 4, 5}

Input features
- Review_ID: unique identifier of each review
- Year_Month: visit time in YYYY-M format (e.g., 2019-4)
- Reviewer_Location: free-text country/region of the reviewer
- Review_Text: free-text review content
- Branch: Disneyland_California, Disneyland_Paris, Disneyland_HongKong

Evaluation metric
Submissions are evaluated using Quadratic Weighted Kappa (QWK) between true and predicted ratings. QWK rewards getting close to the correct star rating and penalizes larger mistakes more heavily than smaller ones, making it suitable for ordinal targets.

Submission format
- A CSV file with columns: Review_ID, Rating
- One row per Review_ID present in test.csv
- Rating must be an integer in {1, 2, 3, 4, 5}. If a submission contains non-integer or out-of-range values, they will be rounded to the nearest integer and clipped to the [1, 5] range during scoring.

Data scale and notes
- Use the entire training set; no subsampling is needed or expected.
- Text is provided verbatim and may include punctuation, emojis, and non-ASCII characters.
- Year_Month is an informative temporal feature; you may derive additional features (e.g., month, year, seasonality).
- Reviewer_Location is free-form; normalizing to ISO codes or continents may help.
- Branch encodes park-level differences in experience and language distribution.

Final data artifacts for this competition
- train.csv
- test.csv
- sample_submission.csv

Good luck, and have fun building robust, multilingual, ordinal text classifiers!