Sephora Skincare Reviews Rating Prediction

Overview
Participants are challenged to predict the 1–5 star rating a reviewer assigned to a Sephora skincare product, using the full review text together with rich reviewer- and product-level metadata. The dataset contains about one million reviews linked to ~8k products. This task is deliberately realistic and challenging: reviews are imbalanced toward higher ratings, product and reviewer attributes provide complementary signal, and the temporal split introduces natural distribution shift.

Files
- train.csv: Labeled rows with features and the target column rating (integers 1–5).
- test.csv: Unlabeled rows with the same features as train.csv except without rating.
- sample_submission.csv: Sample file showing the expected submission format.

Target
- rating: Integer in {1, 2, 3, 4, 5} representing the number of stars the reviewer gave.

ID
- review_id: Unique identifier for each review. Submissions must cover exactly the review_id values provided in test.csv.

Features (selected examples; columns are consistent between train.csv and test.csv unless noted)
Review-level (from user reviews):
- author_id: Anonymized review author identifier
- is_recommended: 0/1 flag for the reviewer’s recommendation
- helpfulness: Ratio of positive to total feedback on the review (may be 0 when total feedback is 0)
- total_feedback_count, total_neg_feedback_count, total_pos_feedback_count
- submission_time: Date the review was posted (string ISO format)
- review_text, review_title: Free-text content
- skin_tone, eye_color, skin_type, hair_color: Self-reported appearance attributes
- product_id, product_name, brand_name: Product linkage and names
- price_usd: Product price at the time of review

Product-level (from the product catalog, merged by product_id):
- brand_id
- loves_count
- product_avg_rating: Average product rating across all reviews on site (renamed from the catalog to avoid confusion with the target)
- product_review_count: Number of reviews counted at the catalog level
- size, variation_type, variation_value, variation_desc
- ingredients: Ingredient list (stringified; can be parsed)
- value_price_usd, sale_price_usd
- limited_edition, new, online_only, out_of_stock, sephora_exclusive (0/1 flags)
- highlights: Tag list (stringified)
- primary_category, secondary_category, tertiary_category
- child_count, child_max_price, child_min_price
Note: To avoid ambiguity with review-side columns, overlapping product catalog fields are suffixed with _catalog where applicable (e.g., product_name_catalog, brand_name_catalog, price_usd_catalog).

Train/Test Split
- The split is chronological by submission_time to reflect a realistic forecasting scenario: the most recent reviews form the test set, the earlier reviews form the training set. This enforces a natural distribution shift and discourages leakage across near-duplicate reviews of the same products.

Submission
- File format: CSV with exactly two columns and a header: review_id,rating
- review_id must match exactly the IDs provided in test.csv (no extra or missing rows; no duplicates).
- rating must be numeric predictions; they will be rounded and clipped to the 1–5 range for evaluation.

Evaluation Metric
- Quadratic Weighted Kappa (QWK) between the true ratings and the submitted predictions, treating ratings as ordinal categories in the range 1–5.
- QWK is robust to class imbalance and penalizes near-misses less than large errors, making it well-suited for star ratings.
- During scoring, predictions are clipped to [1, 5], rounded to the nearest integer, and compared to ground truth via QWK. The final leaderboard is sorted by QWK (higher is better).


Deliverables
- train.csv, test.csv, and sample_submission.csv are the only files needed to train and submit.
- Do not attempt to infer labels from filenames or paths; none are provided.

Good luck and have fun building robust models that understand text, metadata, and time-aware dynamics in skincare reviews!