Competition: Emotion Understanding from Everyday Text

Problem statement
Build a model that predicts the underlying emotion expressed in short, informal pieces of text. Given the text of a sentence or phrase, your task is to assign exactly one emotion label from a fixed set (e.g., anger, love, hate, worry, sadness, happiness, neutral, etc.). This is a multi-class single-label classification problem with noticeable class imbalance and noisy, informal language—perfect for advancing robust NLP systems.

Data description
- train.csv: Training data with columns:
  - id: Unique identifier for each text example.
  - text: The text content (UTF-8, may include punctuation and emojis; casing may be mixed; length varies).
  - Emotion: The gold label for the text (one of the discrete emotion categories).

- test.csv: Test data with columns:
  - id: Unique identifier for each text example.
  - text: The text content.

- sample_submission.csv: A sample submission file with columns:
  - id: Copy of the ids from test.csv.
  - Emotion: A placeholder prediction for each id. Replace with your model’s predictions.

Label space
- The definitive label set is exactly the set of unique values in the Emotion column of train.csv. Predictions must be one of these values. Any out-of-vocabulary label will be rejected by the evaluation.

Evaluation
- Metric: Macro-averaged F1 score (unweighted mean of per-class F1 scores) computed over the classes present in the ground truth. This metric emphasizes balanced performance across majority and minority emotions.
- Submission format: A CSV with a header and exactly two columns in this order: id,Emotion. All test ids must appear exactly once; no extra or missing ids. Labels must be drawn from the train label set.
- Case-sensitivity: Labels are evaluated case-sensitively and must match train.csv exactly.

Rules and expectations
- Use train.csv for model development. Make predictions for every id in test.csv and submit a CSV named however you like with columns id,Emotion that matches the required format.
- The test set includes only labels that also appear at least once in the training set.

Good luck, and have fun pushing the state of emotion understanding in NLP!