Competition: Stack Overflow Question Quality Prediction

Problem statement
Given a Stack Overflow question (title, HTML body, tags, and creation date), predict its quality class among three categories:
- HQ: High-quality posts without a single edit
- LQ_EDIT: Low-quality posts with a negative score and multiple community edits (remained open)
- LQ_CLOSE: Low-quality posts that were closed by the community without a single edit

Your task is to build an ML model that infers question quality from raw text (including code snippets), tags, and temporal signals.

Data description
- train.csv: Labeled training data with columns: Id, Title, Body, Tags, CreationDate, Y
- test.csv: Unlabeled test data with columns: Id, Title, Body, Tags, CreationDate

Notes
- Body is HTML and may include code blocks. You may parse/clean HTML, extract code/text, use character/word/HTML features, or build models directly on raw HTML.
- Tags are provided as a string like <python><pandas>; you can parse them into features.
- CreationDate is UTC. Temporal patterns can be useful.
- Id is a unique identifier and has no inherent meaning.

Submission format
Provide class probabilities for each test Id. The required CSV columns are exactly:
- Id, HQ, LQ_EDIT, LQ_CLOSE
Each row must contain non-negative probabilities summing to 1 across the three class columns.

Evaluation
Submissions are evaluated by class-balanced (macro) log loss, which equally weights each class:
- For each class c in {HQ, LQ_EDIT, LQ_CLOSE}, compute the standard log loss using only the examples of class c.
- The final score is the average of the three class-wise log losses.
Lower is better. This metric rewards calibrated probabilities and balances performance across classes, mitigating class imbalance.

Final files provided
- train.csv (labeled)
- test.csv (unlabeled)
- sample_submission.csv (example submission with valid probabilities)


File integrity
- The Ids in test.csv correspond exactly to the Ids expected in the submission file.
- No label information is present in the test.csv file.

Good luck and have fun building a high-quality, well-calibrated classifier!