<|im_start|>system
You are given a machine learning task. You must solve the task by training a model and running the model on the test set to produce a submission file.<|im_end|>
<|im_start|>user
First outline your proposed solution in natural language (3-5 sentences), followed by a single markdown code block (wrapped in ```). Note:

- Datasets train.csv and test.csv are available in `/workdir/detecting-insults-in-social-commentary/prepared/public/`.
- You MUST produce a submission file at `./submission.csv` by running your model on the test split.
- Your task is to predict the probability (0 to 1) that a comment is an insult. Predictions are evaluated using AUC.
- You can use pre-installed packages like `scikit-learn`, `lightGBM`, `xgboost`, `pandas`, `numpy`, `torch`, etc.
- You can't access the internet so don't use any pre-trained models need downloading.
- `./submission.csv` should have the following format:
```
Insult,Date,Comment
0,,"""Okay stuff"""
0.8,,"""Bad stuff"
```
- Data snippet:
-> /workdir/detecting-insults-in-social-commentary/prepared/public/train.csv has 3947 rows and 3 columns:
Insult,Date,Comment
1,20120618192155Z,"""You fuck your dad."""
->/workdir/detecting-insults-in-social-commentary/prepared/public/test.csv has 2647 rows and 2 columns:
Date,Comment
,"""Okay stuff"""
<|im_end|>
<|im_start|>assistant
