# 2D Robust Regression Dataset Visualization

This repository contains a Python script (`data_2d.py`) that generates and visualizes a synthetic 2D dataset. This dataset is specifically designed to stress-test regression algorithms by introducing **high leverage points** and **proximal outliers**.

## Dataset Composition

The dataset consists of 1000 samples divided into three distinct groups to simulate real-world data corruption:

1.  **Normal Data (70%)**
    * **Distribution:** Uniformly distributed in $X \in [-2, 2]$.
    * **Label:** Follows the ground truth linear model $Y = 2X + 1 + \epsilon$.
    * **Purpose:** Represents the clean signal.

2.  **High Leverage Points (10%)**
    * **Distribution:** Located far from the center (approx $X \approx 10$).
    * **Label:** Consistent with the ground truth model.
    * **Purpose:** These are "good" leverage points. They are extreme in feature space but valid in labels. Non-robust methods might over-rely on them, while robust methods should handle them correctly.

3.  **Proximal Outliers (20%)**
    * **Distribution:** Located near the center of the data ($X \approx 0$).
    * **Label:** Generated with an **inverted slope** ($W = -2$) and a large bias ($+10$).
    * **Purpose:** These are "bad" outliers. They sit in the high-density region of the feature space but have completely wrong labels, designed to "trap" the model into fitting local noise.

## Requirements

The script requires the following Python libraries:

* `numpy`
* `matplotlib`

```bash
pip install numpy matplotlib