# Datasets

In this folder, we have all of the datasets we managed to find online. This folder is further subdivided into two folders: classification and regression.

The [classification](classification/) folder has the next datasets:
- Heart: [heart.dat](classification/heart.dat) ([Link](https://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29))
- Breast cancer: [dataset_13_breast-cancer.arff](classification/dataset_13_breast-cancer.arff) ([Link](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29))
- Haberman: [haberman.data](classification/haberman.data) ([Link](https://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival))
- Ionosphere: [ionoshpere.data](classification/ionosphere.data) ([Link](https://archive.ics.uci.edu/ml/datasets/Ionosphere))
- Diabetes: [diabetes.csv](classification/diabetes.csv) ([Link](https://www.kaggle.com/datasets/mathchi/diabetes-data-set))
- German credit: [SouthGermanCredit.asc](classification/SouthGermanCredit.asc) ([Link](https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29))
- Juvenile: [juvenile.xpt](classification/juvenile.xpt) ([Link](https://www.icpsr.umich.edu/web/NACJD/studies/3986))
- Recidivism: [compas-scores-two-years.csv](classification/compas-scores-two-years.csv) ([Link](https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis))

The [regression](regression/) folder has the next datasets:
- Geographical music: [geographical_music.tsv](regression/geographical_music.tsv) ([Link](https://epistasislab.github.io/pmlb/profile/4544_GeographicalOriginalofMusic.html))
- Red wine: [winequality-red.csv](regression/winequality-red.csv) ([Link](https://archive.ics.uci.edu/ml/datasets/Wine+Quality))
- Abalone: [abalone.data](regression/abalone.data) ([Link](https://archive.ics.uci.edu/ml/datasets/Abalone))
- Satellite image: [satellite_image.tsv](regression/satellite_image.tsv) ([Link](https://epistasislab.github.io/pmlb/profile/294_satellite_image.html))
- CA housing: [ca_housing.data](regression/ca_housing.data) ([Link](https://www.kaggle.com/datasets/camnugent/california-housing-prices))


Python package `scikit-learn` was used for the next three regression datasets:
- Friedman1: `sklearn.datasets.make_friedman1()` ([Link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman1.html))
- Friedman3: `sklearn.datasets.make_friedman3()` ([Link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_friedman3.html))
- Diabetes: `sklearn.datasets.load_diabetes()` ([Link](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html))