Dataloaders: Wine Quality

class wine_dataloader.WineQuality(path, train=True, noise=False, noise_type=None, distribution_data=None, normalize=False, size=None)
Description:
Wine Quality dataset [1] is a dataset that related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.
Attribute Information:

Input variables (based on physicochemical tests):

1:fixed acidity
2:volatile acidity
3:citric acid
4:residual sugar
5:chlorides
6:free sulfur dioxide
7:total sulfur dioxide
8:density
9:pH
10:sulphates
11:alcohol
12:quality (score between 0 and 10) [Output variable (based on sensory data]
Args:
path (string):A path to the UTKF dataset directory.
train (bool):A boolean that controls the selection of training data (True), or testing data (False).
noise (bool):A boolean that controls if the noise should be added to the data or not.
noise_type (string):
 A variable that controls the type of the noise.
distribution data (list):
 A list of information that is needed for noise generation.
normalize (bool):
 A boolean that controls if the data will be normalized (True) or not (False).
size (int):Size of dataset (training or testing).
load_data()

Description:

Loads the dataset.
Return:
features, labels.
Return type:
Tuple
Args:
None.
get_uniform_params(mu, v)
Description:

Generates the bounds of the uniform distribution using the mean and the variance, by solving the formula

a = mu - sqrt(3*v)
b = mu + sqrt(3*v)
Return:
Uniform distribution bounds a and b.
Return type:
Tuple.
Args:
mu (float):The mean of the uniform distribution.
v (float):The variance of the uniform distribution.
get_gamma_params(mu, v)
Description:

Generates the shape or concentration (alpha) and rate (beta) using the mean and variance of gamma distribution

alpha = 1/k
beta = 1/theta
**
k = (mu**2)/v
theta = v/mu
Return:
Alpha, Beta .
Return type:
Tuple.
Args:
mu (float):The mean of gamma distribution.
v (float):The variance of gamma distribution.
get_distribution(dist_type, data, is_params_estimated, vmax=False, vmax_scale=1)
Description:
Create a probability distribution (uniform or gamma).
Return:
A probability distribution.
Return type:
Object.
Args:
dist_type:An argument that specifies the type of the distribution.
data:A list that contains the information of distribution .
is_params_estimated:
 An argument that controls if the data is used used to create probability distribution. The data could be distribution statistics (mean and variance) or distribution parameters.
vmax:A boolean that controls if maximum heteroscedasticity will be used or not.
vmax_scale:An argument that specifies the heteroscedasticity scale.
gaussian_noise(var_dists, p=0.5)
Description:
Generates gaussian noises with a cenetred mean around 0 and heteroscedasticitical variance that sampled from a range of distributions.
Return:
Guassian noises and their heteroscedasticitical variances.
Return type:
Tuple.
Args:
var_dist(object):
 Noise varaince probability distributions.
p (float):The contribution ratio of low and high noise variance distributions.
generate_noise(norm=False)
Description:
Unpacks information and calls gaussian_noise to generates noises.
Return:
Guassian noises and their heteroscedasticitical variances.
Return type:
Tuple.
Args:
norm:Normalization.

References

[1]
  1. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties.In Decision Support Systems, Elsevier, 47(4):547-553, 2009.