Dataloaders: Bike Sharing¶
-
class
bike_dataloader.
BikeSharing
(path, seed=42, train=True, noise=False, noise_type=None, distribution_data=None, normalize=False, size=None)¶ - Description:
Bike Sharing Dataset (Fanaee-T & Gama, 2013) [1] is a dataset that contains 17,379 samples of the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.
The dataset has been processed, where the date was normalized from a scale between day 1 to day 730 to a scale between 0 and 4π. Then, the network is provided with the cosine and the sine of this number, which allowed to have the same representation for the same days of the year, while having the same distance between any two consecutive days, keeping the cyclic nature of a year.
A similar idea was applied to hours,normalized from 0 to 2π instead of 0 to 24, and with the cosine and sine given to the network. The day of the week, being a category, was given as a one-hot vector of dimension 7. We also removed the season and the month as it was redundant information with the date.
- Attribute Information:
The number of features after the preprocessing step is 19, which as following:
f1: Year f2-f4: Date (sine and cos) f4-f5: Hour (sine and cos) f6-f12: Days of the week (one-hot vector) f13: Holiday boolean f14: Working day boolean f15: Weather situation f16: Temperature f17: Felt temperature f18: Humidity f19: Wind speed - Args:
path (string): A path to the UTKF dataset directory. train (bool): A boolean that controls the selection of training data (True), or testing data (False). noise (bool): A boolean that controls if the noise should be added to the data or not. noise_type (string): A variable that controls the type of the noise. distribution data (list): A list of information that is needed for noise generation. normalize (bool): A boolean that controls if the data will be normalized (True) or not (False). size (int): Size of dataset (training or testing).
-
load_data
()¶ Description:
Loads the dataset.- Return:
- features, labels.
- Return type:
- Tuple
- Args:
- None.
-
get_uniform_params
(mu, v)¶ - Description:
Generates the bounds of the uniform distribution using the mean and the variance, by solving the formula
a = mu - sqrt(3*v) b = mu + sqrt(3*v)
- Return:
- Uniform distribution bounds a and b.
- Return type:
- Tuple.
- Args:
mu (float): The mean of the uniform distribution. v (float): The variance of the uniform distribution.
-
get_gamma_params
(mu, v)¶ - Description:
Generates the shape or concentration (alpha) and rate (beta) using the mean and variance of gamma distribution
alpha = 1/k beta = 1/theta ** k = (mu**2)/v theta = v/mu
- Return:
- Alpha, Beta .
- Return type:
- Tuple.
- Args:
mu (float): The mean of gamma distribution. v (float): The variance of gamma distribution.
-
get_distribution
(dist_type, data, is_params_estimated, vmax=False, vmax_scale=1)¶ - Description:
- Create a probability distribution (uniform or gamma).
- Return:
- A probability distribution.
- Return type:
- Object.
- Args:
dist_type: An argument that specifies the type of the distribution. data: A list that contains the information of distribution . is_params_estimated: An argument that controls if the data is used used to create probability distribution. The data could be distribution statistics (mean and variance) or distribution parameters. vmax: A boolean that controls if maximum heteroscedasticity will be used or not. vmax_scale: An argument that specifies the heteroscedasticity scale.
-
gaussian_noise
(var_dists, p=0.5)¶ - Description:
- Generates gaussian noises with a cenetred mean around 0 and heteroscedasticitical variance that sampled from a range of distributions.
- Return:
- Guassian noises and their heteroscedasticitical variances.
- Return type:
- Tuple.
- Args:
var_dist(object): Noise varaince probability distributions. p (float): The contribution ratio of low and high noise variance distributions.
-
generate_noise
(norm=False)¶ - Description:
- Unpacks information and calls gaussian_noise to generates noises.
- Return:
- Guassian noises and their heteroscedasticitical variances.
- Return type:
- Tuple.
- Args:
norm: Normalization.
References
[1] | Hadi Fanaee-T and Joao Gama. Event labeling combining ensemble detectors and backgroundknowledge.Progress in Artificial Intelligence, pp. 1–15, 2013. ISSN 2192-6352. doi: 10.1007/s13748-013-0040-3 |