SampleSelector

Inheritance diagram of gmmvi.optimization.gmmvi_modules.sample_selector.SampleSelector, gmmvi.optimization.gmmvi_modules.sample_selector.VipsSampleSelector, gmmvi.optimization.gmmvi_modules.sample_selector.LinSampleSelector
class gmmvi.optimization.gmmvi_modules.sample_selector.SampleSelector(target_distribution: LNPDF, model: GmmWrapper, sample_db: SampleDB)[source]

Provides the interface for selecting samples for performing the updates at the beginning of every iteration.

The samples are evaluated on the target distribution and used for updating the weights, means and covariance of the GMM.

There are currently two options for estimating the natural gradient:

  1. The VipsSampleSelector use the procedure described by Arenz et al. [AZN18], Arenz et al. [AZN20] to ensure that we have samples in the vicinity of every component, enabling us to perform a stable update on every component.

  2. The LinSampleSelector uses the procedure described by Lin et al. [LKS19a] which draws samples

    according to the weights of the current mixture model, aiming for better sample efficiency.

Parameters:
  • target_distributionLNPDF The target distribution is used for evaluating the newly drawn samples.

  • modelGmmWrapper The wrapped model is used for drawing the samples.

  • sample_dbSampleDB The new samples and their target_densities (and gradients) are stored in the sample database.

static build_from_config(config, gmm_wrapper, sample_db, target_distribution)[source]

This static method provides a convenient way to create a VipsSampleSelector, or LinSampleSelector depending on the provided config.

Parameters:
  • config – dict The dictionary is typically read from YAML a file, and holds all hyperparameters.

  • gmm_wrapperGmmWrapper The wrapped model is used for drawing the samples.

  • sample_dbSampleDB The new samples and their target_densities (and gradients) are stored in the sample database.

  • target_distributionLNPDF The target distribution is used for evaluating the newly drawn samples.

select_samples() [<class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>][source]

Select the samples for current learning iteration and stores the data in the sample database.

Returns:

samples - a tensor of shape number_of_selected_samples x number_of_dimensions

old_samples_pdf - a tensor of shape number_of_selected_samples, containing the log-densities of the distribution that was effectively used to obtain the selected samples. Needed for importance weighting.

target_lnpdfs - a tensor of shape number_of_selected_samples, containing the log-densities of the target distrbution for each selected sample, \log p(\mathbf{x}).

target_grads - a tensor of shape number_of_selected_samples x num_dimensions, containing the gradients of the log-densities of the target distrbution for each selected sample, \nabla_{\mathbf{x}} \log p(\mathbf{x}).

Return type:

tuple(tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor)

LinSampleSelector

class gmmvi.optimization.gmmvi_modules.sample_selector.LinSampleSelector(target_distribution: LNPDF, model: GmmWrapper, sample_db: SampleDB, desired_samples_per_component: int, ratio_reused_samples_to_desired: float)[source]

Selects the samples according to the procedure described by Lin et al. [LKS19a].

This class uses the procedure described by Lin et al. [LKS19a] by drawing new samples for the current mixture model. We also implemented the two-phase procedure of the VipsSampleSelector to reuse samples from the database and redraw samples based on a desired number of samples. However, in contrast to the VipsSampleSelector, we compute the effective sample size not per component, but for the whole mixture, and redraw samples n_eff - desired_samples_per_component new samples from the mixture model. The exact procedure of Lin et al. [LKS19a] can be reproduced, when choosing ratio_reused_samples_to_desired = 0, where always a fixed number of new samples is drawn from the mixture model.

Parameters:
  • target_distributionLNPDF The target distribution is used for evaluating the newly drawn samples.

  • modelGmmWrapper The wrapped model is used for drawing the samples.

  • sample_dbSampleDB The database is used for reusing samples from previous iterations and for storing the new samples and their target_densities (and gradients).

  • desired_samples_per_component – int The desired number for the mixture update.

  • ratio_reused_samples_to_desired – float In the first pass, we reuse the ratio_reused_samples_to_desired * desired_samples_per_component freshest samples from the database.

get_effective_samples(model_densities: Tensor, oldsamples_pdf: Tensor) Tensor[source]

Computes the effective sample size of the mixture model based on the log-densities of the target distribution and the log-densities of the background distribution.

Parameters:
  • model_densities – tf.Tensor The log-densities of the mixture model, \log q(\mathbf{x}).

  • oldsamples_pdf – tf.Tensor The log-densities of the distribution that was effectively used for obtaining the selected samples

Returns:

the effective number of samples

Return type:

float

sample_where_needed() [<class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'int'>][source]

Computes the mixture model’s effective sample size for the given set of samples and draws n_{\text{des}} - n_{\text{eff}} new samples from the mixture model.

Parameters:
  • samples – tf.Tensor the samples that were chosen during the first pass

  • oldsamples_pdf – tf.Tensor The log-densities of the distribution that was effectively used for obtaining the selected samples

  • num_desired_samples – int The number of desired samples per component

Returns:

new_samples - a tensor containing the newly drawn samples

new_target_lnpdfs - a tensor containing the log-densities of the target distribution on the newly drawn samples, \log p(\mathbf{x}).

new_target_grads - a tensor containing the gradients of the log-densities for the newly drawn samples.

mapping - a tensor containing for every sample the one-dimensional tensor contains the index of the component that was used for drawing that sample.

Return type:

tuple(tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor)

select_samples() [<class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>][source]

Select the samples for current learning iteration and stores the data in the sample database.

Returns:

samples - a tensor of shape number_of_selected_samples x number_of_dimensions

old_samples_pdf - a tensor of shape number_of_selected_samples, containing the log-densities of the distribution that was effectively used to obtain the selected samples. Needed for importance weighting.

target_lnpdfs - a tensor of shape number_of_selected_samples, containing the log-densities of the target distrbution for each selected sample, \log p(\mathbf{x}).

target_grads - a tensor of shape number_of_selected_samples x num_dimensions, containing the gradients of the log-densities of the target distrbution for each selected sample, \nabla_{\mathbf{x}} \log p(\mathbf{x}).

Return type:

tuple(tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor)

VipsSampleSelector

class gmmvi.optimization.gmmvi_modules.sample_selector.VipsSampleSelector(target_distribution: LNPDF, model: GmmWrapper, sample_db: SampleDB, desired_samples_per_component: int, ratio_reused_samples_to_desired: float)[source]

Selects the samples according to the procedure described by Arenz et al. [AZN18], Arenz et al. [AZN20].

This class uses the procedure described by Arenz et al. [AZN18], Arenz et al. [AZN20] to ensure that we have samples in the vicinity of every component. It uses two passes. In the first pass, it selects a given number of samples from the sample database. In the second pass, it computes the effective sample size for every component (based on the importance weights) and compares the effective sample size with a given desired number of samples. It then draws from every component the respective missing number of samples.

Parameters:
  • target_distributionLNPDF The target distribution is used for evaluating the newly drawn samples.

  • modelGmmWrapper The wrapped model is used for drawing the samples.

  • sample_dbSampleDB The database is used for reusing samples from previous iterations and for storing the new samples and their target_densities (and gradients).

  • desired_samples_per_component – int The desired number of samples for every component.

  • ratio_reused_samples_to_desired – float In the first pass, we reuse the number_of_components * ratio_reused_samples_to_desired * desired_samples_per_component freshest samples from the database.

get_effective_samples(model_densities: Tensor, oldsamples_pdf: Tensor) Tensor[source]

Computes the effective sample size based on the log-densities of the target distribution and the log-densities of the background distribution.

Parameters:
  • model_densities – tf.Tensor The log-densities of the individual components, \log q(\mathbf{x}|o)

  • oldsamples_pdf – tf.Tensor The log-densities of the distribution that was effectively used for obtaining the selected samples

Returns:

the effective number of samples

Return type:

float

sample_where_needed(samples: Tensor, oldsamples_pdf: Tensor, num_desired_samples: Optional[int] = None) [<class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>][source]

Computes the components’ effective sample sizes for the given set of samples and draws, for every component i, n_{\text{des}} - n_{\text{eff,i}} new samples.

Parameters:
  • samples – tf.Tensor the samples that were chosen during the first pass

  • oldsamples_pdf – tf.Tensor The log-densities of the distribution that was effectively used for obtaining the selected samples

  • num_desired_samples – int The number of desired samples per component

Returns:

new_samples - a tf.Tensor, the newly drawn samples

new_target_lnpdfs - a tf.Tensor, the log-densities of the target distribution on the newly drawn samples, \log p(\mathbf{x}).

new_target_grads - a tf.Tensor, the gradients of the log-densities for the newly drawn samples, \nabla_{\mathbf{x}} \log p(\mathbf{x}).

mapping - a tf.Tensor, for every sample the one-dimensional tensor contains the index of the component that was used for drawing that sample.

Return type:

tuple(tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor)

select_samples() [<class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>, <class 'tensorflow.python.framework.ops.Tensor'>][source]

Select the samples for current learning iteration and stores the data in the sample database.

Returns:

samples - a tensor of shape number_of_selected_samples x number_of_dimensions

old_samples_pdf - a tensor of shape number_of_selected_samples, containing the log-densities of the distribution that was effectively used to obtain the selected samples. Needed for importance weighting.

target_lnpdfs - a tensor of shape number_of_selected_samples, containing the log-densities of the target distrbution for each selected sample, \log p(\mathbf{x}).

target_grads - a tensor of shape number_of_selected_samples x num_dimensions, containing the gradients of the log-densities of the target distrbution for each selected sample, \nabla_{\mathbf{x}} \log p(\mathbf{x}).

Return type:

tuple(tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor)