﻿
# The Gaussian-Multinoulli Restricted Boltzmann Machine Code Guide

## Overview

This document provides supplementary information regarding the code implementation for the paper "The Gaussian-Multinoulli Restricted Boltzmann Machine: A Potts Model Extension of the GRBM" (hereafter referred to as "the main paper"). The provided Python scripts implement the Gaussian-Multinoulli Restricted Boltzmann Machine (GM-RBM) model and the hetero-associative memory experiments detailed in the main paper. The GM-RBM is referred to as `GMRBM` in the codebase, a naming convention adopted for consistency with related prior work, reflecting the use of Potts (Multinoulli/categorical) variables for the hidden units.

The code demonstrates:

1.  The implementation of the GM-RBM, including its energy function (as defined in the main paper), conditional probability calculations, Gibbs sampling, and training via Contrastive Divergence (CD).
    
2.  The experimental setup for the hetero-associative memory task, including data preprocessing using Word2Vec, model training, and evaluation of recall accuracy, as described in the main paper.
    

## Code Structure

Three main Python files are relevant to the experiments:

1.  `gmrbm.py`: Contains the core implementation of the GM-RBM model.
    
2.  `kandMemSweep.py`: Contains the script to run the hetero-associative memory experiments, including data loading, Word2Vec embedding, training, and evaluation, designed to sweep over different numbers of Potts states and dataset sizes.
    
3.  `utils.py`: Contains utility functions, such as `cosine_schedule` (imported by `gmrbm.py`) and `setup_logging` (imported by `kandMemSweep.py`). This file can be found in the reference repository: [https://github.com/DSL-Lab/GRBM](https://github.com/DSL-Lab/GRBM?tab=readme-ov-file "null").
    

### `gmrbm.py` (GM-RBM Model Implementation)

This file defines the `GMRBM` class, which represents the Gaussian-Multinoulli RBM.

-   **Initialization (`__init__`)**:
    
    -   Sets up model parameters:
        
        -   `W`: A 3D weight tensor connecting visible units, hidden units, and Potts states (`visible_size`, `hidden_size`, `num_potts_states`).
            
        -   `b`: A 2D bias tensor for hidden unit states (`hidden_size`, `num_potts_states`).
            
        -   `mu`: A 1D bias tensor for visible units (`visible_size`).
            
        -   `log_var`: A 1D tensor for the logarithm of the variance of visible units (`visible_size`).
            
    -   Initializes weights using a normal distribution and biases to zero, following the methodology for RBMs described in the main paper.
        
-   **Energy Function (`energy`)**:
    
    -   Computes the energy of a given visible state `v` and one-hot encoded hidden state `h_onehot`. The energy function implemented is consistent with the formulation presented in the main paper (e.g., Equation 2): E(v,h)=∑i​2σi2​(vi​−μi​)2​−∑i,j,k​σi2​vi​​Wijk​hjk​−∑j,k​bjk​hjk​ where hjk​ is 1 if hidden unit j is in state k, σi2​ is `self.get_var()`, μi​ is `self.mu`, Wijk​ is `self.W`, and bjk​ is `self.b`.
        
-   **Conditional Probabilities**:
    
    -   `prob_h_given_v(v, var)`: Computes P(h∣v), the probability of hidden unit states given visible units. This involves a softmax over the Potts states for each hidden unit.
        
    -   `prob_v_given_h(h_onehot)`: Computes the mean of P(v∣h), the Gaussian mean of visible units given hidden states.
        
-   **Sampling**:
    
    -   `Gibbs_sampling_vh(v, num_steps, burn_in)`: Performs block Gibbs sampling by alternating between sampling hidden units from P(h∣v) and visible units from P(v∣h). This is the primary inference method used for GM-RBMs in the experiments, as detailed in the main paper.
        
    -   The code structure also includes references to `Langevin_sampling_v` and `Gibbs_Langevin_sampling_vh` for negative phase sampling if the `inference_method` parameter is set accordingly. However, the provided `kandMemSweep.py` script uses "Gibbs" for the GM-RBM, consistent with the experiments focusing on this sampling method.
        
-   **Training (`CD_grad`)**:
    
    -   Implements Contrastive Divergence (CD) to approximate the gradient of the log-likelihood.
        
    -   `positive_grad(v)`: Computes the positive phase gradients based on data samples.
        
    -   `negative_grad(v)`: Computes the negative phase gradients based on model samples obtained after `CD_step` iterations of Gibbs (or other specified) sampling.
        
-   **Other Utilities**:
    
    -   `reconstruction(v)`: Computes Mean Squared Error (MSE) for reconstruction, used for monitoring training.
        
    -   `_vectorized_h_sample(probs)`: Efficiently samples from multinomial distributions for hidden states.
        

### `kandMemSweep.py` (Experiment Driver for Hetero-associative Memory)

This script orchestrates the hetero-associative memory experiments as described in Section 4 of the main paper.

-   **Dataset Handling (`WordPairsDataset`, `create_dataset`)**:
    
    -   Loads word pairs from a CSV file (e.g., "word_relationships.csv").
        
    -   Trains a Word2Vec (CBOW) model on these pairs to get 200-dimensional embeddings. Parameters: 100 iterations, window size 5, no frequency cutoff.
        
    -   Normalizes word embeddings to zero mean and unit variance.
        
    -   Concatenates stimulus and response embeddings to form 400-dimensional input vectors for the RBM's visible layer.
        
-   **Association Task (`associate`, `test_associations`)**:
    
    -   `associate`: Given a stimulus word, it uses the trained GM-RBM to infer the associated response word. This involves iterative Gibbs sampling where the stimulus part of the visible vector is clamped.
        
    -   `test_associations`: Calculates recall accuracy by comparing predicted response words to actual response words over the dataset.
        
-   **Training Loop (`train_one_epoch`, `run_experiment_sweep_words`)**:
    
    -   `train_one_epoch`: Performs one epoch of training using the `CD_grad` method from the GM-RBM and an Adam optimizer.
        
    -   `run_experiment_sweep_words`:
        
        -   Manages the overall experiment, iterating through specified numbers of Potts states (`POTTS_STATES`) and dataset sizes (`DATASET_SIZES`).
            
        -   Calculates the number of hidden units (`hidden_size`) to maintain a constant total number of weight parameters (`TOTAL_CAPACITY`) across different Potts states (q). The formula used, `hidden_size = TOTAL_CAPACITY // (VISIBLE_SIZE * q)`, aligns with the parameter-matched experimental setup detailed in the main paper (e.g., Table 1, Figure 1). `TOTAL_CAPACITY` is set to πw​=800,000 and `VISIBLE_SIZE` is nv​=400.
            
        -   Initializes the `GMRBM` model with specified hyperparameters (e.g., `CD_STEP`, `CD_BURNIN`, `INIT_VAR`).
            
        -   Trains the model, logging validation accuracy and implementing early stopping criteria consistent with the main paper: stopping if accuracy ≥0.98, if the standard deviation of accuracy over the last 20 checkpoints is low, or if no improvement is observed for 10 consecutive checkpoints.
            
-   **Plotting (`plot_word_sweep_results`)**:
    
    -   Generates plots of recall accuracy versus the number of word pairs, similar to Figure 1 in the main paper.
        

## Key Implementation Details

### GM-RBM Model (`gmrbm.py`)

-   **Visible Units**: Gaussian, with learned mean (`mu`) and variance (`log_var.exp()`).
    
-   **Hidden Units**: Potts (Multinoulli) variables, each capable of taking one of `num_potts_states` discrete states. They are represented as one-hot vectors in calculations.
    
-   **Contrastive Divergence**: The `CD_grad` function uses `CD_step` Gibbs sampling steps for the negative phase. The `CD_burnin` parameter allows for discarding initial samples from the Gibbs chain.
    
-   **Inference Method**: While the `GMRBM` class has provisions for 'Langevin' and 'Gibbs-Langevin' inference, `kandMemSweep.py` specifically uses `'Gibbs'` for training the GM-RBM, as stated in the main paper.
    

### Hetero-associative Memory Experiment (`kandMemSweep.py`)

-   **Hyperparameters**: The script defines key hyperparameters consistent with those reported in the main paper's experimental setup (Section 4.1):
    
    -   `VISIBLE_SIZE = 400`
        
    -   `BATCH_SIZE = 64`
        
    -   `EPOCHS = 10000` (maximum, with early stopping)
        
    -   `LR = 1e-4` (Adam optimizer)
        
    -   `CD_STEP = 10`
        
    -   `CD_BURNIN = 2`
        
    -   `INIT_VAR = 0.1`
        
-   **Word2Vec Integration**: The `gensim` library is used for Word2Vec model training and inference. The resulting embeddings are normalized before being fed to the RBM.
    
-   **Parameter Matching**: For the parameter-matched q-sweep experiments (comparable to Figure 1 in the main paper), the number of hidden units nh​ is adjusted based on the number of Potts states q using nh​=⌊πw​/(nv​⋅q)⌋. This ensures the total number of weight parameters Wijk​ remains approximately constant.
    

## Connection to the Paper and Adaptability

The provided code directly supports the findings presented in the main paper. The `GMRBM` class in `gmrbm.py` is the core implementation of the Gaussian-Multinoulli RBM (GM-RBM) central to the paper's contributions.

-   **Hetero-associative Memory (Section 4 of the main paper)**: The `kandMemSweep.py` script is designed to reproduce the experiments comparing GM-RBMs with varying numbers of Potts states (q) and dataset sizes (N). This generates results similar to those shown in Figure 1 and supports the analysis for Figure 2 of the main paper. The training procedure, data preprocessing, and evaluation metrics align with the methodology described in Section 4.
    
    -   Visualizations of the learning process for this task, such as plots of recall accuracy over epochs, are generated by `kandMemSweep.py`.
        
-   **Auto-associative Memory and Generative Tasks (Section 5 of the main paper)**: The fundamental `GMRBM` model implemented in `gmrbm.py` is versatile. It also serves as the basis for the auto-associative memory experiments, such as image generation on MNIST and CelebA, as discussed in Section 5.
    
    -   While `kandMemSweep.py` is specific to word associations, adapting the `GMRBM` for generative tasks would involve creating a different experiment script (e.g., a `main_generative.py`). This script would handle image data loading, specific training regimes (e.g., number of epochs mentioned in Table 2 of the main paper), and evaluation methods suitable for image generation.
        
    -   The generative experiments reported in the main paper were conducted with minor adaptations to the code from the reference GitHub repository: `https://github.com/DSL-Lab/GRBM`. The necessary `utils.py` file is also available in this repository.
        
    -   Visualizations of sampling processes for generative tasks (e.g., generating images from noise over successive Gibbs steps, similar to those shown for MNIST and CelebA in related RBM literature) can be created by logging the state of the visible units during the sampling phase of a trained `GMRBM` model.
        

The modular design allows the `GMRBM` class to be integrated into various experimental setups beyond what is explicitly provided in `kandMemSweep.py`.

## Dependencies and Usage

### Dependencies

The code relies on several Python libraries. Key versions identified from the provided environment list are:

-   **Python**: `3.6.8`
    
-   **PyTorch**: `1.10.1` (torch)
-  **torchvision**: `0.11.2` 
        
-   **NumPy**: `1.19.5`
    
-   **Pandas**: `1.1.5`
    
-   **Gensim**: `3.8.3`
    
-   **Matplotlib**: `3.3.4`
    
-   **Python Standard Library**: `os`, `time` 
    
### Running Experiments

1.  **Setup**:
    
    -   Ensure all dependencies are installed, preferably in a dedicated virtual environment.
        
    -   Obtain the `utils.py` file from the reference repository ([https://github.com/DSL-Lab/GRBM](https://github.com/DSL-Lab/GRBM?tab=readme-ov-file "null")) and place it in the same directory as `gmrbm.py` and `kandMemSweep.py`, or ensure it is in your Python path.
        
    -   A CUDA-enabled GPU is recommended for faster training, as the code supports it (`model.cuda()`).
        
2.  **Data**:
    
    -   For hetero-associative memory: A CSV file named "word_relationships.csv" containing word pairs (one pair per line, comma-separated) is expected in the same directory as `kandMemSweep.py`, or the `file_path` variable within the script should be updated accordingly.
        
    -   For generative tasks (e.g., MNIST, CelebA): Download the respective datasets and ensure they are accessible by your experiment script. For example, CelebA might need to be placed in a `data/celeba` directory relative to your script, as is common practice.
        
3.  **Execution**:
    
    -   **Hetero-associative Memory Experiments**:
        
        -   The main script to run is `kandMemSweep.py`.
            
        -   To perform sweeps as described in the main paper, modify the `POTTS_STATES` list (e.g., `[2, 4, 6, 8, 10]`) and `DATASET_SIZES` list (e.g., `[500, 1000, 1500, 2000, 2500, 3000]`) in `kandMemSweep.py`.
            
        -   The script will output results to a directory (default: "word_sweep_results_multi_k"), including a CSV file with numerical results and plots.
            
        -   Logging information is saved to "benchmark.log" within the output directory.
            
    -   **Running Other Experiments (e.g., Generative Tasks)**:
        
        -   To run other experiments, such as training the `GMRBM` on image datasets like MNIST or CelebA, you would typically use a separate main script (e.g., `python main_generative.py --dataset mnist`).
            
        -   Configuration for these experiments (hyperparameters, dataset paths, etc.) might be managed through command-line arguments or separate configuration files (e.g., JSON or YAML).
            
        -   **Key Hyperparameters for `GMRBM` (from `gmrbm.py` and relevant to general training)**:
            
            -   `visible_size`: Dimensionality of the input data.
                
            -   `hidden_size`: Number of hidden units.
                
            -   `num_potts_states`: Number of states for each Potts hidden unit (q).
                
            -   `CD_step`: Number of Gibbs sampling steps for Contrastive Divergence.
                
            -   `CD_burnin`: Number of initial Gibbs steps to discard in CD.
                
            -   `init_var`: Initial variance for visible unit parameters.
                
            -   `inference_method`: Sampling method for the negative phase. Options in `gmrbm.py` include `'Gibbs'`, `'Langevin'`, `'Gibbs-Langevin'`.
                
                -   `Langevin_step`: Number of inner loop Langevin steps if using Gibbs-Langevin or Langevin.
                    
                -   `Langevin_eta`: Step size for Langevin dynamics.
                    
                -   `is_anneal_Langevin`: Whether to anneal Langevin noise (if applicable).
                    
                -   `Langevin_adjust_step`: Enables Metropolis adjustment for Langevin steps from a specified CD step onwards.
                    
            -   Learning rate, batch size, and number of epochs would be controlled by the training script.
                

This setup allows for the reproduction of the hetero-associative memory experiments and provides a basis for conducting other experiments (like generative tasks) using the `GMRBM` model, aligning with the discussions in the main paper.

## License

Will be added post-publications

## Cite

Please consider citing the main paper if you find this code useful in your research work.

## Questions/Bugs

When the code is made available in a public repository post-publication, please use the issue tracker associated with that repository.
