
# Supplementary Material:ICLR 2026 (Submission Number: 9549)

**Paper Title:** Gibbs Sampling with Simulated Annealing K-Means for
Mixture Regression

**Anonymity Note:** This repository and its contents have been
anonymized to comply with the ICML double-blind review process. All
identifying information, author details, and affiliations will be added
to the public version of this repository upon the paper’s acceptance.

## 1.Overview

This repository contains the R code to reproduce all simulation studies
presented in our paper, “Gibbs Sampling with Simulated Annealing K-Means
for Mixture Regression”. The code provides functionalities to (1)
generate simulated datasets based on the parameters described in the
“Simulation setup” section, (2) run our proposed Gibbs sampling with
simulated annealing K-means clustering algorithm (Algorithm 1) to obtain
the results, and (3) plot these results to generate the figures
presented in the paper.

## 2. System Requirements & Environment Setup

This section details the necessary hardware and software to run our
experiments. \### 2.1. Hardware\* **CPU:** \[Any modern multi-core CPU
is sufficient.\]\* **GPU:** \[Not required. All experiments are run on
CPU.\]\* **RAM:** \[32GB is sufficient\]\* \### 2.2. Software
Environment

The code was developed and tested on **Rstudio 2025.05.0+496** using **R
version 4.4.3**. All required R packages and their specific versions are
managed by the `renv` package and are listed in the `renv.lock` file.
Here is the detailed session information:

    ## R version 4.4.3 (2025-02-28 ucrt)
    ## Platform: x86_64-w64-mingw32/x64
    ## Running under: Windows 11 x64 (build 26100)
    ## 
    ## Matrix products: default
    ## 
    ## 
    ## locale:
    ## [1] LC_COLLATE=Chinese (Simplified)_China.utf8  LC_CTYPE=Chinese (Simplified)_China.utf8   
    ## [3] LC_MONETARY=Chinese (Simplified)_China.utf8 LC_NUMERIC=C                               
    ## [5] LC_TIME=Chinese (Simplified)_China.utf8    
    ## 
    ## time zone: Europe/London
    ## tzcode source: internal
    ## 
    ## attached base packages:
    ## [1] stats     graphics  grDevices datasets  utils     methods   base     
    ## 
    ## loaded via a namespace (and not attached):
    ##  [1] compiler_4.4.3    fastmap_1.2.0     cli_3.6.3         htmltools_0.5.8.1
    ##  [5] tools_4.4.3       rstudioapi_0.17.1 yaml_2.3.10       rmarkdown_2.29   
    ##  [9] knitr_1.50        xfun_0.52         digest_0.6.37     rlang_1.1.4      
    ## [13] renv_1.1.5        evaluate_1.0.5

## 3. Installation

We strongly recommend using the `renv` package for dependency management
to ensure a fully reproducible environment.

1.  Install R (version 4.4 or higher).
2.  Open this project in RStudio.
3.  Run the following command in the R console to install all required
    packages from the `renv.lock` file:

``` r
# This will install all packages at their correct versions.
renv::restore()
```

    ## - The library is already synchronized with the lockfile.

## 4. Directory Structure

The code repository is organized as follows:

    ├── plotting_accuracy.R #Script to generate the Figure of classification accuracy of the training set.
    ├── plotting_matrix.R #Script to generate the Figure of estimation error
    ├── plotting_metric.R #Script to generate the Figure of WCSS of the training set.
    ├── plotting_test_accuracy.R #Script to generate the Figure of classification accuracy of the testing set.
    ├── plotting_test_metric.R #Script to generate the Figure of WCSS of the testing set.
    ├── README.md # The generated Markdown README
    ├── README.Rmd # The R Markdown source for this README
    ├── README.html
    ├── renv/ # renv project folder
    ├── renv.lock # R environment lockfile for reproducibility
    ├── results_with_D=20,p=35,q=2,K=4.csv #All 16 CSV files is the simulation result of simulate_study_program.R
    ├── results_with_D=20,p=35,q=3,K=4.csv
    ├── results_with_D=20,p=50,q=2,K=3.csv
    ├── results_with_D=20,p=50,q=2,K=4.csv
    ├── results_with_D=20,p=50,q=3,K=3.csv
    ├── results_with_D=20,p=50,q=3,K=4.csv
    ├── results_with_D=20,p=70,q=2,K=3.csv
    ├── results_with_D=20,p=70,q=3,K=3.csv
    ├── results_with_D=40,p=35,q=2,K=4.csv
    ├── results_with_D=40,p=35,q=3,K=4.csv
    ├── results_with_D=40,p=50,q=2,K=3.csv
    ├── results_with_D=40,p=50,q=2,K=4.csv
    ├── results_with_D=40,p=50,q=3,K=3.csv
    ├── results_with_D=40,p=50,q=3,K=4.csv
    ├── results_with_D=40,p=70,q=2,K=3.csv
    ├── results_with_D=40,p=70,q=3,K=3.csv
    ├── simulate_study_program.R #The main R program, which is the code of Gibbs sampling with simulated annealing K-means clustering algorithm (Algorithm 1)
    └── The_code_of_simulation_studies.Rproj

## 5. Reproduction Instructions

We provide two pathways for reproducing our results. **We highly
recommend reviewers start with the “Fast Verification” path.**

### 5.1. Path A: Fast Verification from Pre-computed Results (Recommended)

This path regenerates the paper’s main figures using the pre-computed
data from the .csv files in the root directory. This process is fast and
should only take a few seconds.

#### Reproducing Figure 1 and other Figures

This version is more compact if you prefer a shorter style.

To regenerate Figure 1 (estimation error) from the main body of the
paper, run the following script from the project’s root directory:

``` bash
plotting_matrix.R
```

This will generate 16 different .png files in the matrix/ directory.

The four other figures in the appendix can be generated by running their
corresponding plotting\_\*.R scripts in a similar fashion.

Note that as these five plotting scripts are executed, they will also
calculate and save the summary data for the five tables presented in the
appendix into a single .csv file in the corresponding directory.

### 5.2. Path B: Full Reproduction from Scratch (Optional)

This path re-runs the entire experimental pipeline, including data
generation and model fitting.

**WARNING:** This process is computationally expensive.

1.  **Configure Parameters:** To run the simulation for a specific
    parameter group, you must first manually edit the parameters at the
    top of the simulate_study_program.R script. Open the file and modify
    the values for D, p, q, and k (on lines 19-22).

2.  **Run the Main Simulation:** After saving your changes to the
    script, run it from the project’s root directory:

``` bash
simulate_study_program.R
```

*Estimated runtime: Approximately \[4 hours\] on an i9-14900k CPU per
parameter group.*

This process will generate or overwrite the .csv file in the main
directory corresponding to the parameter group you selected in the
script. After it completes, you can follow the steps in Path A to
generate the figures. This path re-runs the entire experimental
pipeline, including data generation and model fitting.

## 6. License

Upon acceptance, the code will be made available under an MIT License.
