---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Supplementary Material:ICLR 2026 (Submission Number: 9549)

**Paper Title:** Gibbs Sampling with Simulated Annealing K-Means for Mixture Regression

**Anonymity Note:** This repository and its contents have been anonymized to comply with the ICML double-blind review process. All identifying information, author details, and affiliations will be added to the public version of this repository upon the paper's acceptance.

## 1.Overview

This repository contains the R code to reproduce all simulation studies presented in our paper, "Gibbs Sampling with Simulated Annealing K-Means for Mixture Regression". The code provides functionalities to (1) generate simulated datasets based on the parameters described in the "Simulation setup" section, (2) run our proposed Gibbs sampling with simulated annealing K-means clustering algorithm (Algorithm 1) to obtain the results, and (3) plot these results to generate the figures presented in the paper.

## 2. System Requirements & Environment Setup



This section details the necessary hardware and software to run our experiments.
### 2.1. Hardware* 
**CPU:** [Any modern multi-core CPU is sufficient.]* 
**GPU:** [Not required. All experiments are run on CPU.]* 
**RAM:** [32GB is sufficient]*
### 2.2. Software Environment



The code was developed and tested on **Rstudio 2025.05.0+496** using **R version 4.4.3**. All required R packages and their specific versions are managed by the `renv` package and are listed in the `renv.lock` file. Here is the detailed session information:
```{r session-info, echo=FALSE, message=FALSE, warning=FALSE}


# This R code chunk automatically records your R environment.
# It's a robust way to ensure reproducibility.
# The 'echo=FALSE' option means this code won't show in the final README.md, only its output.
# You might need to install 'devtools' first: install.packages("devtools")
# If devtools is too heavy, sessionInfo() is a good alternative.
# devtools::session_info()
sessionInfo()
```


## 3. Installation

We strongly recommend using the `renv` package for dependency management to ensure a fully reproducible environment.

1.  Install R (version 4.4 or higher).
2.  Open this project in RStudio.
3.  Run the following command in the R console to install all required packages from the `renv.lock` file:

```{r}
# This will install all packages at their correct versions.
renv::restore()
```

## 4. Directory Structure

The code repository is organized as follows:

```
├── plotting_accuracy.R #Script to generate the Figure of classification accuracy of the training set.
├── plotting_matrix.R #Script to generate the Figure of estimation error
├── plotting_metric.R #Script to generate the Figure of WCSS of the training set.
├── plotting_test_accuracy.R #Script to generate the Figure of classification accuracy of the testing set.
├── plotting_test_metric.R #Script to generate the Figure of WCSS of the testing set.
├── README.md # The generated Markdown README
├── README.Rmd # The R Markdown source for this README
├── README.html
├── renv/ # renv project folder
├── renv.lock # R environment lockfile for reproducibility
├── results_with_D=20,p=35,q=2,K=4.csv #All 16 CSV files is the simulation result of simulate_study_program.R
├── results_with_D=20,p=35,q=3,K=4.csv
├── results_with_D=20,p=50,q=2,K=3.csv
├── results_with_D=20,p=50,q=2,K=4.csv
├── results_with_D=20,p=50,q=3,K=3.csv
├── results_with_D=20,p=50,q=3,K=4.csv
├── results_with_D=20,p=70,q=2,K=3.csv
├── results_with_D=20,p=70,q=3,K=3.csv
├── results_with_D=40,p=35,q=2,K=4.csv
├── results_with_D=40,p=35,q=3,K=4.csv
├── results_with_D=40,p=50,q=2,K=3.csv
├── results_with_D=40,p=50,q=2,K=4.csv
├── results_with_D=40,p=50,q=3,K=3.csv
├── results_with_D=40,p=50,q=3,K=4.csv
├── results_with_D=40,p=70,q=2,K=3.csv
├── results_with_D=40,p=70,q=3,K=3.csv
├── simulate_study_program.R #The main R program, which is the code of Gibbs sampling with simulated annealing K-means clustering algorithm (Algorithm 1)
└── The_code_of_simulation_studies.Rproj
```
## 5. Reproduction Instructions

We provide two pathways for reproducing our results. **We highly recommend reviewers start with the "Fast Verification" path.**

### 5.1. Path A: Fast Verification from Pre-computed Results (Recommended)

This path regenerates the paper's main figures using the pre-computed data from the .csv files in the root directory. This process is fast and should only take a few seconds.

#### Reproducing Figure 1 and other Figures

This version is more compact if you prefer a shorter style.

To regenerate Figure 1 (estimation error) from the main body of the paper, run the following script from the project's root directory:

```bash
plotting_matrix.R
```
This will generate 16 different .png files in the matrix/ directory.

The four other figures in the appendix can be generated by running their corresponding plotting_*.R scripts in a similar fashion.

Note that as these five plotting scripts are executed, they will also calculate and save the summary data for the five tables presented in the appendix into a single .csv file in the corresponding directory.

### 5.2. Path B: Full Reproduction from Scratch (Optional)


This path re-runs the entire experimental pipeline, including data generation and model fitting.

**WARNING:** This process is computationally expensive.

1. **Configure Parameters:**
To run the simulation for a specific parameter group, you must first manually edit the parameters at the top of the simulate_study_program.R script. Open the file and modify the values for D, p, q, and k (on lines 19-22).

2. **Run the Main Simulation:**
After saving your changes to the script, run it from the project's root directory:

```bash
simulate_study_program.R
```

*Estimated runtime: Approximately [4 hours] on an i9-14900k CPU per parameter group.*

This process will generate or overwrite the .csv file in the main directory corresponding to the parameter group you selected in the script. After it completes, you can follow the steps in Path A to generate the figures.
This path re-runs the entire experimental pipeline, including data generation and model fitting.

## 6. License

Upon acceptance, the code will be made available under an MIT License.