# Cholesky and Lookahead Accelerations for Regression with Interpretable Trees

## Compile and Run the Demo (C++)

To build the project, simply run:

```bash
make
```

To run the demo with the provided dataset:

```bash
make test
```

To clean up the build:

```bash
make clean
```


## `clari_tree` Python Library

Python bindings for the `clari_tree` regression tree implementation (via pybind11).


1. **Create a virtual environment**
   ```bash
   python3 -m venv claritree-env
   source claritree-env/bin/activate
    ```

2. **Upgrade pip and install in editable mode**

   ```bash
   pip install --upgrade pip
   ```
    
   ```bash
   ./rebuild_clari_tree.sh
   ```

## Main Directories

* **Prepare Data**
  Place all datasets under the `data/` directory.
  Five-fold splits are pre-generated and stored in each dataset’s subfolder under
  `data/<dataset_name>/splits/`.
  *(No need to run `splits_data.py` manually.)*

---

* **`src/`**
  C++ core source code containing algorithm implementations.

* **`include/`**
  Header files (`.hpp`) for the C++ source.

---

* **`script/`**
  Python scripts for data preprocessing, model training, and result aggregation.

  * **`example.py`** – Basic usage demonstration of the Python bindings for the regression-tree models.
  * **`paper_experiment/`** – Scripts for paper experiments:
    * **`parameter_selection/binary_search.py`** – Find cost_complexity (lambda) values for different leaf count buckets.
    * **`main_experiment/run_const_lambda.py`** – Runs *constant-leaf* methods (**CLARITreeConst**) and baselines across a range of sparsity levels. Records training time, training $R^2$, and test $R^2$.
    * **`main_experiment/run_linear_optuna.py`** – Runs *linear-leaf* methods (**CLARITree**) and baselines across varying sparsity levels. Logs training time, training $R^2$, and test $R^2$.
    * **`ablation/speedup_ablation.py`** – Speedup ablation study comparing CLARITree vs CLARITreeFull performance.
  * **`processors/`** – Encapsulated wrappers for different baseline implementations.
  * **`utils/`** – Auxiliary helper utilities *(not required for standard runs).*

---

* **`results/`**
  Stores all experimental outputs.

  * **`CR_depth5/`** – Results for depth = 5 experiments on *constant-leaf* methods (**CLARITreeConst**) and baselines. Includes runtime, training $R^2$, and test $R^2$ across multiple sparsity levels.
  * **`LRT_depth4/`** – Results for depth = 4 experiments on *linear-leaf* methods (**CLARITree**) and baselines. Includes runtime, training $R^2$, and test $R^2$ across all threshold values.
  * **`LRF_depth4/`** – Results for depth = 4 experiments on *linear-leaf* methods (**CLARITree**) and baselines (full version).