# Where BETA and USE_LOG are effectively used (LGB_S)

## Summary

- **beta** is used only for **LGB_S** (stochastic LGB imputation), and only when sampling **continuous** columns: it scales the predicted std before drawing samples.
- **use_log** is **not used** in the current imputer code: it is passed through but never read in the sampling logic.

---

## 1. Beta – where it’s used

**Only in LGB_S, and only in the imputer when sampling continuous columns.**

| Location | What happens |
|----------|----------------|
| **augmask/data/data_prepm.py** | For preproc **LGB_S**, `beta` is converted to float (e.g. `'0p7'` → `0.7`) and passed into the imputer. Used in cache paths and in the call to `lgbm_impute` (or the old `lgbm_impute2` path). |
| **augmask/data/imputers.py** `lgbm_impute` | For **continuous** columns, when **sample=True** (LGB_S): line **129** `std_hat = sigma_test * beta`. The model predicts a std (`sigma_test`); that is multiplied by `beta` and then samples are drawn from `N(mu_test, std_hat)`. So **beta** is a shrinkage factor for the imputation variance (e.g. 0.7 = more conservative / less variance). |

So **beta** is effectively used only in **LGB_S**, and only in that one line in `imputers.py` when sampling continuous features.

---

## 2. use_log – where it’s (not) used

| Location | What happens |
|----------|----------------|
| **augmask/data/data_prepm.py** | `self.use_log` is stored and passed into the imputer and used in **cache path names** (e.g. `X_imputed_tr_LGB_S_beta0p7_use_logswitch.pkl`). So it affects **cache keys**, not the math. |
| **augmask/data/imputers.py** `lgbm_impute` | Parameter **use_log** exists in the signature and docstring but **is never used in the function body**. The variance model is always fit in log-space (`log_res_sq`, line 116); there is no branch on `use_log`. |

So **use_log** is effectively used only in **LGB_S** (and only there) for **naming cache files**. It does **not** change the imputation logic in the current code.

---

## 3. LGB_S-specific call sites

- **data_prepm.py**
  - **Lines 381–396** (`_impute_with_lightgbm`): when `self.preproc == 'LGB_S'`, builds cache path with `beta` and `use_log`, then calls `lgbm_impute(..., beta=beta_float, use_log=self.use_log, ...)`.
  - **Lines 607–630**: when `self.preproc == 'lgbs2'` (old name), builds paths with `beta`/`use_log` and calls `lgbm_impute2(...)` (that function no longer exists in this repo; this path is legacy/dead for the current LGB_D/LGB_S naming).

So in the **current** pipeline (preproc **LGB_S**), the only place LGB_S is intended to be handled with beta/use_log is **`_impute_with_lightgbm`**. The block that uses `preproc in ['lgb', 'lgbs2']` still references `lgbm_impute2` and the old names; with the new names **LGB_D** / **LGB_S**, that block is never entered, so **LGB_S** would currently fall through to the **else** branch (m/r-style imputation) unless there is another code path that calls `_impute_with_lightgbm` for LGB_D/LGB_S. A quick code search shows **`_impute_with_lightgbm` is never called** in this file; only defined. So the only path that actually runs for LGB-style imputation with beta/use_log is the **`['lgb', 'lgbs2']`** block (old names). If the pipeline is using **LGB_S** (new name), then currently **no** LGB imputation path runs and beta/use_log are not used at all at runtime for LGB_S.

---

## 4. Other mentions (not LGB_S logic)

- **evaluator.py** / **main.py**: `beta` and `use_log` are used only in **file and experiment naming** (sample path, `exp_name`, etc.) for LGB_S runs, so that the right CSV and report folder are used. They do not affect evaluation math.
- **experiment_cdtd.py** line 731: `predictions_cache_path` includes `beta` and `use_log` in the filename for loading the validation predictions cache (LGB_S only).

So **beta** is effectively used only in **LGB_S**, and only in **imputers.py** line 129 for continuous sampling (and in cache/experiment naming). **use_log** is only used in **cache and path naming** for LGB_S, not in the imputation math.
