# Implementation of Column Generation with DCA pricing (CG-DCA) in Bayesian Network Structure Learning

This repository implements CG-DCA for Bayesian network structure learning, supporting both continuous and discrete data.

## File Structure and Purpose:

1. **Main Execution Files:**
   - `run_bayesian_network.py`: Main driver for both continuous and discrete data
   - `parallel.py`: Parallel execution for multiple continuous data instances
   - `run_continuous/discrete_data.py`: Specific execution examples

2. **Core Algorithm Files:**
   - `column_generation.py`: Main column generation framework
   - `master_problem.py`: Solves the master problem (RMLP/RMIP)
   - `dca_solver.py`: DC Algorithm implementation for pricing problems
   - `separation_problem.py`: Finds violated cluster constraints

3. **Score Calculation:**
   - `continuous_scores.py`: Cost and determinant calculations for continuous data
   - `discrete_scores.py`: Cost and entropy calculations for discrete data

5. **Alternative Methods:**
   - `pricing_IP.py`: Exact MINLP solver for pricing problems
   - `local_search.py`: Hill climbing implementation

5. **Supporting Files:**
   - `data.py`: Data loading and processing
   - `utils.py`: Utility functions (initialization, cycle finding)
   - `visualization.py`: Graph visualization

## How to Run

### For Continuous Data:
```python
import run_bayesian_network
run_bayesian_network.run('C', n=20, N=5000, d=1, data_index=0)
```
Parameters:
- Data type: 'C' (Continuous)
- n: Number of nodes
- N: Sample size
- d: Average in-degree (graph density)
- data_index: Instance index (0-9)

Parallel Execution:
Run parallel.py to process multiple continuous data instances automatically.

### For Discrete Data:
```python
import run_bayesian_network
run_bayesian_network.run('D', None, None, None, data_index=1)
```
Parameters:
- Data type: 'D' (Discrete)
- data_index: 0 for LUCAS, 1 for ALARM, 2 for INSURANCE

## Adjustable Hyperparameters:

### CG Parameters (column_generation.py):
- `regu_Lambda`: Regularization parameter for BIC score (default: 0.5 * log(ndata))
- `method`: Optimization method ('DCA', 'DCA-HC', or 'MINLP')
- `time_limit`: Maximum runtime in seconds (default: 10800 = 3 hours)
- `pricing_err`: Threshold for negative reduced cost (default: 1e-3)
- `cg_err`: Convergence threshold for RMIP objective (default: 1e-3)
- `count_lim`: Convergence iteration limit (default: 3)

### DCA Solver Parameters (dca_solver.py):
- `maxit1`: Maximum DCA iterations (default: 1e3)
- `maxit2`: Maximum Kelley's algorithm iterations (default: 1e3)
- `err1`: DCA convergence threshold (default: 1e-2)
- `err2`: Kelley's algorithm convergence threshold (default: 1e-2)
- `init_pattern`: Warm-start initialization pattern
- `random_init_threshold`: Threshold for random initialization (default: 50 patterns)

The implementation follows the approach described in the paper "Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization" by the authors.