# SAT_Solver_Trace

This repository contains the code for generating, solving, and producing execution traces for SAT problems that can serve as input to Transformer models.


# Generating Dataset

There are 2 scripts to generate SAT dataset. The first "Random" script generates random formulas without any constraints. To use this script, run the following command:

```
python3 generate_formula.py NUM_SAMPLES [-o OUTPUT_FILE] [--seed SEED] [--balanced]
```
where
* `NUM_SAMPLES` is the number of SAT Instances to generate
* `OUTPUT_FILE` is the name of the file to write the SAT Instances to (default: SAT_Dataset.txt)
* `SEED` is the random seed to use (default: 0) This is useful when generating test or validation datasets.
* Include the `--balanced` flag to generate a balanced dataset (i.e. half of the instances are satisfiable and half are unsatisfiable)

The second script "Diff" generates formulas that differ by one clause. This is achieved by generating random clauses until it becomes unsatisfiable, then negating one variable in the final clause.

We would like to ensure that the Transformer models are not just learning statistical features that can be used to distinguish between satisfiable and unsatisfiable formulas. e.g. The ratio of clauses to variables is a very good indicator of whether a random formula is satisfiable, known as the "phase transition" of SAT formulas.

To use this script, run the following command:
```
python3 generate_formula_diff.py NUM_SAMPLES [-o OUTPUT_FILE] [--seed SEED] [--min_n MIN_N] [--max_n MAX_N]
```
where the arguments are the same as the previous script, except for the following:
* `MIN_N` and `MAX_N` are the range number of variables in the formula (min inclusive max exclusive)

There are also the `--ratio-min` and `--ratio-max` flags that can be used to specify the range of the ratio of clauses to variables. This is to ensure that the generated formulas are of "maximum hardness" (i.e. the ratio is close to the phase transition).

# Solving and Obtaining Execution Trace
Use the `sat_solver_raw.py` script to solve the SAT instances and obtain the execution trace. This uses a custom implemented DPLL and "CDCL" solver. Note that the CDCL solver is not a true CDCL solver, but derives conflict clauses using a simpler way than traditional CDCL solvers.

```
mkdir outputs
python3 sat_solver_raw.py INPUT_FILE [-o OUTPUT_FILE] [-a ALGORITHM] [--seed SEED]
```
where:
* `INPUT_FILE` is the name of the file containing the SAT instances generated by the previous section
* `ALGORITHM` is the algorithm to use to solve the SAT instances. The options are:
  * `dpll` - DPLL solver
  * `cdcl` - CDCL solver

This would produce a file of the following format:

The file SAT_dataset.out now contains the original problem, separator, exeuction trace, and satisfiability result.

