# Supplemental Material for SymTex: A New Benchmark for Symbolic and Textual Non-monotonic Reasoning Capability of Large Language Models

This is the supplemental material of our paper for reproducibility, including:

# Requirements
## Python Libs
```text
chromadb==0.5.5
nltk
tqdm
torch==2.4.0
transformers==4.43.2
sentence_transformers==3.0.1
scikit-learn==1.5.1
faker==26.2.0
matplotlib
seaborn
numpy
scipy==1.13.1
pandas==2.2.2
openai==1.40.6
zhipuai==2.1.4
wandb
```
## DLV2
Download url: https://dlv.demacs.unical.it/

# MG-SymTex
## Build vec database
To build vec database for the ''related word'' setting, run the following command:
```shell
python bulid_vec_dataset.py
```
This command will build a vector database from wordnet.

## Generate Templates
To generate templates, run the following command:
```shell
bash script/logical_datasets_build.sh

params:
MAX_OBJNUM=3 # max number of objects in the ASP program
MAX_CNUM=3 # max number of conditions in the ASP program
MAX_PNUM=3 # max number of parameters in predicates
SEED=43 # random seed
I_START=2 # initial number of facts
I_END=5 # final number of facts 
I_ADD_START=5 # initial number of rules
I_ADD_END=8 # final number of rules
GENERATE_NUM=10000 # number of generated templates for each fact and rule settings
```

## Modify the templates
To modify the templates, run the following command:
```shell
bash script/modify_templates.sh

params:
templates_path="logicalDatasets/complex_templates_3_c3_p3_rw999_s43_i2-5_j5-8.jsonl" # path to the templates
generation_path_prefix="logicalDatasets/generation/generated_dataset" # path to the generated dataset
add_fact_range_start=0 
add_fact_range_end=10 # range of the number of facts to add
remove_fact_range_start=0
remove_fact_range_end=10 # range of the number of facts to remove
add_rule_range_start=0
add_rule_range_end=10 # range of the number of rules to add
p_add_rule_constraints=0.1 # probability of adding constraints to the new rules
remove_rule_range_start=0
remove_rule_range_end=10 # range of the number of rules to remove
p_neg=0.3 # probability of adding strong negation a predicate
p_dneg=0.1 # probability of adding default negation to a predicate
p_add_conclution=0.1 # probability of adding a disjunction
add_conclution_max=1 # max number of disjunctions to add
p_change_variable=0.1 # probability of changing the position of params
use_word=false # whether to use word
use_related_word=false # whether to use related word
generation_num=100 # number of generated templates for each template
seed=42 # random seed
selected_num=10000 # number of selected templates
```

## Textualization
To textualize the templates, run the following command:
```shell
python build_nl_dataset.py
```
# SymTex
The SymTex dataset is available at `./datasets`
```text
overall_datasets # overall datasets of SymTex
nm_dataset # specifical datasets for evaluating the non-monotonic reasoning

normal_eval # evaluated subsets of overall_datasets for Tri-State Boolean Querying
generation_eval # evaluated subsets of overall_datasets for Answer Set Solving
nm_eval # evaluated subsets of nm_dataset for Tri-State Boolean Querying

small_* # evaluated subsets for o1-mini

tiny_* # evaluated subsets for temperature effect experiments
```

# Evaluation
To evaluate the performance of LLMs, firstly need to set api-key in `src/llm/resources/ampi.json`.

Then run the following commands in `./script`.