# The artifact for the paper "FaSAS: A Feedback-Augmented Stepwise Algorithm Selection for Software Verification"


## Artifact Introduction

### Background

Verification plays a crucial role in enhancing the quality and reliability of software systems. However, selecting appropriate verification algorithms or techniques for input programs often relies on domain expertise and significant human effort, making it a complex and resource-intensive task. In order to address these challenges, it is crucial to develop an automated algorithm selector that can optimize the verification process.


### Goals

Appropriate algorithm selection is a critical challenge in software verification, which typically demands domain expertise and non-trivial manpower. However, existing selectors, either dependent on machine-learned strategies or manually crafted heuristics, encounter issues such as reliance on high-quality samples with ground truth algorithm labels and limited scalability. 

### Approach

In this paper, we propose an automated algorithm selection approach, FaSAS, for software verification. FaSAS embeds the code property graph of a semantic-preserving transformed program to enhance the robustness of the prediction model. Furthermore, our approach decomposes the selection task into the sub-tasks of predicting potentially applicable algorithms and matching the most appropriate verifiers. It further incorporates a feedback mechanism to refine predictions iteratively. Experimental results demonstrate the effectiveness of FaSAS, achieving a prediction accuracy of 91.47% without ground truth algorithm labels provided during the training phase. Moreover, FaSAS exhibits the least resource overhead compared to other approaches while solving the most verification tasks.
We have made the implementation, along with all relevant publicly available data, accessible to facilitate comparison: \href{https://figshare.com/s/746cb529fab12742644c}{https://figshare.com/s/746cb529fab12742644c}.



### Results

We evaluate FaSAS on 20 verifiers and over 15,000 verification tasks.
Experimental results demonstrate the effectiveness of FaSAS, achieving a prediction accuracy of 91.47\% even without algorithm labels provided during the training phase. Moreover, FaSAS exhibits the least resource overhead compared to other approaches while solving the most verification tasks.



## Description of Artifact Structure

### 1. Pre-process Scripts & Processed Datasets
```
.
├── experiment_code         
│   ├── data/                   // Store datasets, CSV files, and generated intermediate files  
│   ├── make_token.py           // Collect unique tokens in datasets
│   ├── obfuscator.py           // The semantic-preserving transformation script for C programs.
│   ├── unify.py                // Script for joint preprocessing of data.
│   ├── predata.py              // Prepare the nodes and their corresponding edges of a graph, and convert them into the input format suitable for Graph Neural Network (GNN) models.
│   ├── readTest.py             // Read experimental results
│   ├── result-split.py         // Divide the dataset into training set, validation set, and testing set
│   ├── svcompile-result.py     // Extract the crawled competition CSV file into a. json format dataset  
│   ├── test-data.py            // Pre-process data using the joern tool  
│   ├── test-forwhile.py        // Convert the for-loops into while-loops of the programs in the test dataset
│   ├── test_US.py              // Add additional unused statements into the programs in the test set
│   └── train_model.py          // The pre-trained word2vec model for embedding code property graphs.
├── test.c
└── test_transfer.c
```
### 2. Neural Networks & Evaluation Scripts.

```
.
└── src
    └── networks
        ├── algEvaluator.py         // Script for evaluating the algorithm suggestion models.
        ├── dirctEvaluator.py       // Script for direct prediction (for ablation study: Stepwise Prediction).        
        ├── evaluator.py            // Script for evalution.
        ├── gnn.py                  // Definitions of various neural networks. 
        ├── netAlgTrainer-Edge.py   // Script for the ablation experiment of different graph-based program representations.                        
        ├── netAlgTrainer.py        // Script for training the positive suggestion model.
        ├── netAlgTrainer0.py       // Script for training the negative suggestion model.
        ├── netDirctTool.py         // Verifier matching without stepwise prediction
        ├── netStra.py              // 
        ├── netToolTrainer.py       // Script for stepwise prediction.
        └── utils
            └── utils.py            // Utils for evaluation & model training & dataset creation
```
### 3. The trained Models.
```
.
└── model
    ├── algCPG-N.pt         // Negative suggestion model.
    ├── algCPG-P.pt         // Positive suggestion model.
    ├── algCPG-P-AST.pt     // Positive suggestion model of AST-based program representation.
    ├── algCPG-P-CFG.pt     // Positive suggestion model of CFG-based program representation.
    ├── algCPG-P-PDG.pt     // Positive suggestion model of PDG-based program representation.
    ├── toolCPG.pt          // Appropriate verifier matching model (10 verifiers)
    ├── toolCPG-15.pt       // Appropriate verifier matching model (15 verifiers).
    ├── toolCPG-20.pt       // Appropriate verifier matching model (20 verifiers).
    └── toolCPG-Direct.pt   // End-to-end prediction model.
```


## Experimental Results Reproduction

### 1. Installation & Deployment
- Requirements
    - Hardware Requirements: Workstations/PC with multi-core processors
    - Operating System: >= Ubuntu 20.04 LTS
- Dependencies (required)
    * python3
    * JDK 19
    * [PyTorch 1.10.*](https://pytorch.org/)(CUDA not required,but highly recommended)
    * [PyTorch-Geometric 1.10*](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)(CUDA not required,but highly recommended)
    * [Joern](https://github.com/joernio/joern) (v4.0.54)

### 2. Steps for reproduction

- Pre-process the datasets

    - Prepare the benchmark.
        ```
        cd experiment_code/data
        git clone https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks.git
        ```

    - Clean & Optimize the benchmark (Remove the redundant pre-processing codes in this benchmark).

        We have performed the following processing on the dataset, removing comments before functions and some common assertion functions.

        For example: `test.c` 
        ```C
        /*
         * Benchmarks contributed by Divyesh Unadkat[1,2], Supratik Chakraborty[1], Ashutosh Gupta[1]
         * [1] Indian Institute of Technology Bombay, Mumbai
         * [2] TCS Innovation labs, Pune
         *
         */
            extern void abort(void);
            extern void __assert_fail(const char *, const char *, unsigned int, const char *) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__noreturn__));
            void reach_error() { __assert_fail("0", "brs1.c", 10, "reach_error"); }
            extern void abort(void);
            void assume_abort_if_not(int cond) {
              if(!cond) {abort();}
            }
            void __VERIFIER_assert(int cond) { if(!(cond)) { ERROR: {reach_error();abort();} } }
            extern int __VERIFIER_nondet_int(void);
            void *malloc(unsigned int size);

            int N;

            int main()
            {
            	N = __VERIFIER_nondet_int();
            	if(N <= 0) return 1;
            	assume_abort_if_not(N <= 2147483647/sizeof(int));

            	int i;
            	int sum[1];
            	int *a = malloc(sizeof(int)*N);

            	for(i=0; i<N; i++)
            	{
            		if(i%1==0) {
            			a[i] = 1;
            		} else {
            			a[i] = 0;
            		}
            	}

            	for(i=0; i<N; i++)
            	{
            		if(i==0) {
            			sum[0] = 0;
            		} else {
            			sum[0] = sum[0] + a[i];
            		}
            	}
            	__VERIFIER_assert(sum[0] <= N);
            	return 1;
            }

            ```
            Transfer: `test_transfer.c` 

            ```C
            void *malloc(unsigned int size);

            int N;

            int main()
            {
            	N = __VERIFIER_nondet_int();
            	if(N <= 0) return 1;
            	assume_abort_if_not(N <= 2147483647/sizeof(int));

            	int i;
            	int sum[1];
            	int *a = malloc(sizeof(int)*N);

            	for(i=0; i<N; i++)
            	{
            		if(i%1==0) {
            			a[i] = 1;
            		} else {
            			a[i] = 0;
            		}
            	}

            	for(i=0; i<N; i++)
            	{
            		if(i==0) {
            			sum[0] = 0;
            		} else {
            			sum[0] = sum[0] + a[i];
            		}
            	}
            	__VERIFIER_assert(sum[0] <= N);
            	return 1;
            }
        ```
    - Program normalization & CPG generation & Dataset collection

        Thereafter, the pre-process scripts can be used to normalize the optimized benchmark, generate embedding vector for CPG graphs, and collect the dataset for network training and evalution.
        ```
        3.cd experiment_code
        4.python unify.py
        ```

- Model Training

    - Train models
        ```
        cd FaSAS
        python3 src/networks/netAlgTrainer.py           // Train the positive suggestion model
          experiment_code/data/trainRobust-CPG.json 
          experiment_code/data/valRobust-CPG.json 
          experiment_code/data/testRobust-CPG.json 
          experiment_code/data/GNNInput-npz/ 
          --mp-layers 1 --epochs 20 -n GGNN --mode max -task algorithm
        python3 src/networks/netAlgTrainer0.py          // Train the negative suggestion model
          experiment_code/data/trainRobust-CPG.json 
          experiment_code/data/valRobust-CPG.json 
          experiment_code/data/testRobust-CPG.json 
          experiment_code/data/GNNInput-npz/ 
          --mp-layers 1 --epochs 20 -n GGNN --mode max --task algorithm
        python3 src/networks/netToolTrainer.py          // Train the verifier ranking model
          experiment_code/data/trainRobust-CPG.json 
          experiment_code/data/valRobust-CPG.json 
          experiment_code/data/testRobust-CPG.json 
          experiment_code/data/GNNInput-npz/ --mp-layers 1 
          --epochs 20 -n GGNN --mode max --task topk --network1 trained-P.pt --network0 trained-N.pt
        ```

- Evaluation
    ```
    python3 src/networks/evaluator.py       // evaluate the trained models.
    --test experiment_code/data/testRobust-CPG.json 
    --dataset experiment_code/data/GNNInput-npz/ 
    --networkT toolTrained.pt --network1 trained-P.pt --network0 trained-N.pt
    ```

    - Notice:

        In all experiments, all datasets require program normalization. For robustness experiments, an additional for-to-while transformation and unused code insertion are needed for the test sets on top of this normalization process.

        Specifically, code conversions for the robustness experiment can be achieved through the following commands:
        ```
        conda create -n treesitter python=3.9
        conda activate treesitter
        pip install tree-sitter
        cd exprientment_code
        mkdir vendor && cd vendor
        git clone https://github.com/tree-sitter/tree-sitter-c.git
        gedit build.py
        ```
        ```
        from tree_sitter import Language

        Language.build_library(
        
          # Store the library in the `build` directory
          'build/my-languages.so',

          # Include one or more languages
          [
            'vendor/tree-sitter-c',
            #'vendor/tree-sitter-cpp'
            # 'vendor/tree-sitter-java',
            # 'vendor/tree-sitter-python',
            # 'vendor/tree-sitter-cpp',
          ]
        )
        ```
        Next, we can perform robust perturbation code conversion for US and Loop Exchange.
        ```
        python build.py
        ```
