This directory contains two components:
- **TTG-Bench-Lite:** A benchmark with 500 problems, including all necessary code to reproduce retrieval-based and agent-based experiments.  
- **TTG-Gen:** A tool that selects candidate target locations using coverage-guided fuzzing.

# Environment Setup  
It is recommended to run the code inside a Docker container as follows:  
```bash
docker build -t ttg .
docker run -it ttg:latest /bin/bash
```

# TTG-Bench-Lite
## Problems
The 500 problems are stored in `TTG-Bench-Lite/test.jsonl`.  
Below is an example problem:  
```json
{
    "entry_program": "libpng_read_fuzzer",
    "project": "libpng",
    "repo_url": "https://github.com/glennrp/libpng.git",
    "commit": "cd0ea2a7f53b603d3d9b5b891c779c430047b39a",
    "problem_id": 234,
    "target_location": {
        "file_name": "libpng/libpng/pngrutil.c",
        "func_name": "png_handle_sCAL",
        "start_lineno": 2396,
        "start_colno": 4,
        "end_lineno": 2400,
        "end_colno": 5,
        "code": "      png_crc_finish(png_ptr, length);\n      png_chunk_benign_error(png_ptr, \"duplicate\");\n      return;\n",
        "surround_code": "\n   else if (info_ptr != NULL && (info_ptr->valid & PNG_INFO_sCAL) != 0)\n   {\n      png_crc_finish(png_ptr, length);\n      png_chunk_benign_error(png_ptr, \"duplicate\");\n      return;\n   }\n\n",
        "discovered_time": 4798
    }
}
```

### Field Descriptions  
- **entry_program:** Name of the entry program (fuzz driver).  
- **project:** Name of the related software project.  
- **repo_url:** URL of the project's source code repository.  
- **commit:** Specific Git commit ID.  
- **problem_id:** Numeric identifier for the problem.  
- **target_location:** Object describing the target location to reach:  
  - **file_name:** Relative path to the source file within the repository.  
  - **func_name:** Function name containing the target location.  
  - **start_lineno:** Starting line number of the target location.  
  - **start_colno:** Starting column number.  
  - **end_lineno:** Ending line number.  
  - **end_colno:** Ending column number.  
  - **code:** Code snippet at the target location.  
  - **surround_code:** Surrounding code snippet for extra context (including lines before and after).  
  - **discovered_time:** Time in seconds since fuzzing started when the location was discovered.

## Result Verification
Result verification is implemented in `TTG-Bench-Lite/CovVerifier.py` with the function:  
```python
def verify_result(problem, python_code):
```
- `problem`: The problem object as shown above.  
- `python_code`: Python code that generates the input buffer.  

This function returns `True` if the generated input passes verification, otherwise `False`.  

Example usage:  
```python
problem = {"entry_program": "libpng_read_fuzzer", "project": "libpng", "repo_url": "https://github.com/glennrp/libpng.git", "commit": "cd0ea2a7f53b603d3d9b5b891c779c430047b39a", "problem_id": 234, "target_location": {"file_name": "libpng/libpng/pngrutil.c", "func_name": "png_handle_sCAL", "start_lineno": 2396, "start_colno": 4, "end_lineno": 2400, "end_colno": 5, "code": "      png_crc_finish(png_ptr, length);\n      png_chunk_benign_error(png_ptr, \"duplicate\");\n      return;\n", "surround_code": "\n   else if (info_ptr != NULL && (info_ptr->valid & PNG_INFO_sCAL) != 0)\n   {\n      png_crc_finish(png_ptr, length);\n      png_chunk_benign_error(png_ptr, \"duplicate\");\n      return;\n   }\n\n", "discovered_time": 4798}}
code_correct = "import zlib\n\ndef generate_buf():\n    # PNG signature\n    png_sig = bytes([0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A])\n    \n    # IHDR chunk (1x1 RGB)\n    ihdr_data = bytes([\n        0x00, 0x00, 0x00, 0x01,  # width\n        0x00, 0x00, 0x00, 0x01,  # height\n        0x08,  # bit depth\n        0x02,  # color type (RGB)\n        0x00,  # compression\n        0x00,  # filter\n        0x00   # interlace\n    ])\n    ihdr_type = b'IHDR'\n    ihdr_crc = zlib.crc32(ihdr_type + ihdr_data).to_bytes(4, 'big')\n    ihdr_chunk = bytes([0x00, 0x00, 0x00, 0x0D]) + ihdr_type + ihdr_data + ihdr_crc\n    \n    # First valid sCAL chunk\n    scal1_data = bytes([0x01]) + b'1.0\\x001.0'\n    scal1_type = b'sCAL'\n    scal1_crc = zlib.crc32(scal1_type + scal1_data).to_bytes(4, 'big')\n    scal1_chunk = bytes([0x00, 0x00, 0x00, 0x08]) + scal1_type + scal1_data + scal1_crc\n    \n    # Second sCAL chunk (duplicate)\n    scal2_data = bytes([0x01, 0x00, 0x00, 0x00])\n    scal2_type = b'sCAL'\n    scal2_crc = zlib.crc32(scal2_type + scal2_data).to_bytes(4, 'big')\n    scal2_chunk = bytes([0x00, 0x00, 0x00, 0x04]) + scal2_type + scal2_data + scal2_crc\n    \n    # IDAT chunk (minimal valid zlib data for 1x1 RGB)\n    idat_data = zlib.compress(b'\\x00\\x00\\x00\\x00\\x00')\n    idat_type = b'IDAT'\n    idat_crc = zlib.crc32(idat_type + idat_data).to_bytes(4, 'big')\n    idat_chunk = len(idat_data).to_bytes(4, 'big') + idat_type + idat_data + idat_crc\n    \n    # IEND chunk\n    iend_chunk = bytes([0x00, 0x00, 0x00, 0x00]) + b'IEND' + zlib.crc32(b'IEND').to_bytes(4, 'big')\n    \n    return png_sig + ihdr_chunk + scal1_chunk + scal2_chunk + idat_chunk + iend_chunk"
code_wrong = "import zlib\ndef generate_buf():\n    # PNG signature\n    png = bytearray(b'\\x89PNG\\r\\n\\x1a\\n')\n    \n    # IHDR chunk (1x1 RGBA)\n    ihdr_data = bytes.fromhex('00000001 00000001 08020000 00')\n    ihdr_type = b'IHDR'\n    ihdr_crc = zlib.crc32(ihdr_type + ihdr_data).to_bytes(4, 'big')\n    png.extend((len(ihdr_data)).to_bytes(4, 'big') + ihdr_type + ihdr_data + ihdr_crc)\n    \n    # First valid sCAL chunk (unit=1, width=\"1\", height=\"1\")\n    scal1_data = bytes.fromhex('0131003100')\n    scal_type = b'sCAL'\n    scal1_crc = zlib.crc32(scal_type + scal1_data).to_bytes(4, 'big')\n    png.extend((len(scal1_data)).to_bytes(4, 'big') + scal_type + scal1_data + scal1_crc)\n    \n    # Second sCAL chunk (duplicate, minimal data)\n    scal2_data = bytes.fromhex('01000000')\n    scal2_crc = zlib.crc32(scal_type + scal2_data).to_bytes(4, 'big')\n    png.extend((len(scal2_data)).to_bytes(4, 'big') + scal_type + scal2_data + scal2_crc)\n    \n    # IDAT chunk (minimal data)\n    idat_data = bytes([0])\n    idat_type = b'IDAT'\n    idat_crc = zlib.crc32(idat_type + idat_data).to_bytes(4, 'big')\n    png.extend((len(idat_data)).to_bytes(4, 'big') + idat_type + idat_data + idat_crc)\n    \n    # IEND chunk\n    iend_type = b'IEND'\n    iend_crc = zlib.crc32(iend_type).to_bytes(4, 'big')\n    png.extend((0).to_bytes(4, 'big') + iend_type + iend_crc)\n    \n    return bytes(png)"
print(verify_result(problem, code_correct),
        verify_result(problem, code_wrong))
```
Expected output:  
```text
True False
```

## Retrieval-based Setting
The context retrieved by BM25 is saved at `TTG-Bench-Lite/retrieval/<entry_program>.<problem_id>.context`.

## Agent-based Setting
The code browsing tool is implemented in `TTG-Bench-Lite/agent/CodeBrowser.py`.  
Example usage:  
```python
entry_program = 'libpng_read_fuzzer'
cb = CodeBrowser(entry_program)
res_1 = cb.get_query_result({
    'func_name': ['LLVMFuzzerTestOneInput'],
})
print(res_1)
res_2 =  cb.get_query_result({
    'called_name': ['png_handle_sCAL'],
})
print(res_2)
```
Expected output:  
```text
The code of function LLVMFuzzerTestOneInput is shown below.
...
The code that calls png_handle_sCAL is shown below.
...
```

# TTG-Gen
## Fuzzing
To demonstrate fuzzing, we provide an example using AFL on libpng. 
Run the following code:
```bash
cd code/TTG-Gen/
./afl-fuzz -i libpng/fuzzin/ -o libpng/fuzzout -x libpng/png.dict -- libpng/libpng_read_fuzzer @@
```

## Selecting Target Locations
To select target locations, clone the source code of the target project (which is `libpng` in this example).

```bash
git clone https://github.com/glennrp/libpng.git /work/TTG-Gen/libpng/libpng
cd /work/TTG-Gen/libpng/libpng
git checkout cd0ea2a7f53b603d3d9b5b891c779c430047b39a
```

The TTG-Gen method is implemented in the function:  

```python
def ttg_gen(entry_program, fuzz_queue, threshold_time):
```
- `entry_program`: the entry program.  
- `fuzz_queue`: directory containing the seed pool produced by fuzzing.  
- `threshold_time`: discovery time threshold in seconds.  

The function returns candidate target locations.
