# Artifacts for Binary Diff Summarization using Large Language Models

## Folder structure

 - <program>
   - <program>-clean-<version1>-<malware>-<version2>
     - changelog.txt: changelog extracted from source or website
     - ghidriff.json: binary diff generated by Ghidriff
     - <model>-summary
       - summary.json: Contains the function-wise summary generated by LLM
       - llm.jsonl: log of the LLM prompts and responses
       - pred_<predmodel>_top<n>_<change>.json: malware detector output for any LLM

Data download link: https://zenodo.org/records/17199610?token=eyJhbGciOiJIUzUxMiIsImlhdCI6MTc1ODc5NDE5MywiZXhwIjoxNzY3MTM5MTk5fQ.eyJpZCI6IjAwODVjOGZjLWI4YTctNDBmMS04YWJkLWUzZjBhZTNjOTAxMCIsImRhdGEiOnt9LCJyYW5kb20iOiI3Yzg3MmRiMjBkMGY0NDU2YWM3MzVhOGU1NzY2Mzg5ZiJ9.9lrIuRUT03Qkr40ZY2fgaMwHlyLLRdkWiVLQoMVxFtzUKNU1fuRfYk_t-Y5hEVubaWerFcSUvc92Qng9AB9-hA
