Search and Retrieval in Semantic-Structural Representations of Novel Malware

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Malware Analysis, Explainability
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We present a novel representation for binary programs. We present experimental results from search based on program semantics and structural properties, and show we are able to recognize patterns in novel malware with unknown functionality.
Abstract: In this study we present a novel representation for binary programs, which captures semantic similarity and structural properties. Our representation is composed in a bottom-up approach and enables new methods of analysis. We show that we can perform search and retrieval of binary executable programs based on similarity of behavioral properties, with an adjustable level of feature resolution. We begin by extracting data dependency graphs (DDG), which are representative of both program structure and operational semantics. We then encode each program as a set of graph hashes representing isomorphic uniqueness, a method we have labeled DDG Fingerprinting. Next, we use k-Nearest Neighbors to search in a metric space constructed from examples. This approach allows us to perform a quantitative analysis of patterns of program operation. By evaluating similarity of behavior we are able to recognize patterns in novel malware with functionality not previously identified. We present experimental results from search based on program semantics and structural properties in a dataset of binary executables with features extracted using our method of representation. We show that the associated metric space allows an adjustable level of resolution. Resolution of the features may be decreased for breadth of search and retrieval, or as the search space is reduced, the resolution may be increased for accuracy and fine-grained analysis of malware behavior.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2932
Loading