Abstract: Highlights•We show that the current best approximate matching tool (in terms of security),i.e., FbHash is not very efficient in design in terms of memory consumption and run time performance and list out possible reasons behind them.•We then propose a more time and memory efficient version of FbHash and term it as – FbHash-E.•We present a novel bloom filter based document frequency design implementation in FbHash-E that reduces the memory requirements to few MBs compared to its predecessor FbHash.•We show that design changes done to improve the performance of FbHash does not impact its ability to detect similar files by much and there is drop of only an average difference of 7.5 in the scores generated by FbHash and FbHash-E respectively.•We show a detailed security analysis of FbHash-E, perform more tests to evaluate its robustness that were not done earlier for FbHash and compare its results with the other state-of-the-art forensic tools. We show that FbHash-E outperforms all the other tools in all the security tests conducted.•We also introduce two new tests, namely - Consistency Test and Code Version Identification Test, discuss their significance in relation to evaluation of a forensic tool and present their results.
Loading