Robustness Analysis of Machine Learning-Based And Hashes-Based Malware Detectors
In the current Digital Era, malware complexity is increasing exponentially, which exposes people from society to government infrastructure against threats of privacy, data leaks and ransomware. While both industry and academic research proposed various detection methods, malware developers often resort to modification techniques to obfuscate their malware, preserving functionality while altering the binary code syntax and structure. This adaption allows them to evade existing detection methods.
This thesis undertakes a comprehensive analysis of the modification (or obfuscation) techniques employed in malware. Specifically, we investigate the impact of these modified malware on a range of anti-malware detection techniques, including Hashing, Raw Bytes analysis, and Feature-Based Machine Learning models. Our methodologies span from the manipulation of control flow graphs by displacing instructions to the application of virtualisation techniques for generating obfuscated malware.
The results of our analysis highlight a concerning inadequacy in Raw Bytes-based detectors when confronted with today’s sophisticated obfuscated malware. However, our graph-based classifier demonstrates a remarkable ability to detect newly transformed malware, which rely on displacement-based obfuscation, with an impressive 98% accuracy. This insight exposes vulnerabilities in conventional and state-of-the-art detection mechanisms when dealing with evolving obfuscated threats.
Our research not only contributes to the current understanding of malware evolution and detection but also paves the way for more proactive threat detection strategies. It serves as an important stepping stone in the ongoing efforts to safeguard society from the multifaceted repercussions of digital threat vectors.