Membership Inference Via Backdooring In Image-Based Malware Classification
The development of malware represents a labor-intensive and resource-demanding undertaking. Typically, malware creators are driven to obfuscate their malicious code or binaries in order to evade detection by security tools such as VirusTotal, Process Monitor, Regshot, Wireshark, and Procmon. Some of these detection tools employ machine learning models to identify malicious code or suspicious activities. Consequently, malware developers employ various strategies to avoid detection, including efforts to ascertain whether their malware code was employed during the training of machine learning models. One of the techniques utilized for this purpose is known as ’membership inference’. This technique involves accessing specific data from a dataset indirectly, without direct access, and determining whether a particular sample of data was part of a machine learning model’s training dataset. Membership inference can be executed through various methodologies, including shadow models, partial training models, and the introduction of backdoors into machine learning models. In the context of backdooring, a sample of data is strategically augmented with a particular pattern that may serve as a potential means to identify the existence of a specific malware code within a training dataset used by a machine learning model.
We hypothesize that the machine learning model employed for malware file detection relies on distinct patterns within the files to determine their malicious or benign nature. In our study, we employed grayscale representations of malware images as input data within a Black Box setting, wherein the attacker possesses no control over the model architecture or settings. The conversion of binary malware files into grayscale images facilitated the execution of a membership inference attack by strategically introducing backdoors during model training.
Our hypothesis posits that successfully introducing these backdoors will result in noticeable deviations in the model’s behavior, leading to reduced prediction accuracy and potential misclassification of malware files by the model. The attack involves limited queries by an attacker, involving sample backdoor malware images, to discern variations in prediction probabilities compared to clean malware images. This approach empowers malware developers to ascertain whether their specific malware files have been utilized in training the targeted machine learning model.
Our methodology consists of a sequence of steps, including converting binary files into grayscale images, followed by inserting backdoors into a small subset of the same-class training dataset (in this case, the malware class). These backdoored samples are disguised within a different class (Benign), and the model is subsequently trained using five distinct neural networks (VGG-16, NasNetMobile, MobileNet, VGG-19, and DenseNet121). The objective is to evaluate and compare the prediction probability performance for detecting specific malware files within the training dataset.
Our experimental findings consistently support our hypothesis, demonstrating that a mere 1% of backdoored data in the training dataset is adequate to execute the membership inference attack successfully. The test results provide compelling evidence of specific malware files within the training dataset, confirming the viability of our approach.