Close menu

SURE

Sunderland Repository records the research produced by the University of Sunderland including practice-based research and theses.

Detecting Binary Perturbation Using Advanced Machine Learning Techniques

Ajayi, Bamidele (2026) Detecting Binary Perturbation Using Advanced Machine Learning Techniques. Doctoral thesis, The University of Sunderland.

Item Type: Thesis (Doctoral)

Abstract

Traditional malware defences based on signature matching are increasingly ineffective against polymorphic and metamorphic variants, which easily evade exact-match detection. The research community now uses machine learning models to extract behavioural and structural patterns from both static and dynamic attributes in their work. Conventional model detection systems, including linear classifiers, fail to identify malware when attackers apply minimal binary modifications that preserve the functionality of the malware. The research presents a unified method to boost malware classification stability through the implementation of Variational Autoencoders (VAEs) for extracting latent space representations. The research combines traditional machine learning algorithms with Convolutional Neural Networks (CNNs) from deep learning models. The reduction in computation costs reaches 95%. The 32-dimensional embeddings from 2,381 static
features lead to a 95.80% reduction in computation cost for BODMAS and a 94.42% reduction for EMBER while maintaining high classification accuracy.
The research evaluates the EMBER and BODMAS and MALIMG datasets through multiple classification tasks that include binary and multiclass operations. The research evaluates robustness through multiple structured perturbation methods which include
both basic stochastic noise types (Gaussian, uniform, dropout, salt-and-pepper) and advanced adversarial attacks (FGSM, HopSkipJump, Boundary). The statistical tests of ANOVA and the paired t-tests and Mann–Whitney U-tests demonstrate that the ensemble models (Random Forest, LightGBM) and CNNs trained with latent representations maintain their performance levels under different perturbation scenarios. The performance of CNNs that use raw features deteriorates when exposed to adversarial noise, yet logistic regression models experience major accuracy declines when deprived of detailed feature representations. The research establishes complete statistical proof about how structured perturbations affect the robustness of the malware classifier in the first study of its kind.
Findings demonstrate that latent space representations strike a practical trade-off between efficiency and resilience, while full feature sets may still offer added protection against highly structured adversarial attacks. The proposed pipeline not only enhances inference efficiency and model generalisation, but also enables real-world deployment in constrained environments such as IoT and edge devices. Future work will explore adaptive latent space learning and adversarial training to further improve malware detection in adversarial settings.

[thumbnail of Bamidele_Ajayi_PhD_Thesis_Final_April2026 (1).pdf]
Preview
PDF
Bamidele_Ajayi_PhD_Thesis_Final_April2026 (1).pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (3MB) | Preview

More Information

Depositing User: Delphine Doucet

Identifiers

Item ID: 20211
URI: https://sure.sunderland.ac.uk/id/eprint/20211

Users with ORCIDS

Catalogue record

Date Deposited: 15 May 2026 08:32
Last Modified: 15 May 2026 08:33

Contributors

Author: Bamidele Ajayi

University Divisions

Collections > Theses

Subjects

Computing

Actions (login required)

View Item (Repository Staff Only) View Item (Repository Staff Only)

Downloads per month over past year