Adversarial training for machine learning classifiers against multiple adversaries

Quadri, Hakeem Alade

Adversarial training for machine learning classifiers against multiple adversaries

Download

Preview

QUADRI, Hakeem - THESIS.pdf - Submitted Version (3MB) | Preview

Export

Quadri, Hakeem Alade (2025) Adversarial training for machine learning classifiers against multiple adversaries. PhD thesis, Victoria University.

Abstract

Convolutional Neural Networks (CNNs), particularly low-latency models like MobileNet, are widely applied in areas such as image classification, speech recognition, and language processing. Despite their efficiency and accuracy, these models remain vulnerable to ad-versarial attacks, small, structured perturbations to input data that can lead to misclassifi-cation without affecting human perception. Traditional adversarial training techniques, which aim to enhance model robustness, typically treat all data points equally. This uni-form treatment does not account for the varying susceptibility of individual samples to adversarial perturbation. To address this limitation, we propose a Weighted Adversarial Reinforced Stackelberg Learning (WARS) framework, which formulates the training process as a Stackelberg game between a defender (the CNN model) and an adversary. In this setup, we assign greater training weight to data points more likely to be exploited by adversaries. The strategy allows the model to adapt its training focus based on the risk level associated with each input. To further enhance robustness, we integrate a reinforcement learning (RL) agent to fine-tune hyperparameters dynamically throughout training, reducing reli-ance on manual configuration and improving convergence efficiency. Experimental results on the CIFAR-10 dataset show that the WARS model achieves a ro-bustness of 66.18% after a single epoch of training, compared to 64.72% obtained through standard adversarial training. This indicates that the WARS approach can offer measurable improvements in resilience with minimal computational overhead. Beyond single-adversary settings, we extend our model to account for multiple attackers using a Bayesian Stackelberg game framework. This models the interaction between the classifier and a population of adversaries with different strategies, simulating more realistic deployment conditions. The defender computes an optimal mixed strategy that considers the distribution of possible attacks. The resulting nested Bayesian Stackelberg formulation provides a scalable foundation for training models robust to varied and un-predictable threats. Finally, we investigate quantum machine learning as an alternative defence strategy. By employing quantum support vector machines (QSVM) with ZZ feature maps, we project adversarial inputs into high-dimensional quantum spaces, allowing for enhanced separa-bility between perturbed and unperturbed data. On adversarial perturbed MNIST and CIFAR-10 datasets, the QSVM achieved 70.6% classification accuracy, outperforming a classical SVM with an RBF kernel, which scored 51%. This demonstrates the potential of quantum kernels in defending against adversarial threats, particularly in complex, non-linear domains. This thesis addresses three key challenges in adversarial machine learning: (1) the inabil-ity of traditional adversarial training to adapt to sample-specific vulnerabilities, (2) the in-efficiency of static hyperparameter tuning in dynamic adversarial settings, and (3) the limitations of classical models in handling complex, non-linear adversarial perturbations. To overcome these challenges, we propose a Weighted Adversarial Reinforced Stackel-berg Learning (WARS) framework that combines sample-weighted adversarial training with reinforcement-based hyperparameter optimization. We extend this to a Bayesian Stackelberg game to model interactions with multiple attackers and improve scalability in real-world threat environments. Finally, we explore quantum-enhanced classification us-ing Quantum Support Vector Machines (QSVMs), demonstrating superior resilience to adversarial perturbations through high-dimensional feature mapping. Collectively, this work presents an integrated defence strategy that enhances the robustness of modern machine learning models against evolving adversarial threats.

Additional Information	Doctor of Philosophy
Item type	Thesis (PhD thesis)
URI	https://vuir.vu.edu.au/id/eprint/49932
Subjects	Current > FOR (2020) Classification > 4611 Machine learning Current > Division/Research > Institute for Sustainable Industries and Liveable Cities
Keywords	Convolutional Neural Networks, CNNs, machine learning, training techniques, adversarial machine learning
Download/View statistics	View download statistics for this item

Search Google Scholar

Repository staff login

CORE (COnnecting REpositories)

VU Research Repository

Find repository resources

Adversarial training for machine learning classifiers against multiple adversaries

Abstract

Useful links

Indexed by

Acknowledgement of Country

VU Research Repository

Find repository resources

Search the VU Research Repository

Adversarial training for machine learning classifiers against multiple adversaries

Abstract